From: Stephan B. <Ste...@MX...> - 2010-07-16 21:07:58
|
Hi All Hope this email is not too long and boring, just want to share some stuff and ask some too... Current Infrastructure: - 3x locations in different countries est. 200 servers each and network equipment to match - Currently running 1 x hobbit/xymon per site (with hobbit clients) Main goal for Xymon/Devmon implementation - Single central monitoring station (Lightweight) - Need to report on historical infrastructure data (availability, resource utilization etc.) - Snmp functionality to enable monitoring of network equip, storage and servers - Low as possible network load Work done: - Base install of Centos 5.5, 64bit on Dell Poweredge 1955, 8x2ghz, 8gb ram - xymon 4.2.3 installation - Devmon 0.3.1-beta1 (MULTINODE) CYCLETIME=60 NUMFORKS=14 (tried up to 20) MAXPOLLTIME=4 SNMPTIMEOUT=50 (because currently polling across borders) SNMPTRIES=4 Rest Default - Transferred existing bb-hosts file (+/- 200 devices) - Customized tests - Linux 1955 - Poweredge template removed fans and power - Added Disk, DiskIO, IFLoad and Processes from linux-netsnmp template - Win 1955 - Poweredge template removed fans, power and cpu - Added Disk, IFLoad, cpus and Services from Server2003 template - Win/Lin 1950/2950 - Same as above except left fans and power in there Problems experienced: - Default UDP buffer too small causing, packet receive errors when "netstat -su" (SOLVED) - Set in /etc/sysctl.conf net.core.rmem_max = 16777216 net.core.rmem_default = 8388608 net.core.wmem_max = 16777216 net.core.wmem_default = 8388608 - Regularly getting devices that stop monitoring with the following error (white hobbit): - This problem is intermittent, across various servers on LAN and WAN Missing repeater data for primary OID XXXXXXX - Recently started getting the following problem (purple hobbit): - Also intermittent, started happening since adding some new hosts to monitor - Possible cause, think i have reached my limit for this server, please confirm, data from devmon not reaching hobbit in time - Database reports 750 tests on this main node Problems with templates: Hashed these lines out [10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.4) at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms, line 6 [10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.2) at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms, line 6 [10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.3) at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms, line 6 [10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.7) at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms, line 6 Don't need this templates, just thought i'd share [10-07-16@22:16:20] Missing 'oids' file in /usr/local/devmon/templates/cisco-mds9500/experimental, skipping this test. [10-07-16@22:16:20] Missing 'message' file in /usr/local/devmon/templates/netscreen-5gt/memory, skipping this test. Dell-poweredge template on Windows server cpu don't report, please let me know the correct oid. Dell-poweredge template on Windows server memory don't report correctly, problems vary on 2003/2008/2008r2 (Probably OID's too) Requests: - Need to know if anybody can help me out with SNMP templates for the following: - Dell Chassis Switches PowerConnect 5316M - Dell 1955 chassis DRAC - EMC Clariion CX3 series storage - CISCO ACE - Fortigate (Fortinet) Firewall - Anything MS SQL server - Got the Brocade on thanks!!!! - Any poweredge templates specific to Windows Versions and Linux The ones above i do not find i will attempt to create and share as soon as this is done. Questions: - Want to manually assign owner-node of a device to be polled in db or otherwise, and it not be overwritten by auto assignment by devmon - Reason, each node must only poll on local LAN and send update to display server (Multi location) - What is the relative maximum tests per node as per experience (got 750 on display server) - What is the relative maximum devices to be monitored on hobbit keeping RRD in mind - If i disable conn test on hobbit, will this affect polling of devices if so how can i disable. - If devmon fails to connect to display server to send polling results, is the results buffered and resent when display server becomes avail or does it get lost More hobbit related: - Want to pull custom reports relating to test data results, was thinking to import all rrd and hobbit history data into MS SQL db via rrdtool xport function, any ideas would help. And lastly i love what you guys are doing with devmon keep the good work up!! Cheers, thanks for the help Stephan Buys This email is subject to the MXit email disclaimer, which is available at http://www.mxit.com/email.pdf If you cannot access the disclaimer, please get a copy from us by sending an email to: su...@mx... |