|
From: Stephan B. <Ste...@MX...> - 2010-07-16 21:07:58
|
Hi All
Hope this email is not too long and boring, just want to share some
stuff and ask some too...
Current Infrastructure:
- 3x locations in different countries est. 200 servers each and network
equipment to match
- Currently running 1 x hobbit/xymon per site (with hobbit clients)
Main goal for Xymon/Devmon implementation
- Single central monitoring station (Lightweight)
- Need to report on historical infrastructure data (availability,
resource utilization etc.)
- Snmp functionality to enable monitoring of network equip, storage and
servers
- Low as possible network load
Work done:
- Base install of Centos 5.5, 64bit on Dell Poweredge 1955, 8x2ghz, 8gb
ram
- xymon 4.2.3 installation
- Devmon 0.3.1-beta1 (MULTINODE)
CYCLETIME=60
NUMFORKS=14 (tried up to 20)
MAXPOLLTIME=4
SNMPTIMEOUT=50 (because currently polling across borders)
SNMPTRIES=4
Rest Default
- Transferred existing bb-hosts file (+/- 200 devices)
- Customized tests - Linux 1955 - Poweredge template removed fans
and power
- Added Disk,
DiskIO, IFLoad and Processes from linux-netsnmp template
- Win 1955 - Poweredge template
removed fans, power and cpu
- Added Disk,
IFLoad, cpus and Services from Server2003 template
- Win/Lin 1950/2950 - Same as above
except left fans and power in there
Problems experienced:
- Default UDP buffer too small causing, packet receive errors when
"netstat -su" (SOLVED)
- Set in /etc/sysctl.conf
net.core.rmem_max = 16777216
net.core.rmem_default = 8388608
net.core.wmem_max = 16777216
net.core.wmem_default = 8388608
- Regularly getting devices that stop monitoring with the following
error (white hobbit):
- This problem is intermittent, across various servers on LAN and
WAN
Missing repeater data for primary OID XXXXXXX
- Recently started getting the following problem (purple hobbit):
- Also intermittent, started happening since adding some new hosts
to monitor
- Possible cause, think i have reached my limit for this server,
please confirm, data from devmon not reaching hobbit in time
- Database reports 750 tests on this main node
Problems with templates:
Hashed these lines out
[10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.4)
at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms,
line 6
[10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.2)
at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms,
line 6
[10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.3)
at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms,
line 6
[10-07-16@22:16:20] Bad SWITCH case type (1.3.6.1.2.1.25.2.1.7)
at /usr/local/devmon/templates/microsoft-win2k3server/disk/transforms,
line 6
Don't need this templates, just thought i'd share
[10-07-16@22:16:20] Missing 'oids' file
in /usr/local/devmon/templates/cisco-mds9500/experimental, skipping this
test.
[10-07-16@22:16:20] Missing 'message' file
in /usr/local/devmon/templates/netscreen-5gt/memory, skipping this test.
Dell-poweredge template on Windows server cpu don't report, please let
me know the correct oid.
Dell-poweredge template on Windows server memory don't report correctly,
problems vary on 2003/2008/2008r2 (Probably OID's too)
Requests:
- Need to know if anybody can help me out with SNMP templates for the
following:
- Dell Chassis Switches PowerConnect 5316M
- Dell 1955 chassis DRAC
- EMC Clariion CX3 series storage
- CISCO ACE
- Fortigate (Fortinet) Firewall
- Anything MS SQL server
- Got the Brocade on thanks!!!!
- Any poweredge templates specific to Windows Versions and Linux
The ones above i do not find i will attempt to create and share as soon
as this is done.
Questions:
- Want to manually assign owner-node of a device to be polled in db or
otherwise, and it not be overwritten by auto assignment by devmon
- Reason, each node must only poll on local LAN and send update to
display server (Multi location)
- What is the relative maximum tests per node as per experience (got 750
on display server)
- What is the relative maximum devices to be monitored on hobbit keeping
RRD in mind
- If i disable conn test on hobbit, will this affect polling of devices
if so how can i disable.
- If devmon fails to connect to display server to send polling results,
is the results buffered and resent when display server becomes avail or
does it get lost
More hobbit related:
- Want to pull custom reports relating to test data results, was
thinking to import all rrd and hobbit history data into MS SQL db via
rrdtool xport function, any ideas would help.
And lastly i love what you guys are doing with devmon keep the good work
up!!
Cheers, thanks for the help
Stephan Buys
This email is subject to the MXit email disclaimer, which is available at http://www.mxit.com/email.pdf
If you cannot access the disclaimer, please get a copy from us by sending an email to: su...@mx...
|