From: SourceForge.net <no...@so...> - 2011-09-09 16:43:29
|
Bugs item #3406813, was opened at 2011-09-09 10:12 Message generated for change (Comment added) made by blentz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=3406813&group_id=12694 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: build/test Group: aix Status: Open Resolution: None Priority: 5 Private: No Submitted By: blentz (blentz) Assigned to: Nobody/Anonymous (nobody) Summary: 5.7.1.pre2 hrProcessorLoad broken on AIX Initial Comment: On some AIX systems, there is a substantial delay in retrieving any stats from hrProcessorLoad after snmpd startup. On other AIX systems, the problem persists indefinitely. snmpd -DALL shows the following output initially: trace: header_hrproc(): mibgroup/host/hr_proc.c, 100: host/hr_proc: var_hrproc: HOST-RESOURCES-MIB::hrProcessorLoad 1 trace: header_hrproc(): mibgroup/host/hr_proc.c, 136: host/hr_proc: ... index out of range trace: netsnmp_call_handler(): agent_handler.c, 529: handler:returned: handler old_api returned 0 trace: netsnmp_handle_request(): snmp_agent.c, 3286: results: request results (status = 0): trace: netsnmp_handle_request(): snmp_agent.c, 3289: results: HOST-RESOURCES-MIB::hrProcessorLoad = No Such Instance currently exists at this OID And, on some systems - not all - , eventually shows the following output (success): trace: header_hrproc(): mibgroup/host/hr_proc.c, 100: host/hr_proc: var_hrproc: HOST-RESOURCES-MIB::hrProcessorLoad.770 0 trace: header_hrproc(): mibgroup/host/hr_proc.c, 146: host/hr_proc: ... get proc stats HOST-RESOURCES-MIB::hrProcessorLoad.771 trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 3 (found) It seems there is a problem enumerating processors. Somewhere. Our AIX systems have the ability to support DLPAR (hot add / hot remove), micro-partitioning (integer quantity of CPUs seen by the operating system each backed by anywhere between 0.1 - 1.0 physical cores), 2 and 4 way SMT (hyperthreading), etc., so I'm not sure what value we're reading here. ---------------------------------------------------------------------- >Comment By: blentz (blentz) Date: 2011-09-09 11:43 Message: Using NMON as a cross-reference as another piece of software which leverages libperfstat heavily, the Logical CPU IDs shown are 8, 9, 12, 13, 14, 15 due to repeated DLPAR (hot-remove / hot-add) operations performed on this system. Perhaps this is the issue? ---------------------------------------------------------------------- Comment By: blentz (blentz) Date: 2011-09-09 11:29 Message: On a system where stats will eventually be collected: $ sudo /opt/local/net-snmp/sbin/snmpd -Le -DALL -p /opt/local/net-snmp/var/run/snmpd.pid -a -f 0.0.0.0:16161 2>&1 | grep cpu mib_init: initializing: cpu mib_init: initializing: cpu_perfstat trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx -1 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 0 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 1 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 2 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 3 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx -1 (found) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 0 (found) cpu->user_ticks: 78654821 cpu->kern_ticks: 18758349 trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 1 (found) cpu->user_ticks: 769611 cpu->kern_ticks: 281698 trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 2 (found) cpu->user_ticks: 61397130 cpu->kern_ticks: 18661941 trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 3 (found) cpu->user_ticks: 805261 cpu->kern_ticks: 288135 On a system where stats will never be collected (same debugging code copied): $ sudo /opt/local/net-snmp/sbin/snmpd -Le -DALL -p /opt/local/net-snmp/var/run/snmpd.pid -a -f 0.0.0.0:16161 2>&1 | grep cpu mib_init: initializing: cpu mib_init: initializing: cpu_perfstat trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx -1 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx -1 (found) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 0 (created) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx -1 (found) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 0 (found) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx -1 (found) trace: netsnmp_cpu_get_byIdx(): mibgroup/hardware/cpu/cpu.c, 78: cpu: cpu_get_byIdx 0 (found) No idea.. the system affected has had CPUs hot-removed and hot-added several times, not sure if that's causing the CPU indexes to go through the roof or if it's an issue with the data returned by libperfstat which I need to address with IBM... ---------------------------------------------------------------------- Comment By: blentz (blentz) Date: 2011-09-09 11:19 Message: More info: I added some debug statements to ./agent/mibgroup/hardware/cpu/cpu_perfstat.c in the "Per-CPU statistics" section: /* Interrupt stats only apply overall, not per-CPU */ fprintf(stderr, "CPU %d of %d\n", i, n); fprintf(stderr, "cpu->user_ticks: %d\n", (unsigned long)cs2[i].user); fprintf(stderr, "cpu->kern_ticks: %d\n", (unsigned long)cs2[i].sys); } Then I fire it up and can see that this thing is working: sudo /opt/local/net-snmp/sbin/snmpd -Le -p /opt/local/net-snmp/var/run/snmpd.pid -a -f 0.0.0.0:16161 & [1] 1024058 NET-SNMP version 5.7.1.pre2 CPU 0 of 4 cpu->user_ticks: 78651650 cpu->kern_ticks: 18757507 CPU 1 of 4 cpu->user_ticks: 769554 cpu->kern_ticks: 281698 CPU 2 of 4 cpu->user_ticks: 61394479 cpu->kern_ticks: 18661067 CPU 3 of 4 cpu->user_ticks: 805261 cpu->kern_ticks: 288125 $ /net-snmp/bin/snmpwalk -v 2c -c aix2011 localhost:16161 HOST-RESOURCES-MIB::hrProcessorLoad Received SNMP packet(s) from UDP: [127.0.0.1]:39426->[0.0.0.0]:0 HOST-RESOURCES-MIB::hrProcessorLoad = No Such Instance currently exists at this OID WTH? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112694&aid=3406813&group_id=12694 |