[Openipmi-developer] IPMI driver performance problem
Brought to you by:
cminyard
|
From: Nathan S. <ns...@ac...> - 2008-03-17 04:11:32
|
Hi all, We're seeing an IPMI related performance problem on our production servers, which I hope someone can help me with. These are Dell boxes so I've CC'd Matt in case the answer is known already (sorry for the intrusion if not, Matt). Background to the problem is we see system time spikes (occasional lengthy time spent in the kernel) every few minutes, and sometimes more prolonged than others. Using the CPU event counters, oprofile is attributing the time to the port_inb() routine in ipmi_si.ko. Below is one such prolonged sample, showing 31% of the measured CPU cycles in this routine (yikes!). CPU: P4 / Xeon with 2 hyper-threads, speed 2992.76 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % app name symbol name 6891 31.7807 ipmi_si.ko port_inb 2096 9.6666 vmlinux schedule 777 3.5835 vmlinux mwait_idle 720 3.3206 vmlinux _spin_lock 469 2.1630 vmlinux _spin_unlock_irqrestore 443 2.0431 vmlinux _spin_unlock 426 1.9647 ext3.ko ext3_group_sparse 404 1.8632 ipmi_si.ko kcs_event 343 1.5819 vmlinux _spin_lock_irqsave 340 1.5680 vmlinux find_next_bit 328 1.5127 vmlinux timer_interrupt (... chopped remainder of opreport output for brevity ...). $ sudo /sbin/lsmod | grep ipmi ipmi_devintf 13385 2 ipmi_si 37449 1 ipmi_msghandler 32041 2 ipmi_devintf,ipmi_si $ dmesg | grep -i ipmi ipmi message handler version 33.13 IPMI System Interface driver version 33.13, KCS version 33.13, SMIC version 33.13, BT version 33.13 ipmi_si: Found SMBIOS-specified state machine at I/O address 0xca8, slave address 0x20 IPMI kcs interface initialized ipmi device interface version 33.13 (this is a RHEL4 kernel). I don't have a great deal of knowledge about IPMI, unfortunately. >From what I can intuit from reading some code (and take this with a grain of salt given the above statement), and from observing the /proc/ipmi/0/si_stats file during these times, we seem to be seeing large bursts of "short_timeouts". This and the oprofile port_inb() pointer suggests we may be going through the SI_SM_CALL_WITH_DELAY branch in smi_timeout() a fair bit (with a device poll inside the timeout handling code) - does that sound feasible? If thats the case, then the use of an IRQ would seem to be an ideal way to address this issue (smi_timeout() comment says "Running with interrupts, only do long timeouts."). So, I unloaded all of the IPMI kernel drivers, and attempted to run with irqs=X,Y,Z settings. That didn't seem to work though - the /proc/ipmi/0/si_stats file reported "interrupts_enabled: 0" still, and I didn't see any kernel messages that told me the option had been accepted/rejected. Thanks for reading this far! I guess my questions at the moment are: does the above reasoning make sense? And, how do I know what IRQ number to use (I just picked free ones from /proc/interrupts)? And how do I know if our hardware supports IPMI in interrupt mode? cheers. -- Nathan |