Hi Team,
I am trying to use the getevt option of ipmiutil (v3.15) as shown below:
ipmiutil getevt -t 0 -s -b -r /usr/local/bin/evt.sh
or
ipmiutil getevt -t 0 -s
The getevt daemon starts successfully but exits soon as shown below. But the same works properly in lower version of IPMI util (v2.79).
e.g.,
NS9100_35# ipmiutil getevt -t 0 -s
ipmiutil getevent ver 3.15
-- BMC version 1.15, IPMI version 2.0
event receiver sa = 20 lun = 00
bmc enables = 0f
igetevent reading sensors ...
Get IPMI SEL events after ID 003a
igetevent waiting for events via method 1 (SEL_events)
Waiting 0 seconds for an event ...
get_event error: ret = 0xfffffffa
ipmiutil getevent exiting.
I put the debug option with 'x', this is what I got:
get_sel(779) rv=0 cc=0 id=779 next=ffff
sel ok, id=779 next=ffff
get_sel(779) rv=0 cc=0 id=779 next=ffff
sel ok, id=779 next=ffff
get_sel(779) rv=-3 cc=0 id=0 next=0
sel recid 779 error, rv = -3
Any help on how to avoid getevt exit would be really helpful?
Thanks!
That error means that the receive failed (LAN_ERR_RECV_FAIL = -3 in ipmicmd.h).
There are several methods to get the events, mainly the SEL_events method and the GetMessage method.
This by default uses the SEL_events method, and should continue to read the last SEL event until a new event occurs. The fact that it fails to read the SEL event that it was able to read twice before implies some firmware anomaly. Which firmware vendor is this?
It may work better in this case to use -m instead of -s.
If not, perhaps we could add special-case handling of -3 in igetevent.c for this firmware vendor to keep waiting if it gets this error.
Info about these two methods, from the igetevent.c header comment:
Hi Andy Cress,
Thank you very much for your quick response.
I missed to mention that I tried with 'GetMessage' too but it resulted in crash as shown below.
NS9100_35# ipmiutil getevt -t 0
ipmiutil getevent ver 3.15
-- BMC version 1.15, IPMI version 2.0
event receiver sa = 20 lun = 00
bmc enables = 0f
igetevent reading sensors ...
Get IPMI events from kcs driver
igetevent waiting for events via method 2 (GetMessage)
Waiting 0 seconds for an event ...
*** buffer overflow detected ***: terminated
Aborted (core dumped)
[Jyothi] Where can I find the coredump?
NS9100_35# ipmiutil getevt -t 0 -s
ipmiutil getevent ver 3.15
-- BMC version 1.15, IPMI version 2.0
event receiver sa = 20 lun = 00
bmc enables = 0f
igetevent reading sensors ...
Get IPMI SEL events after ID 0019
igetevent waiting for events via method 1 (SEL_events)
Waiting 0 seconds for an event ...
get_event error: ret = 0xfffffffd
ipmiutil getevent exiting.
uname -r
5.4.0-42-generic
Last edit: Jyothi 2022-02-03
got event id 002b, sensor_type = 13
event data: 2b 00 02 a7 96 fb 61 33 00 04 13 05 71 a0 03 18
002b 02/03/22 08:47:35 MIN Bios Critical Interrupt #05 PCIe Cor Sensor PCIe Warn Receiver Error on (03:03.0) 71 [a0 03 18]
Waiting 0 seconds for an event ...
got event id 002c, sensor_type = 13
event data: 2c 00 02 f3 96 fb 61 33 00 04 13 05 71 a0 03 18
002c 02/03/22 08:48:51 MIN Bios Critical Interrupt #05 PCIe Cor Sensor PCIe Warn Receiver Error on (03:03.0) 71 [a0 03 18]
Waiting 0 seconds for an event ...
get_event error: ret = 0xc7
ipmiutil getevent exiting.
I enabled debug logs (-x option) and now the return code is different.
sel ok, id=2c next=ffff
ipmidir Cmd=43 NetFn=0a Lun=00 Sa=20 Data(6): 00 00 2c 00 00 ff
Send Netfn=0a Cmd=43, raw: 00 20 28 43 00 00 2c 00 00 ff
ipmidir Resp(b,43): status=0 cc=00, Data(18): ff ff 2c 00 02 f3 96 fb 61 33 00 04 13 05 71 a0 03 18
get_sel(2c) rv=0 cc=0 id=2c next=ffff
sel ok, id=2c next=ffff
ipmidir Cmd=43 NetFn=0a Lun=00 Sa=20 Data(6): 00 00 2c 00 00 ff
Send Netfn=0a Cmd=43, raw: 00 20 28 43 00 00 2c 00 00 ff
ipmidir Resp(2,43): status=-6 cc=ca, Data(0):
get_sel(2c) rv=-6 cc=0 id=0 next=0
sel recid 2c error, rv = -6
get_event ret = -6
get_event error: ret = 0xfffffffa
sync: recid=2c time=61fb96f3
ipmiutil getevent exiting.
clear_lock rv = -1
Looks like issue is somewhere else; we are unable to figure it out though.
For this function, it looks like the right approach is fixing the firmware
on this motherboard.
The command 'ipmiutil health -x' will show the make and firmware version
from the get_deviceid function as hex codes, and will show known makes.
The firmware version is 1.15 as shown above, but the make/manufacturer
isn't shown by default.
However, to enter a support request you will probably have to go through
the server manufacturer.
The problem is that the firmware doesn't handle more than 2 requests to get
a given SEL entry in the same session.
As an alternative in the meantime, you could create a shell script to run
'ipmiutil sel' in a loop and take action when a new event occurs.
This would use a fresh session each time.
Andy
Last edit: Andy Cress 2022-04-05
Hi Andy Cress,
Thank you for your inputs.
I went through the sample script 'ipmiutil_evt' script, it exits if the mode is driverless. And our appliance also uses ipmiutil in driverless way:
NS9100_35# ipmiutil cmd -k
IPMI access is ok, driver type = kcs
Using driverless method
ipmiutil cmd, completed successfully
I have uploaded the 'ipmiutil health -x' output file 'health_out' to the thread.
Last edit: Jyothi 2022-02-04
Hmmm. I'm surprised. This is Intel firmware on the Intel S4600LH motherboard.
BMC manufacturer = 000157 (Intel), product = 005c (S4600LH)
BMC version = 1.15.4159 (Boot 1.13), IPMI v2.0
BIOS Version = SE5C600.86B.01.07.0002.030620132047
That means this isn't a one-off behavior, so ipmiutil needs to handle it more robustly.
I guess the best approach then is to modify igetevent.c to initiate a new session for each pass.
Okay! Thanks for your input.
History:
We recently upgraded Linux kernel version and all the packages. After the upgrade, ipmiutil access via openIPMI driver has become extremely slow. In order to speed up, we experimented with 'driverless' mode and the speed increased.
OLD:
Linux kernel 4.4.110
ipmiutil ver 2.79 (driver type = open)
NEW:
Linux kernel: 5.4.0-42-generic
ipmiutil ver 3.15 ( driver type = kcs, Using driverless method)
If 'driverless' method does not have multi-user support, using the openIPMI driver way looks safer.
But openIPMI driver seems to extremely slow and our application APIs get stuck/crash in the latest kernel version.
Do you happen to have any information or special configuration/mode that we need to enable in the latest kernel. We could not find much info online.
Coming back to this, you have probably already handled it, but you have a few alternatives:
1) Contact the OpenIPMI driver project to resolve the slowness with kernel 5.4.0.
2) Create a bash script to run the ipmiutil sel via driverless with only one session, which would need to save the last event, and only take action if the last event does not match the saved event (i.e. one or more new events).
3) In ipmiutil, we could create an option to igetevent.c which would cause it to create a fresh session for each pass, and try again if return code is 0xc7 or 0xca.