openipmi-developer Mailing List for Open IPMI
Brought to you by:
cminyard
You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(8) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(4) |
Feb
(14) |
Mar
(40) |
Apr
(41) |
May
(17) |
Jun
(50) |
Jul
(16) |
Aug
(37) |
Sep
(57) |
Oct
(44) |
Nov
(48) |
Dec
(35) |
| 2004 |
Jan
(12) |
Feb
(3) |
Mar
(8) |
Apr
(8) |
May
(22) |
Jun
(23) |
Jul
(14) |
Aug
(51) |
Sep
(21) |
Oct
(38) |
Nov
(8) |
Dec
(17) |
| 2005 |
Jan
(27) |
Feb
(28) |
Mar
(50) |
Apr
(32) |
May
(55) |
Jun
(38) |
Jul
(26) |
Aug
(40) |
Sep
(67) |
Oct
(86) |
Nov
(25) |
Dec
(29) |
| 2006 |
Jan
(53) |
Feb
(19) |
Mar
(36) |
Apr
(25) |
May
(27) |
Jun
(56) |
Jul
(28) |
Aug
(15) |
Sep
(37) |
Oct
(63) |
Nov
(63) |
Dec
(105) |
| 2007 |
Jan
(54) |
Feb
(29) |
Mar
(23) |
Apr
(42) |
May
(6) |
Jun
(70) |
Jul
(51) |
Aug
(58) |
Sep
(27) |
Oct
(43) |
Nov
(52) |
Dec
(24) |
| 2008 |
Jan
(39) |
Feb
(76) |
Mar
(23) |
Apr
(18) |
May
(5) |
Jun
(7) |
Jul
(12) |
Aug
(7) |
Sep
(2) |
Oct
(6) |
Nov
(22) |
Dec
(31) |
| 2009 |
Jan
(4) |
Feb
(2) |
Mar
(32) |
Apr
(5) |
May
(22) |
Jun
(5) |
Jul
(9) |
Aug
(6) |
Sep
(12) |
Oct
(30) |
Nov
(27) |
Dec
(31) |
| 2010 |
Jan
(17) |
Feb
(2) |
Mar
(41) |
Apr
(8) |
May
(19) |
Jun
(11) |
Jul
(53) |
Aug
(1) |
Sep
(14) |
Oct
(31) |
Nov
(13) |
Dec
(10) |
| 2011 |
Jan
(10) |
Feb
(15) |
Mar
(6) |
Apr
(6) |
May
(4) |
Jun
|
Jul
(6) |
Aug
(5) |
Sep
(6) |
Oct
(9) |
Nov
(2) |
Dec
(3) |
| 2012 |
Jan
|
Feb
(10) |
Mar
(11) |
Apr
(3) |
May
(2) |
Jun
(6) |
Jul
(12) |
Aug
(1) |
Sep
(3) |
Oct
(23) |
Nov
(6) |
Dec
(11) |
| 2013 |
Jan
(9) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(40) |
Jun
(9) |
Jul
(47) |
Aug
(23) |
Sep
(52) |
Oct
(6) |
Nov
(9) |
Dec
(8) |
| 2014 |
Jan
(27) |
Feb
(15) |
Mar
(26) |
Apr
(36) |
May
(33) |
Jun
(4) |
Jul
(15) |
Aug
(2) |
Sep
(11) |
Oct
(120) |
Nov
(32) |
Dec
(27) |
| 2015 |
Jan
(30) |
Feb
(15) |
Mar
(7) |
Apr
(17) |
May
(27) |
Jun
(23) |
Jul
(15) |
Aug
(39) |
Sep
(19) |
Oct
(5) |
Nov
(26) |
Dec
(6) |
| 2016 |
Jan
(37) |
Feb
(35) |
Mar
(51) |
Apr
(18) |
May
(8) |
Jun
(11) |
Jul
(5) |
Aug
(7) |
Sep
(54) |
Oct
(6) |
Nov
(33) |
Dec
(11) |
| 2017 |
Jan
(15) |
Feb
(25) |
Mar
(25) |
Apr
(19) |
May
(17) |
Jun
(28) |
Jul
(11) |
Aug
(56) |
Sep
(53) |
Oct
(15) |
Nov
(19) |
Dec
(30) |
| 2018 |
Jan
(63) |
Feb
(44) |
Mar
(42) |
Apr
(41) |
May
(19) |
Jun
(22) |
Jul
(16) |
Aug
(38) |
Sep
(14) |
Oct
(6) |
Nov
(11) |
Dec
(12) |
| 2019 |
Jan
(44) |
Feb
(7) |
Mar
(11) |
Apr
(58) |
May
(10) |
Jun
(10) |
Jul
(42) |
Aug
(36) |
Sep
(3) |
Oct
(29) |
Nov
(29) |
Dec
(23) |
| 2020 |
Jan
(7) |
Feb
(22) |
Mar
(3) |
Apr
(38) |
May
(14) |
Jun
(7) |
Jul
(12) |
Aug
(48) |
Sep
(85) |
Oct
(71) |
Nov
(14) |
Dec
(4) |
| 2021 |
Jan
(11) |
Feb
(36) |
Mar
(65) |
Apr
(106) |
May
(73) |
Jun
(33) |
Jul
(25) |
Aug
(19) |
Sep
(19) |
Oct
(29) |
Nov
(95) |
Dec
(21) |
| 2022 |
Jan
(91) |
Feb
(30) |
Mar
(43) |
Apr
(95) |
May
(136) |
Jun
(47) |
Jul
(28) |
Aug
(36) |
Sep
(17) |
Oct
(46) |
Nov
(53) |
Dec
(15) |
| 2023 |
Jan
|
Feb
(15) |
Mar
(44) |
Apr
(9) |
May
(20) |
Jun
(18) |
Jul
(8) |
Aug
(18) |
Sep
(41) |
Oct
(67) |
Nov
(44) |
Dec
(2) |
| 2024 |
Jan
(4) |
Feb
(7) |
Mar
(45) |
Apr
(35) |
May
(4) |
Jun
(29) |
Jul
(4) |
Aug
(37) |
Sep
(16) |
Oct
(12) |
Nov
(6) |
Dec
(8) |
| 2025 |
Jan
(179) |
Feb
(49) |
Mar
(8) |
Apr
(41) |
May
(32) |
Jun
(35) |
Jul
(31) |
Aug
(46) |
Sep
(32) |
Oct
(18) |
Nov
(111) |
Dec
(19) |
| 2026 |
Jan
(21) |
Feb
(8) |
Mar
(18) |
Apr
(52) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Corey M. <co...@mi...> - 2026-04-26 00:04:50
|
On Sat, Apr 25, 2026 at 10:36:05AM +0100, Matt Fleming wrote: > On Tue, Apr 21, 2026 at 07:42:44AM -0500, Corey Minyard wrote: > > The driver would just fetch events and receive messages until the > > BMC said it was done. To avoid issues with BMCs that never say they are > > done, add a limit of 10 fetches at a time. > > > > This is a more general fix than the previous fix for the specific bad > > BMC, but should fix the more general issue of a BMC that won't stop > > saying it has data. > > > > This has been there from the beginning of the driver. > > > > Reported-by: Matt Fleming <mfl...@cl...> > > Closes: https://lore.kernel.org/lkml/202...@re.../ > > Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2") > > Cc: st...@vg... > > Signed-off-by: Corey Minyard <co...@mi...> > > --- > > drivers/char/ipmi/ipmi_si_intf.c | 15 +++++++++++++++ > > drivers/char/ipmi/ipmi_ssif.c | 15 +++++++++++++++ > > 2 files changed, 30 insertions(+) > > [...] > > > @@ -410,6 +413,7 @@ static void start_getting_msg_queue(struct smi_info *smi_info) > > > > start_new_msg(smi_info, smi_info->curr_msg->data, > > smi_info->curr_msg->data_size); > > + smi_info->num_requests_in_a_row = 0; > > smi_info->si_state = SI_GETTING_MESSAGES; > > } > > > > @@ -421,6 +425,7 @@ static void start_getting_events(struct smi_info *smi_info) > > > > start_new_msg(smi_info, smi_info->curr_msg->data, > > smi_info->curr_msg->data_size); > > + smi_info->num_requests_in_a_row = 0; > > smi_info->si_state = SI_GETTING_EVENTS; > > } > > > > Would it be better to move this zeroing to handle_transaction_done()? > Otherwise we reset the counter in handle_flags() -> > start_getting_events() and the threshold is never reached. Oh, yeah. Moving it to handle_transaction_done() is not ideal, though. If something was spewing receive messages, it would never get to events, which is why I did it like I did. The following should fix this. You could also have different limits for receive messages and events, but I think the following is more clear. diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 2a739123270c..e46f4150ceb5 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -413,8 +413,10 @@ static void start_getting_msg_queue(struct smi_info *smi_info) start_new_msg(smi_info, smi_info->curr_msg->data, smi_info->curr_msg->data_size); - smi_info->num_requests_in_a_row = 0; - smi_info->si_state = SI_GETTING_MESSAGES; + if (smi_info->si_state != SI_GETTING_MESSAGES) { + smi_info->num_requests_in_a_row = 0; + smi_info->si_state = SI_GETTING_MESSAGES; + } } static void start_getting_events(struct smi_info *smi_info) @@ -425,8 +427,10 @@ static void start_getting_events(struct smi_info *smi_info) start_new_msg(smi_info, smi_info->curr_msg->data, smi_info->curr_msg->data_size); - smi_info->num_requests_in_a_row = 0; - smi_info->si_state = SI_GETTING_EVENTS; + if (smi_info->si_state != SI_GETTING_EVENTS) { + smi_info->num_requests_in_a_row = 0; + smi_info->si_state = SI_GETTING_EVENTS; + } } /* > > Thanks, > Matt |
|
From: Matt F. <ma...@re...> - 2026-04-25 09:36:20
|
On Tue, Apr 21, 2026 at 07:42:44AM -0500, Corey Minyard wrote: > The driver would just fetch events and receive messages until the > BMC said it was done. To avoid issues with BMCs that never say they are > done, add a limit of 10 fetches at a time. > > This is a more general fix than the previous fix for the specific bad > BMC, but should fix the more general issue of a BMC that won't stop > saying it has data. > > This has been there from the beginning of the driver. > > Reported-by: Matt Fleming <mfl...@cl...> > Closes: https://lore.kernel.org/lkml/202...@re.../ > Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2") > Cc: st...@vg... > Signed-off-by: Corey Minyard <co...@mi...> > --- > drivers/char/ipmi/ipmi_si_intf.c | 15 +++++++++++++++ > drivers/char/ipmi/ipmi_ssif.c | 15 +++++++++++++++ > 2 files changed, 30 insertions(+) [...] > @@ -410,6 +413,7 @@ static void start_getting_msg_queue(struct smi_info *smi_info) > > start_new_msg(smi_info, smi_info->curr_msg->data, > smi_info->curr_msg->data_size); > + smi_info->num_requests_in_a_row = 0; > smi_info->si_state = SI_GETTING_MESSAGES; > } > > @@ -421,6 +425,7 @@ static void start_getting_events(struct smi_info *smi_info) > > start_new_msg(smi_info, smi_info->curr_msg->data, > smi_info->curr_msg->data_size); > + smi_info->num_requests_in_a_row = 0; > smi_info->si_state = SI_GETTING_EVENTS; > } > Would it be better to move this zeroing to handle_transaction_done()? Otherwise we reset the counter in handle_flags() -> start_getting_events() and the threshold is never reached. Thanks, Matt |
|
From: Jian Z. <zha...@by...> - 2026-04-22 04:45:36
|
> 2026年4月22日 06:24,Matt Fleming <ma...@re...> 写道: > > On Tue, Apr 21, 2026 at 07:42:42AM -0500, Corey Minyard wrote: >> Matt reported that there were issues with the IPMI driver getting wedged >> in some cases. It turns out that the BMC was not reporting an error as >> it should have (per the spec) when the event queue was empty. The IPMI >> driver would then request the next event, and so on, wedging the driver. > > Thanks for replying so quickly, Corey. I'll test these out. > > One bit of info I pulled out of the stuck machine is that the response > looks properly formed. > > I sampled the first 8 entries and they were all identical 19-byte > successful READ_EVENT_MSG_BUFFER responses: > > 1c 35 00 55 55 c0 41 a7 00 00 00 00 00 3a ff 00 ff ff ff > Perhaps I know where this data comes from. During a previous debugging session (where ipmitool v1.8.19 failed on sensor list due to an underflow in nr_numbers, which has since been fixed), I noticed this behavior. However, I ’m not sure why it is implemented this way or what exactly this command is intended to do. If you are running on OpenBMC, it is very likely related to this part, where a fixed value is always returned (especially if the KCS channel happens to be configured as 15): See: https://github.com/openbmc/phosphor-host-ipmid/blob/master/systemintfcmds.cpp#L35 Jian. > So on this machine, the event replies do not look short or malformed; > they look like repeated successful event-buffer reads with the same > payload. > > Thanks, > Matt > > > _______________________________________________ > Openipmi-developer mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openipmi-developer |
|
From: Matt F. <ma...@re...> - 2026-04-21 22:24:26
|
On Tue, Apr 21, 2026 at 07:42:42AM -0500, Corey Minyard wrote: > Matt reported that there were issues with the IPMI driver getting wedged > in some cases. It turns out that the BMC was not reporting an error as > it should have (per the spec) when the event queue was empty. The IPMI > driver would then request the next event, and so on, wedging the driver. Thanks for replying so quickly, Corey. I'll test these out. One bit of info I pulled out of the stuck machine is that the response looks properly formed. I sampled the first 8 entries and they were all identical 19-byte successful READ_EVENT_MSG_BUFFER responses: 1c 35 00 55 55 c0 41 a7 00 00 00 00 00 3a ff 00 ff ff ff So on this machine, the event replies do not look short or malformed; they look like repeated successful event-buffer reads with the same payload. Thanks, Matt |
|
From: Corey M. <co...@mi...> - 2026-04-21 17:55:03
|
Matt reported that there were issues with the IPMI driver getting wedged in some cases. It turns out that the BMC was not reporting an error as it should have (per the spec) when the event queue was empty. The IPMI driver would then request the next event, and so on, wedging the driver. The BMC sits on a fuzzy line between a trusted devices and a remote and possibly untrusted device. If you compromised a BMC you have all sorts of tools you can use to attack the host: the reset line, interrupts, and usually access to write the system firmware and possibly devices like disk drives, serial ports and VGA consoles. So attacking through this interface would not be the first thing you would do. But it is an possible attack point. I'm assuming that the BMC was delivering an empty message when this happens, so the first patch checks the message length to make sure it's a valid message. It's a good check no matter what, so it's in whether that's the issue or not. The second patch limits the number of events or messages that can be fetched at a time to 10. This is a good thing to do, anyway. If more message or events were present, the next flag check should get them. So it's a more general fix. I looked at adding the patch Matt suggested, doing a timeout on the wait, but that introduces some race conditions if the response comes back late. That will require some more thought. The timeouts with IPMI can be pretty long, the spec specifies fairly long timeouts, 5 seconds waiting for the BMC to respond to anything. So failing an operation can take some time, and reducing the timeouts is probably a bad idea. No rationale is given in the spec, but I'm guessing it expects that a BMC in restart can recover within 5 seconds, so it gives timeouts so the BMC is always available within that tie. The spec gives you the gist that the BMC should always be available on a system that has one. So the driver (at the beginning) followed that. Thus the driver tries 10 times for a message before it gives up, giving 50 seconds total failure time for a message. That is not in the spec (I don't think) so that could be made selectable on a per-message basis. There are already mechanisms for this available in the APIs; I'll look at that. -corey |
|
From: Corey M. <co...@mi...> - 2026-04-21 13:26:04
|
The driver would just fetch events and receive messages until the BMC said it was done. To avoid issues with BMCs that never say they are done, add a limit of 10 fetches at a time. This is a more general fix than the previous fix for the specific bad BMC, but should fix the more general issue of a BMC that won't stop saying it has data. This has been there from the beginning of the driver. Reported-by: Matt Fleming <mfl...@cl...> Closes: https://lore.kernel.org/lkml/202...@re.../ Fixes: <1da177e4c3f4> ("Linux-2.6.12-rc2") Cc: st...@vg... Signed-off-by: Corey Minyard <co...@mi...> --- drivers/char/ipmi/ipmi_si_intf.c | 15 +++++++++++++++ drivers/char/ipmi/ipmi_ssif.c | 15 +++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 08c208cc64c5..a705aae29867 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -168,6 +168,9 @@ struct smi_info { OEM2_DATA_AVAIL) unsigned char msg_flags; + /* When requesting events and messages, don't do it forever. */ + unsigned int num_requests_in_a_row; + /* Does the BMC have an event buffer? */ bool has_event_buffer; @@ -410,6 +413,7 @@ static void start_getting_msg_queue(struct smi_info *smi_info) start_new_msg(smi_info, smi_info->curr_msg->data, smi_info->curr_msg->data_size); + smi_info->num_requests_in_a_row = 0; smi_info->si_state = SI_GETTING_MESSAGES; } @@ -421,6 +425,7 @@ static void start_getting_events(struct smi_info *smi_info) start_new_msg(smi_info, smi_info->curr_msg->data, smi_info->curr_msg->data_size); + smi_info->num_requests_in_a_row = 0; smi_info->si_state = SI_GETTING_EVENTS; } @@ -646,6 +651,11 @@ static void handle_transaction_done(struct smi_info *smi_info) } else { smi_inc_stat(smi_info, events); + smi_info->num_requests_in_a_row++; + if (smi_info->num_requests_in_a_row > 10) + /* Stop if we do this too many times. */ + smi_info->msg_flags &= ~EVENT_MSG_BUFFER_FULL; + /* * Do this before we deliver the message * because delivering the message releases the @@ -684,6 +694,11 @@ static void handle_transaction_done(struct smi_info *smi_info) } else { smi_inc_stat(smi_info, incoming_messages); + smi_info->num_requests_in_a_row++; + if (smi_info->num_requests_in_a_row > 10) + /* Stop if we do this too many times. */ + smi_info->msg_flags &= ~RECEIVE_MSG_AVAIL; + /* * Do this before we deliver the message * because delivering the message releases the diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index b49500a1bd36..547447f304ba 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -225,6 +225,9 @@ struct ssif_info { bool has_event_buffer; bool supports_alert; + /* When requesting events and messages, don't do it forever. */ + unsigned int num_requests_in_a_row; + /* * Used to tell what we should do with alerts. If we are * waiting on a response, read the data immediately. @@ -413,6 +416,7 @@ static void start_event_fetch(struct ssif_info *ssif_info, unsigned long *flags) } ssif_info->curr_msg = msg; + ssif_info->num_requests_in_a_row = 0; ssif_info->ssif_state = SSIF_GETTING_EVENTS; ipmi_ssif_unlock_cond(ssif_info, flags); @@ -436,6 +440,7 @@ static void start_recv_msg_fetch(struct ssif_info *ssif_info, } ssif_info->curr_msg = msg; + ssif_info->num_requests_in_a_row = 0; ssif_info->ssif_state = SSIF_GETTING_MESSAGES; ipmi_ssif_unlock_cond(ssif_info, flags); @@ -843,6 +848,11 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, ssif_info->msg_flags &= ~EVENT_MSG_BUFFER_FULL; handle_flags(ssif_info, flags); } else { + ssif_info->num_requests_in_a_row++; + if (ssif_info->num_requests_in_a_row > 10) + /* Stop if we do this too many times. */ + ssif_info->msg_flags &= ~EVENT_MSG_BUFFER_FULL; + handle_flags(ssif_info, flags); ssif_inc_stat(ssif_info, events); deliver_recv_msg(ssif_info, msg); @@ -876,6 +886,11 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, ssif_info->msg_flags &= ~RECEIVE_MSG_AVAIL; handle_flags(ssif_info, flags); } else { + ssif_info->num_requests_in_a_row++; + if (ssif_info->num_requests_in_a_row > 10) + /* Stop if we do this too many times. */ + ssif_info->msg_flags &= ~RECEIVE_MSG_AVAIL; + ssif_inc_stat(ssif_info, incoming_messages); handle_flags(ssif_info, flags); deliver_recv_msg(ssif_info, msg); -- 2.43.0 |
|
From: Corey M. <co...@mi...> - 2026-04-21 13:26:03
|
The event message buffer response data size got checked later when processing, but check it right after the response comes back. It appears some BMCs may return an empty message instead of an error when fetching events. There are apparently some new BMCs that make this error, so we need to compensate. Reported-by: Matt Fleming <mfl...@cl...> Closes: https://lore.kernel.org/lkml/202...@re.../ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: <st...@vg...> Signed-off-by: Corey Minyard <co...@mi...> --- drivers/char/ipmi/ipmi_si_intf.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 4a9e9de4d684..08c208cc64c5 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) */ msg = smi_info->curr_msg; smi_info->curr_msg = NULL; - if (msg->rsp[2] != 0) { + /* + * It appears some BMCs, with no event data, return no + * data in the message and not a 0x80 error as the + * spec says they should. Shut down processing if + * the data is not the right length. + */ + if (msg->rsp[2] != 0 || msg->rsp_size != 19) { /* Error getting event, probably done. */ msg->done(msg); -- 2.43.0 |
|
From: Dan C. <er...@gm...> - 2026-04-21 04:52:22
|
Hello Corey Minyard,
Commit 75c486cb1bca ("ipmi:ssif: Clean up kthread on errors") from
Apr 13, 2026 (linux-next), leads to the following Smatch static
checker warning:
drivers/char/ipmi/ipmi_ssif.c:1923 ssif_probe()
error: 'ssif_info->thread' dereferencing possible ERR_PTR()
drivers/char/ipmi/ipmi_ssif.c
1874 ssif_info->handlers.owner = THIS_MODULE;
1875 ssif_info->handlers.start_processing = ssif_start_processing;
1876 ssif_info->handlers.shutdown = shutdown_ssif;
1877 ssif_info->handlers.get_smi_info = get_smi_info;
1878 ssif_info->handlers.sender = sender;
1879 ssif_info->handlers.request_events = request_events;
1880 ssif_info->handlers.set_need_watch = ssif_set_need_watch;
1881
1882 thread_num = ((i2c_adapter_id(ssif_info->client->adapter) << 8) |
1883 ssif_info->client->addr);
1884 init_completion(&ssif_info->wake_thread);
1885 ssif_info->thread = kthread_run(ipmi_ssif_thread, ssif_info,
1886 "kssif%4.4x", thread_num);
1887 if (IS_ERR(ssif_info->thread)) {
^^^^^^^^^^^^^^^^^
1888 rv = PTR_ERR(ssif_info->thread);
1889 dev_notice(&ssif_info->client->dev,
1890 "Could not start kernel thread: error %d\n",
1891 rv);
1892 goto out;
1893 }
1894
1895 dev_set_drvdata(&ssif_info->client->dev, ssif_info);
1896 rv = device_add_group(&ssif_info->client->dev,
1897 &ipmi_ssif_dev_attr_group);
1898 if (rv) {
1899 dev_err(&ssif_info->client->dev,
1900 "Unable to add device attributes: error %d\n",
1901 rv);
1902 goto out;
1903 }
1904
1905 rv = ipmi_register_smi(&ssif_info->handlers,
1906 ssif_info,
1907 &ssif_info->client->dev,
1908 slave_addr);
1909 if (rv) {
1910 dev_err(&ssif_info->client->dev,
1911 "Unable to register device: error %d\n", rv);
1912 goto out_remove_attr;
1913 }
1914
1915 out:
1916 if (rv) {
1917 /*
1918 * If ipmi_register_smi() starts the interface, it will
1919 * call shutdown and that will free the thread and set
1920 * it to NULL. Otherwise it must be freed here.
1921 */
1922 if (ssif_info->thread) {
if (!IS_ERR_OR_NULL(ssif_info->thread))
--> 1923 kthread_stop(ssif_info->thread);
1924 ssif_info->thread = NULL;
1925 }
1926 if (addr_info)
1927 addr_info->client = NULL;
1928
1929 dev_err(&ssif_info->client->dev,
1930 "Unable to start IPMI SSIF: %d\n", rv);
1931 i2c_set_clientdata(client, NULL);
1932 kfree(ssif_info);
1933 }
1934 kfree(resp);
1935 mutex_unlock(&ssif_infos_mutex);
1936 return rv;
1937
1938 out_remove_attr:
1939 device_remove_group(&ssif_info->client->dev, &ipmi_ssif_dev_attr_group);
1940 dev_set_drvdata(&ssif_info->client->dev, NULL);
1941 goto out;
1942 }
This email is a free service from the Smatch-CI project [smatch.sf.net].
regards,
dan carpenter
|
|
From: Corey M. <co...@mi...> - 2026-04-20 18:11:46
|
On Mon, Apr 20, 2026 at 11:33:01AM -0500, Corey Minyard wrote: > On Sun, Apr 19, 2026 at 09:50:38PM +0100, Matt Fleming wrote: > > On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote: > > > > > > The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful > > > READ_EVENT_MSG_BUFFER command completes. Getting data from the > > > BMC has higher priority than sending data to the BMC. > > > > > > If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then > > > that would certainly wedge the driver. But it would have to continually > > > report success for that command, which would be strange as its supposed > > > to error out when the queue is empty. > > > > That does indeed appear to be what's happening. > > > > The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER > > handler does not fail when there is nothing to read, > > > > https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704 > > Actually, that is so clearly wrong that it's hard to imagine they made > this mistake. Anyway, a defect needs to be filed against it. It will > certainly break other software on other OSes. > > I think I have an easy workaround, though I'm guessing. I'm guessing > they are returning zero data bytes. There's no check on the size at that > point in the code (it's later). > > Can you try the following patch? Actually ignore that, I was thinking too much about the other stuff and didn't actually read my patch. Here's a working patch: diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 4a9e9de4d684..08c208cc64c5 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) */ msg = smi_info->curr_msg; smi_info->curr_msg = NULL; - if (msg->rsp[2] != 0) { + /* + * It appears some BMCs, with no event data, return no + * data in the message and not a 0x80 error as the + * spec says they should. Shut down processing if + * the data is not the right length. + */ + if (msg->rsp[2] != 0 || msg->rsp_size != 19) { /* Error getting event, probably done. */ msg->done(msg); > > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c > index 4a9e9de4d684..cf8674a93af1 100644 > --- a/drivers/char/ipmi/ipmi_si_intf.c > +++ b/drivers/char/ipmi/ipmi_si_intf.c > @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) > */ > msg = smi_info->curr_msg; > smi_info->curr_msg = NULL; > - if (msg->rsp[2] != 0) { > + /* > + * It appears some BMCs, with no event data, return no > + * data in the message and not a 0x80 error as the > + * spec says they should. Shut down processing if > + * the data is not the right length. > + */ > + if (msg->rsp[2] != 0 || smi_info->curr_msg->rsp_size != 19) { > /* Error getting event, probably done. */ > msg->done(msg); > > > With your approval on that, I'll send it to Linus after letting it sit > in the next tree for a bit. Actually, I'll probably add it in any case, > as it's a good check to do. > > > > > > If it's really something like that, I could also look at adding limits > > > for those operations. > > > > That would be great. Me and Fred would be happy to test out any patch. > > > > I still think the original patch I sent is a worthwhile defense. > > > > Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to > > block behind one another when we hit these kinds of issues in the IPMI > > code. Untangling that across thousands of machines can be time > > consuming and a more explicit EIO or ETIMEDOUT would help with triage. > > Unfortunately, that might have other issues, similar to the ones the > people with the watchdog issue found. I'll look at it a bit, but those > sorts of things would have to be scattered all over the code, not just > in that one place. As you say, it would make debugging easier. > > I think adding a counter to the number of operations occuring from a > single flag fetch would be a way to avoid this issue. That's going to > take a little more time, but I'll definitely work on that. > > Thanks, > > -corey |
|
From: Corey M. <co...@mi...> - 2026-04-20 16:33:18
|
On Sun, Apr 19, 2026 at 09:50:38PM +0100, Matt Fleming wrote: > On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote: > > > > The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful > > READ_EVENT_MSG_BUFFER command completes. Getting data from the > > BMC has higher priority than sending data to the BMC. > > > > If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then > > that would certainly wedge the driver. But it would have to continually > > report success for that command, which would be strange as its supposed > > to error out when the queue is empty. > > That does indeed appear to be what's happening. > > The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER > handler does not fail when there is nothing to read, > > https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704 Actually, that is so clearly wrong that it's hard to imagine they made this mistake. Anyway, a defect needs to be filed against it. It will certainly break other software on other OSes. I think I have an easy workaround, though I'm guessing. I'm guessing they are returning zero data bytes. There's no check on the size at that point in the code (it's later). Can you try the following patch? diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 4a9e9de4d684..cf8674a93af1 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -630,7 +630,13 @@ static void handle_transaction_done(struct smi_info *smi_info) */ msg = smi_info->curr_msg; smi_info->curr_msg = NULL; - if (msg->rsp[2] != 0) { + /* + * It appears some BMCs, with no event data, return no + * data in the message and not a 0x80 error as the + * spec says they should. Shut down processing if + * the data is not the right length. + */ + if (msg->rsp[2] != 0 || smi_info->curr_msg->rsp_size != 19) { /* Error getting event, probably done. */ msg->done(msg); With your approval on that, I'll send it to Linus after letting it sit in the next tree for a bit. Actually, I'll probably add it in any case, as it's a good check to do. > > > If it's really something like that, I could also look at adding limits > > for those operations. > > That would be great. Me and Fred would be happy to test out any patch. > > I still think the original patch I sent is a worthwhile defense. > > Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to > block behind one another when we hit these kinds of issues in the IPMI > code. Untangling that across thousands of machines can be time > consuming and a more explicit EIO or ETIMEDOUT would help with triage. Unfortunately, that might have other issues, similar to the ones the people with the watchdog issue found. I'll look at it a bit, but those sorts of things would have to be scattered all over the code, not just in that one place. As you say, it would make debugging easier. I think adding a counter to the number of operations occuring from a single flag fetch would be a way to avoid this issue. That's going to take a little more time, but I'll definitely work on that. Thanks, -corey |
|
From: Sasha L. <sa...@ke...> - 2026-04-20 13:31:05
|
From: Jian Zhang <zha...@by...>
[ Upstream commit 7fc3e2546cf3fa9a28a2acc92a512c779a8e5038 ]
The response timer can stay armed across device teardown. If it fires after
remove, the callback dereferences the SSIF context and the i2c client after
teardown has started.
Cancel the timer in remove so the callback cannot run after the device is
unregistered.
Signed-off-by: Jian Zhang <zha...@by...>
Message-ID: <202...@by...>
Signed-off-by: Corey Minyard <co...@mi...>
Signed-off-by: Sasha Levin <sa...@ke...>
---
LLM Generated explanations, may be completely bogus:
Now I have enough information to do the full analysis.
---
## Phase 1: Commit Message Forensics
**Step 1.1 — Subject parsing**
Record: Subsystem `ipmi: ssif_bmc:`; action verb `cancel` (equivalent to
"fix/prevent"); summary: cancel response timer in remove path so a
queued timer callback cannot dereference a torn-down context.
**Step 1.2 — Commit message tags**
Record:
- `Signed-off-by: Jian Zhang <zha...@by...>` (author)
- `Message-ID: <202...@by...>`
— lore msgid
- `Signed-off-by: Corey Minyard <co...@mi...>` — IPMI subsystem
maintainer applied it
- No `Fixes:` tag, no `Cc: stable`, no `Reported-by:`, no `Reviewed-by:`
/ `Acked-by:` in the applied version (absence of Fixes/Cc stable is
expected in this pipeline and not a negative signal)
- Related patches 2/5, 3/5, 4/5 in the same series DO carry `Fixes:
dd2bc5cc9e25 ("ipmi: ssif_bmc: Add SSIF BMC driver")`
**Step 1.3 — Body analysis**
Record: Bug description is explicit — "response timer can stay armed
across device teardown"; failure mode is "the callback dereferences the
SSIF context and the i2c client after teardown has started". This is a
classic **use-after-free** scenario. The author names the exact
mechanism (pending timer firing after `ssif_bmc_remove()`).
**Step 1.4 — Hidden bug detection**
Record: Not hidden. Clearly presented as a UAF-prevention fix even
without the word "fix" in the subject. The phrase "so the callback
cannot run after the device is unregistered" is unambiguous fix
language.
## Phase 2: Diff Analysis
**Step 2.1 — Inventory**
Record: 1 file (`drivers/char/ipmi/ssif_bmc.c`), +1/-0 lines, inside
`ssif_bmc_remove()`. Single-file surgical fix.
**Step 2.2 — Flow change**
Record: Before: `ssif_bmc_remove()` calls `i2c_slave_unregister()` then
`misc_deregister()` and returns, leaving `response_timer` possibly
pending. After: `timer_delete_sync(&ssif_bmc->response_timer)` is called
first, which both cancels a pending timer and waits for any in-flight
callback on another CPU to finish before continuing.
**Step 2.3 — Bug mechanism**
Record: Category (d) memory safety / (b) synchronization with teardown.
Root cause:
- `ssif_bmc` is `devm_kzalloc`'d at probe (line 809), so it is freed by
devres AFTER `.remove` returns.
- `response_timer` is armed via `mod_timer(&ssif_bmc->response_timer,
jiffies + 500 ms)` inside `handle_request()` (line 335).
- `response_timeout()` callback dereferences `ssif_bmc`
(`timer_container_of`), takes `ssif_bmc->lock` and touches several
fields (lines 300–315).
- Without `timer_delete_sync()` in remove, the timer can fire after
remove returns and after devres frees `ssif_bmc`, producing a UAF on
`ssif_bmc` and on `ssif_bmc->client`. On module unload the callback
address itself may also be in freed module text.
**Step 2.4 — Fix quality**
Record: 1 line, the canonical pattern (`timer_delete_sync()` in remove
for a driver-owned timer). V1 of the patch used the non-sync variant;
review led to v2 using `timer_delete_sync()`, which is the correct
choice because it also waits for a concurrent callback on another CPU.
Zero-initialized `timer_list` (never armed because `handle_request`
never ran) is safely handled by `timer_delete_sync()` —
`timer_pending()` returns false on a zeroed list entry. No regression
risk.
## Phase 3: Git History Investigation
**Step 3.1 — Blame**
Record: `git blame -L 820,870` shows `ssif_bmc_remove` untouched since
`dd2bc5cc9e2555` (Quan Nguyen, 2022-10-04), which first appears in
**v6.2**. So the bug has been present in every release from v6.2 onward.
**Step 3.2 — Fixes: tag follow-up**
Record: No explicit `Fixes:` tag on this patch, but companion patches
2/5, 3/5, 4/5 all point to `dd2bc5cc9e25`. The same commit introduces
both the timer and the buggy `ssif_bmc_remove()`. `dd2bc5cc9e25` is
present in stable branches from 6.6.y onward (verified via `git show
pending-6.6:drivers/char/ipmi/ssif_bmc.c`) but NOT in 6.1.y (`git show
pending-6.1:drivers/char/ipmi/ssif_bmc.c` reports the file does not
exist — driver was added after 6.1 branched).
**Step 3.3 — File history**
Record: Recent file commits: 41cb08555c416 (treewide
`timer_container_of()` rename, v6.16), 8fa7292fee5c5 (treewide
`timer_delete[_sync]()` rename, v6.15), plus prior IPMI fixes. No
prerequisites for this patch. It is self-contained.
**Step 3.4 — Author context**
Record: Jian Zhang has multiple prior kernel contributions (NCSI, MCTP,
i2c-aspeed, etc.). Not the maintainer, but the patch was applied by
`co...@mi...`, the IPMI maintainer, and was discussed with Quan
Nguyen, the original driver author.
**Step 3.5 — Dependencies**
Record: None. A single `timer_delete_sync()` call inside `.remove` does
not rely on any other patch in the series. For stable branches below
v6.15, the API is `del_timer_sync()` — a trivial rename that the stable
maintainers routinely handle as a backport adjustment.
## Phase 4: Mailing List Research
**Step 4.1 — Original submission**
Record: `b4 dig -c 7fc3e2546cf3f` found the thread at `https://lore.kern
el.org/all/202...@by.../`. This
is patch **1/5** in a 5-patch series of IPMI SSIF BMC fixes.
**`b4 dig -a` — revisions**: v1 (2026-04-02) and v2 (2026-04-03). v2
changelog: "use `timer_delete_sync()` to cancel the timer" — review
feedback upgraded v1's non-sync delete to the sync variant.
In-thread discussion from Corey Minyard (reply to v2 1/5): "Thanks for
the updates on this. I have the new version in my tree." — explicit
maintainer acceptance.
**Step 4.2 — Recipients**
Record: CC list on v2 was Corey Minyard (IPMI maintainer), Quan Nguyen
(original author of the driver / MAINTAINERS entry), openipmi-developer
list, linux-kernel. Correct audience; reviewed by the right people.
**Step 4.3 — Bug report**
Record: No syzbot / bugzilla / user report cited. The bug was apparently
found through code review or internal testing at Bytedance. Unverified
whether there is a field occurrence.
**Step 4.4 — Series context**
Record: Siblings in the same series (all for
`drivers/char/ipmi/ssif_bmc.c`, all with `Fixes: dd2bc5cc9e25`):
copy_to_user partial failure fix (2/5), message desynchronization after
truncated response (3/5), log-level change (4/5), and a kunit test
(5/5). This patch is independent and applies on its own.
**Step 4.5 — Stable-list discussion**
Record: None visible. Anubis anti-bot blocked direct lore searches, but
the archived mbox I pulled via `b4 dig -m` contained no stable-list
cross-post or stable nomination.
## Phase 5: Code Semantic Analysis
**Step 5.1 — Modified function**: `ssif_bmc_remove()`.
**Step 5.2 — Callers**
Record: `ssif_bmc_remove` is assigned to `i2c_driver.remove` and invoked
by the i2c bus core when the device is unbound (rmmod, device unbind via
sysfs, or driver removal).
**Step 5.3 — Callees/related**
Record: The fix target is `response_timer`. Arming site is
`handle_request()` (`mod_timer(... RESPONSE_TIMEOUT=500 ms)`). Callback
is `response_timeout()` which locks `ssif_bmc->lock` and touches
`ssif_bmc->busy`, `response_timer_inited`, `aborting` — all of which are
inside a `devm_kzalloc`'d struct.
**Step 5.4 — Reachability**
Record: Triggered every time a SSIF IPMI request is handled from the
host side; the window to teardown can be up to 500 ms per request.
Remove path is reached when the module is unloaded or the i2c device is
unbound — a real, not theoretical, code path in BMC firmware
environments (OpenBMC) that allow driver rebinding.
**Step 5.5 — Similar patterns**
Record: `timer_delete_sync()` in driver `.remove` is the standard idiom
for any driver-owned timer with a pointer to devres-managed state. The
same pattern is also added in the kunit test (patch 5/5) for correctness
of the test fixture.
## Phase 6: Stable-tree Analysis
**Step 6.1 — Exists in stable?**
Record: Buggy `ssif_bmc_remove()` exists identically in 6.6.y, 6.12.y,
6.15.y, 6.16.y and later (verified by `git show
pending-6.X:drivers/char/ipmi/ssif_bmc.c`). Not present in 6.1.y (driver
added after 6.1 branched).
**Step 6.2 — Backport complications**
Record: Mainline (7.0) uses `timer_delete_sync()`; stable branches <
v6.15 expose it as `del_timer_sync()`. This is a trivial rename that
stable maintainers handle routinely. Otherwise the patch applies cleanly
— the 3 surrounding lines (`struct ssif_bmc_ctx *ssif_bmc = ...;
i2c_slave_unregister(client); misc_deregister(&ssif_bmc->miscdev);`) are
identical in all stable branches I checked.
**Step 6.3 — Prior stable fixes**
Record: No earlier fix for this UAF window has been applied to any
stable branch I checked.
## Phase 7: Subsystem Context
**Step 7.1 — Criticality**: `drivers/char/ipmi/ssif_bmc.c` — PERIPHERAL
(BMC-side SSIF; used on Linux-running BMCs, e.g. OpenBMC). Real users,
narrow hardware scope.
**Step 7.2 — Activity**: Low-frequency but active; Corey Minyard
actively maintains; Jian Zhang's 5-patch series is a recent hardening
round.
## Phase 8: Impact and Risk
**Step 8.1 — Affected users**: Users of Linux BMC firmware that exposes
SSIF to a host CPU (OpenBMC et al.).
**Step 8.2 — Trigger**: Requires device unbind/rmmod while a timer from
a recent IPMI request is still armed (≤500 ms window per request). Not
routinely triggered at runtime on production BMCs that never unbind, but
reachable on development / test / field-service paths and by privileged
userspace actions.
**Step 8.3 — Failure mode**: UAF on `ssif_bmc` (and potentially on
`client`), plus potential call into freed module text on module unload.
Severity: HIGH — potential kernel crash / memory corruption; the
callback also takes a spinlock on the freed struct.
**Step 8.4 — Risk/benefit**
Record: Benefit = eliminates a real UAF window on device removal in a
driver present in many live stable trees. Risk = near-zero; one line,
canonical idiom, no control-flow change, safe on a never-initialized
`timer_list`. Ratio is strongly favorable.
## Phase 9: Synthesis
**Evidence for**: Classic UAF-prevention pattern; minimal 1-line diff;
obviously correct; review feedback already incorporated (v1 → v2
upgraded non-sync to sync); IPMI maintainer applied it; original driver
author was on CC; self-contained (no dependencies); bug present since
driver introduction (v6.2).
**Evidence against**: No explicit `Fixes:` tag on this specific patch
(though companion patches in the series tag dd2bc5cc9e25); no
syzbot/user report cited; backport to < 6.15 needs `del_timer_sync()`
rename.
**Stable-rules checklist**
1. Obviously correct and tested — YES (1 line, canonical, in maintainer
tree)
2. Real bug — YES (UAF on timer callback after teardown)
3. Important issue — YES (UAF / possible kernel panic)
4. Small and contained — YES (+1/-0 line, single function)
5. No new features/APIs — YES (adds a cleanup call only)
6. Applies to stable — YES, with a trivial API rename for < v6.15
**Exception categories**: Not applicable; stands on bug-fix merit alone.
**Decision**: YES. Small UAF fix in a never-canceled timer on device
teardown, accepted by the subsystem maintainer, safe and trivial to
backport to all stable trees that carry the driver (6.6.y and later).
## Verification
- [Phase 1] Parsed subject / body / tags directly from the commit and
companion series — confirmed no `Fixes:`/`Cc: stable`, but sibling
patches in series carry `Fixes: dd2bc5cc9e25`.
- [Phase 2] Read `ssif_bmc.c` lines 77–100 (struct fields including
`response_timer`), 200–228 (`timer_delete()` in write path), 298–336
(`response_timeout` callback + `mod_timer` arming site), 804–848
(probe/remove) — confirmed `ssif_bmc` is `devm_kzalloc`'d (line 809)
and that `response_timeout` dereferences fields inside it.
- [Phase 3] `git blame -L 820,870 drivers/char/ipmi/ssif_bmc.c` — buggy
`ssif_bmc_remove` unchanged since `dd2bc5cc9e2555`.
- [Phase 3] `git show --stat dd2bc5cc9e255` — initial driver add (Oct
2022).
- [Phase 3] `git tag --contains dd2bc5cc9e255` — earliest release is
**v6.2**.
- [Phase 3] `git tag --contains 8fa7292fee5c5` — `timer_delete_sync()`
rename lands in v6.15.
- [Phase 3] `git tag --contains 41cb08555c416` — `timer_container_of()`
rename lands in v6.16.
- [Phase 4] `b4 dig -c 7fc3e2546cf3f` — matched series at `lore.kernel.o
rg/all/202...@by.../`.
- [Phase 4] `b4 dig -c 7fc3e2546cf3f -a` — two revisions (v1 and v2);
applied version is the latest (v2).
- [Phase 4] `b4 dig -m /tmp/ssif_thread.mbx` — full thread saved; Corey
Minyard acknowledges acceptance: "I have the new version in my tree."
v2 changelog: "use timer_delete_sync() to cancel the timer".
- [Phase 5] `Grep response_timer` in the file — arming site in
`handle_request`, callback `response_timeout` dereferences `ssif_bmc`
(line 300), uses `ssif_bmc->lock`.
- [Phase 6] `git show pending-6.6:drivers/char/ipmi/ssif_bmc.c` and
`pending-6.12`, `pending-6.15`, `pending-6.16` — confirmed identical
buggy `ssif_bmc_remove()` in all of them; confirmed older trees use
`del_timer_sync()` / `from_timer()`.
- [Phase 6] `git show pending-6.1:drivers/char/ipmi/ssif_bmc.c` — file
does not exist (driver added after 6.1).
- [Phase 8] Failure mode: timer callback on freed devres memory → UAF /
possible panic on module unload (inferred from code inspection of the
callback plus `devm_kzalloc` semantics).
- UNVERIFIED: no reachable syzbot/bugzilla/user report cited; impact in
field is plausible but I did not locate a concrete crash report.
- UNVERIFIED: did not retrieve v1 body directly (Anubis blocked lore
fetch), but v2 changelog and the maintainer reply unambiguously
document the v1 → v2 change.
**YES**
drivers/char/ipmi/ssif_bmc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/char/ipmi/ssif_bmc.c b/drivers/char/ipmi/ssif_bmc.c
index 7a52e3ea49ed8..dc1d5bb4a4604 100644
--- a/drivers/char/ipmi/ssif_bmc.c
+++ b/drivers/char/ipmi/ssif_bmc.c
@@ -843,6 +843,7 @@ static void ssif_bmc_remove(struct i2c_client *client)
{
struct ssif_bmc_ctx *ssif_bmc = i2c_get_clientdata(client);
+ timer_delete_sync(&ssif_bmc->response_timer);
i2c_slave_unregister(client);
misc_deregister(&ssif_bmc->miscdev);
}
--
2.53.0
|
|
From: Matt F. <ma...@re...> - 2026-04-19 20:50:47
|
On Fri, Apr 17, 2026 at 06:53:55PM -0500, Corey Minyard wrote: > > The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful > READ_EVENT_MSG_BUFFER command completes. Getting data from the > BMC has higher priority than sending data to the BMC. > > If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then > that would certainly wedge the driver. But it would have to continually > report success for that command, which would be strange as its supposed > to error out when the queue is empty. That does indeed appear to be what's happening. The implementation of intel-ipmi-oem's OpenBMC READ_EVENT_MSG_BUFFER handler does not fail when there is nothing to read, https://github.com/openbmc/intel-ipmi-oem/blob/master/src/bridgingcommands.cpp#L704 > If it's really something like that, I could also look at adding limits > for those operations. That would be great. Me and Fred would be happy to test out any patch. I still think the original patch I sent is a worthwhile defense. Our periodic monitoring scripts cause TASK_UNINTERRUPTIBLE tasks to block behind one another when we hit these kinds of issues in the IPMI code. Untangling that across thousands of machines can be time consuming and a more explicit EIO or ETIMEDOUT would help with triage. |
|
From: <pr-...@ke...> - 2026-04-18 18:30:03
|
The pull request you sent on Fri, 17 Apr 2026 20:24:34 -0500: > https://github.com/cminyard/linux-ipmi.git tags/for-linus-7.1-1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/1e769656963e0329b91d32ec76955e077966b603 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html |
|
From: Corey M. <co...@mi...> - 2026-04-18 01:24:51
|
The following changes since commit af4e9ef3d78420feb8fe58cd9a1ab80c501b3c08: uaccess: Fix scoped_user_read_access() for 'pointer to const' (2026-03-02 09:24:32 -0800) are available in the Git repository at: https://github.com/cminyard/linux-ipmi.git tags/for-linus-7.1-1 for you to fetch changes up to 75c486cb1bcaa1a3ec3a6438498176a3a4998ae4: ipmi:ssif: Clean up kthread on errors (2026-04-17 06:47:40 -0500) ---------------------------------------------------------------- ipmi: Small updates and fixes Mostly fixes to the BMC software. Fix one issue in the host side driver where a kthread can be left running on a specific memory allocation failre at probe time. Replace system_wq with system_percpu_wq so system_wq can eventually go away. ---------------------------------------------------------------- Corey Minyard (2): ipmi:ssif: Remove unnecessary indention ipmi:ssif: Clean up kthread on errors Jian Zhang (6): ipmi: ssif_bmc: cancel response timer on remove ipmi: ssif_bmc: fix missing check for copy_to_user() partial failure ipmi: ssif_bmc: fix message desynchronization after truncated response ipmi: ssif_bmc: change log level to dbg in irq callback ipmi: ssif_bmc: add unit test for state machine ipmi: ssif_bmc: Fix KUnit test link failure when KUNIT=m Marco Crivellari (1): ipmi: Replace use of system_wq with system_percpu_wq drivers/char/ipmi/Kconfig | 10 + drivers/char/ipmi/ipmi_msghandler.c | 10 +- drivers/char/ipmi/ipmi_ssif.c | 41 ++-- drivers/char/ipmi/ssif_bmc.c | 405 +++++++++++++++++++++++++++++++++++- 4 files changed, 435 insertions(+), 31 deletions(-) |
|
From: Corey M. <co...@mi...> - 2026-04-17 23:54:11
|
On Fri, Apr 17, 2026 at 11:23:03PM +0100, Matt Fleming wrote: > On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > > > The lower level driver should never not return an answer, it is supposed > > to guarantee that it returns an error if the BMC doesn't respond. > > > > So the bug is not here, the bug is elsewhere. My guess is that there > > is some new failure mode where a BMC is not working but it responds well > > enough that it sort of works and fools the driver. But that's only a > > guess. > > I can now reproduce this pretty reliably by running concurrent > ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic > ipmitool mc reset cold. It wedges in a few minutes. Hmm. If you are sending cold resets, then the driver is going into reset maintenance mode and it should be rejecting messages for 30 seconds after you send that command. You can disable that by changing is_maintenance_mode_cmd() in ipmi_msghandler.c to always return false. > > My working theory is handle_flags() in ipmi_si_intf.c can loop on > flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling > start_next_msg(), starving waiting_msg indefinitely. > > Captured state at wedge: > > si_state=SI_GETTING_EVENTS msg_flags=0x02 > si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER) > si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted) > > The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during > re-init, which drives the flag loop. The EVENT_MSG_BUFFER_FULL flag only gets cleared when a unsuccessful READ_EVENT_MSG_BUFFER command completes. Getting data from the BMC has higher priority than sending data to the BMC. If the BMC continually reports success from READ_EVENT_MSG_BUFFER, then that would certainly wedge the driver. But it would have to continually report success for that command, which would be strange as its supposed to error out when the queue is empty. If it's really something like that, I could also look at adding limits for those operations. To debug things like this I often add module_params that let me see what is going on. But you can look at the "invalid_events" counter to see if the data is bogus. Or there should be an "Event queue full, discarding incoming events" log coming out once at the beginning of when this happens. -corey > > Thanks, > Matt |
|
From: Matt F. <ma...@re...> - 2026-04-17 22:23:17
|
On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote: > > The lower level driver should never not return an answer, it is supposed > to guarantee that it returns an error if the BMC doesn't respond. > > So the bug is not here, the bug is elsewhere. My guess is that there > is some new failure mode where a BMC is not working but it responds well > enough that it sort of works and fools the driver. But that's only a > guess. I can now reproduce this pretty reliably by running concurrent ipmitool commands (sensor/sel/mc info) + sysfs readers + periodic ipmitool mc reset cold. It wedges in a few minutes. My working theory is handle_flags() in ipmi_si_intf.c can loop on flag-driven commands (e.g. READ_EVENT_MSG_BUFFER) without ever calling start_next_msg(), starving waiting_msg indefinitely. Captured state at wedge: si_state=SI_GETTING_EVENTS msg_flags=0x02 si_curr cycling cmd=0x35 (READ_EVENT_MSG_BUFFER) si_wait frozen cmd=0x08 (GET_DEVICE_GUID, never promoted) The cold reset makes the BMC report EVENT_MSG_BUFFER_FULL during re-init, which drives the flag loop. Thanks, Matt |
|
From: Matt F. <ma...@re...> - 2026-04-17 16:11:08
|
On Wed, Apr 15, 2026 at 07:16:53AM -0500, Corey Minyard wrote:
>
> I've seen this before in several scenarios, including a system that put
> IPMI in the ACPI tree and it sort of worked but there was no BMC
> present. I had to disable that particular device.
>
> What hardware is involved here?
I'm fairly sure we've seen this across a bunch of different BMCs, so
it's not a BMC hardware thing. Almost certainly a driver issue.
> Can you give a more detailed example of what's happening in the
> low-level hardware? If it's KCS there's a debug flag in the
> drivers/char/ipmi/ipmi_kcs_sm.c file that should help.
Yep, it's KCS. Unfortunately I haven't found a way to reproduce this
reliably yet.
Looking at a wedged machine (cat /sys/class/ipmi/.../firmware_revision)
with drgn I can see that there's 99,846 messages sat on intf->xmit_msgs
and the KCS SM is idle (it's responding to internal traffic like Get
Global Enables and Get Msg Flags). So it looks like completions are
getting dropped.
We're running a 6.18.18 kernel which includes c08ec55617cb ("ipmi: Fix
use-after-free and list corruption on sender error"), so it's not that.
Here's a dump of some of the data structures.
intf = 0xffff9d2f4a5a0000
intf->curr_msg = 0xffff9d34f21a9c00
intf->xmit_msgs.next = 0xffff9d30c28e3c80
intf->waiting_rcv_msgs = empty
intf->maintenance_mode = 0
intf->maintenance_mode_state = 0
intf->in_shutdown = false
intf->seq_table = 0/64 slots used
intf->smi_work.pending = 0
The stuck message itself — intf->curr_msg:
msg @ 0xffff9d34f21a9c00
.data = { 0x18, 0x01 } # NetFn 0x06 (App), cmd 0x01 = Get Device ID
.data_size = 2
.rsp_size = 38
.rsp[0..7] = 2c 01 00 00 ...
.done = free_smi_msg
.user_data = NULL
.msgid = (internal GDI poll)
.type = IPMI_SMI_MSG_TYPE_NORMAL
smi_info = 0xffff9d2f4a010000
smi_info->si_state = SI_NORMAL (0)
smi_info->curr_msg = 0xffff9d2f48c7b800
smi_info->waiting_msg = NULL
smi_info->interrupt_disabled = false
smi_info->supports_event_msg_buff = true
smi_info->io.irq = 0
smi_info->run_to_completion = false
smi_info->in_maintenance_mode = 0
Let me know if you want any other info. I'll try to trace this and
catch it reproducing.
|
|
From: Matt F. <ma...@re...> - 2026-04-17 16:09:54
|
On Thu, Apr 16, 2026 at 10:28:50AM -0400, Tony Camuso wrote: > > In my testing with updates from the Linus tree, after a BMC cold reset: > 1. The KCS driver returned -EBUSY to callers (good) > 2. The watchdog daemon received the error and initiated shutdown > 3. No D-state hang > > My tests, conducted on a Dell PER640, verified that Corey's upstream fixes > cause the driver to properly return errors instead of blocking. > At least on that platform. > > Which hich low-level driver are you using (KCS, BT, SSIF)? > The PER640 uses KCS. > # cat /sys/class/ipmi/ipmi0/device/params 2>/dev/null > kcs,i/o,0xca8,rsp=4,rsi=1,rsh=0,irq=10,ipmb=32 $ cat /sys/class/ipmi/ipmi0/device/params kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=32 attentions 3 complete_transactions 7080342 events 3 flag_fetches 0 hosed_count 1 idles 25359147 incoming_messages 0 interrupts 0 long_timeouts 264790 short_timeouts 13723711 watchdog_pretimeouts 0 > Actually, no. The 54 commits I backported simply bring my RHEL-9 test kernel > to parity with the Linus tree, which includes [2] and ... > cae66f1a1dcd 2026-02-13 co...@mi... ipmi:si: Fix check for a misbehaving BMC Ah, I see we have some machines on v6.18.20 which includes this and they're still triggering this problem. |
|
From: Tony C. <tc...@re...> - 2026-04-16 14:29:05
|
On 4/15/26 5:22 PM, Frederick Lawler wrote: > Hi Corey & Tony, > > On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote: >> On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: >>> From: Matt Fleming <mfl...@cl...> >>> >>> When the BMC does not respond to a "Get Device ID" command, the >>> wait_event() in __get_device_id() blocks forever in >>> TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent >>> sysfs reader then piles up in D state. Replace with >>> wait_event_timeout() to return -EIO after 1 second. >> >> On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: >>> This is the second report I have of something like this. So >>> something is up. I'm adding Tony, who reported something like this >>> dealing with the watchdog. >>> >>> The lower level driver should never not return an answer, it is >>> supposed to guarantee that it returns an error if the BMC doesn't >>> respond. So the bug is not here, the bug is elsewhere. > > This is a bit of a throwback to our previous discussions around [1]. > > I did end up applying [2] based on that discussion, and had limited > success, but we still have external resets that cause us to enter > this undesirable state :( In my testing with updates from the Linus tree, after a BMC cold reset: 1. The KCS driver returned -EBUSY to callers (good) 2. The watchdog daemon received the error and initiated shutdown 3. No D-state hang My tests, conducted on a Dell PER640, verified that Corey's upstream fixes cause the driver to properly return errors instead of blocking. At least on that platform. Which hich low-level driver are you using (KCS, BT, SSIF)? The PER640 uses KCS. # cat /sys/class/ipmi/ipmi0/device/params 2>/dev/null kcs,i/o,0xca8,rsp=4,rsi=1,rsh=0,irq=10,ipmb=32 > [1]: https://lore.kernel.org/all/aJU...@ma.../ > [2]: https://lore.kernel.org/all/202...@mi.../ >> >> I've been tracking a related issue (RHEL customer case) where BMC >> reset while the IPMI watchdog is active causes D-state hangs. This >> appears to be the same root cause Matt is hitting. >> >> I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel >> (54 patches bringing it to mainline parity) and tested today on a >> Dell R640. > > I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and > lost protection on BMC reset" [3]? > > [3]: https://lore.kernel.org/all/202...@re.../ > Actually, no. The 54 commits I backported simply bring my RHEL-9 test kernel to parity with the Linus tree, which includes [2] and ... cae66f1a1dcd 2026-02-13 co...@mi... ipmi:si: Fix check for a misbehaving BMC |
|
From: Frederick L. <fr...@cl...> - 2026-04-15 22:19:27
|
Hi Corey & Tony, On Wed, Apr 15, 2026 at 11:46:27AM -0400, 'Tony Camuso' via kernel-team wrote: > On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > > From: Matt Fleming <mfl...@cl...> > > > > When the BMC does not respond to a "Get Device ID" command, the > > wait_event() in __get_device_id() blocks forever in > > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent > > sysfs reader then piles up in D state. Replace with > > wait_event_timeout() to return -EIO after 1 second. > > On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: > > This is the second report I have of something like this. So > > something is up. I'm adding Tony, who reported something like this > > dealing with the watchdog. > > > > The lower level driver should never not return an answer, it is > > supposed to guarantee that it returns an error if the BMC doesn't > > respond. So the bug is not here, the bug is elsewhere. This is a bit of a throwback to our previous discussions around [1]. I did end up applying [2] based on that discussion, and had limited success, but we still have external resets that cause us to enter this undesirable state :( [1]: https://lore.kernel.org/all/aJU...@ma.../ [2]: https://lore.kernel.org/all/202...@mi.../ > > I've been tracking a related issue (RHEL customer case) where BMC > reset while the IPMI watchdog is active causes D-state hangs. This > appears to be the same root cause Matt is hitting. > > I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel > (54 patches bringing it to mainline parity) and tested today on a > Dell R640. I assume this patch series: "ipmi:watchdog: Fix panic, D-state hang, and lost protection on BMC reset" [3]? [3]: https://lore.kernel.org/all/202...@re.../ |
|
From: Tony C. <tc...@re...> - 2026-04-15 15:46:45
|
On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote: > From: Matt Fleming <mfl...@cl...> > > When the BMC does not respond to a "Get Device ID" command, the > wait_event() in __get_device_id() blocks forever in > TASK_UNINTERRUPTIBLE while holding bmc->dyn_mutex. Every subsequent > sysfs reader then piles up in D state. Replace with > wait_event_timeout() to return -EIO after 1 second. On Wed, Apr 15, 2026 at 12:17:04PM, Corey Minyard wrote: > This is the second report I have of something like this. So > something is up. I'm adding Tony, who reported something like this > dealing with the watchdog. > > The lower level driver should never not return an answer, it is > supposed to guarantee that it returns an error if the BMC doesn't > respond. So the bug is not here, the bug is elsewhere. I've been tracking a related issue (RHEL customer case) where BMC reset while the IPMI watchdog is active causes D-state hangs. This appears to be the same root cause Matt is hitting. I backported the recent upstream KCS/SI fixes to a RHEL 9 test kernel (54 patches bringing it to mainline parity) and tested today on a Dell R640. Test: Trigger `ipmitool mc reset cold` while watchdog daemon is running. Results with backported fixes: [ 245.376402] IPMI Watchdog: heartbeat completion received [ 246.376392] IPMI Watchdog: heartbeat send failure: -16 [ 247.377560] IPMI Watchdog: heartbeat send failure: -16 ... [ 252.413240] IPMI Watchdog: set timeout error: -16 The watchdog daemon received error 16 (EBUSY) and eventually initiated orderly shutdown: write watchdog device gave error 16 = 'Device or resource busy' shutting down the system because of error 16 Key finding: With the upstream fixes, the driver returns -EBUSY instead of blocking forever. No D-state hang. The watchdog daemon handles the error and initiates orderly reboot. Note: There was still a delay of several minutes before the daemon timed out and triggered shutdown. The driver returned errors promptly, but the watchdog daemon's retry logic (error retry time-out = 120 seconds) extended the overall recovery time. This may warrant a separate look at whether the daemon's retry behavior is appropriate when the BMC is completely unresponsive. This confirms Corey's assessment - the bug is in the lower-level driver not returning errors, not in __get_device_id(). Matt's timeout patch would be a defensive fallback, but the real fix is ensuring KCS/SI properly returns errors when the BMC is unresponsive. Tony |
|
From: Matt F. <ma...@re...> - 2026-04-15 12:28:59
|
From: Matt Fleming <mfl...@cl...>
When the BMC does not respond to a "Get Device ID" command, the
wait_event() in __get_device_id() blocks forever in TASK_UNINTERRUPTIBLE
while holding bmc->dyn_mutex. Every subsequent sysfs reader then piles
up in D state. Replace with wait_event_timeout() to return -EIO after 1
second.
Signed-off-by: Matt Fleming <ma...@re...>
---
drivers/char/ipmi/ipmi_msghandler.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index c41f51c82edd..efa9588e8210 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -2599,7 +2599,13 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)
if (rv)
goto out_reset_handler;
- wait_event(intf->waitq, bmc->dyn_id_set != 2);
+ if (!wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2,
+ msecs_to_jiffies(1000))) {
+ dev_warn(intf->si_dev,
+ "Timed out waiting for get bmc device id response\n");
+ rv = -EIO;
+ goto out_reset_handler;
+ }
if (!bmc->dyn_id_set) {
if (bmc->cc != IPMI_CC_NO_ERROR &&
--
2.43.0
|
|
From: Corey M. <co...@mi...> - 2026-04-15 12:17:04
|
On Wed, Apr 15, 2026 at 12:59:30PM +0100, Matt Fleming wrote:
> From: Matt Fleming <mfl...@cl...>
>
> When the BMC does not respond to a "Get Device ID" command, the
> wait_event() in __get_device_id() blocks forever in TASK_UNINTERRUPTIBLE
> while holding bmc->dyn_mutex. Every subsequent sysfs reader then piles
> up in D state. Replace with wait_event_timeout() to return -EIO after 1
> second.
This is the second report I have of something like this. So something
is up. I'm adding Tony, who reported something like this dealing with
the watchdog.
The lower level driver should never not return an answer, it is supposed
to guarantee that it returns an error if the BMC doesn't respond.
So the bug is not here, the bug is elsewhere. My guess is that there
is some new failure mode where a BMC is not working but it responds well
enough that it sort of works and fools the driver. But that's only a
guess.
I've seen this before in several scenarios, including a system that put
IPMI in the ACPI tree and it sort of worked but there was no BMC
present. I had to disable that particular device.
What hardware is involved here?
Can you give a more detailed example of what's happening in the
low-level hardware? If it's KCS there's a debug flag in the
drivers/char/ipmi/ipmi_kcs_sm.c file that should help.
Thanks,
-corey
>
> Signed-off-by: Matt Fleming <ma...@re...>
> ---
> drivers/char/ipmi/ipmi_msghandler.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index c41f51c82edd..efa9588e8210 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -2599,7 +2599,13 @@ static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)
> if (rv)
> goto out_reset_handler;
>
> - wait_event(intf->waitq, bmc->dyn_id_set != 2);
> + if (!wait_event_timeout(intf->waitq, bmc->dyn_id_set != 2,
> + msecs_to_jiffies(1000))) {
> + dev_warn(intf->si_dev,
> + "Timed out waiting for get bmc device id response\n");
> + rv = -EIO;
> + goto out_reset_handler;
> + }
>
> if (!bmc->dyn_id_set) {
> if (bmc->cc != IPMI_CC_NO_ERROR &&
> --
> 2.43.0
>
|
|
From: Tony C. <tc...@re...> - 2026-04-09 14:33:20
|
On 4/7/2026 5:54 PM, Corey Minyard wrote:
> On Tue, Apr 07, 2026 at 01:51:32PM -0400, Tony Camuso wrote:
>> When the BMC resets while the IPMI watchdog is active, the driver has
>> three failure modes depending on timing:
>>
>> 1. list_add double add panic -- the watchdog daemon retries while the
>> static smi_msg/recv_msg structures are still queued in the IPMI
>> layer from the previous (unanswered) request.
>
> I'm trying to make sense of this. Are you sure this didn't start
> happening after you added a timeout on the wait_for_completion()?
> Otherwise it would never return, the mutex would be held, and no new
> message could be added.
>
> Just timing out in wait_for_completion() there could cause all kinds of
> bad things to happen.
>
You're right. This work was done on a RHEL 9 kernel that did not yet have
your recent upstream KCS/SI fixes applied, so some of the behavior I
observed may have been caused/influenced by bugs you've more recently
addressed.
>>
>> 2. D-state hang -- wait_for_completion() blocks indefinitely because
>> the BMC never delivers a response.
>
> This is an issue. The lower level driver is *always* supposed to return
> a failure. Something else needs to be fixed.
>
> I have seen several creative ways in which BMCs "fail to respond" that
> have confused the lower level drivers. If my guess is correct, there's
> a bug in the low level driver that's causing it to not time out the
> message.
>
> If we don't fix this, it will cause other issues outside the watchdog.
>
Agreed -- the D-state hang is a symptom, not the root cause. If the
KCS driver correctly transitions through error recovery to
SI_SM_HOSED, and the SI layer returns an error completion to the
caller, then wait_for_completion() should never block indefinitely.
To get to the bottom of this, I've instrumented three layers:
- ipmi_kcs_sm.c: trace entry into start_error_recovery() and the
transition to KCS_HOSED after MAX_ERROR_RETRIES
- ipmi_si_intf.c: trace return_hosed_msg(), the SI_SM_HOSED
handler in smi_event_handler(), and HOSED recovery in
smi_timeout()
- ipmi_watchdog.c: trace message send/completion in
_ipmi_set_timeout() and __ipmi_heartbeat(), and the completion
code received in ipmi_wdog_msg_handler()
I've applied your recent upstream patches to my test kernel, so the
KCS/SI code is congruent with current mainline. The traces will show
whether the error recovery chain works correctly with your fixes in
place, or whether the BMC is doing something that still confuses the
low-level driver.
I'll collect the data and follow up.
>>
>> 3. Silent loss of watchdog protection -- the BMC returns a non-zero
>> completion code, the driver's internal state becomes inconsistent,
>> writes to /dev/watchdog return -EINVAL, and the daemon gives up.
>> The system continues running without hardware watchdog coverage.
>
> Again, are you sure this didn't start happening after you added the
> timeout?
>
I think this one is pre-existing, independent of any timeout
changes. When the BMC comes back after a reset and returns a
non-zero completion code (e.g. 0xD5 or 0xFF), the watchdog handler
treats this as a permanent failure. The userspace daemon sees
-EINVAL on subsequent writes to /dev/watchdog and stops retrying.
The system continues running without hardware watchdog coverage,
with no indication to the administrator.
But I need to confirm this with the instrumented traces on the
the patched kernel.
I should have traces collected within the next week or so.
Tony
>>
>> All three stem from the same root cause: the static message structures
>> and unbounded completion waits were never designed for a BMC that
>> disappears mid-transaction.
>
> All that is supposed to be protected by a mutex. That mutex is claimed
> on all IPMI watchdog operations, and it shouldn't be released until all
> resources have been freed. Anything that violates that is asking for
> trouble.
>
> You don't mention the lower level interface (KCS, BT, SMIC, SSIF) but I
> think we need to start looking there.
>
> It may be that the timeouts on the watchdog messages need to be
> adjusted. The whole IPMI driver was designed on the presumption that
> the BMC would go away for only a short period of time (5-10 seconds) and
> not permanantly. That has slowly been fixed over time, but things might
> need to be adjusted in the watchdog.
>
> -corey
>
>>>> This has been independently reported by Kenta Akagi on a Dell PowerEdge
>> R640 running 6.18.7, also triggered by a BMC reset with the watchdog
>> active:
>>
>> https://sourceforge.net/p/openipmi/mailman/message/59292850/
>>
>> The fix takes a simple, deterministic approach: detect the failure via
>> BMC error response, guard against structure reuse (msg_in_flight) and
>> indefinite waits (completion timeout), then initiate orderly_reboot()
>> when the watchdog is active. This produces the same outcome the
>> hardware watchdog would have -- a system reset -- but through a
>> controlled path with clear logging and no panics or hangs.
>>
>> If the watchdog is stopped when the BMC resets, no reboot occurs and
>> the system continues normally.
>>
>> Tested on Dell PowerEdge R640 with kernel 5.14 (RHEL 9) and verified
>> against mainline (both patches apply cleanly).
>>
>> Corey Minyard's recent fix for list corruption in smi_work()
>> (ipmi_msghandler.c) addresses a related but separate code path. The
>> watchdog driver's own static structure reuse requires this fix.
>>
>> Tony Camuso (2):
>> ipmi:watchdog: Reboot cleanly on BMC reset
>> Documentation: ipmi: Update BMC reset behavior for watchdog
>>
>> Documentation/driver-api/ipmi.rst | 61 ++++++++++++++++++
>> drivers/char/ipmi/ipmi_watchdog.c | 101 ++++++++++++++++++++++++------
>> 2 files changed, 144 insertions(+), 18 deletions(-)
>>
>> --
>> 2.53.0
>>
>
|
|
From: Corey M. <co...@mi...> - 2026-04-07 21:54:59
|
On Tue, Apr 07, 2026 at 01:51:32PM -0400, Tony Camuso wrote: > When the BMC resets while the IPMI watchdog is active, the driver has > three failure modes depending on timing: > > 1. list_add double add panic -- the watchdog daemon retries while the > static smi_msg/recv_msg structures are still queued in the IPMI > layer from the previous (unanswered) request. I'm trying to make sense of this. Are you sure this didn't start happening after you added a timeout on the wait_for_completion()? Otherwise it would never return, the mutex would be held, and no new message could be added. Just timing out in wait_for_completion() there could cause all kinds of bad things to happen. > > 2. D-state hang -- wait_for_completion() blocks indefinitely because > the BMC never delivers a response. This is an issue. The lower level driver is *always* supposed to return a failure. Something else needs to be fixed. I have seen several creative ways in which BMCs "fail to respond" that have confused the lower level drivers. If my guess is correct, there's a bug in the low level driver that's causing it to not time out the message. If we don't fix this, it will cause other issues outside the watchdog. > > 3. Silent loss of watchdog protection -- the BMC returns a non-zero > completion code, the driver's internal state becomes inconsistent, > writes to /dev/watchdog return -EINVAL, and the daemon gives up. > The system continues running without hardware watchdog coverage. Again, are you sure this didn't start happening after you added the timeout? > > All three stem from the same root cause: the static message structures > and unbounded completion waits were never designed for a BMC that > disappears mid-transaction. All that is supposed to be protected by a mutex. That mutex is claimed on all IPMI watchdog operations, and it shouldn't be released until all resources have been freed. Anything that violates that is asking for trouble. You don't mention the lower level interface (KCS, BT, SMIC, SSIF) but I think we need to start looking there. It may be that the timeouts on the watchdog messages need to be adjusted. The whole IPMI driver was designed on the presumption that the BMC would go away for only a short period of time (5-10 seconds) and not permanantly. That has slowly been fixed over time, but things might need to be adjusted in the watchdog. -corey > > This has been independently reported by Kenta Akagi on a Dell PowerEdge > R640 running 6.18.7, also triggered by a BMC reset with the watchdog > active: > > https://sourceforge.net/p/openipmi/mailman/message/59292850/ > > The fix takes a simple, deterministic approach: detect the failure via > BMC error response, guard against structure reuse (msg_in_flight) and > indefinite waits (completion timeout), then initiate orderly_reboot() > when the watchdog is active. This produces the same outcome the > hardware watchdog would have -- a system reset -- but through a > controlled path with clear logging and no panics or hangs. > > If the watchdog is stopped when the BMC resets, no reboot occurs and > the system continues normally. > > Tested on Dell PowerEdge R640 with kernel 5.14 (RHEL 9) and verified > against mainline (both patches apply cleanly). > > Corey Minyard's recent fix for list corruption in smi_work() > (ipmi_msghandler.c) addresses a related but separate code path. The > watchdog driver's own static structure reuse requires this fix. > > Tony Camuso (2): > ipmi:watchdog: Reboot cleanly on BMC reset > Documentation: ipmi: Update BMC reset behavior for watchdog > > Documentation/driver-api/ipmi.rst | 61 ++++++++++++++++++ > drivers/char/ipmi/ipmi_watchdog.c | 101 ++++++++++++++++++++++++------ > 2 files changed, 144 insertions(+), 18 deletions(-) > > -- > 2.53.0 > |