openipmi-developer Mailing List for Open IPMI

Brought to you by: cminyard

openipmi-developer — Information for development of IPMI code and drivers

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (8)	Dec
2003	Jan (4)	Feb (14)	Mar (40)	Apr (41)	May (17)	Jun (50)	Jul (16)	Aug (37)	Sep (57)	Oct (44)	Nov (48)	Dec (35)
2004	Jan (12)	Feb (3)	Mar (8)	Apr (8)	May (22)	Jun (23)	Jul (14)	Aug (51)	Sep (21)	Oct (38)	Nov (8)	Dec (17)
2005	Jan (27)	Feb (28)	Mar (50)	Apr (32)	May (55)	Jun (38)	Jul (26)	Aug (40)	Sep (67)	Oct (86)	Nov (25)	Dec (29)
2006	Jan (53)	Feb (19)	Mar (36)	Apr (25)	May (27)	Jun (56)	Jul (28)	Aug (15)	Sep (37)	Oct (63)	Nov (63)	Dec (105)
2007	Jan (54)	Feb (29)	Mar (23)	Apr (42)	May (6)	Jun (70)	Jul (51)	Aug (58)	Sep (27)	Oct (43)	Nov (52)	Dec (24)
2008	Jan (39)	Feb (76)	Mar (23)	Apr (18)	May (5)	Jun (7)	Jul (12)	Aug (7)	Sep (2)	Oct (6)	Nov (22)	Dec (31)
2009	Jan (4)	Feb (2)	Mar (32)	Apr (5)	May (22)	Jun (5)	Jul (9)	Aug (6)	Sep (12)	Oct (30)	Nov (27)	Dec (31)
2010	Jan (17)	Feb (2)	Mar (41)	Apr (8)	May (19)	Jun (11)	Jul (53)	Aug (1)	Sep (14)	Oct (31)	Nov (13)	Dec (10)
2011	Jan (10)	Feb (15)	Mar (6)	Apr (6)	May (4)	Jun	Jul (6)	Aug (5)	Sep (6)	Oct (9)	Nov (2)	Dec (3)
2012	Jan	Feb (10)	Mar (11)	Apr (3)	May (2)	Jun (6)	Jul (12)	Aug (1)	Sep (3)	Oct (23)	Nov (6)	Dec (11)
2013	Jan (9)	Feb (2)	Mar (8)	Apr (7)	May (40)	Jun (9)	Jul (47)	Aug (23)	Sep (52)	Oct (6)	Nov (9)	Dec (8)
2014	Jan (27)	Feb (15)	Mar (26)	Apr (36)	May (33)	Jun (4)	Jul (15)	Aug (2)	Sep (11)	Oct (120)	Nov (32)	Dec (27)
2015	Jan (30)	Feb (15)	Mar (7)	Apr (17)	May (27)	Jun (23)	Jul (15)	Aug (39)	Sep (19)	Oct (5)	Nov (26)	Dec (6)
2016	Jan (37)	Feb (35)	Mar (51)	Apr (18)	May (8)	Jun (11)	Jul (5)	Aug (7)	Sep (54)	Oct (6)	Nov (33)	Dec (11)
2017	Jan (15)	Feb (25)	Mar (25)	Apr (19)	May (17)	Jun (28)	Jul (11)	Aug (56)	Sep (53)	Oct (15)	Nov (19)	Dec (30)
2018	Jan (63)	Feb (44)	Mar (42)	Apr (41)	May (19)	Jun (22)	Jul (16)	Aug (38)	Sep (14)	Oct (6)	Nov (11)	Dec (12)
2019	Jan (44)	Feb (7)	Mar (11)	Apr (58)	May (10)	Jun (10)	Jul (42)	Aug (36)	Sep (3)	Oct (29)	Nov (29)	Dec (23)
2020	Jan (7)	Feb (22)	Mar (3)	Apr (38)	May (14)	Jun (7)	Jul (12)	Aug (48)	Sep (85)	Oct (71)	Nov (14)	Dec (4)
2021	Jan (11)	Feb (36)	Mar (65)	Apr (106)	May (73)	Jun (33)	Jul (25)	Aug (19)	Sep (19)	Oct (29)	Nov (95)	Dec (21)
2022	Jan (91)	Feb (30)	Mar (43)	Apr (95)	May (136)	Jun (47)	Jul (28)	Aug (36)	Sep (17)	Oct (46)	Nov (53)	Dec (15)
2023	Jan	Feb (15)	Mar (44)	Apr (9)	May (20)	Jun (18)	Jul (8)	Aug (18)	Sep (41)	Oct (67)	Nov (44)	Dec (2)
2024	Jan (4)	Feb (7)	Mar (45)	Apr (35)	May (4)	Jun (29)	Jul (4)	Aug (37)	Sep (16)	Oct (12)	Nov (6)	Dec (8)
2025	Jan (179)	Feb (49)	Mar (8)	Apr (41)	May (32)	Jun (35)	Jul (31)	Aug (33)	Sep	Oct	Nov	Dec

Flat | Threaded

1 2 3 .. 292 > >> (Page 1 of 292)

Re: [Openipmi-developer] [RFC] Patches to disable messages during BMC reset

From: Corey M. <co...@mi...> - 2025-08-16 02:02:50

On Fri, Aug 15, 2025 at 04:23:08PM -0500, Frederick Lawler wrote:
> Hi Corey,
> 
> On Thu, Aug 07, 2025 at 06:02:31PM -0500, Corey Minyard wrote:
> > I went ahead and did some patches for this, since it was on my mind.
> > 
> > With these, if a reset is sent to the BMC, the driver will disable
> > messages to the BMC for a time, defaulting to 30 seconds.  Don't
> > modify message timing, since no messages are allowed, anyway.
> > 
> > If a firmware update command is sent to the BMC, then just reject
> > sysfs commands that query the BMC.  Modify message timing and
> > allow direct messages through the driver interface.
> > 
> > Hopefully this will work around the problem, and it's a good idea,
> > anyway.
> > 
> > -corey
> > 
> 
> Thanks for the patches, and sorry for the delay in response.
> It's one of _those weeks_. Anyway, I backported the patch series
> to 6.12, and the changes seem reasonable to me overall. Ran it
> through our infra on a single node, and nothing seemed to break.
> 
> I did observe with testing that resetting BMC via ipmitool on the host
> did kick out sysfs reads as expected.

Ok, I took the liberty of adding a "Tested-by" line with your name.  If
that's not ok, I can pull it out.

> 
> Resetting the BMC remotely, was not handled (this seems obvious given the state
> changes are handled via ipmi_msg handler). Would the BMC send an event
> to the kernel letting it know its resetting so that case could be
> handled?

Unfortunately not.  It's one of the many things that would be nice to
have...

In general, dealing with a BMC being reset is a real pain.  They tend to
do all kinds of different things.  The worst is when they sort of act
like they are operational, but then do strange things.

I haven't thought of a good general purpose way to handle this.  I'm
toying with the idea of making it so if the BMC gets an error, just shut
things down for a second or so and then test it to see if it's working.
During this time just return errors, like the new patches do during
reset.

Thanks for testing these.

-corey

> 
> Best,
> Fred

Re: [Openipmi-developer] [RFC] Patches to disable messages during BMC reset

From: Frederick L. <fr...@cl...> - 2025-08-15 21:23:17

Hi Corey,

On Thu, Aug 07, 2025 at 06:02:31PM -0500, Corey Minyard wrote:
> I went ahead and did some patches for this, since it was on my mind.
> 
> With these, if a reset is sent to the BMC, the driver will disable
> messages to the BMC for a time, defaulting to 30 seconds.  Don't
> modify message timing, since no messages are allowed, anyway.
> 
> If a firmware update command is sent to the BMC, then just reject
> sysfs commands that query the BMC.  Modify message timing and
> allow direct messages through the driver interface.
> 
> Hopefully this will work around the problem, and it's a good idea,
> anyway.
> 
> -corey
> 

Thanks for the patches, and sorry for the delay in response.
It's one of _those weeks_. Anyway, I backported the patch series
to 6.12, and the changes seem reasonable to me overall. Ran it
through our infra on a single node, and nothing seemed to break.

I did observe with testing that resetting BMC via ipmitool on the host
did kick out sysfs reads as expected.

Resetting the BMC remotely, was not handled (this seems obvious given the state
changes are handled via ipmi_msg handler). Would the BMC send an event
to the kernel letting it know its resetting so that case could be
handled?

Best,
Fred

Re: [Openipmi-developer] [TEST PATCH] ipmi:si: Delay when an error is discovered in error recovery

From: Corey M. <co...@mi...> - 2025-08-14 18:09:48

On Thu, Aug 14, 2025 at 06:23:23PM +0100, Mark Bannister wrote:
> > > Thanks for the bug report and debugging info.  I think I know what is
> > > going on, I've attached a patch that should hopefully fix it.
> > > Basically, it looks like the BMC is alive enough that it sort of
> > > responds to the host, but not alive enough to actually complete a
> > > transaction.  The driver needs to not immediately retry in that case, it
> > > needs to delay a bit.
> > >
> > > It passes all my tests, but the situation you are in would be hard to
> > > manufacture for me.
> > >
> > > Can you try this patch?
> >
> > Thanks for the super quick response, I'll try out this patch and report
> back my findings.
> >
> > Best regards
> > Mark
> 
> The patch looks good.  Without the patch I was able to reproduce the
> problem on kernels 6.6 and 6.12 (but not 6.1) after 5-20 attempts of
> running 'ipmitool mc reset cold' every 2 minutes.  With the patch, I have
> run it 50 times without incident.

Perfect, I'll queue it for the next kernel release.  I can get it into
the current release if it's urgent.

The change that caused this was c608966f3f9c "ipmi: fix msg stack when
IPMI is disconnected" and it came in between 6.1 and 6.6.  I'm adding
the author of that patch because this change may affect that.

In hindsight I think the fix that caused this is wrong.  I'm not sure
how what the author said was happening could happen.  There's a limit
of 100 messages per user.  I am inclined right now to revert that
change.

> The hosed counter isn't as much of an
> indicator as I thought, I saw it in the tens of thousands with and without
> the patch, I have also seen it in the hundreds of thousands without the
> patch and on other hardware I have seen it reach 5 million in one hour
> without the patch (but also without incident).

Yeah, that's just a count of how many issues it has with the BMC.  You
will still see it go up.

-corey

> 
> We will incorporate your patch into our builds so that we avoid hitting
> this problem in production again.
> 
> Best regards
> Mark

Re: [Openipmi-developer] [TEST PATCH] ipmi:si: Delay when an error is discovered in error recovery

From: Mark B. <mba...@ja...> - 2025-08-14 17:23:42

> > Thanks for the bug report and debugging info.  I think I know what is
> > going on, I've attached a patch that should hopefully fix it.
> > Basically, it looks like the BMC is alive enough that it sort of
> > responds to the host, but not alive enough to actually complete a
> > transaction.  The driver needs to not immediately retry in that case, it
> > needs to delay a bit.
> >
> > It passes all my tests, but the situation you are in would be hard to
> > manufacture for me.
> >
> > Can you try this patch?
>
> Thanks for the super quick response, I'll try out this patch and report
back my findings.
>
> Best regards
> Mark

The patch looks good.  Without the patch I was able to reproduce the
problem on kernels 6.6 and 6.12 (but not 6.1) after 5-20 attempts of
running 'ipmitool mc reset cold' every 2 minutes.  With the patch, I have
run it 50 times without incident.  The hosed counter isn't as much of an
indicator as I thought, I saw it in the tens of thousands with and without
the patch, I have also seen it in the hundreds of thousands without the
patch and on other hardware I have seen it reach 5 million in one hour
without the patch (but also without incident).

We will incorporate your patch into our builds so that we avoid hitting
this problem in production again.

Best regards
Mark

Re: [Openipmi-developer] [TEST PATCH] ipmi:si: Delay when an error is discovered in error recovery

From: Mark B. <mba...@ja...> - 2025-08-14 14:21:19

> Thanks for the bug report and debugging info.  I think I know what is
> going on, I've attached a patch that should hopefully fix it.
> Basically, it looks like the BMC is alive enough that it sort of
> responds to the host, but not alive enough to actually complete a
> transaction.  The driver needs to not immediately retry in that case, it
> needs to delay a bit.
>
> It passes all my tests, but the situation you are in would be hard to
> manufacture for me.
>
> Can you try this patch?

Thanks for the super quick response, I'll try out this patch and report
back my findings.

Best regards
Mark

[Openipmi-developer] [TEST PATCH] ipmi:si: Delay when an error is discovered in error recovery

From: Corey M. <co...@mi...> - 2025-08-14 14:00:24

If the BMC is in a state where it is partially responding but not really
there, the driver could go into an infinite loop trying error recovery
over and over.

The device should eventually come back, but we don't want to be
continually retrying.  Add a delay between retries.

Signed-off-by: Corey Minyard <co...@mi...>
---
 drivers/char/ipmi/ipmi_kcs_sm.c  | 4 ++--
 drivers/char/ipmi/ipmi_si_intf.c | 9 +++++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

Thanks for the bug report and debugging info.  I think I know what is
going on, I've attached a patch that should hopefully fix it.
Basically, it looks like the BMC is alive enough that it sort of
responds to the host, but not alive enough to actually complete a
transaction.  The driver needs to not immediately retry in that case, it
needs to delay a bit.

It passes all my tests, but the situation you are in would be hard to
manufacture for me.

Can you try this patch?

-corey


diff --git a/drivers/char/ipmi/ipmi_kcs_sm.c b/drivers/char/ipmi/ipmi_kcs_sm.c
index ecfcb50302f6..20f3611c5444 100644
--- a/drivers/char/ipmi/ipmi_kcs_sm.c
+++ b/drivers/char/ipmi/ipmi_kcs_sm.c
@@ -467,7 +467,7 @@ static enum si_sm_result kcs_event(struct si_sm_data *kcs, long time)
 		if (state != KCS_READ_STATE) {
 			start_error_recovery(kcs,
 					     "Not in read state for error2");
-			break;
+			return SI_SM_CALL_WITH_TICK_DELAY;
 		}
 		if (!check_obf(kcs, status, time))
 			return SI_SM_CALL_WITH_DELAY;
@@ -481,7 +481,7 @@ static enum si_sm_result kcs_event(struct si_sm_data *kcs, long time)
 		if (state != KCS_IDLE_STATE) {
 			start_error_recovery(kcs,
 					     "Not in idle state for error3");
-			break;
+			return SI_SM_CALL_WITH_TICK_DELAY;
 		}
 
 		if (!check_obf(kcs, status, time))
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 8b5524069c15..3f4747ae5ddb 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -790,7 +790,10 @@ static enum si_sm_result smi_event_handler(struct smi_info *smi_info,
 			 */
 			return_hosed_msg(smi_info, IPMI_ERR_UNSPECIFIED);
 		}
-		goto restart;
+		/*
+		 * If the device isn't working, we want a delay before
+		 * trying again.
+		 */
 	}
 
 	/*
@@ -888,15 +891,17 @@ static void flush_messages(void *send_info)
 {
 	struct smi_info *smi_info = send_info;
 	enum si_sm_result result;
+	int loops_left = 10000; /* Don't try forever. */
 
 	/*
 	 * Currently, this function is called only in run-to-completion
 	 * mode.  This means we are single-threaded, no need for locks.
 	 */
 	result = smi_event_handler(smi_info, 0);
-	while (result != SI_SM_IDLE) {
+	while (result != SI_SM_IDLE && loops_left > 0) {
 		udelay(SI_SHORT_TIMEOUT_USEC);
 		result = smi_event_handler(smi_info, SI_SHORT_TIMEOUT_USEC);
+		loops_left--;
 	}
 }
 
-- 
2.43.0

[Openipmi-developer] [BUG] ipmi_si: watchdog: hard LOCKUP in smi_event_handler/kcs_event

From: Mark B. <mba...@ja...> - 2025-08-14 09:16:23

Hi Corey

I crashed a machine on 1st August after issuing 'ipmitool mc reset cold' to
reset a BMC.  I got a crash dump from this event which I have been
analyzing.  The crash occurred when the NMI watchdog detected a hard LOCKUP
in an interrupt handler:

[144482.968722] CPU: 1 PID: 96220 Comm: process-finder Kdump: loaded
Tainted: G        W  O       6.6.93-1.el8.x86_64 #1
[144482.968724] RIP: 0010:port_outb+0x13/0x20 [ipmi_si]
[144482.968735] Code: 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 0f 1f 44 00 00 66 0f af 77 18 89 d0 0f b7 57 28 01 f2 ee
<c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[144482.968736] RSP: 0018:ff626798c007ce50 EFLAGS: 00000002
[144482.968737] RAX: 0000000000000000 RBX: ff2e8eaa120b1c00 RCX:
ff2e8ee87e860640
[144482.968738] RDX: 0000000000000ca2 RSI: 0000000000000000 RDI:
ff2e8ee98e8c0840
[144482.968738] RBP: 0000000000000001 R08: ff2e8ee87e860668 R09:
ff626798c007cf08
[144482.968739] R10: 0000000000000006 R11: 000000000000044d R12:
0000000000000000
[144482.968739] R13: ff2e8ee98e8c0800 R14: ffffffffc27ad210 R15:
ff626798c007cf00
[144482.968740] FS:  00007fffe8bff700(0000) GS:ff2e8ee87e840000(0000)
knlGS:0000000000000000
[144482.968740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[144482.968741] CR2: 00007ffff7ceb528 CR3: 000000047de9e001 CR4:
0000000000771ee0
[144482.968742] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[144482.968742] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7:
0000000000000400
[144482.968743] PKRU: 55555554
[144482.968743] Call Trace:
[144482.968745]  <IRQ>
[144482.968746]  kcs_event+0x253/0x960 [ipmi_si]
[144482.968751]  smi_event_handler+0x5b/0x280 [ipmi_si]
[144482.968756]  smi_timeout+0x3b/0xc0 [ipmi_si]
[144482.968760]  ? __pfx_smi_timeout+0x10/0x10 [ipmi_si]
[144482.968764]  call_timer_fn+0x24/0x130
[144482.968769]  __run_timers.part.0+0x1d8/0x280
[144482.968771]  ? enqueue_hrtimer+0x35/0x90
[144482.968772]  ? __hrtimer_run_queues+0x141/0x2b0
[144482.968772]  ? sched_clock+0xc/0x30
[144482.968775]  run_timer_softirq+0x26/0x50
[144482.968776]  handle_softirqs+0xdd/0x2d0
[144482.968779]  irq_exit_rcu+0xa8/0xd0
[144482.968781]  sysvec_apic_timer_interrupt+0x6e/0x90
[144482.968784]  </IRQ>

I was able to reproduce the crash two days ago (12th August) by running
'ipmitool mc reset cold' in a loop with 2 minute sleeps between on
identical test hardware running the same kernel version, although so far
when I have reproduced the crash I have not been able to get another crash
dump.

# c=0; while :; do ((c+=1)); echo $(date) - $c; ipmitool mc reset cold;
sleep 120; done
Tue 12 Aug 07:02:28 EDT 2025 - 1
Sent cold reset command to MC
Tue 12 Aug 07:04:28 EDT 2025 - 2
Sent cold reset command to MC
Tue 12 Aug 07:06:28 EDT 2025 - 3
Sent cold reset command to MC
Tue 12 Aug 07:08:28 EDT 2025 - 4
Sent cold reset command to MC
Tue 12 Aug 07:10:28 EDT 2025 - 5
Sent cold reset command to MC
Tue 12 Aug 07:12:28 EDT 2025 - 6
Sent cold reset command to MC
Tue 12 Aug 07:14:28 EDT 2025 - 7
Sent cold reset command to MC
Tue 12 Aug 07:16:28 EDT 2025 - 8
Sent cold reset command to MC
Tue 12 Aug 07:18:28 EDT 2025 - 9
Sent cold reset command to MC
Tue 12 Aug 07:20:28 EDT 2025 - 10
Sent cold reset command to MC
Tue 12 Aug 07:22:28 EDT 2025 - 11
Sent cold reset command to MC
Tue 12 Aug 07:24:28 EDT 2025 - 12
Sent cold reset command to MC
Tue 12 Aug 07:26:28 EDT 2025 - 13
Sent cold reset command to MC
EXIT STATUS 255

I have tried (and so far failed) to reproduce the problem on
kernel 6.1.144-1.el8.x86_64, but admittedly I haven't tried very hard yet
so that might not be a reliable data point.

On the reproducer, I was gathering debug data from the ipmi_si module using
'echo 7 > /sys/module/ipmi_si/parameters/kcs_debug' and was running
'journalctl -f' in a terminal window at the time of the crash, where the
terminal buffer is filled up with thousands of lines like this, which were
produced as the BMC was resetting:

Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: ipmi_kcs_sm: kcs hosed: Not in
read state for error2
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: ipmi_kcs_sm: kcs hosed: Not in
read state for error2
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: ipmi_kcs_sm: kcs hosed: Not in
read state for error2
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: ipmi_kcs_sm: kcs hosed: Not in
read state for error2
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 7, c9
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 8, c1
Aug 12 07:27:44 kernel: ipmi_si IPI0001:00: KCS: State = 6, c1

I collected some more debug data from the vmcore file collected on 1st
August:

$ crash --zero_excluded
/usr/lib/debug/lib/modules/6.6.93-1.el8.x86_64/vmlinux vmcore
...
crash> mod -s ipmi_si
     MODULE       NAME                              TEXT_BASE         SIZE
 OBJECT FILE
ffffffffc27dde80  ipmi_si                        ffffffffc27ab000    86016
 /usr/lib/debug/lib/modules/6.6.93-1.el8.x86_64/kernel/drivers/char/ipmi/ipmi_si.ko.debug


crash> struct smi_info 0xff2e8ee98e8c0800
struct smi_info {
  si_num = 0,
  intf = 0xff2e8ee98fbaa000,
  si_sm = 0xff2e8eaa120b1c00,
  handlers = 0xffffffffc27e4240 <kcs_smi_handlers>,
  si_lock = {
    {
      rlock = {
        raw_lock = {
          {
            val = {
              counter = 257
            },
            {
              locked = 1 '\001',
              pending = 1 '\001'
            },
            {
              locked_pending = 257,
              tail = 0
            }
          }
        }
      }
    }
  },
  waiting_msg = 0x0,
  curr_msg = 0x0,
  si_state = SI_NORMAL,
  io = {
    inputb = 0xffffffffc27b1940 <port_inb>,
    outputb = 0xffffffffc27b1970 <port_outb>,
    addr = 0x0,
    regspacing = 1,
    regsize = 1,
    regshift = 0,
    addr_space = IPMI_IO_ADDR_SPACE,
    addr_data = 3234,
    addr_source = SI_ACPI,
    addr_info = {
      acpi_info = {
        acpi_handle = 0xff2e8ee9891e2f30
      }
    },
    io_setup = 0xffffffffc27b1ac0 <ipmi_si_port_setup>,
    io_cleanup = 0xffffffffc27b1a60 <port_cleanup>,
    io_size = 2,
    irq = 0,
    irq_setup = 0x0,
    irq_handler_data = 0x0,
    irq_cleanup = 0x0,
    slave_addr = 32 ' ',
    si_type = SI_KCS,
    dev = 0xff2e8ee98ac6c010
  },
  oem_data_avail_handler = 0x0,
  msg_flags = 0 '\000',
  has_event_buffer = false,
  req_events = {
    counter = 0
  },
  run_to_completion = false,
  si_timer = {
    entry = {
      next = 0xdead000000000122,
      pprev = 0x0
    },
    expires = 4439136548,
    function = 0xffffffffc27ad210 <smi_timeout>,
    flags = 155189249
  },
  timer_can_start = true,
  timer_running = true,
  last_timeout_jiffies = 4439136547,
  need_watch = {
    counter = 0
  },
  interrupt_disabled = true,
  supports_event_msg_buff = false,
  cannot_disable_irq = false,
  irq_enable_broken = false,
  in_maintenance_mode = true,
  got_attn = false,
  device_id = {
    device_id = 32 ' ',
    device_revision = 2 '\002',
    firmware_revision_1 = 1 '\001',
    firmware_revision_2 = 0 '\000',
    ipmi_version = 2 '\002',
    additional_device_support = 191 '\277',
    manufacturer_id = 10876,
    product_id = 7496,
    aux_firmware_revision = "!\001\000 ",
    aux_firmware_revision_set = 1
  },
  dev_group_added = true,
  stats = {{
      counter = 13470
    }, {
      counter = 1809
    }, {
      counter = 358202
    }, {
      counter = 0
    }, {
      counter = 0
    }, {
      counter = 0
    }, {
      counter = 24503
    }, {
      counter = 357924
    }, {
      counter = 0
    }, {
      counter = 0
    }, {
      counter = 0
    }},
  thread = 0xff2e8eaa82124100,
  link = {
    next = 0xffffffffc27dd780 <smi_infos>,
    prev = 0xffffffffc27dd780 <smi_infos>
  }
}

crash> struct si_sm_data 0xff2e8eaa120b1c00
struct si_sm_data {
  state = KCS_ERROR1,
  io = 0xff2e8ee98e8c0840,
  write_data = "\030\001\003\001\000\000&\030@\000\000\000\000\000\000\000\330\002\000\000\000\000\000\000\330\002\000\000\000\000\000\000\b\000\000\000\000\000\000\000\003\000\000\000\004\000\000\000\030\003\000\000\000\000\000\000\030\003\000\000\000\000\000\000\030\003\000\000\000\000\000\000\034\000\000\000\000\000\000\000\034\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\004\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\n\000\000\000\000\000\000\000\n\000\000\000\000\000\000\000\020\000\000\000\000\000\000\001\000\000\000\005\000\000\000\000\020\000\000\000\000\000\000\000\020\000\000\000\000\000\000\000\020\000\000\000\000\000\000E\005\000\000\000\000\000\000E\005\000\000\000\000\000\000\000\020\000\000\000\000\000\000\001\000\000\000\004\000\000\000\000
\000\000\000\000\000\000\000 \000\000\000\000\000\000\000
\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000"...,
  write_pos = 0,
  write_count = 0,
  orig_write_count = 0,
  read_data = "\034\002\000 \002\001\000\002\277|*\000H\035!\001\000
\034\000@SDA Temp\000\a-C\374\177\200KF\000\000\006\000\000\000@
-\000\000\000\000\000\000@=\000\000\000\000\000\000@=\000\000\000\000\000\000
\002\000\000\000\000\000\000
\002\000\000\000\000\000\000\b\000\000\000\000\000\000\000\004\000\000\000\004\000\000\000\070\003\000\000\000\000\000\000\070\003\000\000\000\000\000\000\070\003\000\000\000\000\000\000
\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\b\000\000\000\000\000\000\000\004\000\000\000\004\000\000\000X\003\000\000\000\000\000\000X\003\000\000\000\000\000\000X\003\000\000\000\000\000\000D\000\000\000\000\000\000\000D\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000S\345td\004\000\000\000pz\350\320\023u\023\376\070\003\000\000\000\000\000\000\070\003\000\000\000\000\000\000
\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\b\000\000\000\000\000\000\000"...,
  read_pos = 0,
  truncated = 0,
  error_retries = 6,
  ibf_timeout = 5000000,
  obf_timeout = 5000000,
  error0_timeout = 4439151592
}
crash>

>From the above it looks like, at the time of the crash, the state machine
was at KCS_ERROR1 (si_sm_data.state) having at that moment in time handled
6 retries (si_sm_data.error_retries), but having a hosed counter of 24,503
(smi_info.stats[6]).

Looking in the smi_event_handler code, I wasn't immediately sure whether a
result of SI_SM_HOSED would cause the interrupt handler to keep looping
around and not allow other interrupts to fire, but the symptoms might
suggest that?  Although if that was the case I'm surprised we haven't seen
the problem more often, we have lots of machines.

My presumption was that this:

[144482.968724] RIP: 0010:port_outb+0x13/0x20 [ipmi_si]
[144482.968735] Code: 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 0f 1f 44 00 00 66 0f af 77 18 89 d0 0f b7 57 28 01 f2 ee
<c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90

... as well as the thousands of state transitions I saw when reproducing
the problem with debug output, and the hosed counter being very high in the
vmcore, suggested that it wasn't actually stuck at a ret instruction (c3)
in port_outb, but that's just where RIP was at the point the crash was
taken.

Happy to collect more info from the vmcore as needed or test patches etc.

Best regards
Mark

[Openipmi-developer] [PATCH v9 3/3] ipmi: Add Loongson-2K BMC support

From: Binbin Z. <zho...@lo...> - 2025-08-12 12:00:23

This patch adds Loongson-2K BMC IPMI support.

According to the existing design, we use software simulation to
implement the KCS interface registers: Stauts/Command/Data_Out/Data_In.

Also since both host side and BMC side read and write kcs status, fifo flag
is used to ensure data consistency.

The single KCS message block is as follows:

+-------------------------------------------------------------------------+
|FIFO flags| KCS register data | CMD data | KCS version | WR REQ | WR ACK |
+-------------------------------------------------------------------------+

Co-developed-by: Chong Qiao <qia...@lo...>
Signed-off-by: Chong Qiao <qia...@lo...>
Reviewed-by: Huacai Chen <che...@lo...>
Acked-by: Corey Minyard <co...@mi...>
Signed-off-by: Binbin Zhou <zho...@lo...>
---
 MAINTAINERS                      |   1 +
 drivers/char/ipmi/Kconfig        |   7 ++
 drivers/char/ipmi/Makefile       |   1 +
 drivers/char/ipmi/ipmi_si.h      |   7 ++
 drivers/char/ipmi/ipmi_si_intf.c |   4 +
 drivers/char/ipmi/ipmi_si_ls2k.c | 189 +++++++++++++++++++++++++++++++
 6 files changed, 209 insertions(+)
 create mode 100644 drivers/char/ipmi/ipmi_si_ls2k.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d50b2c3b2bb8..ce1fdc47e9f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14210,6 +14210,7 @@ LOONGSON-2K Board Management Controller (BMC) DRIVER
 M:	Binbin Zhou <zho...@lo...>
 M:	Chong Qiao <qia...@lo...>
 S:	Maintained
+F:	drivers/char/ipmi/ipmi_si_ls2k.c
 F:	drivers/mfd/ls2k-bmc-core.c
 
 LOONGSON EDAC DRIVER
diff --git a/drivers/char/ipmi/Kconfig b/drivers/char/ipmi/Kconfig
index f4adc6feb3b2..92bed266d07c 100644
--- a/drivers/char/ipmi/Kconfig
+++ b/drivers/char/ipmi/Kconfig
@@ -84,6 +84,13 @@ config IPMI_IPMB
 	  bus, and it also supports direct messaging on the bus using
 	  IPMB direct messages.  This module requires I2C support.
 
+config IPMI_LS2K
+	bool 'Loongson-2K IPMI interface'
+	depends on LOONGARCH
+	select MFD_LS2K_BMC_CORE
+	help
+	  Provides a driver for Loongson-2K IPMI interfaces.
+
 config IPMI_POWERNV
 	depends on PPC_POWERNV
 	tristate 'POWERNV (OPAL firmware) IPMI interface'
diff --git a/drivers/char/ipmi/Makefile b/drivers/char/ipmi/Makefile
index e0944547c9d0..4ea450a82242 100644
--- a/drivers/char/ipmi/Makefile
+++ b/drivers/char/ipmi/Makefile
@@ -8,6 +8,7 @@ ipmi_si-y := ipmi_si_intf.o ipmi_kcs_sm.o ipmi_smic_sm.o ipmi_bt_sm.o \
 	ipmi_si_mem_io.o
 ipmi_si-$(CONFIG_HAS_IOPORT) += ipmi_si_port_io.o
 ipmi_si-$(CONFIG_PCI) += ipmi_si_pci.o
+ipmi_si-$(CONFIG_IPMI_LS2K) += ipmi_si_ls2k.o
 ipmi_si-$(CONFIG_PARISC) += ipmi_si_parisc.o
 
 obj-$(CONFIG_IPMI_HANDLER) += ipmi_msghandler.o
diff --git a/drivers/char/ipmi/ipmi_si.h b/drivers/char/ipmi/ipmi_si.h
index 508c3fd45877..687835b53da5 100644
--- a/drivers/char/ipmi/ipmi_si.h
+++ b/drivers/char/ipmi/ipmi_si.h
@@ -101,6 +101,13 @@ void ipmi_si_pci_shutdown(void);
 static inline void ipmi_si_pci_init(void) { }
 static inline void ipmi_si_pci_shutdown(void) { }
 #endif
+#ifdef CONFIG_IPMI_LS2K
+void ipmi_si_ls2k_init(void);
+void ipmi_si_ls2k_shutdown(void);
+#else
+static inline void ipmi_si_ls2k_init(void) { }
+static inline void ipmi_si_ls2k_shutdown(void) { }
+#endif
 #ifdef CONFIG_PARISC
 void ipmi_si_parisc_init(void);
 void ipmi_si_parisc_shutdown(void);
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index bb42dfe1c6a8..9c38aca16fd0 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -2121,6 +2121,8 @@ static int __init init_ipmi_si(void)
 
 	ipmi_si_pci_init();
 
+	ipmi_si_ls2k_init();
+
 	ipmi_si_parisc_init();
 
 	mutex_lock(&smi_infos_lock);
@@ -2335,6 +2337,8 @@ static void cleanup_ipmi_si(void)
 
 	ipmi_si_pci_shutdown();
 
+	ipmi_si_ls2k_shutdown();
+
 	ipmi_si_parisc_shutdown();
 
 	ipmi_si_platform_shutdown();
diff --git a/drivers/char/ipmi/ipmi_si_ls2k.c b/drivers/char/ipmi/ipmi_si_ls2k.c
new file mode 100644
index 000000000000..45442c257efd
--- /dev/null
+++ b/drivers/char/ipmi/ipmi_si_ls2k.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Driver for Loongson-2K BMC IPMI interface
+ *
+ * Copyright (C) 2024-2025 Loongson Technology Corporation Limited.
+ *
+ * Authors:
+ *	Chong Qiao <qia...@lo...>
+ *	Binbin Zhou <zho...@lo...>
+ */
+
+#include <linux/bitfield.h>
+#include <linux/ioport.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include "ipmi_si.h"
+
+#define LS2K_KCS_FIFO_IBFH	0x0
+#define LS2K_KCS_FIFO_IBFT	0x1
+#define LS2K_KCS_FIFO_OBFH	0x2
+#define LS2K_KCS_FIFO_OBFT	0x3
+
+/* KCS registers */
+#define LS2K_KCS_REG_STS	0x4
+#define LS2K_KCS_REG_DATA_OUT	0x5
+#define LS2K_KCS_REG_DATA_IN	0x6
+#define LS2K_KCS_REG_CMD	0x8
+
+#define LS2K_KCS_CMD_DATA	0xa
+#define LS2K_KCS_VERSION	0xb
+#define LS2K_KCS_WR_REQ		0xc
+#define LS2K_KCS_WR_ACK		0x10
+
+#define LS2K_KCS_STS_OBF	BIT(0)
+#define LS2K_KCS_STS_IBF	BIT(1)
+#define LS2K_KCS_STS_SMS_ATN	BIT(2)
+#define LS2K_KCS_STS_CMD	BIT(3)
+
+#define LS2K_KCS_DATA_MASK	(LS2K_KCS_STS_OBF | LS2K_KCS_STS_IBF | LS2K_KCS_STS_CMD)
+
+static bool ls2k_registered;
+
+static unsigned char ls2k_mem_inb_v0(const struct si_sm_io *io, unsigned int offset)
+{
+	void __iomem *addr = io->addr;
+	int reg_offset;
+
+	if (offset & BIT(0)) {
+		reg_offset = LS2K_KCS_REG_STS;
+	} else {
+		writeb(readb(addr + LS2K_KCS_REG_STS) & ~LS2K_KCS_STS_OBF, addr + LS2K_KCS_REG_STS);
+		reg_offset = LS2K_KCS_REG_DATA_OUT;
+	}
+
+	return readb(addr + reg_offset);
+}
+
+static unsigned char ls2k_mem_inb_v1(const struct si_sm_io *io, unsigned int offset)
+{
+	void __iomem *addr = io->addr;
+	unsigned char inb = 0, cmd;
+	bool obf, ibf;
+
+	obf = readb(addr + LS2K_KCS_FIFO_OBFH) ^ readb(addr + LS2K_KCS_FIFO_OBFT);
+	ibf = readb(addr + LS2K_KCS_FIFO_IBFH) ^ readb(addr + LS2K_KCS_FIFO_IBFT);
+	cmd = readb(addr + LS2K_KCS_CMD_DATA);
+
+	if (offset & BIT(0)) {
+		inb = readb(addr + LS2K_KCS_REG_STS) & ~LS2K_KCS_DATA_MASK;
+		inb |= FIELD_PREP(LS2K_KCS_STS_OBF, obf)
+		    | FIELD_PREP(LS2K_KCS_STS_IBF, ibf)
+		    | FIELD_PREP(LS2K_KCS_STS_CMD, cmd);
+	} else {
+		inb = readb(addr + LS2K_KCS_REG_DATA_OUT);
+		writeb(readb(addr + LS2K_KCS_FIFO_OBFH), addr + LS2K_KCS_FIFO_OBFT);
+	}
+
+	return inb;
+}
+
+static void ls2k_mem_outb_v0(const struct si_sm_io *io, unsigned int offset,
+			     unsigned char val)
+{
+	void __iomem *addr = io->addr;
+	unsigned char sts = readb(addr + LS2K_KCS_REG_STS);
+	int reg_offset;
+
+	if (sts & LS2K_KCS_STS_IBF)
+		return;
+
+	if (offset & BIT(0)) {
+		reg_offset = LS2K_KCS_REG_CMD;
+		sts |= LS2K_KCS_STS_CMD;
+	} else {
+		reg_offset = LS2K_KCS_REG_DATA_IN;
+		sts &= ~LS2K_KCS_STS_CMD;
+	}
+
+	writew(val, addr + reg_offset);
+	writeb(sts | LS2K_KCS_STS_IBF, addr + LS2K_KCS_REG_STS);
+	writel(readl(addr + LS2K_KCS_WR_REQ) + 1, addr + LS2K_KCS_WR_REQ);
+}
+
+static void ls2k_mem_outb_v1(const struct si_sm_io *io, unsigned int offset,
+			     unsigned char val)
+{
+	void __iomem *addr = io->addr;
+	unsigned char ibfh, ibft;
+	int reg_offset;
+
+	ibfh = readb(addr + LS2K_KCS_FIFO_IBFH);
+	ibft = readb(addr + LS2K_KCS_FIFO_IBFT);
+
+	if (ibfh ^ ibft)
+		return;
+
+	reg_offset = (offset & BIT(0)) ? LS2K_KCS_REG_CMD : LS2K_KCS_REG_DATA_IN;
+	writew(val, addr + reg_offset);
+
+	writeb(offset & BIT(0), addr + LS2K_KCS_CMD_DATA);
+	writeb(!ibft, addr + LS2K_KCS_FIFO_IBFH);
+	writel(readl(addr + LS2K_KCS_WR_REQ) + 1, addr + LS2K_KCS_WR_REQ);
+}
+
+static void ls2k_mem_cleanup(struct si_sm_io *io)
+{
+	if (io->addr)
+		iounmap(io->addr);
+}
+
+static int ipmi_ls2k_mem_setup(struct si_sm_io *io)
+{
+	unsigned char version;
+
+	io->addr = ioremap(io->addr_data, io->regspacing);
+	if (!io->addr)
+		return -EIO;
+
+	version = readb(io->addr + LS2K_KCS_VERSION);
+
+	io->inputb = version ? ls2k_mem_inb_v1 : ls2k_mem_inb_v0;
+	io->outputb = version ? ls2k_mem_outb_v1 : ls2k_mem_outb_v0;
+	io->io_cleanup = ls2k_mem_cleanup;
+
+	return 0;
+}
+
+static int ipmi_ls2k_probe(struct platform_device *pdev)
+{
+	struct si_sm_io io;
+
+	memset(&io, 0, sizeof(io));
+
+	io.si_info	= &ipmi_kcs_si_info;
+	io.io_setup	= ipmi_ls2k_mem_setup;
+	io.addr_data	= pdev->resource[0].start;
+	io.regspacing	= resource_size(&pdev->resource[0]);
+	io.dev		= &pdev->dev;
+
+	dev_dbg(&pdev->dev, "addr 0x%lx, spacing %d.\n", io.addr_data, io.regspacing);
+
+	return ipmi_si_add_smi(&io);
+}
+
+static void ipmi_ls2k_remove(struct platform_device *pdev)
+{
+	ipmi_si_remove_by_dev(&pdev->dev);
+}
+
+struct platform_driver ipmi_ls2k_platform_driver = {
+	.driver = {
+		.name = "ls2k-ipmi-si",
+	},
+	.probe	= ipmi_ls2k_probe,
+	.remove	= ipmi_ls2k_remove,
+};
+
+void ipmi_si_ls2k_init(void)
+{
+	platform_driver_register(&ipmi_ls2k_platform_driver);
+	ls2k_registered = true;
+}
+
+void ipmi_si_ls2k_shutdown(void)
+{
+	if (ls2k_registered)
+		platform_driver_unregister(&ipmi_ls2k_platform_driver);
+}
-- 
2.47.3

[Openipmi-developer] [PATCH v9 1/3] mfd: ls2kbmc: Introduce Loongson-2K BMC core driver

From: Binbin Z. <zho...@lo...> - 2025-08-12 12:00:14

The Loongson-2K Board Management Controller provides an PCIe interface
to the host to access the feature implemented in the BMC.

The BMC is assembled on a server similar to the server machine with
Loongson-3 CPU. It supports multiple sub-devices like DRM and IPMI.

Co-developed-by: Chong Qiao <qia...@lo...>
Signed-off-by: Chong Qiao <qia...@lo...>
Reviewed-by: Huacai Chen <che...@lo...>
Acked-by: Corey Minyard <co...@mi...>
Signed-off-by: Binbin Zhou <zho...@lo...>
---
 MAINTAINERS                 |   6 ++
 drivers/mfd/Kconfig         |  13 +++
 drivers/mfd/Makefile        |   2 +
 drivers/mfd/ls2k-bmc-core.c | 189 ++++++++++++++++++++++++++++++++++++
 4 files changed, 210 insertions(+)
 create mode 100644 drivers/mfd/ls2k-bmc-core.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0f84051ef044..d50b2c3b2bb8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14206,6 +14206,12 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/thermal/loongson,ls2k-thermal.yaml
 F:	drivers/thermal/loongson2_thermal.c
 
+LOONGSON-2K Board Management Controller (BMC) DRIVER
+M:	Binbin Zhou <zho...@lo...>
+M:	Chong Qiao <qia...@lo...>
+S:	Maintained
+F:	drivers/mfd/ls2k-bmc-core.c
+
 LOONGSON EDAC DRIVER
 M:	Zhao Qunqin <zha...@lo...>
 L:	lin...@vg...
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 425c5fba6cb1..55fbeba2ca33 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -2428,6 +2428,19 @@ config MFD_INTEL_M10_BMC_PMCI
 	  additional drivers must be enabled in order to use the functionality
 	  of the device.
 
+config MFD_LS2K_BMC_CORE
+	bool "Loongson-2K Board Management Controller Support"
+	depends on PCI && ACPI_GENERIC_GSI
+	select MFD_CORE
+	help
+	  Say yes here to add support for the Loongson-2K BMC which is a Board
+	  Management Controller connected to the PCIe bus. The device supports
+	  multiple sub-devices like display and IPMI. This driver provides common
+	  support for accessing the devices.
+
+	  The display is enabled by default in the driver, while the IPMI interface
+	  is enabled independently through the IPMI_LS2K option in the IPMI section.
+
 config MFD_QNAP_MCU
 	tristate "QNAP microcontroller unit core driver"
 	depends on SERIAL_DEV_BUS
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index f7bdedd5a66d..a950e670efba 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -286,6 +286,8 @@ obj-$(CONFIG_MFD_INTEL_M10_BMC_CORE)   += intel-m10-bmc-core.o
 obj-$(CONFIG_MFD_INTEL_M10_BMC_SPI)    += intel-m10-bmc-spi.o
 obj-$(CONFIG_MFD_INTEL_M10_BMC_PMCI)   += intel-m10-bmc-pmci.o
 
+obj-$(CONFIG_MFD_LS2K_BMC_CORE)		+= ls2k-bmc-core.o
+
 obj-$(CONFIG_MFD_ATC260X)	+= atc260x-core.o
 obj-$(CONFIG_MFD_ATC260X_I2C)	+= atc260x-i2c.o
 
diff --git a/drivers/mfd/ls2k-bmc-core.c b/drivers/mfd/ls2k-bmc-core.c
new file mode 100644
index 000000000000..39cc481d9ba1
--- /dev/null
+++ b/drivers/mfd/ls2k-bmc-core.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Loongson-2K Board Management Controller (BMC) Core Driver.
+ *
+ * Copyright (C) 2024-2025 Loongson Technology Corporation Limited.
+ *
+ * Authors:
+ *	Chong Qiao <qia...@lo...>
+ *	Binbin Zhou <zho...@lo...>
+ */
+
+#include <linux/aperture.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mfd/core.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/platform_data/simplefb.h>
+#include <linux/platform_device.h>
+
+/* LS2K BMC resources */
+#define LS2K_DISPLAY_RES_START		(SZ_16M + SZ_2M)
+#define LS2K_IPMI_RES_SIZE		0x1C
+#define LS2K_IPMI0_RES_START		(SZ_16M + 0xF00000)
+#define LS2K_IPMI1_RES_START		(LS2K_IPMI0_RES_START + LS2K_IPMI_RES_SIZE)
+#define LS2K_IPMI2_RES_START		(LS2K_IPMI1_RES_START + LS2K_IPMI_RES_SIZE)
+#define LS2K_IPMI3_RES_START		(LS2K_IPMI2_RES_START + LS2K_IPMI_RES_SIZE)
+#define LS2K_IPMI4_RES_START		(LS2K_IPMI3_RES_START + LS2K_IPMI_RES_SIZE)
+
+enum {
+	LS2K_BMC_DISPLAY,
+	LS2K_BMC_IPMI0,
+	LS2K_BMC_IPMI1,
+	LS2K_BMC_IPMI2,
+	LS2K_BMC_IPMI3,
+	LS2K_BMC_IPMI4,
+};
+
+static struct resource ls2k_display_resources[] = {
+	DEFINE_RES_MEM_NAMED(LS2K_DISPLAY_RES_START, SZ_4M, "simpledrm-res"),
+};
+
+static struct resource ls2k_ipmi0_resources[] = {
+	DEFINE_RES_MEM_NAMED(LS2K_IPMI0_RES_START, LS2K_IPMI_RES_SIZE, "ipmi0-res"),
+};
+
+static struct resource ls2k_ipmi1_resources[] = {
+	DEFINE_RES_MEM_NAMED(LS2K_IPMI1_RES_START, LS2K_IPMI_RES_SIZE, "ipmi1-res"),
+};
+
+static struct resource ls2k_ipmi2_resources[] = {
+	DEFINE_RES_MEM_NAMED(LS2K_IPMI2_RES_START, LS2K_IPMI_RES_SIZE, "ipmi2-res"),
+};
+
+static struct resource ls2k_ipmi3_resources[] = {
+	DEFINE_RES_MEM_NAMED(LS2K_IPMI3_RES_START, LS2K_IPMI_RES_SIZE, "ipmi3-res"),
+};
+
+static struct resource ls2k_ipmi4_resources[] = {
+	DEFINE_RES_MEM_NAMED(LS2K_IPMI4_RES_START, LS2K_IPMI_RES_SIZE, "ipmi4-res"),
+};
+
+static struct mfd_cell ls2k_bmc_cells[] = {
+	[LS2K_BMC_DISPLAY] = {
+		.name = "simple-framebuffer",
+		.num_resources = ARRAY_SIZE(ls2k_display_resources),
+		.resources = ls2k_display_resources
+	},
+	[LS2K_BMC_IPMI0] = {
+		.name = "ls2k-ipmi-si",
+		.num_resources = ARRAY_SIZE(ls2k_ipmi0_resources),
+		.resources = ls2k_ipmi0_resources
+	},
+	[LS2K_BMC_IPMI1] = {
+		.name = "ls2k-ipmi-si",
+		.num_resources = ARRAY_SIZE(ls2k_ipmi1_resources),
+		.resources = ls2k_ipmi1_resources
+	},
+	[LS2K_BMC_IPMI2] = {
+		.name = "ls2k-ipmi-si",
+		.num_resources = ARRAY_SIZE(ls2k_ipmi2_resources),
+		.resources = ls2k_ipmi2_resources
+	},
+	[LS2K_BMC_IPMI3] = {
+		.name = "ls2k-ipmi-si",
+		.num_resources = ARRAY_SIZE(ls2k_ipmi3_resources),
+		.resources = ls2k_ipmi3_resources
+	},
+	[LS2K_BMC_IPMI4] = {
+		.name = "ls2k-ipmi-si",
+		.num_resources = ARRAY_SIZE(ls2k_ipmi4_resources),
+		.resources = ls2k_ipmi4_resources
+	},
+};
+
+/*
+ * Currently the Loongson-2K BMC hardware does not have an I2C interface to adapt to the
+ * resolution. We set the resolution by presetting "video=1280x1024-16@2M" to the BMC memory.
+ */
+static int ls2k_bmc_parse_mode(struct pci_dev *pdev, struct simplefb_platform_data *pd)
+{
+	char *mode;
+	int depth, ret;
+
+	/* The last 16M of PCI BAR0 is used to store the resolution string. */
+	mode = devm_ioremap(&pdev->dev, pci_resource_start(pdev, 0) + SZ_16M, SZ_16M);
+	if (!mode)
+		return -ENOMEM;
+
+	/* The resolution field starts with the flag "video=". */
+	if (!strncmp(mode, "video=", 6))
+		mode = mode + 6;
+
+	ret = kstrtoint(strsep(&mode, "x"), 10, &pd->width);
+	if (ret)
+		return ret;
+
+	ret = kstrtoint(strsep(&mode, "-"), 10, &pd->height);
+	if (ret)
+		return ret;
+
+	ret = kstrtoint(strsep(&mode, "@"), 10, &depth);
+	if (ret)
+		return ret;
+
+	pd->stride = pd->width * depth / 8;
+	pd->format = depth == 32 ? "a8r8g8b8" : "r5g6b5";
+
+	return 0;
+}
+
+static int ls2k_bmc_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+	struct simplefb_platform_data pd;
+	resource_size_t base;
+	int ret;
+
+	ret = pci_enable_device(dev);
+	if (ret)
+		return ret;
+
+	ret = ls2k_bmc_parse_mode(dev, &pd);
+	if (ret)
+		goto disable_pci;
+
+	ls2k_bmc_cells[LS2K_BMC_DISPLAY].platform_data = &pd;
+	ls2k_bmc_cells[LS2K_BMC_DISPLAY].pdata_size = sizeof(pd);
+	base = dev->resource[0].start + LS2K_DISPLAY_RES_START;
+
+	/* Remove conflicting efifb device */
+	ret = aperture_remove_conflicting_devices(base, SZ_4M, "simple-framebuffer");
+	if (ret) {
+		dev_err(&dev->dev, "Failed to removed firmware framebuffers: %d\n", ret);
+		goto disable_pci;
+	}
+
+	return devm_mfd_add_devices(&dev->dev, PLATFORM_DEVID_AUTO,
+				    ls2k_bmc_cells, ARRAY_SIZE(ls2k_bmc_cells),
+				    &dev->resource[0], 0, NULL);
+
+disable_pci:
+	pci_disable_device(dev);
+	return ret;
+}
+
+static void ls2k_bmc_remove(struct pci_dev *dev)
+{
+	pci_disable_device(dev);
+}
+
+static struct pci_device_id ls2k_bmc_devices[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_LOONGSON, 0x1a05) },
+	{ }
+};
+MODULE_DEVICE_TABLE(pci, ls2k_bmc_devices);
+
+static struct pci_driver ls2k_bmc_driver = {
+	.name = "ls2k-bmc",
+	.id_table = ls2k_bmc_devices,
+	.probe = ls2k_bmc_probe,
+	.remove = ls2k_bmc_remove,
+};
+module_pci_driver(ls2k_bmc_driver);
+
+MODULE_DESCRIPTION("Loongson-2K Board Management Controller (BMC) Core driver");
+MODULE_AUTHOR("Loongson Technology Corporation Limited");
+MODULE_LICENSE("GPL");
-- 
2.47.3

[Openipmi-developer] [PATCH v9 2/3] mfd: ls2kbmc: Add Loongson-2K BMC reset function support

From: Binbin Z. <zho...@lo...> - 2025-08-12 12:00:14

Since the display is a sub-function of the Loongson-2K BMC, when the
BMC reset, the entire BMC PCIe is disconnected, including the display
which is interrupted.

Quick overview of the entire LS2K BMC reset process:

There are two types of reset methods: soft reset (BMC-initiated reboot
of IPMI reset command) and BMC watchdog reset (watchdog timeout).

First, regardless of the method, an interrupt is generated (PCIe interrupt
for soft reset/GPIO interrupt for watchdog reset);

Second, during the interrupt process, the system enters bmc_reset_work,
clears the bus/IO/mem resources of the LS7A PCI-E bridge, waits for the BMC
reset to begin, then restores the parent device's PCI configuration space,
waits for the BMC reset to complete, and finally restores the BMC PCI
configuration space.

Display restoration occurs last.

Co-developed-by: Chong Qiao <qia...@lo...>
Signed-off-by: Chong Qiao <qia...@lo...>
Reviewed-by: Huacai Chen <che...@lo...>
Acked-by: Corey Minyard <co...@mi...>
Signed-off-by: Binbin Zhou <zho...@lo...>
---
 drivers/mfd/ls2k-bmc-core.c | 336 ++++++++++++++++++++++++++++++++++++
 1 file changed, 336 insertions(+)

diff --git a/drivers/mfd/ls2k-bmc-core.c b/drivers/mfd/ls2k-bmc-core.c
index 39cc481d9ba1..ec94526628aa 100644
--- a/drivers/mfd/ls2k-bmc-core.c
+++ b/drivers/mfd/ls2k-bmc-core.c
@@ -10,8 +10,12 @@
  */
 
 #include <linux/aperture.h>
+#include <linux/bitfield.h>
+#include <linux/delay.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/iopoll.h>
+#include <linux/kbd_kern.h>
 #include <linux/kernel.h>
 #include <linux/mfd/core.h>
 #include <linux/module.h>
@@ -19,6 +23,8 @@
 #include <linux/pci_ids.h>
 #include <linux/platform_data/simplefb.h>
 #include <linux/platform_device.h>
+#include <linux/stop_machine.h>
+#include <linux/vt_kern.h>
 
 /* LS2K BMC resources */
 #define LS2K_DISPLAY_RES_START		(SZ_16M + SZ_2M)
@@ -29,6 +35,48 @@
 #define LS2K_IPMI3_RES_START		(LS2K_IPMI2_RES_START + LS2K_IPMI_RES_SIZE)
 #define LS2K_IPMI4_RES_START		(LS2K_IPMI3_RES_START + LS2K_IPMI_RES_SIZE)
 
+#define LS7A_PCI_CFG_SIZE		0x100
+
+/* LS7A bridge registers */
+#define LS7A_PCIE_PORT_CTL0		0x0
+#define LS7A_PCIE_PORT_STS1		0xC
+#define LS7A_GEN2_CTL			0x80C
+#define LS7A_SYMBOL_TIMER		0x71C
+
+/* Bits of LS7A_PCIE_PORT_CTL0 */
+#define LS2K_BMC_PCIE_LTSSM_ENABLE	BIT(3)
+
+/* Bits of LS7A_PCIE_PORT_STS1 */
+#define LS2K_BMC_PCIE_LTSSM_STS		GENMASK(5, 0)
+#define LS2K_BMC_PCIE_CONNECTED		0x11
+
+#define LS2K_BMC_PCIE_DELAY_US		1000
+#define LS2K_BMC_PCIE_TIMEOUT_US	1000000
+
+/* Bits of LS7A_GEN2_CTL */
+#define LS7A_GEN2_SPEED_CHANG		BIT(17)
+#define LS7A_CONF_PHY_TX		BIT(18)
+
+/* Bits of LS7A_SYMBOL_TIMER */
+#define LS7A_MASK_LEN_MATCH		BIT(26)
+
+/* Interval between interruptions */
+#define LS2K_BMC_INT_INTERVAL		(60 * HZ)
+
+/* Maximum time to wait for U-Boot and DDR to be ready with ms. */
+#define LS2K_BMC_RESET_WAIT_TIME	10000
+
+/* It's an experience value */
+#define LS7A_BAR0_CHECK_MAX_TIMES	2000
+
+#define LS2K_BMC_RESET_GPIO		14
+#define LOONGSON_GPIO_REG_BASE		0x1FE00500
+#define LOONGSON_GPIO_REG_SIZE		0x18
+#define LOONGSON_GPIO_OEN		0x0
+#define LOONGSON_GPIO_FUNC		0x4
+#define LOONGSON_GPIO_INTPOL		0x10
+#define LOONGSON_GPIO_INTEN		0x14
+
 enum {
 	LS2K_BMC_DISPLAY,
 	LS2K_BMC_IPMI0,
@@ -95,6 +143,281 @@ static struct mfd_cell ls2k_bmc_cells[] = {
 	},
 };
 
+/* Index of the BMC PCI configuration space to be restored at BMC reset. */
+struct ls2k_bmc_pci_data {
+	u32 pci_command;
+	u32 base_address0;
+	u32 interrupt_line;
+};
+
+/* Index of the parent PCI configuration space to be restored at BMC reset. */
+struct ls2k_bmc_bridge_pci_data {
+	u32 pci_command;
+	u32 base_address[6];
+	u32 rom_addreess;
+	u32 interrupt_line;
+	u32 msi_hi;
+	u32 msi_lo;
+	u32 devctl;
+	u32 linkcap;
+	u32 linkctl_sts;
+	u32 symbol_timer;
+	u32 gen2_ctrl;
+};
+
+struct ls2k_bmc_pdata {
+	struct device *dev;
+	struct work_struct bmc_reset_work;
+	struct ls2k_bmc_pci_data bmc_pci_data;
+	struct ls2k_bmc_bridge_pci_data bridge_pci_data;
+};
+
+static bool ls2k_bmc_bar0_addr_is_set(struct pci_dev *pdev)
+{
+	u32 addr;
+
+	pci_read_config_dword(pdev, PCI_BASE_ADDRESS_0, &addr);
+
+	return addr & PCI_BASE_ADDRESS_MEM_MASK ? true : false;
+}
+
+static bool ls2k_bmc_pcie_is_connected(struct pci_dev *parent, struct ls2k_bmc_pdata *ddata)
+{
+	void __iomem *base;
+	int val, ret;
+
+	base = pci_iomap(parent, 0, LS7A_PCI_CFG_SIZE);
+	if (!base)
+		return false;
+
+	val = readl(base + LS7A_PCIE_PORT_CTL0);
+	writel(val | LS2K_BMC_PCIE_LTSSM_ENABLE, base + LS7A_PCIE_PORT_CTL0);
+
+	ret = readl_poll_timeout_atomic(base + LS7A_PCIE_PORT_STS1, val,
+					(val & LS2K_BMC_PCIE_LTSSM_STS) == LS2K_BMC_PCIE_CONNECTED,
+					LS2K_BMC_PCIE_DELAY_US, LS2K_BMC_PCIE_TIMEOUT_US);
+	if (ret) {
+		pci_iounmap(parent, base);
+		dev_err(ddata->dev, "PCI-E training failed status=0x%x\n", val);
+		return false;
+	}
+
+	pci_iounmap(parent, base);
+	return true;
+}
+
+static void ls2k_bmc_restore_bridge_pci_data(struct pci_dev *parent, struct ls2k_bmc_pdata *ddata)
+{
+	int base, i = 0;
+
+	pci_write_config_dword(parent, PCI_COMMAND, ddata->bridge_pci_data.pci_command);
+
+	for (base = PCI_BASE_ADDRESS_0; base <= PCI_BASE_ADDRESS_5; base += 4, i++)
+		pci_write_config_dword(parent, base, ddata->bridge_pci_data.base_address[i]);
+
+	pci_write_config_dword(parent, PCI_ROM_ADDRESS, ddata->bridge_pci_data.rom_addreess);
+	pci_write_config_dword(parent, PCI_INTERRUPT_LINE, ddata->bridge_pci_data.interrupt_line);
+
+	pci_write_config_dword(parent, parent->msi_cap + PCI_MSI_ADDRESS_LO,
+			       ddata->bridge_pci_data.msi_lo);
+	pci_write_config_dword(parent, parent->msi_cap + PCI_MSI_ADDRESS_HI,
+			       ddata->bridge_pci_data.msi_hi);
+	pci_write_config_dword(parent, parent->pcie_cap + PCI_EXP_DEVCTL,
+			       ddata->bridge_pci_data.devctl);
+	pci_write_config_dword(parent, parent->pcie_cap + PCI_EXP_LNKCAP,
+			       ddata->bridge_pci_data.linkcap);
+	pci_write_config_dword(parent, parent->pcie_cap + PCI_EXP_LNKCTL,
+			       ddata->bridge_pci_data.linkctl_sts);
+
+	pci_write_config_dword(parent, LS7A_GEN2_CTL, ddata->bridge_pci_data.gen2_ctrl);
+	pci_write_config_dword(parent, LS7A_SYMBOL_TIMER, ddata->bridge_pci_data.symbol_timer);
+}
+
+static int ls2k_bmc_recover_pci_data(void *data)
+{
+	struct ls2k_bmc_pdata *ddata = data;
+	struct pci_dev *pdev = to_pci_dev(ddata->dev);
+	struct pci_dev *parent = pdev->bus->self;
+	u32 i;
+
+	/*
+	 * Clear the bus, io and mem resources of the PCI-E bridge to zero, so that
+	 * the processor can not access the LS2K PCI-E port, to avoid crashing due to
+	 * the lack of return signal from accessing the LS2K PCI-E port.
+	 */
+	pci_write_config_dword(parent, PCI_BASE_ADDRESS_2, 0);
+	pci_write_config_dword(parent, PCI_BASE_ADDRESS_3, 0);
+	pci_write_config_dword(parent, PCI_BASE_ADDRESS_4, 0);
+
+	/*
+	 * When the LS2K BMC is reset, the LS7A PCI-E port is also reset, and its PCI
+	 * BAR0 register is cleared. Due to the time gap between the GPIO interrupt
+	 * generation and the LS2K BMC reset, the LS7A PCI BAR0 register is read to
+	 * determine whether the reset has begun.
+	 */
+	for (i = LS7A_BAR0_CHECK_MAX_TIMES; i > 0 ; i--) {
+		if (!ls2k_bmc_bar0_addr_is_set(parent))
+			break;
+		mdelay(1);
+	};
+
+	if (i == 0)
+		return false;
+
+	ls2k_bmc_restore_bridge_pci_data(parent, ddata);
+
+	/* Check if PCI-E is connected */
+	if (!ls2k_bmc_pcie_is_connected(parent, ddata))
+		return false;
+
+	/* Waiting for U-Boot and DDR ready */
+	mdelay(LS2K_BMC_RESET_WAIT_TIME);
+	if (!ls2k_bmc_bar0_addr_is_set(parent))
+		return false;
+
+	/* Restore LS2K BMC PCI-E config data */
+	pci_write_config_dword(pdev, PCI_COMMAND, ddata->bmc_pci_data.pci_command);
+	pci_write_config_dword(pdev, PCI_BASE_ADDRESS_0, ddata->bmc_pci_data.base_address0);
+	pci_write_config_dword(pdev, PCI_INTERRUPT_LINE, ddata->bmc_pci_data.interrupt_line);
+
+	return 0;
+}
+
+static void ls2k_bmc_events_fn(struct work_struct *work)
+{
+	struct ls2k_bmc_pdata *ddata = container_of(work, struct ls2k_bmc_pdata, bmc_reset_work);
+
+	/*
+	 * The PCI-E is lost when the BMC resets, at which point access to the PCI-E
+	 * from other CPUs is suspended to prevent a crash.
+	 */
+	stop_machine(ls2k_bmc_recover_pci_data, ddata, NULL);
+
+	if (IS_ENABLED(CONFIG_VT)) {
+		/* Re-push the display due to previous PCI-E loss. */
+		set_console(vt_move_to_console(MAX_NR_CONSOLES - 1, 1));
+	}
+}
+
+static irqreturn_t ls2k_bmc_interrupt(int irq, void *arg)
+{
+	struct ls2k_bmc_pdata *ddata = arg;
+	static unsigned long last_jiffies;
+
+	if (system_state != SYSTEM_RUNNING)
+		return IRQ_HANDLED;
+
+	/* Skip interrupt in LS2K_BMC_INT_INTERVAL */
+	if (time_after(jiffies, last_jiffies + LS2K_BMC_INT_INTERVAL)) {
+		schedule_work(&ddata->bmc_reset_work);
+		last_jiffies = jiffies;
+	}
+
+	return IRQ_HANDLED;
+}
+
+/*
+ * Saves the BMC parent device (LS7A) and its own PCI configuration space registers
+ * that need to be restored after BMC reset.
+ */
+static void ls2k_bmc_save_pci_data(struct pci_dev *pdev, struct ls2k_bmc_pdata *ddata)
+{
+	struct pci_dev *parent = pdev->bus->self;
+	int base, i = 0;
+
+	pci_read_config_dword(parent, PCI_COMMAND, &ddata->bridge_pci_data.pci_command);
+
+	for (base = PCI_BASE_ADDRESS_0; base <= PCI_BASE_ADDRESS_5; base += 4, i++)
+		pci_read_config_dword(parent, base, &ddata->bridge_pci_data.base_address[i]);
+
+	pci_read_config_dword(parent, PCI_ROM_ADDRESS, &ddata->bridge_pci_data.rom_addreess);
+	pci_read_config_dword(parent, PCI_INTERRUPT_LINE, &ddata->bridge_pci_data.interrupt_line);
+
+	pci_read_config_dword(parent, parent->msi_cap + PCI_MSI_ADDRESS_LO,
+			      &ddata->bridge_pci_data.msi_lo);
+	pci_read_config_dword(parent, parent->msi_cap + PCI_MSI_ADDRESS_HI,
+			      &ddata->bridge_pci_data.msi_hi);
+
+	pci_read_config_dword(parent, parent->pcie_cap + PCI_EXP_DEVCTL,
+			      &ddata->bridge_pci_data.devctl);
+	pci_read_config_dword(parent, parent->pcie_cap + PCI_EXP_LNKCAP,
+			      &ddata->bridge_pci_data.linkcap);
+	pci_read_config_dword(parent, parent->pcie_cap + PCI_EXP_LNKCTL,
+			      &ddata->bridge_pci_data.linkctl_sts);
+
+	pci_read_config_dword(parent, LS7A_GEN2_CTL, &ddata->bridge_pci_data.gen2_ctrl);
+	ddata->bridge_pci_data.gen2_ctrl |= FIELD_PREP(LS7A_GEN2_SPEED_CHANG, 0x1)
+					| FIELD_PREP(LS7A_CONF_PHY_TX, 0x0);
+
+	pci_read_config_dword(parent, LS7A_SYMBOL_TIMER, &ddata->bridge_pci_data.symbol_timer);
+	ddata->bridge_pci_data.symbol_timer |= LS7A_MASK_LEN_MATCH;
+
+	pci_read_config_dword(pdev, PCI_COMMAND, &ddata->bmc_pci_data.pci_command);
+	pci_read_config_dword(pdev, PCI_BASE_ADDRESS_0, &ddata->bmc_pci_data.base_address0);
+	pci_read_config_dword(pdev, PCI_INTERRUPT_LINE, &ddata->bmc_pci_data.interrupt_line);
+}
+
+static int ls2k_bmc_pdata_initial(struct ls2k_bmc_pdata *ddata)
+{
+	struct pci_dev *pdev = to_pci_dev(ddata->dev);
+	int gsi = 16 + (LS2K_BMC_RESET_GPIO & 7);
+	void __iomem *gpio_base;
+	int irq, ret, val;
+
+	ls2k_bmc_save_pci_data(pdev, ddata);
+
+	INIT_WORK(&ddata->bmc_reset_work, ls2k_bmc_events_fn);
+
+	ret = devm_request_irq(&pdev->dev, pdev->irq, ls2k_bmc_interrupt,
+			       IRQF_SHARED | IRQF_TRIGGER_FALLING, "ls2kbmc pcie", ddata);
+	if (ret) {
+		dev_err(ddata->dev, "Failed to request LS2KBMC PCI-E IRQ %d.\n", pdev->irq);
+		return ret;
+	}
+
+	/*
+	 * Since gpio_chip->to_irq is not implemented in the Loongson-3 GPIO driver,
+	 * acpi_register_gsi() is used to obtain the GPIO IRQ. The GPIO interrupt is a
+	 * watchdog interrupt that is triggered when the BMC resets.
+	 */
+	irq = acpi_register_gsi(NULL, gsi, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW);
+	if (irq < 0)
+		return irq;
+
+	gpio_base = ioremap(LOONGSON_GPIO_REG_BASE, LOONGSON_GPIO_REG_SIZE);
+	if (!gpio_base) {
+		ret = PTR_ERR(gpio_base);
+		goto acpi_failed;
+	}
+
+	/* Disable GPIO output */
+	val = readl(gpio_base + LOONGSON_GPIO_OEN);
+	writel(val | BIT(LS2K_BMC_RESET_GPIO), gpio_base + LOONGSON_GPIO_OEN);
+
+	/* Enable GPIO functionality */
+	val = readl(gpio_base + LOONGSON_GPIO_FUNC);
+	writel(val & ~BIT(LS2K_BMC_RESET_GPIO), gpio_base + LOONGSON_GPIO_FUNC);
+
+	/* Set GPIO interrupts to low-level active */
+	val = readl(gpio_base + LOONGSON_GPIO_INTPOL);
+	writel(val & ~BIT(LS2K_BMC_RESET_GPIO), gpio_base + LOONGSON_GPIO_INTPOL);
+
+	/* Enable GPIO interrupts */
+	val = readl(gpio_base + LOONGSON_GPIO_INTEN);
+	writel(val | BIT(LS2K_BMC_RESET_GPIO), gpio_base + LOONGSON_GPIO_INTEN);
+
+	ret = devm_request_irq(ddata->dev, irq, ls2k_bmc_interrupt,
+			       IRQF_SHARED | IRQF_TRIGGER_FALLING, "ls2kbmc gpio", ddata);
+	if (ret)
+		dev_err(ddata->dev, "Failed to request LS2KBMC GPIO IRQ %d.\n", irq);
+
+	iounmap(gpio_base);
+
+acpi_failed:
+	acpi_unregister_gsi(gsi);
+	return ret;
+}
+
 /*
  * Currently the Loongson-2K BMC hardware does not have an I2C interface to adapt to the
  * resolution. We set the resolution by presetting "video=1280x1024-16@2M" to the BMC memory.
@@ -134,6 +457,7 @@ static int ls2k_bmc_parse_mode(struct pci_dev *pdev, struct simplefb_platform_da
 static int ls2k_bmc_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
 	struct simplefb_platform_data pd;
+	struct ls2k_bmc_pdata *ddata;
 	resource_size_t base;
 	int ret;
 
@@ -141,6 +465,18 @@ static int ls2k_bmc_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	if (ret)
 		return ret;
 
+	ddata = devm_kzalloc(&dev->dev, sizeof(*ddata), GFP_KERNEL);
+	if (IS_ERR(ddata)) {
+		ret = -ENOMEM;
+		goto disable_pci;
+	}
+
+	ddata->dev = &dev->dev;
+
+	ret = ls2k_bmc_pdata_initial(ddata);
+	if (ret)
+		goto disable_pci;
+
 	ret = ls2k_bmc_parse_mode(dev, &pd);
 	if (ret)
 		goto disable_pci;
-- 
2.47.3

[Openipmi-developer] [PATCH v9 0/3] LoongArch: Add Loongson-2K BMC support

From: Binbin Z. <zho...@lo...> - 2025-08-12 12:00:05

Hi all:

This patchset introduces the Loongson-2K BMC.

It is a PCIe device present on servers similar to the Loongson-3 CPUs.
And it is a multifunctional device (MFD), such as display as a sub-function
of it.

For IPMI, according to the existing design, we use software simulation to
implement the KCS interface registers: Stauts/Command/Data_Out/Data_In.

Also since both host side and BMC side read and write kcs status, we use
fifo pointer to ensure data consistency.

For the display, based on simpledrm, the resolution is read from a fixed
position in the BMC since the hardware does not support auto-detection
of the resolution. Of course, we will try to support multiple
resolutions later, through a vbios-like approach.

Especially, for the BMC reset function, since the display will be
disconnected when BMC reset, we made a special treatment of re-push.

Based on this, I will present it in four patches:
patch-1: BMC device PCI resource allocation.
patch-2: BMC reset function support
patch-3: IPMI implementation

Thanks.

-------
V9:
Patch (2/3):
 - PCIE -> PCI-E in dev_err();
 - Separate the read from the write;

Link to V8:
https://lore.kernel.org/all/cov...@lo.../

V8:
Patch (1/3):
 - Similar to as3711_subdevs, identify elements in ls2k_bmc_cells.

Patch (2/3):
 - Rename variables using usual names, such as `priv` -> `ddata`;
 - Use if statements instead of #ifery;
 - Rewrite the error message to ensure it is easy to understand;
 - ls2k_bmc_pdata_initial(dev, priv); -> ls2k_bmc_pdata_initial(priv);

Link to V7:
https://lore.kernel.org/all/cov...@lo.../

V7:
Patch (1/3):
  - Fix build warning by lkp: Add depend on ACPI_GENERIC_GSI
    - https://lore.kernel.org/all/202...@in.../

Link to V6:
https://lore.kernel.org/all/cov...@lo.../

V6:
- Add Acked-by tag from Corey, thanks;
Patch (1/3):
  - Fix build warning by lkp: Add depend on PCI
    - https://lore.kernel.org/all/202...@in.../
    - https://lore.kernel.org/all/202...@in.../
    - https://lore.kernel.org/all/202...@in.../
    - https://lore.kernel.org/all/202...@in.../

Link to V5:
https://lore.kernel.org/all/cov...@lo.../

V5:
Patch (1/3):
 - Rename ls2kbmc-mfd.c to ls2k-bmc-core.c;
 - Rename MFD_LS2K_BMC to MFD_LS2K_BMC_CORE and update its help text.
Patch (3/3):
 - Add an IPMI_LS2K config in the IPMI section that enables the IPMI
   interface and selects MFD_LS2K_BMC_CORE.

Link to V4:
https://lore.kernel.org/all/cov...@lo.../

V4:
- Add Reviewed-by tag;
- Change the order of the patches.
Patch (1/3):
  - Fix build warning by lkp: Kconfig tristate -> bool
    - https://lore.kernel.org/all/202...@in.../
 - Update commit message;
 - Move MFD_LS2K_BMC after MFD_INTEL_M10_BMC_PMCI in Kconfig and
   Makefile.
Patch (2/3):
  - Remove unnecessary newlines;
  - Rename ls2k_bmc_check_pcie_connected() to
    ls2k_bmc_pcie_is_connected();
  - Update comment message.
Patch (3/3):
  - Remove unnecessary newlines.

Link to V3:
https://lore.kernel.org/all/cov...@lo.../

V3:
Patch (1/3):
 - Drop "MFD" in title and comment;
 - Fromatting code;
 - Add clearer comments.
Patch (2/3):
 - Rebase linux-ipmi/next tree;
 - Use readx()/writex() to read and write IPMI data instead of structure
   pointer references;
 - CONFIG_LOONGARCH -> MFD_LS2K_BMC;
 - Drop unused output.
Patch (3/3):
 - Inline the ls2k_bmc_gpio_reset_handler() function to ls2k_bmc_pdata_initial();
 - Add clearer comments.
 - Use proper multi-line commentary as per the Coding Style documentation;
 - Define all magic numbers.

Link to V2:
https://lore.kernel.org/all/cov...@lo.../

V2:
- Drop ls2kdrm, use simpledrm instead.
Patch (1/3):
 - Use DEFINE_RES_MEM_NAMED/MFD_CELL_RES simplified code;
 - Add resolution fetching due to replacing the original display
   solution with simpledrm; 
 - Add aperture_remove_conflicting_devices() to avoid efifb
   conflict with simpledrm.
Patch (3/3):
 - This part of the function, moved from the original ls2kdrm to mfd;
 - Use set_console to implement the Re-push display function.

Link to V1:
https://lore.kernel.org/all/cov...@lo.../

Binbin Zhou (3):
  mfd: ls2kbmc: Introduce Loongson-2K BMC core driver
  mfd: ls2kbmc: Add Loongson-2K BMC reset function support
  ipmi: Add Loongson-2K BMC support

 MAINTAINERS                      |   7 +
 drivers/char/ipmi/Kconfig        |   7 +
 drivers/char/ipmi/Makefile       |   1 +
 drivers/char/ipmi/ipmi_si.h      |   7 +
 drivers/char/ipmi/ipmi_si_intf.c |   4 +
 drivers/char/ipmi/ipmi_si_ls2k.c | 189 +++++++++++
 drivers/mfd/Kconfig              |  13 +
 drivers/mfd/Makefile             |   2 +
 drivers/mfd/ls2k-bmc-core.c      | 525 +++++++++++++++++++++++++++++++
 9 files changed, 755 insertions(+)
 create mode 100644 drivers/char/ipmi/ipmi_si_ls2k.c
 create mode 100644 drivers/mfd/ls2k-bmc-core.c


base-commit: 006aa8f57f55dd5bf68c4ada1e0d3f4e59027d71
-- 
2.47.3

[Openipmi-developer] [PATCH AUTOSEL 6.16-5.4] ipmi: Use dev_warn_ratelimited() for incorrect message warnings

From: Sasha L. <sa...@ke...> - 2025-08-10 00:21:47

From: Breno Leitao <le...@de...>

[ Upstream commit ec50ec378e3fd83bde9b3d622ceac3509a60b6b5 ]

During BMC firmware upgrades on live systems, the ipmi_msghandler
generates excessive "BMC returned incorrect response" warnings
while the BMC is temporarily offline. This can flood system logs
in large deployments.

Replace dev_warn() with dev_warn_ratelimited() to throttle these
warnings and prevent log spam during BMC maintenance operations.

Signed-off-by: Breno Leitao <le...@de...>
Message-ID: <202...@de...>
Signed-off-by: Corey Minyard <co...@mi...>
Signed-off-by: Sasha Levin <sa...@ke...>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit is an excellent candidate for backporting to stable kernel
trees for the following reasons:

1. **Fixes a Real User-Impacting Bug**: The commit addresses log
   flooding during BMC firmware upgrades or when BMC is temporarily
   offline. This is a practical operational issue that affects
   production systems, especially in large deployments where BMC
   maintenance is routine.

2. **Minimal and Safe Change**: The change is trivial - simply replacing
   `dev_warn()` with `dev_warn_ratelimited()`. This is:
   - A single-line change
   - Uses well-established kernel infrastructure (ratelimited logging)
   - Has no functional impact on the IPMI driver's operation
   - Cannot introduce regressions in driver functionality

3. **Clear Problem Statement**: The commit message clearly describes a
   specific scenario (BMC firmware upgrades) where excessive "BMC
   returned incorrect response" warnings flood system logs. This is a
   documented, reproducible issue.

4. **Follows Established Pattern**: The kernel already has precedent for
   this type of fix. Commit ca8c1c53b03b ("ipmi: rate limit ipmi
   smi_event failure message") from kernel v5.15 made similar changes to
   address log flooding in the same subsystem, and that was successfully
   integrated into stable trees.

5. **Low Risk**: The change only affects logging behavior, not the
   actual handling of IPMI messages. The error path (goto
   return_unspecified) remains unchanged, maintaining the same error
   handling logic.

6. **Addresses Operational Concerns**: In production environments with
   many servers, BMC maintenance is common. Without rate limiting, logs
   can grow excessively, potentially:
   - Filling up disk space
   - Making real issues harder to spot
   - Degrading system performance due to excessive logging I/O

7. **Code Context Shows Repeated Condition**: Looking at the code path
   (lines 4604-4615), this warning triggers when the BMC response
   doesn't match the expected NetFN and Command. During BMC firmware
   updates or reboots, this mismatch can occur repeatedly until the BMC
   comes back online, making rate limiting essential.

The change meets all criteria for stable backporting: it fixes a bug
affecting users, is small and contained, has minimal risk of regression,
and addresses a clear operational problem without introducing new
features or architectural changes.

 drivers/char/ipmi/ipmi_msghandler.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 064944ae9fdc..8e9050f99e9e 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -4607,10 +4607,10 @@ static int handle_one_recv_msg(struct ipmi_smi *intf,
 		 * The NetFN and Command in the response is not even
 		 * marginally correct.
 		 */
-		dev_warn(intf->si_dev,
-			 "BMC returned incorrect response, expected netfn %x cmd %x, got netfn %x cmd %x\n",
-			 (msg->data[0] >> 2) | 1, msg->data[1],
-			 msg->rsp[0] >> 2, msg->rsp[1]);
+		dev_warn_ratelimited(intf->si_dev,
+				     "BMC returned incorrect response, expected netfn %x cmd %x, got netfn %x cmd %x\n",
+				     (msg->data[0] >> 2) | 1, msg->data[1],
+				     msg->rsp[0] >> 2, msg->rsp[1]);
 
 		goto return_unspecified;
 	}
-- 
2.39.5

[Openipmi-developer] [PATCH AUTOSEL 6.16-5.4] ipmi: Fix strcpy source and destination the same

From: Sasha L. <sa...@ke...> - 2025-08-10 00:21:34

From: Corey Minyard <co...@mi...>

[ Upstream commit 8ffcb7560b4a15faf821df95e3ab532b2b020f8c ]

The source and destination of some strcpy operations was the same.
Split out the part of the operations that needed to be done for those
particular calls so the unnecessary copy wasn't done.

Reported-by: kernel test robot <lk...@in...>
Closes: https://lore.kernel.org/oe-kbuild-all/202...@in.../
Signed-off-by: Corey Minyard <co...@mi...>
Signed-off-by: Sasha Levin <sa...@ke...>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit is an excellent candidate for backporting to stable kernel
trees. Here's my extensive analysis:

## Bug Description and Impact

The commit fixes a critical bug where `strcpy()` operations were copying
a string to itself (source and destination being the same). Looking at
the original code:

1. In `action_op()` at line 1213: `strcpy(action, inval);` - but when
   called from `ipmi_wdog_init()` line 1294 with `action_op(action,
   NULL)`, the first parameter `action` is passed, and if validation
   passes, it copies `action` to itself.

2. Similar issues in `preaction_op()` line 1237 and `preop_op()` line
   1257 when called from initialization.

This undefined behavior could lead to:
- Memory corruption
- Unpredictable watchdog behavior during initialization
- Potential system instability in the IPMI watchdog subsystem

## Fix Analysis

The fix properly refactors the code by:
1. Creating separate `*_set_val()` functions that only validate and set
   the value variables
2. Modifying the original functions to use these new helpers
3. In `ipmi_wdog_init()`, directly calling the `*_set_val()` functions
   for validation without the problematic strcpy

This is a minimal, targeted fix that:
- Doesn't introduce new features
- Maintains existing functionality
- Only restructures code to eliminate undefined behavior
- Has clear boundaries within the IPMI watchdog driver

## Stable Tree Criteria Met

1. **Fixes a real bug**: Yes - strcpy with same source/destination is
   undefined behavior
2. **Small and contained**: Yes - changes are limited to one file and
   involve straightforward refactoring
3. **No major architectural changes**: Correct - only refactors existing
   functions
4. **Minimal regression risk**: Yes - the logic remains identical, just
   properly separated
5. **Critical subsystem**: IPMI watchdog is important for system
   reliability and recovery
6. **Automated detection**: The bug was found by kernel test robot,
   indicating it's a real issue

## Additional Evidence

- The commit was reported by the kernel test robot with a specific bug
  report link
- The fix is defensive programming that eliminates undefined behavior
- IPMI watchdog is used in production servers for system recovery
- The bug could manifest differently across compilers/architectures

This meets all stable kernel criteria for backporting as it fixes a
concrete bug with minimal risk.

 drivers/char/ipmi/ipmi_watchdog.c | 59 ++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index ab759b492fdd..a013ddbf1466 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1146,14 +1146,8 @@ static struct ipmi_smi_watcher smi_watcher = {
 	.smi_gone = ipmi_smi_gone
 };
 
-static int action_op(const char *inval, char *outval)
+static int action_op_set_val(const char *inval)
 {
-	if (outval)
-		strcpy(outval, action);
-
-	if (!inval)
-		return 0;
-
 	if (strcmp(inval, "reset") == 0)
 		action_val = WDOG_TIMEOUT_RESET;
 	else if (strcmp(inval, "none") == 0)
@@ -1164,18 +1158,26 @@ static int action_op(const char *inval, char *outval)
 		action_val = WDOG_TIMEOUT_POWER_DOWN;
 	else
 		return -EINVAL;
-	strcpy(action, inval);
 	return 0;
 }
 
-static int preaction_op(const char *inval, char *outval)
+static int action_op(const char *inval, char *outval)
 {
+	int rv;
+
 	if (outval)
-		strcpy(outval, preaction);
+		strcpy(outval, action);
 
 	if (!inval)
 		return 0;
+	rv = action_op_set_val(inval);
+	if (!rv)
+		strcpy(action, inval);
+	return rv;
+}
 
+static int preaction_op_set_val(const char *inval)
+{
 	if (strcmp(inval, "pre_none") == 0)
 		preaction_val = WDOG_PRETIMEOUT_NONE;
 	else if (strcmp(inval, "pre_smi") == 0)
@@ -1188,18 +1190,26 @@ static int preaction_op(const char *inval, char *outval)
 		preaction_val = WDOG_PRETIMEOUT_MSG_INT;
 	else
 		return -EINVAL;
-	strcpy(preaction, inval);
 	return 0;
 }
 
-static int preop_op(const char *inval, char *outval)
+static int preaction_op(const char *inval, char *outval)
 {
+	int rv;
+
 	if (outval)
-		strcpy(outval, preop);
+		strcpy(outval, preaction);
 
 	if (!inval)
 		return 0;
+	rv = preaction_op_set_val(inval);
+	if (!rv)
+		strcpy(preaction, inval);
+	return 0;
+}
 
+static int preop_op_set_val(const char *inval)
+{
 	if (strcmp(inval, "preop_none") == 0)
 		preop_val = WDOG_PREOP_NONE;
 	else if (strcmp(inval, "preop_panic") == 0)
@@ -1208,7 +1218,22 @@ static int preop_op(const char *inval, char *outval)
 		preop_val = WDOG_PREOP_GIVE_DATA;
 	else
 		return -EINVAL;
-	strcpy(preop, inval);
+	return 0;
+}
+
+static int preop_op(const char *inval, char *outval)
+{
+	int rv;
+
+	if (outval)
+		strcpy(outval, preop);
+
+	if (!inval)
+		return 0;
+
+	rv = preop_op_set_val(inval);
+	if (!rv)
+		strcpy(preop, inval);
 	return 0;
 }
 
@@ -1245,18 +1270,18 @@ static int __init ipmi_wdog_init(void)
 {
 	int rv;
 
-	if (action_op(action, NULL)) {
+	if (action_op_set_val(action)) {
 		action_op("reset", NULL);
 		pr_info("Unknown action '%s', defaulting to reset\n", action);
 	}
 
-	if (preaction_op(preaction, NULL)) {
+	if (preaction_op_set_val(preaction)) {
 		preaction_op("pre_none", NULL);
 		pr_info("Unknown preaction '%s', defaulting to none\n",
 			preaction);
 	}
 
-	if (preop_op(preop, NULL)) {
+	if (preop_op_set_val(preop)) {
 		preop_op("preop_none", NULL);
 		pr_info("Unknown preop '%s', defaulting to none\n", preop);
 	}
-- 
2.39.5

Re: [Openipmi-developer] [PATCH 2/4] ipmi: Disable sysfs access and requests in maintenance mode

From: Corey M. <co...@mi...> - 2025-08-08 22:28:27

On Fri, Aug 08, 2025 at 03:37:51PM -0500, Frederick Lawler wrote:
> Hi Corey,
> 
> On Thu, Aug 07, 2025 at 06:02:33PM -0500, Corey Minyard wrote:
> > If the driver goes into any maintenance mode, disable sysfs access until
> > it is done.
> >
> 
> Why specifically sysfs reads during FW update state? Is there an expectation
> that during a FW update, that redfish/ipmi/etc... are chunking/buffering the
> FW payloads to the device, thus needs write access? I'm assuming that the
> device is blocking waiting for paylods to finish, so sending additional messages
> just get ignored?

In my experience, when the BMC goes into firmware update mode, it
doesn't behave normally.  But it's just my experience.  It general, it's
best not to mess with something during an update.

-corey

> 
> > If the driver goes into reset maintenance mode, disable all messages
> > until it is done.
> > 
> > Signed-off-by: Corey Minyard <co...@mi...>
> > ---
> >  drivers/char/ipmi/ipmi_msghandler.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> > index f124c0b33db8..72f5f4a0c056 100644
> > --- a/drivers/char/ipmi/ipmi_msghandler.c
> > +++ b/drivers/char/ipmi/ipmi_msghandler.c
> > @@ -2338,6 +2338,11 @@ static int i_ipmi_request(struct ipmi_user     *user,
> >  
> >  	if (!run_to_completion)
> >  		mutex_lock(&intf->users_mutex);
> > +	if (intf->maintenance_mode_state == IPMI_MAINTENANCE_MODE_STATE_RESET) {
> > +		/* No messages while the BMC is in reset. */
> > +		rv = -EBUSY;
> > +		goto out_err;
> > +	}
> >  	if (intf->in_shutdown) {
> >  		rv = -ENODEV;
> >  		goto out_err;
> > @@ -2639,6 +2644,12 @@ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc,
> >  	    (bmc->dyn_id_set && time_is_after_jiffies(bmc->dyn_id_expiry)))
> >  		goto out_noprocessing;
> >  
> > +	/* Don't allow sysfs access when in maintenance mode. */
> > +	if (intf->maintenance_mode_state) {
> > +		rv = -EBUSY;
> > +		goto out_noprocessing;
> > +	}
> > +
> >  	prev_guid_set = bmc->dyn_guid_set;
> >  	__get_guid(intf);
> >  
> > -- 
> > 2.43.0
> > 
> 
> Best, Fred

Re: [Openipmi-developer] [PATCH 2/4] ipmi: Disable sysfs access and requests in maintenance mode

From: Frederick L. <fr...@cl...> - 2025-08-08 20:38:05

Hi Corey,

On Thu, Aug 07, 2025 at 06:02:33PM -0500, Corey Minyard wrote:
> If the driver goes into any maintenance mode, disable sysfs access until
> it is done.
>

Why specifically sysfs reads during FW update state? Is there an expectation
that during a FW update, that redfish/ipmi/etc... are chunking/buffering the
FW payloads to the device, thus needs write access? I'm assuming that the
device is blocking waiting for paylods to finish, so sending additional messages
just get ignored?

> If the driver goes into reset maintenance mode, disable all messages
> until it is done.
> 
> Signed-off-by: Corey Minyard <co...@mi...>
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index f124c0b33db8..72f5f4a0c056 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -2338,6 +2338,11 @@ static int i_ipmi_request(struct ipmi_user     *user,
>  
>  	if (!run_to_completion)
>  		mutex_lock(&intf->users_mutex);
> +	if (intf->maintenance_mode_state == IPMI_MAINTENANCE_MODE_STATE_RESET) {
> +		/* No messages while the BMC is in reset. */
> +		rv = -EBUSY;
> +		goto out_err;
> +	}
>  	if (intf->in_shutdown) {
>  		rv = -ENODEV;
>  		goto out_err;
> @@ -2639,6 +2644,12 @@ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc,
>  	    (bmc->dyn_id_set && time_is_after_jiffies(bmc->dyn_id_expiry)))
>  		goto out_noprocessing;
>  
> +	/* Don't allow sysfs access when in maintenance mode. */
> +	if (intf->maintenance_mode_state) {
> +		rv = -EBUSY;
> +		goto out_noprocessing;
> +	}
> +
>  	prev_guid_set = bmc->dyn_guid_set;
>  	__get_guid(intf);
>  
> -- 
> 2.43.0
> 

Best, Fred

Re: [Openipmi-developer] [PATCH] dt-bindings: ipmi: aspeed, ast2400-kcs-bmc: Add missing "clocks" property

From: Corey M. <co...@mi...> - 2025-08-08 14:48:20

On Fri, Aug 08, 2025 at 11:17:29AM +0930, Andrew Jeffery wrote:
> On Thu, 2025-08-07 at 08:28 -0500, Rob Herring (Arm) wrote:
> > The ASpeed kcs-bmc nodes have a "clocks" property which isn't
> > documented. It looks like all the LPC child devices have the same clock
> > source and some of the drivers manage their clock. Perhaps it is the
> > parent device that should have the clock, but it's too late for that.
> > 
> > Signed-off-by: Rob Herring (Arm) <ro...@ke...>
> 
> Thanks Rob.
> 
> Acked-by: Andrew Jeffery <an...@co...>

Queued for 4.18, I'll add it to the next tree when 4.17-rc1 releases.

Thanks,

-corey

Re: [Openipmi-developer] [PATCH] dt-bindings: ipmi: aspeed, ast2400-kcs-bmc: Add missing "clocks" property

From: Andrew J. <an...@co...> - 2025-08-08 02:06:19

On Thu, 2025-08-07 at 08:28 -0500, Rob Herring (Arm) wrote:
> The ASpeed kcs-bmc nodes have a "clocks" property which isn't
> documented. It looks like all the LPC child devices have the same clock
> source and some of the drivers manage their clock. Perhaps it is the
> parent device that should have the clock, but it's too late for that.
> 
> Signed-off-by: Rob Herring (Arm) <ro...@ke...>

Thanks Rob.

Acked-by: Andrew Jeffery <an...@co...>

[Openipmi-developer] [PATCH 3/4] ipmi: Add a maintenance mode sysfs file

From: Corey M. <co...@mi...> - 2025-08-07 23:31:54

So you can see if it's in maintenance mode and see how long is left.

Signed-off-by: Corey Minyard <co...@mi...>
---
 drivers/char/ipmi/ipmi_msghandler.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 72f5f4a0c056..5ff35c473b50 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -432,6 +432,7 @@ struct ipmi_smi {
 	atomic_t nr_users;
 	struct device_attribute nr_users_devattr;
 	struct device_attribute nr_msgs_devattr;
+	struct device_attribute maintenance_mode_devattr;
 
 
 	/* Used for wake ups at startup. */
@@ -3545,6 +3546,19 @@ static ssize_t nr_msgs_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(nr_msgs);
 
+static ssize_t maintenance_mode_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct ipmi_smi *intf = container_of(attr,
+					     struct ipmi_smi,
+					     maintenance_mode_devattr);
+
+	return sysfs_emit(buf, "%u %d\n", intf->maintenance_mode_state,
+			  intf->auto_maintenance_timeout);
+}
+static DEVICE_ATTR_RO(maintenance_mode);
+
 static void redo_bmc_reg(struct work_struct *work)
 {
 	struct ipmi_smi *intf = container_of(work, struct ipmi_smi,
@@ -3681,6 +3695,14 @@ int ipmi_add_smi(struct module         *owner,
 		goto out_err_bmc_reg;
 	}
 
+	intf->maintenance_mode_devattr = dev_attr_maintenance_mode;
+	sysfs_attr_init(&intf->maintenance_mode_devattr.attr);
+	rv = device_create_file(intf->si_dev, &intf->maintenance_mode_devattr);
+	if (rv) {
+		device_remove_file(intf->si_dev, &intf->nr_users_devattr);
+		goto out_err_bmc_reg;
+	}
+
 	intf->intf_num = i;
 	mutex_unlock(&ipmi_interfaces_mutex);
 
@@ -3788,6 +3810,7 @@ void ipmi_unregister_smi(struct ipmi_smi *intf)
 	if (intf->handlers->shutdown)
 		intf->handlers->shutdown(intf->send_info);
 
+	device_remove_file(intf->si_dev, &intf->maintenance_mode_devattr);
 	device_remove_file(intf->si_dev, &intf->nr_msgs_devattr);
 	device_remove_file(intf->si_dev, &intf->nr_users_devattr);
 
-- 
2.43.0

[Openipmi-developer] [PATCH 4/4] ipmi: Set a timer for maintenance mode

From: Corey M. <co...@mi...> - 2025-08-07 23:07:11

Now that maintenance mode rejects all messages, there's nothing to
run time timer.  Make sure the timer is running in maintenance mode.

Signed-off-by: Corey Minyard <co...@mi...>
---
 drivers/char/ipmi/ipmi_msghandler.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 5ff35c473b50..786c71eb00f4 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -50,6 +50,8 @@ static void intf_free(struct kref *ref);
 static bool initialized;
 static bool drvregistered;
 
+static struct timer_list ipmi_timer;
+
 /* Numbers in this enumerator should be mapped to ipmi_panic_event_str */
 enum ipmi_panic_event_op {
 	IPMI_SEND_PANIC_EVENT_NONE,
@@ -1948,6 +1950,7 @@ static int i_ipmi_req_sysintf(struct ipmi_smi        *intf,
 				&& intf->maintenance_mode_state < newst) {
 			intf->maintenance_mode_state = newst;
 			maintenance_mode_update(intf);
+			mod_timer(&ipmi_timer, jiffies + IPMI_TIMEOUT_JIFFIES);
 		}
 		spin_unlock_irqrestore(&intf->maintenance_mode_lock,
 				       flags);
@@ -5136,6 +5139,7 @@ static bool ipmi_timeout_handler(struct ipmi_smi *intf,
 			    && (intf->auto_maintenance_timeout <= 0)) {
 				intf->maintenance_mode_state =
 					IPMI_MAINTENANCE_MODE_STATE_OFF;
+				intf->auto_maintenance_timeout = 0;
 				maintenance_mode_update(intf);
 			}
 		}
@@ -5158,8 +5162,6 @@ static void ipmi_request_event(struct ipmi_smi *intf)
 		intf->handlers->request_events(intf->send_info);
 }
 
-static struct timer_list ipmi_timer;
-
 static atomic_t stop_operation;
 
 static void ipmi_timeout_work(struct work_struct *work)
@@ -5183,6 +5185,8 @@ static void ipmi_timeout_work(struct work_struct *work)
 			}
 			need_timer = true;
 		}
+		if (intf->maintenance_mode_state)
+			need_timer = true;
 
 		need_timer |= ipmi_timeout_handler(intf, IPMI_TIMEOUT_TIME);
 	}
-- 
2.43.0

[Openipmi-developer] [PATCH 2/4] ipmi: Disable sysfs access and requests in maintenance mode

From: Corey M. <co...@mi...> - 2025-08-07 23:07:07

If the driver goes into any maintenance mode, disable sysfs access until
it is done.

If the driver goes into reset maintenance mode, disable all messages
until it is done.

Signed-off-by: Corey Minyard <co...@mi...>
---
 drivers/char/ipmi/ipmi_msghandler.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index f124c0b33db8..72f5f4a0c056 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -2338,6 +2338,11 @@ static int i_ipmi_request(struct ipmi_user     *user,
 
 	if (!run_to_completion)
 		mutex_lock(&intf->users_mutex);
+	if (intf->maintenance_mode_state == IPMI_MAINTENANCE_MODE_STATE_RESET) {
+		/* No messages while the BMC is in reset. */
+		rv = -EBUSY;
+		goto out_err;
+	}
 	if (intf->in_shutdown) {
 		rv = -ENODEV;
 		goto out_err;
@@ -2639,6 +2644,12 @@ static int __bmc_get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc,
 	    (bmc->dyn_id_set && time_is_after_jiffies(bmc->dyn_id_expiry)))
 		goto out_noprocessing;
 
+	/* Don't allow sysfs access when in maintenance mode. */
+	if (intf->maintenance_mode_state) {
+		rv = -EBUSY;
+		goto out_noprocessing;
+	}
+
 	prev_guid_set = bmc->dyn_guid_set;
 	__get_guid(intf);
 
-- 
2.43.0

[Openipmi-developer] [PATCH 1/4] ipmi: Differentiate between reset and firmware update in maintenance

From: Corey M. <co...@mi...> - 2025-08-07 23:07:06

This allows later changes to have different behaviour during a reset
verses a firmware update.

Signed-off-by: Corey Minyard <co...@mi...>
---
 drivers/char/ipmi/ipmi_msghandler.c | 42 ++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 8e9050f99e9e..f124c0b33db8 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -539,7 +539,11 @@ struct ipmi_smi {
 
 	/* For handling of maintenance mode. */
 	int maintenance_mode;
-	bool maintenance_mode_enable;
+
+#define IPMI_MAINTENANCE_MODE_STATE_OFF		0
+#define IPMI_MAINTENANCE_MODE_STATE_FIRMWARE	1
+#define IPMI_MAINTENANCE_MODE_STATE_RESET	2
+	int maintenance_mode_state;
 	int auto_maintenance_timeout;
 	spinlock_t maintenance_mode_lock; /* Used in a timer... */
 
@@ -1534,8 +1538,15 @@ EXPORT_SYMBOL(ipmi_get_maintenance_mode);
 static void maintenance_mode_update(struct ipmi_smi *intf)
 {
 	if (intf->handlers->set_maintenance_mode)
+		/*
+		 * Lower level drivers only care about firmware mode
+		 * as it affects their timing.  They don't care about
+		 * reset, which disables all commands for a while.
+		 */
 		intf->handlers->set_maintenance_mode(
-			intf->send_info, intf->maintenance_mode_enable);
+			intf->send_info,
+			(intf->maintenance_mode_state ==
+			 IPMI_MAINTENANCE_MODE_STATE_FIRMWARE));
 }
 
 int ipmi_set_maintenance_mode(struct ipmi_user *user, int mode)
@@ -1552,16 +1563,17 @@ int ipmi_set_maintenance_mode(struct ipmi_user *user, int mode)
 	if (intf->maintenance_mode != mode) {
 		switch (mode) {
 		case IPMI_MAINTENANCE_MODE_AUTO:
-			intf->maintenance_mode_enable
-				= (intf->auto_maintenance_timeout > 0);
+			/* Just leave it alone. */
 			break;
 
 		case IPMI_MAINTENANCE_MODE_OFF:
-			intf->maintenance_mode_enable = false;
+			intf->maintenance_mode_state =
+				IPMI_MAINTENANCE_MODE_STATE_OFF;
 			break;
 
 		case IPMI_MAINTENANCE_MODE_ON:
-			intf->maintenance_mode_enable = true;
+			intf->maintenance_mode_state =
+				IPMI_MAINTENANCE_MODE_STATE_FIRMWARE;
 			break;
 
 		default:
@@ -1922,13 +1934,18 @@ static int i_ipmi_req_sysintf(struct ipmi_smi        *intf,
 
 	if (is_maintenance_mode_cmd(msg)) {
 		unsigned long flags;
+		int newst;
+
+		if (msg->netfn == IPMI_NETFN_FIRMWARE_REQUEST)
+			newst = IPMI_MAINTENANCE_MODE_STATE_FIRMWARE;
+		else
+			newst = IPMI_MAINTENANCE_MODE_STATE_RESET;
 
 		spin_lock_irqsave(&intf->maintenance_mode_lock, flags);
-		intf->auto_maintenance_timeout
-			= maintenance_mode_timeout_ms;
+		intf->auto_maintenance_timeout = maintenance_mode_timeout_ms;
 		if (!intf->maintenance_mode
-		    && !intf->maintenance_mode_enable) {
-			intf->maintenance_mode_enable = true;
+				&& intf->maintenance_mode_state < newst) {
+			intf->maintenance_mode_state = newst;
 			maintenance_mode_update(intf);
 		}
 		spin_unlock_irqrestore(&intf->maintenance_mode_lock,
@@ -5083,7 +5100,8 @@ static bool ipmi_timeout_handler(struct ipmi_smi *intf,
 				-= timeout_period;
 			if (!intf->maintenance_mode
 			    && (intf->auto_maintenance_timeout <= 0)) {
-				intf->maintenance_mode_enable = false;
+				intf->maintenance_mode_state =
+					IPMI_MAINTENANCE_MODE_STATE_OFF;
 				maintenance_mode_update(intf);
 			}
 		}
@@ -5099,7 +5117,7 @@ static bool ipmi_timeout_handler(struct ipmi_smi *intf,
 static void ipmi_request_event(struct ipmi_smi *intf)
 {
 	/* No event requests when in maintenance mode. */
-	if (intf->maintenance_mode_enable)
+	if (intf->maintenance_mode_state)
 		return;
 
 	if (!intf->in_shutdown)
-- 
2.43.0

[Openipmi-developer] [RFC] Patches to disable messages during BMC reset

From: Corey M. <co...@mi...> - 2025-08-07 23:07:04

I went ahead and did some patches for this, since it was on my mind.

With these, if a reset is sent to the BMC, the driver will disable
messages to the BMC for a time, defaulting to 30 seconds.  Don't
modify message timing, since no messages are allowed, anyway.

If a firmware update command is sent to the BMC, then just reject
sysfs commands that query the BMC.  Modify message timing and
allow direct messages through the driver interface.

Hopefully this will work around the problem, and it's a good idea,
anyway.

-corey

Re: [Openipmi-developer] [BUG] ipmi_si: watchdog: Watchdog detected hard LOCKUP

From: Corey M. <co...@mi...> - 2025-08-07 20:29:19

On Thu, Aug 07, 2025 at 02:43:14PM -0500, Frederick Lawler wrote:
> 
> It occurred to me last night that I'd probably like a rate limit on the KCS
> messages as well. I didn't see if a patch for that was made. I can whip
> that up sometime next week, that could be of use to anyone.

That jogged my memory a bit; there is something called "maintenance
mode" in the IPMI driver.  It's used primarily for firmware updates,
but it's triggered by reset commands in addition to firmware update
commands.  It has three basic affects:

* It turns off automatic messages sent to the BMC by the driver
  (only fetching flags, I think).

* It changes the way the timing works to check for the BMC being ready
  a lot more often.  (This is a hardware check and shouldn't affect
  the BMC, but maybe it does on some.)

* It changes the timing for messages routed to the IPMB bus to give
  them more time.

It solved two problems:

* For systems without IPMI interrupts, firmware updates were taking
  forever.  

* When you would reset the BMC, the driver's automatic messages would
  generally time out.  And IPMB messages pending would time out.

The theory was that if the user reset the BMC, they wouldn't issue any
IPMI commands, and the driver wouldn't either, so it would leave the
BMC interface alone until it's done resetting.

It's not perfect, the reset or firmware update can happen over the LAN
interface, but it seemed to help a lot of people.

Anyway, after that long explaination, maybe that needs to be extended
and if the driver goes into maintenance mode have all sysfs accesses
to the BMC return an error.

It also might be a good idea to differentiate between resets and
firmware update commands.  After a reset nothing will probably work, but
the BMC is still partially function during a firmware update.  So no
IPMI commands at all for a little while after a reset.  That is a
behavioral change, but it's probably not a lot different that what would
happen, anyway.  The error just comes back faster.

None of this solves the basic issue, though.

I'm not exactly sure what you mean by a rate limit on KCS messages.  It
would lower the probability, perhaps, but it wouldn't eliminate the
problem, either.  Just not allowing anything during these times is
probably better.

> 
> [1533534.869508] [Hardware Error]: Corrected error, no action required.
> [1533534.884635] [Hardware Error]: CPU:1 (17:31:0) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
> [1533534.912122] [Hardware Error]: Error Addr: 0x0000000313c7a020
> [1533534.926641] [Hardware Error]: IPID: 0x0000009600350f00, Syndrome: 0x9fec08000a800a01
> [1533534.943278] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
> [1533534.946635] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#1channel#3 (csrow:1 channel:3 page:0x0 offset:0x0 grain:64 syndrome:0x800)
> [1533535.369487] INFO: task cat:1844873 blocked for more than 10 seconds.
> [1533535.385145]       Tainted: G        W  O       6.12.35-cloudflare-2025.6.15 #1
> [1533535.401614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [1533535.418715] task:cat             state:D stack:0     pid:1844873 tgid:1844873 ppid:1844872 task_flags:0x400000 flags:0x00004002
> [1533535.447475] Call Trace:
> [1533535.458691]  <TASK>
> [1533535.469154]  __schedule+0x4fa/0xbf0
> [1533535.481433]  schedule+0x27/0xf0
> [1533535.493181]  __get_guid+0xf4/0x130 [ipmi_msghandler]
> [1533535.506325]  ? __pfx_autoremove_wake_function+0x10/0x10
> [1533535.519910]  __bmc_get_device_id+0xd6/0xa30 [ipmi_msghandler]

Yeah, this is what I would expect to see if you are doing this operation
and the BMC is in reset.  It's going to sit there until it times out and
returns an error.

-corey

> [1533535.534459]  ? srso_return_thunk+0x5/0x5f
> [1533535.546509]  ? srso_return_thunk+0x5/0x5f
> [1533535.558540]  ? __memcg_slab_post_alloc_hook+0x21b/0x410
> [1533535.571722]  aux_firmware_rev_show+0x38/0x90 [ipmi_msghandler]
> [1533535.585304]  ? __kmalloc_node_noprof+0x3f6/0x450
> [1533535.598144]  ? seq_read_iter+0x376/0x460
> [1533535.609621]  dev_attr_show+0x1c/0x40
> [1533535.621024]  sysfs_kf_seq_show+0x8f/0xe0
> [1533535.632316]  seq_read_iter+0x11f/0x460
> [1533535.643172]  ? security_file_permission+0x9/0xb0
> [1533535.655102]  vfs_read+0x260/0x330
> [1533535.665368]  ksys_read+0x65/0xe0
> [1533535.675559]  do_syscall_64+0x4b/0x110
> [1533535.686324]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [1533535.698530] RIP: 0033:0x7f72b587125d
> [1533535.708857] RSP: 002b:00007ffccc21bb48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [1533535.723411] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f72b587125d
> [1533535.737361] RDX: 0000000000020000 RSI: 00007f72b5755000 RDI: 0000000000000003
> [1533535.751191] RBP: 0000000000020000 R08: 00000000ffffffff R09: 0000000000000000
> [1533535.764847] R10: 00007f72b5788b60 R11: 0000000000000246 R12: 00007f72b5755000
> [1533535.778536] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000000
> [1533535.792210]  </TASK>
> 
> crash> bt -l 1781073
> PID: 1781073  TASK: ffff9d91c7040000  CPU: 81   COMMAND: "/usr/bin/python"
>  #0 [ffffb3a171683c00] __schedule at ffffffff9d559eea
>     /cfsetup_build/build/linux/kernel/sched/core.c: 5338
>  #1 [ffffb3a171683c80] schedule at ffffffff9d55a617
>     /cfsetup_build/build/linux/arch/x86/include/asm/preempt.h: 84
>  #2 [ffffb3a171683c90] __get_guid at ffffffffc22aa574 [ipmi_msghandler]
>  #3 [ffffb3a171683ce8] __bmc_get_device_id at ffffffffc22aa696 [ipmi_msghandler]
>  #4 [ffffb3a171683da0] aux_firmware_rev_show at ffffffffc22ab1c8 [ipmi_msghandler]
>  #5 [ffffb3a171683dd0] dev_attr_show at ffffffff9d1175dc
>     /cfsetup_build/build/linux/drivers/base/core.c: 2425
>  #6 [ffffb3a171683de8] sysfs_kf_seq_show at ffffffff9cc64caf
>     /cfsetup_build/build/linux/fs/sysfs/file.c: 60
>  #7 [ffffb3a171683e10] seq_read_iter at ffffffff9cbddf7f
>     /cfsetup_build/build/linux/fs/seq_file.c: 230
>  #8 [ffffb3a171683e68] vfs_read at ffffffff9cba8590
>     /cfsetup_build/build/linux/fs/read_write.c: 489
>  #9 [ffffb3a171683f00] ksys_read at ffffffff9cba9165
>     /cfsetup_build/build/linux/fs/read_write.c: 713
> #10 [ffffb3a171683f38] do_syscall_64 at ffffffff9d550c8b
>     /cfsetup_build/build/linux/arch/x86/entry/common.c: 52
> #11 [ffffb3a171683f50] entry_SYSCALL_64_after_hwframe at ffffffff9d60012f
>     /cfsetup_build/build/linux/arch/x86/entry/entry_64.S: 130
>     RIP: 00007f04e1b7c29c  RSP: 00007ffea7aaf6c0  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 0000000000a840f8  RCX: 00007f04e1b7c29c
>     RDX: 0000000000001001  RSI: 000000002fd06ef0  RDI: 00000000000000c1
>     RBP: 00007f04e1a82fc0   R8: 0000000000000000   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000001001
>     R13: 000000002fd06ef0  R14: 00000000000000c1  R15: 0000000000a41520
>     ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
> 
> crash> files 1781073
> ...
> 193 ffff9db5132e5800 ffff9dafb18bd200 ffff9da7b780bcf0 REG  /sys/devices/platform/ipmi_bmc.0/aux_firmware_revision
> 
> crash> log -c
> ...
> [1533553.998160] [      C7] ipmi_si IPI0001:00: KCS in invalid state 6
> [1533554.009156] [      C7] ipmi_si IPI0001:00: KCS in invalid state 8
> [1533554.019973] [T1844873] ipmi_si IPI0001:00: KCS in invalid state 9
> [1533554.031005] [     C81] ipmi_si IPI0001:00: IPMI message handler: device id fetch failed: 0xd5
>

Re: [Openipmi-developer] [BUG] ipmi_si: watchdog: Watchdog detected hard LOCKUP

From: Frederick L. <fr...@cl...> - 2025-08-07 19:43:28

On Wed, Aug 06, 2025 at 05:51:29PM -0500, Corey Minyard wrote:
> On Wed, Aug 06, 2025 at 04:36:41PM -0500, Frederick Lawler wrote:
> > On Wed, Aug 06, 2025 at 04:16:18PM -0500, Corey Minyard wrote:
> > > On Wed, Aug 06, 2025 at 03:19:02PM -0500, Fred Lawler wrote:
> > > > + CC: Corey Minyard <co...@mi...>
> > > > 
> > 
> > > I'm wondering if something is happening with the BMC resetting and
> > > interactions with ACPI involved in that.  Adding the extra part of
> > > trying to talk to the BMC while it's being reset could cause the BMC to
> > > get confused and do bad things?
> > > 
> > 
> > Sure, it's a possibility we explored. We have a lot of automation.
> > Predominately of which is a prometheus module exporting IPMI information
> > from the sysfs files. And we also have config management that's querying
> > sysfs files to regulate updates etc... Sometimes, the config management
> > automation will attempt to reset the BMC.
> 
> Ok.  I have tests that do BMC resets, but I can't run at the scale you
> do, and I'm running in a simulator so it's not going to be have the
> same.
> 
> The other possibility is the processor goes into the idle code while
> interrupts are off, but I think the kernel has checks all around that.
> I can't think of how else a processor would get stuck in idle.
> 

Yes, it's a bit of an odd case. There's nothing obvious reported by the
crash utility. By the time we get the NMI/panic, the CPUs are off doing
something else in our crash typical case. That said, earlier this week I got a
hard lockup outside of a BMC reset, but the node had too many MCE
correctable memory errors.

For sake of completeness, I'll post that stack trace here anyway since
that may provide some more context clues. In this case, I did catch two
separate reads to sysfs files, and then they appear to have competed.
The cat process seemed to already be off CPU, but the KCS
message is still coming in at the same time the python script was being
processed too. Only the python run was on CPU at time of crash. But NMI
panic was still on a idle CPU. Unfortunately, I didn't write down all
the logs this one, so it's missing the idle state NMI for watchdog, but
hopefully the snippets show what's happening. I posted this below.

> > 
> > > > >
> > > > > I tried also tried to load the CPUs with stress-ng, but the best I can do
> > > > > are the hung tasks.
> > > > >
> > > > > I identified that sni_send()[1] could be locked behind the
> > > > > spin_lock_irqsave() and within the KCS send handler, there's another irq
> > > > > save lock. I suspect this is where we're getting hung up. Below is a
> > > > > sample stack trace + log output.
> > > 
> > > Yeah, I don't see that in the traceback.  There is a lock in the KCS
> > > sender, but I don't see how that could do anything.
> > > 
> > > Maybe you could try changing the cpuidle handler?  That would be at
> > > least something to try.
> > > 
> > 
> > Would that help in forming a reproducer? I'd need to deploy any kernel
> > modifications fleet wide to cast a wide enough net. The lockups arn't
> > extremely consistent. We may get a couple or more a week.
> 
> Ah, so this isn't readily reproducable.  Bummer.
> 
> If the problem goes away if you change the cpuidle handler to something
> non-ACPI, that would be a big clue that it's an ACPI issue.
> 
> > 
> > Lastly, I have the rate limit patch backported. I'll be able to start
> > testing with that tomorrow, and same with loading the IPMI watchdog
> > module.
> 
> Ok.  I don't have much hope for it making much difference, but it's safe
> and will be coming in the next kernel release.
>

It occurred to me last night that I'd probably like a rate limit on the KCS
messages as well. I didn't see if a patch for that was made. I can whip
that up sometime next week, that could be of use to anyone.

[1533534.869508] [Hardware Error]: Corrected error, no action required.
[1533534.884635] [Hardware Error]: CPU:1 (17:31:0) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
[1533534.912122] [Hardware Error]: Error Addr: 0x0000000313c7a020
[1533534.926641] [Hardware Error]: IPID: 0x0000009600350f00, Syndrome: 0x9fec08000a800a01
[1533534.943278] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
[1533534.946635] EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#1channel#3 (csrow:1 channel:3 page:0x0 offset:0x0 grain:64 syndrome:0x800)
[1533535.369487] INFO: task cat:1844873 blocked for more than 10 seconds.
[1533535.385145]       Tainted: G        W  O       6.12.35-cloudflare-2025.6.15 #1
[1533535.401614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1533535.418715] task:cat             state:D stack:0     pid:1844873 tgid:1844873 ppid:1844872 task_flags:0x400000 flags:0x00004002
[1533535.447475] Call Trace:
[1533535.458691]  <TASK>
[1533535.469154]  __schedule+0x4fa/0xbf0
[1533535.481433]  schedule+0x27/0xf0
[1533535.493181]  __get_guid+0xf4/0x130 [ipmi_msghandler]
[1533535.506325]  ? __pfx_autoremove_wake_function+0x10/0x10
[1533535.519910]  __bmc_get_device_id+0xd6/0xa30 [ipmi_msghandler]
[1533535.534459]  ? srso_return_thunk+0x5/0x5f
[1533535.546509]  ? srso_return_thunk+0x5/0x5f
[1533535.558540]  ? __memcg_slab_post_alloc_hook+0x21b/0x410
[1533535.571722]  aux_firmware_rev_show+0x38/0x90 [ipmi_msghandler]
[1533535.585304]  ? __kmalloc_node_noprof+0x3f6/0x450
[1533535.598144]  ? seq_read_iter+0x376/0x460
[1533535.609621]  dev_attr_show+0x1c/0x40
[1533535.621024]  sysfs_kf_seq_show+0x8f/0xe0
[1533535.632316]  seq_read_iter+0x11f/0x460
[1533535.643172]  ? security_file_permission+0x9/0xb0
[1533535.655102]  vfs_read+0x260/0x330
[1533535.665368]  ksys_read+0x65/0xe0
[1533535.675559]  do_syscall_64+0x4b/0x110
[1533535.686324]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[1533535.698530] RIP: 0033:0x7f72b587125d
[1533535.708857] RSP: 002b:00007ffccc21bb48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[1533535.723411] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f72b587125d
[1533535.737361] RDX: 0000000000020000 RSI: 00007f72b5755000 RDI: 0000000000000003
[1533535.751191] RBP: 0000000000020000 R08: 00000000ffffffff R09: 0000000000000000
[1533535.764847] R10: 00007f72b5788b60 R11: 0000000000000246 R12: 00007f72b5755000
[1533535.778536] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000000
[1533535.792210]  </TASK>

crash> bt -l 1781073
PID: 1781073  TASK: ffff9d91c7040000  CPU: 81   COMMAND: "/usr/bin/python"
 #0 [ffffb3a171683c00] __schedule at ffffffff9d559eea
    /cfsetup_build/build/linux/kernel/sched/core.c: 5338
 #1 [ffffb3a171683c80] schedule at ffffffff9d55a617
    /cfsetup_build/build/linux/arch/x86/include/asm/preempt.h: 84
 #2 [ffffb3a171683c90] __get_guid at ffffffffc22aa574 [ipmi_msghandler]
 #3 [ffffb3a171683ce8] __bmc_get_device_id at ffffffffc22aa696 [ipmi_msghandler]
 #4 [ffffb3a171683da0] aux_firmware_rev_show at ffffffffc22ab1c8 [ipmi_msghandler]
 #5 [ffffb3a171683dd0] dev_attr_show at ffffffff9d1175dc
    /cfsetup_build/build/linux/drivers/base/core.c: 2425
 #6 [ffffb3a171683de8] sysfs_kf_seq_show at ffffffff9cc64caf
    /cfsetup_build/build/linux/fs/sysfs/file.c: 60
 #7 [ffffb3a171683e10] seq_read_iter at ffffffff9cbddf7f
    /cfsetup_build/build/linux/fs/seq_file.c: 230
 #8 [ffffb3a171683e68] vfs_read at ffffffff9cba8590
    /cfsetup_build/build/linux/fs/read_write.c: 489
 #9 [ffffb3a171683f00] ksys_read at ffffffff9cba9165
    /cfsetup_build/build/linux/fs/read_write.c: 713
#10 [ffffb3a171683f38] do_syscall_64 at ffffffff9d550c8b
    /cfsetup_build/build/linux/arch/x86/entry/common.c: 52
#11 [ffffb3a171683f50] entry_SYSCALL_64_after_hwframe at ffffffff9d60012f
    /cfsetup_build/build/linux/arch/x86/entry/entry_64.S: 130
    RIP: 00007f04e1b7c29c  RSP: 00007ffea7aaf6c0  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000a840f8  RCX: 00007f04e1b7c29c
    RDX: 0000000000001001  RSI: 000000002fd06ef0  RDI: 00000000000000c1
    RBP: 00007f04e1a82fc0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000001001
    R13: 000000002fd06ef0  R14: 00000000000000c1  R15: 0000000000a41520
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

crash> files 1781073
...
193 ffff9db5132e5800 ffff9dafb18bd200 ffff9da7b780bcf0 REG  /sys/devices/platform/ipmi_bmc.0/aux_firmware_revision

crash> log -c
...
[1533553.998160] [      C7] ipmi_si IPI0001:00: KCS in invalid state 6
[1533554.009156] [      C7] ipmi_si IPI0001:00: KCS in invalid state 8
[1533554.019973] [T1844873] ipmi_si IPI0001:00: KCS in invalid state 9
[1533554.031005] [     C81] ipmi_si IPI0001:00: IPMI message handler: device id fetch failed: 0xd5

[Openipmi-developer] [PATCH] dt-bindings: ipmi: aspeed, ast2400-kcs-bmc: Add missing "clocks" property

From: Rob H. (Arm) <ro...@ke...> - 2025-08-07 13:29:12

The ASpeed kcs-bmc nodes have a "clocks" property which isn't
documented. It looks like all the LPC child devices have the same clock
source and some of the drivers manage their clock. Perhaps it is the
parent device that should have the clock, but it's too late for that.

Signed-off-by: Rob Herring (Arm) <ro...@ke...>
---
 .../devicetree/bindings/ipmi/aspeed,ast2400-kcs-bmc.yaml       | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/ipmi/aspeed,ast2400-kcs-bmc.yaml b/Documentation/devicetree/bindings/ipmi/aspeed,ast2400-kcs-bmc.yaml
index 129e32c4c774..610c79863208 100644
--- a/Documentation/devicetree/bindings/ipmi/aspeed,ast2400-kcs-bmc.yaml
+++ b/Documentation/devicetree/bindings/ipmi/aspeed,ast2400-kcs-bmc.yaml
@@ -40,6 +40,9 @@ properties:
       - description: ODR register
       - description: STR register
 
+  clocks:
+    maxItems: 1
+
   aspeed,lpc-io-reg:
     $ref: /schemas/types.yaml#/definitions/uint32-array
     minItems: 1
-- 
2.47.2

71 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 .. 292 > >> (Page 1 of 292)