kgdb-bugreport Mailing List for kgdb
Status: Beta
Brought to you by:
jwessel
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(10) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(22) |
Feb
(7) |
Mar
(14) |
Apr
(9) |
May
(4) |
Jun
(7) |
Jul
(17) |
Aug
(15) |
Sep
(10) |
Oct
(9) |
Nov
(10) |
Dec
(27) |
2003 |
Jan
(7) |
Feb
(7) |
Mar
(9) |
Apr
(18) |
May
(14) |
Jun
(4) |
Jul
(18) |
Aug
(18) |
Sep
(8) |
Oct
(10) |
Nov
(6) |
Dec
(32) |
2004 |
Jan
(103) |
Feb
(131) |
Mar
(161) |
Apr
(58) |
May
(38) |
Jun
(16) |
Jul
(60) |
Aug
(50) |
Sep
(115) |
Oct
(153) |
Nov
(117) |
Dec
(87) |
2005 |
Jan
(40) |
Feb
(20) |
Mar
(73) |
Apr
(40) |
May
(22) |
Jun
(54) |
Jul
(61) |
Aug
(45) |
Sep
(21) |
Oct
(36) |
Nov
(79) |
Dec
(88) |
2006 |
Jan
(180) |
Feb
(51) |
Mar
(119) |
Apr
(108) |
May
(68) |
Jun
(44) |
Jul
(47) |
Aug
(95) |
Sep
(175) |
Oct
(100) |
Nov
(27) |
Dec
(47) |
2007 |
Jan
(25) |
Feb
(41) |
Mar
(50) |
Apr
(43) |
May
(210) |
Jun
(62) |
Jul
(109) |
Aug
(94) |
Sep
(77) |
Oct
(38) |
Nov
(48) |
Dec
(56) |
2008 |
Jan
(41) |
Feb
(20) |
Mar
(59) |
Apr
(46) |
May
(50) |
Jun
(152) |
Jul
(149) |
Aug
(33) |
Sep
(53) |
Oct
(89) |
Nov
(55) |
Dec
(68) |
2009 |
Jan
(59) |
Feb
(94) |
Mar
(105) |
Apr
(264) |
May
(216) |
Jun
(123) |
Jul
(290) |
Aug
(223) |
Sep
(10) |
Oct
(2) |
Nov
(1) |
Dec
(88) |
2010 |
Jan
(133) |
Feb
(148) |
Mar
(45) |
Apr
(35) |
May
(42) |
Jun
(18) |
Jul
(68) |
Aug
(62) |
Sep
(26) |
Oct
(48) |
Nov
(46) |
Dec
(10) |
2011 |
Jan
|
Feb
|
Mar
(12) |
Apr
(12) |
May
(10) |
Jun
(12) |
Jul
(12) |
Aug
(10) |
Sep
(13) |
Oct
(3) |
Nov
(11) |
Dec
(6) |
2012 |
Jan
(4) |
Feb
(36) |
Mar
(76) |
Apr
(4) |
May
(1) |
Jun
|
Jul
(69) |
Aug
(15) |
Sep
(76) |
Oct
(11) |
Nov
(40) |
Dec
|
2013 |
Jan
(32) |
Feb
(7) |
Mar
(47) |
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(16) |
Nov
(15) |
Dec
|
2014 |
Jan
(1) |
Feb
|
Mar
(2) |
Apr
(28) |
May
(12) |
Jun
|
Jul
(12) |
Aug
(8) |
Sep
(13) |
Oct
(53) |
Nov
(22) |
Dec
(13) |
2015 |
Jan
(86) |
Feb
(15) |
Mar
(4) |
Apr
|
May
(1) |
Jun
(2) |
Jul
(6) |
Aug
(3) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(29) |
Oct
(5) |
Nov
(3) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(24) |
2018 |
Jan
(18) |
Feb
(2) |
Mar
|
Apr
(11) |
May
(26) |
Jun
(7) |
Jul
(7) |
Aug
(32) |
Sep
(32) |
Oct
(47) |
Nov
(49) |
Dec
(35) |
2019 |
Jan
(3) |
Feb
(11) |
Mar
(44) |
Apr
(18) |
May
(13) |
Jun
(11) |
Jul
(25) |
Aug
(3) |
Sep
(11) |
Oct
(45) |
Nov
(52) |
Dec
|
2020 |
Jan
(10) |
Feb
(7) |
Mar
(13) |
Apr
(102) |
May
(146) |
Jun
(125) |
Jul
(97) |
Aug
(94) |
Sep
(44) |
Oct
(59) |
Nov
(20) |
Dec
(7) |
2021 |
Jan
(47) |
Feb
(61) |
Mar
(44) |
Apr
(3) |
May
(40) |
Jun
(42) |
Jul
(31) |
Aug
(18) |
Sep
(9) |
Oct
(2) |
Nov
(5) |
Dec
(1) |
2022 |
Jan
(41) |
Feb
(80) |
Mar
(72) |
Apr
(28) |
May
(18) |
Jun
(2) |
Jul
(5) |
Aug
(3) |
Sep
(16) |
Oct
(28) |
Nov
(57) |
Dec
(4) |
2023 |
Jan
(8) |
Feb
|
Mar
(34) |
Apr
(23) |
May
(127) |
Jun
(63) |
Jul
(11) |
Aug
(78) |
Sep
(13) |
Oct
(4) |
Nov
(12) |
Dec
(18) |
2024 |
Jan
|
Feb
(2) |
Mar
(30) |
Apr
(77) |
May
(2) |
Jun
(38) |
Jul
|
Aug
(23) |
Sep
(6) |
Oct
(22) |
Nov
(10) |
Dec
(25) |
2025 |
Jan
(52) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(20) |
Jul
(6) |
Aug
(17) |
Sep
(24) |
Oct
(8) |
Nov
|
Dec
|
From: Petr M. <pm...@su...> - 2025-10-06 13:58:56
|
On Wed 2025-10-01 17:02:41, John Ogness wrote: > On 2025-09-30, Marcos Paulo de Souza <mpd...@su...> wrote: > > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c > > index 558ef31779760340ce42608294d91d5401239f1d..c23abed5933527cb7c6bcc42057fadbb44a43446 100644 > > --- a/kernel/printk/nbcon.c > > +++ b/kernel/printk/nbcon.c > > +/** > > + * nbcon_kdb_release - Exit unsafe section and release the nbcon console > > + * > > + * @wctxt: The nbcon write context initialized by a successful > > + * nbcon_kdb_try_acquire() > > + * > > + * Context: Under console_srcu_read_lock() for emiting a single kdb message > > emitting ---^^^^^^^ > > > + * using the given con->write_atomic() callback. Can be called > > + * only when the console is usable at the moment. > > I do not think the "Context" is relevant. It must be called if > a previous call to nbcon_kdb_try_acquire() was successful. Makes sense. I am fine with removing the "Context:" secion completely from nbcon_kdb_release(). Just to be sure. I think that we do not need to mention that it can be called only when nbcon_kdb_try_acquire() succeeded. It is kind of obvious. Best Regards, Petr > > + */ > > +void nbcon_kdb_release(struct nbcon_write_context *wctxt) > > +{ > > + struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt); > > + > > + if (!nbcon_context_exit_unsafe(ctxt)) > > + return; > > + > > + nbcon_context_release(ctxt); > > + > > + /* > > + * Flush any new printk() messages added when the console was blocked. > > + * Only the console used by the given write context was blocked. > > + * The console was locked only when the write_atomic() callback > > + * was usable. > > + */ > > + __nbcon_atomic_flush_pending_con(ctxt->console, > > + prb_next_reserve_seq(prb), false); > > This can all be one line. 100 characters is the official limit for code. > > > +} |
From: John O. <joh...@li...> - 2025-10-06 08:31:27
|
On 2025-10-02, Marcos Paulo de Souza <mpd...@su...> wrote: > I fixed all your suggestions and published my branch here[1]. The only > change that I didn't fixed was about the "Context:" change that you > suggested, since I'll let Petr to thing about it, since he usually has > comments about it. > > [1]: https://github.com/marcosps/linux/tree/nbcon-kgdboc-v5 Thanks. For v6 you can add: Reviewed-by: John Ogness <joh...@li...> for the rest of the series (patches 1-4). John |
From: Doug A. <dia...@ch...> - 2025-10-03 16:45:24
|
Hi, On Fri, Oct 3, 2025 at 8:13 AM Matvey Kovalev <mat...@is...> wrote: > > Bits of scancode are dropped except 7 low-order ones. > So, scancode can't be equal 0xe0. > > Found by Linux Verification Center (linuxtesting.org) with SVACE. > > Signed-off-by: Matvey Kovalev <mat...@is...> > --- > kernel/debug/kdb/kdb_keyboard.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/kernel/debug/kdb/kdb_keyboard.c b/kernel/debug/kdb/kdb_keyboard.c > index 3a74604fdb8a7..386d30e530b78 100644 > --- a/kernel/debug/kdb/kdb_keyboard.c > +++ b/kernel/debug/kdb/kdb_keyboard.c > @@ -145,9 +145,6 @@ int kdb_get_kbd_char(void) > return CTRL('F'); > } > > - if (scancode == 0xe0) > - return -1; > - The same patch has already been sent and was landed: https://lore.kernel.org/all/202...@gm.../ |
From: John O. <joh...@li...> - 2025-10-01 15:22:04
|
On 2025-09-30, Marcos Paulo de Souza <mpd...@su...> wrote: > Function kdb_msg_write was calling con->write for any found console, > but it won't work on NBCON consoles. In this case we should acquire the > ownership of the console using NBCON_PRIO_EMERGENCY, since printing > kdb messages should only be interrupted by a panic. > > At this point, the console is required to use the atomic callback. The > console is skipped if the write_atomic callback is not set or if the > context could not be acquired. The validation of NBCON is done by the > console_is_usable helper. The context is released right after > write_atomic finishes. > > The oops_in_progress handling is only needed in the legacy consoles, > so it was moved around the con->write callback. > > Suggested-by: Petr Mladek <pm...@su...> > Reviewed-by: Petr Mladek <pm...@su...> > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> Reviewed-by: John Ogness <joh...@li...> |
From: John O. <joh...@li...> - 2025-10-01 15:09:09
|
On 2025-09-30, Marcos Paulo de Souza <mpd...@su...> wrote: > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c > index 6e4641350fe8985438f53bcd32f3adf72d9d6835..2492b14bd272562378c4cdb465eea2269638e127 100644 > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -854,7 +854,7 @@ static bool __nbcon_context_update_unsafe(struct nbcon_context *ctxt, bool unsaf > return nbcon_context_can_proceed(ctxt, &cur); > } > > -static void nbcon_write_context_set_buf(struct nbcon_write_context *wctxt, > +void nbcon_write_context_set_buf(struct nbcon_write_context *wctxt, > char *buf, unsigned int len) Could we also update the whitespace on the 2nd line? And it would all fit on one line if you want. John |
From: John O. <joh...@li...> - 2025-10-01 15:05:29
|
On 2025-09-30, Marcos Paulo de Souza <mpd...@su...> wrote: > diff --git a/include/linux/kdb.h b/include/linux/kdb.h > index ecbf819deeca118f27e98bf71bb37dd27a257ebb..36b46c85733fe1df28cde7760e5c26e96de40c05 100644 > --- a/include/linux/kdb.h > +++ b/include/linux/kdb.h > @@ -207,11 +207,26 @@ static inline const char *kdb_walk_kallsyms(loff_t *pos) > /* Dynamic kdb shell command registration */ > extern int kdb_register(kdbtab_t *cmd); > extern void kdb_unregister(kdbtab_t *cmd); > + > +/* Return true when KDB as locked for printing a message on this CPU. */ > +static inline > +bool kdb_printf_on_this_cpu(void) > +{ > + /* > + * We can use raw_smp_processor_id() here because the task could > + * not get migrated when KDB has locked for printing on this CPU. > + */ > + return unlikely(READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id()); > +} > + > #else /* ! CONFIG_KGDB_KDB */ > static inline __printf(1, 2) int kdb_printf(const char *fmt, ...) { return 0; } > static inline void kdb_init(int level) {} > static inline int kdb_register(kdbtab_t *cmd) { return 0; } > static inline void kdb_unregister(kdbtab_t *cmd) {} > + > +static inline bool kdb_printf_on_this_cpu(void) { return false }; > + > #endif /* CONFIG_KGDB_KDB */ > enum { > KDB_NOT_INITIALIZED, The include for raw_smp_processor_id() is also needed: diff --git a/include/linux/kdb.h b/include/linux/kdb.h index 36b46c85733fe..db9d73b12a1af 100644 --- a/include/linux/kdb.h +++ b/include/linux/kdb.h @@ -14,6 +14,7 @@ */ #include <linux/list.h> +#include <linux/smp.h> /* Shifted versions of the command enable bits are be used if the command * has no arguments (see kdb_check_flags). This allows commands, such as |
From: John O. <joh...@li...> - 2025-10-01 14:56:49
|
On 2025-09-30, Marcos Paulo de Souza <mpd...@su...> wrote: > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c > index 558ef31779760340ce42608294d91d5401239f1d..c23abed5933527cb7c6bcc42057fadbb44a43446 100644 > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -1855,3 +1855,69 @@ void nbcon_device_release(struct console *con) > console_srcu_read_unlock(cookie); > } > EXPORT_SYMBOL_GPL(nbcon_device_release); > + > +/** > + * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe > + * section, and initialized nbcon write context initialize ---^^^^^^^^^^^ And technically it is not initializing the write context, just the console ownership context. I'm not sure it is really necessary to mention that. > + * @con: The nbcon console to acquire > + * @wctxt: The nbcon write context to be used on success > + * > + * Context: Under console_srcu_read_lock() for emiting a single kdb message emitting ---^^^^^^^ "./scripts/checkpatch.pl --codespell" is your friend. ;-) > + * using the given con->write_atomic() callback. Can be called > + * only when the console is usable at the moment. > + * > + * Return: True if the console was acquired. False otherwise. > + * > + * kdb emits messages on consoles registered for printk() without > + * storing them into the ring buffer. It has to acquire the console > + * ownerhip so that it could call con->write_atomic() callback a safe way. > + * > + * This function acquires the nbcon console using priority NBCON_PRIO_EMERGENCY > + * and marks it unsafe for handover/takeover. > + */ > +bool nbcon_kdb_try_acquire(struct console *con, > + struct nbcon_write_context *wctxt) > +{ > + struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt); > + > + memset(ctxt, 0, sizeof(*ctxt)); > + ctxt->console = con; > + ctxt->prio = NBCON_PRIO_EMERGENCY; > + > + if (!nbcon_context_try_acquire(ctxt, false)) > + return false; > + > + if (!nbcon_context_enter_unsafe(ctxt)) > + return false; > + > + return true; > +} > + > +/** > + * nbcon_kdb_release - Exit unsafe section and release the nbcon console > + * > + * @wctxt: The nbcon write context initialized by a successful > + * nbcon_kdb_try_acquire() > + * > + * Context: Under console_srcu_read_lock() for emiting a single kdb message emitting ---^^^^^^^ > + * using the given con->write_atomic() callback. Can be called > + * only when the console is usable at the moment. I do not think the "Context" is relevant. It must be called if a previous call to nbcon_kdb_try_acquire() was successful. > + */ > +void nbcon_kdb_release(struct nbcon_write_context *wctxt) > +{ > + struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt); > + > + if (!nbcon_context_exit_unsafe(ctxt)) > + return; > + > + nbcon_context_release(ctxt); > + > + /* > + * Flush any new printk() messages added when the console was blocked. > + * Only the console used by the given write context was blocked. > + * The console was locked only when the write_atomic() callback > + * was usable. > + */ > + __nbcon_atomic_flush_pending_con(ctxt->console, > + prb_next_reserve_seq(prb), false); This can all be one line. 100 characters is the official limit for code. > +} |
From: John O. <joh...@li...> - 2025-10-01 14:36:48
|
On 2025-09-30, Marcos Paulo de Souza <mpd...@su...> wrote: > The helper will be used on KDB code in the next commits. > > Reviewed-by: Petr Mladek <pm...@su...> > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> > --- > include/linux/console.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ > kernel/printk/internal.h | 44 -------------------------------------------- > 2 files changed, 44 insertions(+), 44 deletions(-) > > diff --git a/include/linux/console.h b/include/linux/console.h > index 8f10d0a85bb4536e4b0dda4e8ccbdf87978bbb4a..67af483574727c00eea1d5a1eacc994755c92607 100644 > --- a/include/linux/console.h > +++ b/include/linux/console.h > @@ -605,6 +605,48 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt); > extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt); > extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt); > extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt); > + > +/* > + * Check if the given console is currently capable and allowed to print > + * records. Note that this function does not consider the current context, > + * which can also play a role in deciding if @con can be used to print > + * records. > + */ > +static inline bool console_is_usable(struct console *con, short flags, bool use_atomic) > +{ > + if (!(flags & CON_ENABLED)) > + return false; > + > + if ((flags & CON_SUSPENDED)) > + return false; > + > + if (flags & CON_NBCON) { > + /* The write_atomic() callback is optional. */ > + if (use_atomic && !con->write_atomic) > + return false; > + > + /* > + * For the !use_atomic case, @printk_kthreads_running is not > + * checked because the write_thread() callback is also used > + * via the legacy loop when the printer threads are not > + * available. > + */ > + } else { > + if (!con->write) > + return false; > + } > + > + /* > + * Console drivers may assume that per-cpu resources have been > + * allocated. So unless they're explicitly marked as being able to > + * cope (CON_ANYTIME) don't call them until this CPU is officially up. > + */ > + if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME)) > + return false; > + > + return true; > +} > + > #else > static inline void nbcon_cpu_emergency_enter(void) { } > static inline void nbcon_cpu_emergency_exit(void) { } > @@ -612,6 +654,8 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return > static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; } > static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; } > static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { } > +static inline bool console_is_usable(struct console *con, short flags, > + bool use_atomic) { return false; } > #endif > > extern int console_set_on_cmdline; > diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h > index f72bbfa266d6c9bbc533661c40386aa5f0df6c8f..7238da161ff9814fe8a22e4836624e50ee5b71d3 100644 > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -112,47 +112,6 @@ bool nbcon_kthread_create(struct console *con); > void nbcon_kthread_stop(struct console *con); > void nbcon_kthreads_wake(void); > > -/* > - * Check if the given console is currently capable and allowed to print > - * records. Note that this function does not consider the current context, > - * which can also play a role in deciding if @con can be used to print > - * records. > - */ > -static inline bool console_is_usable(struct console *con, short flags, bool use_atomic) > -{ > - if (!(flags & CON_ENABLED)) > - return false; > - > - if ((flags & CON_SUSPENDED)) > - return false; > - > - if (flags & CON_NBCON) { > - /* The write_atomic() callback is optional. */ > - if (use_atomic && !con->write_atomic) > - return false; > - > - /* > - * For the !use_atomic case, @printk_kthreads_running is not > - * checked because the write_thread() callback is also used > - * via the legacy loop when the printer threads are not > - * available. > - */ > - } else { > - if (!con->write) > - return false; > - } > - > - /* > - * Console drivers may assume that per-cpu resources have been > - * allocated. So unless they're explicitly marked as being able to > - * cope (CON_ANYTIME) don't call them until this CPU is officially up. > - */ > - if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME)) > - return false; > - > - return true; > -} > - > /** > * nbcon_kthread_wake - Wake up a console printing thread > * @con: Console to operate on > @@ -204,9 +163,6 @@ static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *hand > static inline void nbcon_kthread_wake(struct console *con) { } > static inline void nbcon_kthreads_wake(void) { } > > -static inline bool console_is_usable(struct console *con, short flags, > - bool use_atomic) { return false; } > - > #endif /* CONFIG_PRINTK */ > > extern bool have_boot_console; This also needs the required includes moved over as well. smp.h is probably more appropriate than the higher level percpu.h: diff --git a/include/linux/console.h b/include/linux/console.h index b34c5a0b86303..9406342b27db4 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -19,6 +19,7 @@ #include <linux/irq_work.h> #include <linux/rculist.h> #include <linux/rcuwait.h> +#include <linux/smp.h> #include <linux/types.h> #include <linux/vesa.h> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 077927ed56c5f..7203b7b969c06 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -3,7 +3,6 @@ * internal.h - printk internal definitions */ #include <linux/console.h> -#include <linux/percpu.h> #include <linux/types.h> #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL) On a side note, we are missing <linux/rcuwait.h> in kernel/printk/internal.h. It currently relies on console.h as a proxy. But I guess that is out of scope for this series. John |
From: Petr M. <pm...@su...> - 2025-09-18 13:02:40
|
On Wed 2025-09-17 10:07:47, Konstantin Ryabitsev wrote: > On Wed, Sep 17, 2025 at 09:21:48AM -0300, Marcos Paulo de Souza wrote: > > > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> > > > Reviewed-by: Petr Mladek <pm...@su...> > > > > > > It means that Macros developed a patch and Petr reviewed it. > > > The patch was not longer modified. > > That's not entirely correct. The signed-off trailer is used as the boundary > indicating who was the person responsible for collecting the trailers. When > the trailers are collected by the original author as part of their iteration > cycles, their signed-off-by trailer goes below any trailers they have > received. This is an interesting interpretation. I am just curious. Has this already been discussed anywhere, please? It seems that the ordering of the trailers is not much described in the process documentation, see https://docs.kernel.org/process/submitting-patches.html Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-17 13:18:27
|
On Mon 2025-09-15 08:20:34, Marcos Paulo de Souza wrote: > Function kdb_msg_write was calling con->write for any found console, > but it won't work on NBCON consoles. In this case we should acquire the > ownership of the console using NBCON_PRIO_EMERGENCY, since printing > kdb messages should only be interrupted by a panic. > > At this point, the console is required to use the atomic callback. The > console is skipped if the write_atomic callback is not set or if the > context could not be acquired. The validation of NBCON is done by the > console_is_usable helper. The context is released right after > write_atomic finishes. > > The oops_in_progress handling is only needed in the legacy consoles, > so it was moved around the con->write callback. > > Suggested-by: Petr Mladek <pm...@su...> > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> It looks good to me: Reviewed-by: Petr Mladek <pm...@su...> See one note below. > --- a/kernel/debug/kdb/kdb_io.c > +++ b/kernel/debug/kdb/kdb_io.c > @@ -589,24 +589,41 @@ static void kdb_msg_write(const char *msg, int msg_len) > */ > cookie = console_srcu_read_lock(); > for_each_console_srcu(c) { > - if (!(console_srcu_read_flags(c) & CON_ENABLED)) > + short flags = console_srcu_read_flags(c); > + > + if (!console_is_usable(c, flags, true)) > continue; > if (c == dbg_io_ops->cons) > continue; > - if (!c->write) > - continue; > - /* > - * Set oops_in_progress to encourage the console drivers to > - * disregard their internal spin locks: in the current calling > - * context the risk of deadlock is a bigger problem than risks > - * due to re-entering the console driver. We operate directly on > - * oops_in_progress rather than using bust_spinlocks() because > - * the calls bust_spinlocks() makes on exit are not appropriate > - * for this calling context. > - */ > - ++oops_in_progress; > - c->write(c, msg, msg_len); > - --oops_in_progress; > + > + if (flags & CON_NBCON) { > + struct nbcon_write_context wctxt = { }; > + > + /* > + * Do not continue if the console is NBCON and the context > + * can't be acquired. > + */ > + if (!nbcon_kdb_try_acquire(c, &wctxt)) > + continue; > + > + nbcon_write_context_set_buf(&wctxt, (char *)msg, msg_len); I have overlooked the (char *) cast in the earlier versions of the patchset. It would be nice to fix the nbcon API so that the parameter could be passed as (const char *). It looks that the API using struct nbcon_write_context never modifies the given buffer so it would be the right thing. But it is beyond the scope of this patchset. It would be material for a separate code clean up ;-) Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-17 12:18:30
|
On Mon 2025-09-15 08:20:33, Marcos Paulo de Souza wrote: > This function will be used in the next patch to allow a driver to set > both the message and message length of a nbcon_write_context. This is > necessary because the function also initializes the ->unsafe_takeover > struct member. By using this helper we ensure that the struct is > initialized correctly. > > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> Reviewed-by: Petr Mladek <pm...@su...> Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-17 12:09:51
|
On Mon 2025-09-15 08:20:32, Marcos Paulo de Souza wrote: > KDB can interrupt any console to execute the "mirrored printing" at any > time, so add an exception to nbcon_context_try_acquire_direct to allow > to get the context if the current CPU is the same as kdb_printf_cpu. > > This change will be necessary for the next patch, which fixes > kdb_msg_write to work with NBCON consoles by calling ->write_atomic on > such consoles. But to print it first needs to acquire the ownership of > the console, so nbcon_context_try_acquire_direct is fixed here. > > --- a/include/linux/kdb.h > +++ b/include/linux/kdb.h > @@ -207,11 +207,17 @@ static inline const char *kdb_walk_kallsyms(loff_t *pos) > /* Dynamic kdb shell command registration */ > extern int kdb_register(kdbtab_t *cmd); > extern void kdb_unregister(kdbtab_t *cmd); > + > +#define KDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id()) The condition looks inverted. It should be true when the CPU ID matches. I actually think about using similar approach and naming scheme as for the similar API checking @panic_cpu. There are patches in -mm tree which consolidated that API, see https://lore.kernel.org/r/202...@gm... In our case, the similar API would be: /* Return true when KDB has locked for printing a message on this CPU. */ static inline bool kdb_printf_on_this_cpu(void) { /* * We can use raw_smp_processor_id() here because the task could * not get migrated when KDB has locked for printing on this CPU. */ return unlikely(READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id()); } > + > #else /* ! CONFIG_KGDB_KDB */ > static inline __printf(1, 2) int kdb_printf(const char *fmt, ...) { return 0; } > static inline void kdb_init(int level) {} > static inline int kdb_register(kdbtab_t *cmd) { return 0; } > static inline void kdb_unregister(kdbtab_t *cmd) {} > + > +#define KDB_IS_ACTIVE() false and here to match the style above: static inline bool kdb_printf_on_this_cpu(void) { return false }; > + > #endif /* CONFIG_KGDB_KDB */ > enum { > KDB_NOT_INITIALIZED, > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c > index ff218e95a505fd10521c2c4dfb00ad5ec5773953..8644e019e2391797e623fcc124d37ed4d460ccd9 100644 > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -248,13 +249,17 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt, > * since all non-panic CPUs are stopped during panic(), it > * is safer to have them avoid gaining console ownership. > * > - * If this acquire is a reacquire (and an unsafe takeover > + * One exception is if kdb is active, which may print > + * from multiple CPUs during a panic. Also here the "active" is a bit ambiguous term. I would use: * One exception is when kdb has locked for printing on this * CPU. > + * > + * Second exception is a reacquire (and an unsafe takeover > * has not previously occurred) then it is allowed to attempt > * a direct acquire in panic. This gives console drivers an > * opportunity to perform any necessary cleanup if they were > * interrupted by the panic CPU while printing. > */ > if (other_cpu_in_panic() && > + !KDB_IS_ACTIVE() && > (!is_reacquire || cur->unsafe_takeover)) { > return -EPERM; > } I am sorry that I did not suggested the better names already when this new API was discussed in v3. Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-17 11:16:55
|
On Mon 2025-09-15 08:20:31, Marcos Paulo de Souza wrote: > These helpers will be used when calling console->write_atomic on > KDB code in the next patch. It's basically the same implementation > as nbcon_device_try_acquire, but using NBCON_PRIO_EMERGENCY when > acquiring the context. > > If the acquire succeeds, the message and message length are assigned to > nbcon_write_context so ->write_atomic can print the message. > > After release try to flush the console since there may be a backlog of > messages in the ringbuffer. The kthread console printers do not get a > chance to run while kdb is active. > > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> Reviewed-by: Petr Mladek <pm...@su...> Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-17 08:57:57
|
On Mon 2025-09-15 08:20:30, Marcos Paulo de Souza wrote: > The helper will be used on KDB code in the next commits. > > Reviewed-by: Petr Mladek <pm...@su...> > Signed-off-by: Marcos Paulo de Souza <mpd...@su...> Nit :-) The ordering of the tags is important. It defines the timeline of the related actions. The above ordering might be understood as that Petr reviewed an older version of patch. But it was later modified by Marcos. The expected ordering is: Signed-off-by: Marcos Paulo de Souza <mpd...@su...> Reviewed-by: Petr Mladek <pm...@su...> It means that Macros developed a patch and Petr reviewed it. The patch was not longer modified. Note the Reviewed-by tag might be preserved even when Marcos later did some cosmetic changes, e.g. fixed a typo, formatting, or rebased. Best Regards, Petr PS: There is no need to send v5 just because of this. I could fix it when committing the patch... |
From: Petr M. <pm...@su...> - 2025-09-09 15:03:33
|
On Tue 2025-09-09 16:29:50, John Ogness wrote: > On 2025-09-09, Petr Mladek <pm...@su...> wrote: > > The problem is that wctxt->unsafe_takeover would need to get updated > > after acquiring the context. And might be reused for different > > consoles, ... > > You are right. I think it is best to make nbcon_write_context_set_buf() > available. I am fine with it. > > But wait. I do not see any code using wctxt->unsafe_takeover. > > > > It seems that the motivation was that console drivers might > > do something else when there was an unsafe_takeover in the past, > > see https://lore.kernel.org/lkml/87c...@jo.../ > > But it seems that no console driver is using it. > > > > So, I would prefer to remove the "unsafe_takeover" field from > > struct nbcon_write_context and keep this kdb code as it is now. > > No one is using it because the nbcon drivers are still implementing the > "hope and pray" model on unsafe takeovers. The flag is an important part > of the API to allow drivers to maximize their safety. > > I think it is too early to remove the flag when there are still so few > nbcon drivers in existance. I feel that that I should be more strict and push for removing the flag because it is not used and complicates the design. I am sure that there are cleaner ways how to provide the information when anyone would need it. But I do not want to fight for it. It is not worth a blood (changing code back and forth). I am fine with exporting nbcon_write_context_set_buf() for now. We might know more about real users next time it causes problems ;-) Best Regards, Petr |
From: John O. <joh...@li...> - 2025-09-09 14:38:53
|
On 2025-09-09, Petr Mladek <pm...@su...> wrote: >> > Honestly, I think that the flush is not much important because >> > it will most offten have nothing to do. >> > >> > I am just not sure whether it is better to have it there >> > or avoid it. It might be better to remove it after all. >> > And just document the decision. >> >> IMHO keeping the flush is fine. There are cases where there might be >> something to print. And since a printing kthread will get no chance to >> print as long as kdb is alive, we should have kdb flushing that >> console. >> >> Note that this is the only console that will actually see the new >> messages immediately as all the other CPUs and quiesced. > > I do not understand this argument. IMHO, this new > try_acquire()/release() API should primary flush only > the console which was (b)locked by this API. > > It will be called in kdb_msg_write() which tries to write > to all registered consoles. So the other nbcon consoles will > get flushed when the try_acquire() succeeds on them. And the > legacy conosles were never flushed. Right. I oversaw that it acquires each of the nbcon's. > I would prefer to keep __nbcon_atomic_flush_pending_con(). > I mean to flush only the console which was blocked. Agreed. >> After release try to flush all consoles since there may be a backlog of >> messages in the ringbuffer. The kthread console printers do not get a >> chance to run while kdb is active. > > I like this text. OK, but then change it to talk only about the one console. After release try to flush the console since there may be a backlog of messages in the ringbuffer. The kthread console printers do not get a chance to run while kdb is active. John |
From: John O. <joh...@li...> - 2025-09-09 14:24:01
|
On 2025-09-09, Petr Mladek <pm...@su...> wrote: > The problem is that wctxt->unsafe_takeover would need to get updated > after acquiring the context. And might be reused for different > consoles, ... You are right. I think it is best to make nbcon_write_context_set_buf() available. > But wait. I do not see any code using wctxt->unsafe_takeover. > > It seems that the motivation was that console drivers might > do something else when there was an unsafe_takeover in the past, > see https://lore.kernel.org/lkml/87c...@jo.../ > But it seems that no console driver is using it. > > So, I would prefer to remove the "unsafe_takeover" field from > struct nbcon_write_context and keep this kdb code as it is now. No one is using it because the nbcon drivers are still implementing the "hope and pray" model on unsafe takeovers. The flag is an important part of the API to allow drivers to maximize their safety. I think it is too early to remove the flag when there are still so few nbcon drivers in existance. John |
From: Petr M. <pm...@su...> - 2025-09-09 14:21:32
|
On Mon 2025-09-08 14:15:08, John Ogness wrote: > On 2025-09-05, Petr Mladek <pm...@su...> wrote: > > On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote: > >> These helpers will be used when calling console->write_atomic on > >> KDB code in the next patch. It's basically the same implementaion > >> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when > >> acquiring the context. > >> > >> For release we need to flush the console, since some messages could be > >> added before the context was acquired, as KDB emits the messages using > >> con->{write,write_atomic} instead of storing them on the ring buffer. > > > > I am a bit confused by the last paragraph. It is a very long sentence. > > > > Sigh, I wanted to propose a simple and clear alternative. But I ended > > in a rabbit hole and with a rather complex text: > > > > <proposal> > > The atomic flush in the release function is questionable. vkdb_printf() > > is primary called only when other CPUs are quiescent in kdb_main_loop() > > and do not call the classic printk(). But, for example, the > > write_atomic() callback might print debug messages. Or there is > > one kdb_printf() called in kgdb_panic() before other CPUs are > > quiescent. So the flush might be useful. Especially, when > > the kdb code fails to quiescent the CPUs and returns early. > > > > Let's keep it simple and just call __nbcon_atomic_flush_pending_con(). > > It uses write_atomic() callback which is used by the locked kdb code > > anyway. > > > > The legacy loop (console_trylock()/console_unlock()) is not > > usable in kdb context. > > > > It might make sense to trigger the flush via the printk kthread. > > But it would not work in panic() where is the only known kdb_printf() > > called when other CPUs are not quiescent. So, it does not look > > worth it. > > </proposal> > > > > What do you think? > > > > My opinion: > > > > Honestly, I think that the flush is not much important because > > it will most offten have nothing to do. > > > > I am just not sure whether it is better to have it there > > or avoid it. It might be better to remove it after all. > > And just document the decision. > > IMHO keeping the flush is fine. There are cases where there might be > something to print. And since a printing kthread will get no chance to > print as long as kdb is alive, we should have kdb flushing that > console. > > Note that this is the only console that will actually see the new > messages immediately as all the other CPUs and quiesced. I do not understand this argument. IMHO, this new try_acquire()/release() API should primary flush only the console which was (b)locked by this API. It will be called in kdb_msg_write() which tries to write to all registered consoles. So the other nbcon consoles will get flushed when the try_acquire() succeeds on them. And the legacy conosles were never flushed. > For this reason > we probably want to use __nbcon_atomic_flush_pending() to try to flush > _all_ the consoles. I would prefer to keep __nbcon_atomic_flush_pending_con(). I mean to flush only the console which was blocked. Note that we would need to increment oops_in_progress if we wanted to flush legacy consoles in this context... which would spread the mess into nbcon code... > As to the last paragraph of the commit message, I would keep it simple: > > After release try to flush all consoles since there may be a backlog of > messages in the ringbuffer. The kthread console printers do not get a > chance to run while kdb is active. I like this text. Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-09 13:48:49
|
On Tue 2025-09-09 10:03:05, John Ogness wrote: > On 2025-09-08, Marcos Paulo de Souza <mpd...@su...> wrote: > >> > --- a/kernel/debug/kdb/kdb_io.c > >> > +++ b/kernel/debug/kdb/kdb_io.c > >> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg, > >> > int msg_len) > >> > */ > >> > cookie = console_srcu_read_lock(); > >> > for_each_console_srcu(c) { > >> > - if (!(console_srcu_read_flags(c) & CON_ENABLED)) > >> > + struct nbcon_write_context wctxt = { }; > >> > + short flags = console_srcu_read_flags(c); > >> > + > >> > + if (!console_is_usable(c, flags, true)) > >> > continue; > >> > if (c == dbg_io_ops->cons) > >> > continue; > >> > - if (!c->write) > >> > - continue; > >> > - /* > >> > - * Set oops_in_progress to encourage the console > >> > drivers to > >> > - * disregard their internal spin locks: in the > >> > current calling > >> > - * context the risk of deadlock is a bigger > >> > problem than risks > >> > - * due to re-entering the console driver. We > >> > operate directly on > >> > - * oops_in_progress rather than using > >> > bust_spinlocks() because > >> > - * the calls bust_spinlocks() makes on exit are > >> > not appropriate > >> > - * for this calling context. > >> > - */ > >> > - ++oops_in_progress; > >> > - c->write(c, msg, msg_len); > >> > - --oops_in_progress; > >> > + > >> > + if (flags & CON_NBCON) { > >> > + /* > >> > + * Do not continue if the console is NBCON > >> > and the context > >> > + * can't be acquired. > >> > + */ > >> > + if (!nbcon_kdb_try_acquire(c, &wctxt)) > >> > + continue; > >> > + > >> > + wctxt.outbuf = (char *)msg; > >> > + wctxt.len = msg_len; > >> > >> I double checked whether we initialized all members of the structure > >> correctly. And I found that we didn't. We should call here: > >> > >> nbcon_write_context_set_buf(&wctxt, > >> &pmsg.pbufs- > >> >outbuf[0], > >> > >> pmsg.outbuf_len); > > Nice catch. > > >> Sigh, this is easy to miss. I remember that I was not super happy > >> about this design. I looked for details. I described my concerns at https://lore.kernel.org/lkml/ZNY5gPNyyw9RTo4X@alley/#t > >> But the original code initialized the structure > >> on a single place... > > > > Ok, so I'll need to also export nbcon_write_context_set_buf, since it's > > currently static inside kernel/printk/nbcon.c. I'll do it for the next > > version. > > How about modifying nbcon_kdb_try_acquire() to also take @msg and > @msg_len. Then, on success, @wctxt is already prepared. I do not like > the idea of external code fiddling with the fields. I was thinking about another solution, e.g. an nbcon_wctxt_init() function. The problem is that wctxt->unsafe_takeover would need to get updated after acquiring the context. And might be reused for different consoles, ... But wait. I do not see any code using wctxt->unsafe_takeover. It seems that the motivation was that console drivers might do something else when there was an unsafe_takeover in the past, see https://lore.kernel.org/lkml/87c...@jo.../ But it seems that no console driver is using it. So, I would prefer to remove the "unsafe_takeover" field from struct nbcon_write_context and keep this kdb code as it is now. We could always add it back when really needed. Alternatively, we could create an API which could read this information from struct wctxt.ctxt.con. But I would create this API only when there is an user. Best Regards, Petr |
From: John O. <joh...@li...> - 2025-09-09 07:57:14
|
On 2025-09-08, Marcos Paulo de Souza <mpd...@su...> wrote: >> > --- a/kernel/debug/kdb/kdb_io.c >> > +++ b/kernel/debug/kdb/kdb_io.c >> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg, >> > int msg_len) >> > */ >> > cookie = console_srcu_read_lock(); >> > for_each_console_srcu(c) { >> > - if (!(console_srcu_read_flags(c) & CON_ENABLED)) >> > + struct nbcon_write_context wctxt = { }; >> > + short flags = console_srcu_read_flags(c); >> > + >> > + if (!console_is_usable(c, flags, true)) >> > continue; >> > if (c == dbg_io_ops->cons) >> > continue; >> > - if (!c->write) >> > - continue; >> > - /* >> > - * Set oops_in_progress to encourage the console >> > drivers to >> > - * disregard their internal spin locks: in the >> > current calling >> > - * context the risk of deadlock is a bigger >> > problem than risks >> > - * due to re-entering the console driver. We >> > operate directly on >> > - * oops_in_progress rather than using >> > bust_spinlocks() because >> > - * the calls bust_spinlocks() makes on exit are >> > not appropriate >> > - * for this calling context. >> > - */ >> > - ++oops_in_progress; >> > - c->write(c, msg, msg_len); >> > - --oops_in_progress; >> > + >> > + if (flags & CON_NBCON) { >> > + /* >> > + * Do not continue if the console is NBCON >> > and the context >> > + * can't be acquired. >> > + */ >> > + if (!nbcon_kdb_try_acquire(c, &wctxt)) >> > + continue; >> > + >> > + wctxt.outbuf = (char *)msg; >> > + wctxt.len = msg_len; >> >> I double checked whether we initialized all members of the structure >> correctly. And I found that we didn't. We should call here: >> >> nbcon_write_context_set_buf(&wctxt, >> &pmsg.pbufs- >> >outbuf[0], >> >> pmsg.outbuf_len); Nice catch. >> Sigh, this is easy to miss. I remember that I was not super happy >> about this design. But the original code initialized the structure >> on a single place... > > Ok, so I'll need to also export nbcon_write_context_set_buf, since it's > currently static inside kernel/printk/nbcon.c. I'll do it for the next > version. How about modifying nbcon_kdb_try_acquire() to also take @msg and @msg_len. Then, on success, @wctxt is already prepared. I do not like the idea of external code fiddling with the fields. John |
From: John O. <joh...@li...> - 2025-09-09 07:53:27
|
On 2025-09-08, Petr Mladek <pm...@su...> wrote: > On Fri 2025-09-05 16:58:34, John Ogness wrote: >> On 2025-09-02, Marcos Paulo de Souza <mpd...@su...> wrote: >> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c >> > index ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a57d1a149218ecec 100644 >> > --- a/kernel/printk/nbcon.c >> > +++ b/kernel/printk/nbcon.c >> > @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt, >> > * interrupted by the panic CPU while printing. >> > */ >> > if (other_cpu_in_panic() && >> > + READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() && >> >> This needs some sort of ifdef CONFIG_KGDB_KDB wrapped around it. Maybe >> it would be cleaner to have some macro. > > Great catch! > >> #ifdef CONFIG_KGDB_KDB >> #define KGDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id()) >> #else >> #define KGDB_IS_ACTIVE() false >> #endif > > I like this. It would fit into include/linux/kdb.h which already > contains the #ifdef CONFIG_KGDB_KDB / #else / #endif sections. BTW, if there is such a macro created, it should be KDB_IS_ACTIVE() rather than KGDB_IS_ACTIVE(). John |
From: Petr M. <pm...@su...> - 2025-09-08 15:51:21
|
On Tue 2025-09-02 15:33:55, Marcos Paulo de Souza wrote: > Function kdb_msg_write was calling con->write for any found console, > but it won't work on NBCON ones. In this case we should acquire the > ownership of the console using NBCON_PRIO_EMERGENCY, since printing > kdb messages should only be interrupted by a panic I would end the paragraph here. > in the case it was > triggered by sysrq debug option. This is not important. Also there are more ways how to trigger panic(). For example, it might happen by an error in the kdb code. Or I heard rumors that some HW even had a physical "trigger NMI" button. > This is done by the > nbcon_kdb_{acquire,release} functions. I think that this is more or less obvious. > At this point, the console is required to use the atomic callback. The > console is skipped if the write_atomic callback is not set or if the > context could not be acquired. The validation of NBCON is done by the > console_is_usable helper. The context is released right after > write_atomic finishes. > > The oops_in_progress handling is only needed in the legacy consoles, > so it was moved around the con->write callback. > --- a/kernel/debug/kdb/kdb_io.c > +++ b/kernel/debug/kdb/kdb_io.c > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg, int msg_len) > */ > cookie = console_srcu_read_lock(); > for_each_console_srcu(c) { > - if (!(console_srcu_read_flags(c) & CON_ENABLED)) > + struct nbcon_write_context wctxt = { }; > + short flags = console_srcu_read_flags(c); > + > + if (!console_is_usable(c, flags, true)) > continue; > if (c == dbg_io_ops->cons) > continue; > - if (!c->write) > - continue; > - /* > - * Set oops_in_progress to encourage the console drivers to > - * disregard their internal spin locks: in the current calling > - * context the risk of deadlock is a bigger problem than risks > - * due to re-entering the console driver. We operate directly on > - * oops_in_progress rather than using bust_spinlocks() because > - * the calls bust_spinlocks() makes on exit are not appropriate > - * for this calling context. > - */ > - ++oops_in_progress; > - c->write(c, msg, msg_len); > - --oops_in_progress; > + > + if (flags & CON_NBCON) { > + /* > + * Do not continue if the console is NBCON and the context > + * can't be acquired. > + */ > + if (!nbcon_kdb_try_acquire(c, &wctxt)) > + continue; > + > + wctxt.outbuf = (char *)msg; > + wctxt.len = msg_len; I double checked whether we initialized all members of the structure correctly. And I found that we didn't. We should call here: nbcon_write_context_set_buf(&wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len); Sigh, this is easy to miss. I remember that I was not super happy about this design. But the original code initialized the structure on a single place... > + c->write_atomic(c, &wctxt); > + nbcon_kdb_release(&wctxt); > + } else { > + /* > + * Set oops_in_progress to encourage the console drivers to > + * disregard their internal spin locks: in the current calling > + * context the risk of deadlock is a bigger problem than risks > + * due to re-entering the console driver. We operate directly on > + * oops_in_progress rather than using bust_spinlocks() because > + * the calls bust_spinlocks() makes on exit are not appropriate > + * for this calling context. > + */ > + ++oops_in_progress; > + c->write(c, msg, msg_len); > + --oops_in_progress; > + } > touch_nmi_watchdog(); > } > console_srcu_read_unlock(cookie); Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-08 15:39:39
|
On Tue 2025-09-02 15:33:54, Marcos Paulo de Souza wrote: > KDB can interrupt any console to execute the "mirrored printing" at any > time, so add an exception to nbcon_context_try_acquire_direct to allow > to get the context if the current CPU is the same as kdb_printf_cpu. > > This change will be necessary for the next patch, which fixes > kdb_msg_write to work with NBCON consoles by calling ->write_atomic on > such consoles. But to print it first needs to acquire the ownership of > the console, so nbcon_context_try_acquire_direct is fixed here. > > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt, > * Panic does not imply that the console is owned. However, > * since all non-panic CPUs are stopped during panic(), it > * is safer to have them avoid gaining console ownership. > + * The only exception is if kdb is active, which may print > + * from multiple CPUs during a panic. Strictly speaking this is not the only exception. The reacquire is another one. I would put this into a separate paragraph and write: * One exception is when kdb is active, which may print * from multiple CPUs during a panic. > * If this acquire is a reacquire (and an unsafe takeover And here start the paragrah with * Second exception is a reacquire (and an usafe ... > * has not previously occurred) then it is allowed to attempt Best Regards, Petr |
From: Petr M. <pm...@su...> - 2025-09-08 15:29:39
|
On Fri 2025-09-05 16:58:34, John Ogness wrote: > On 2025-09-02, Marcos Paulo de Souza <mpd...@su...> wrote: > > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c > > index ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a57d1a149218ecec 100644 > > --- a/kernel/printk/nbcon.c > > +++ b/kernel/printk/nbcon.c > > @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt, > > * interrupted by the panic CPU while printing. > > */ > > if (other_cpu_in_panic() && > > + READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() && > > This needs some sort of ifdef CONFIG_KGDB_KDB wrapped around it. Maybe > it would be cleaner to have some macro. Great catch! > #ifdef CONFIG_KGDB_KDB > #define KGDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id()) > #else > #define KGDB_IS_ACTIVE() false > #endif I like this. It would fit into include/linux/kdb.h which already contains the #ifdef CONFIG_KGDB_KDB / #else / #endif sections. > and then something like: > > if (other_cpu_in_panic() && > !KGDB_IS_ACTIVE() && > (!is_reacquire || cur->unsafe_takeover)) { > return -EPERM; > } > > Or however you prefer. We need to compile for !CONFIG_KGDB_KDB. Best Regards, Petr |
From: John O. <joh...@li...> - 2025-09-08 12:09:22
|
On 2025-09-05, Petr Mladek <pm...@su...> wrote: > On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote: >> These helpers will be used when calling console->write_atomic on >> KDB code in the next patch. It's basically the same implementaion >> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when >> acquiring the context. >> >> For release we need to flush the console, since some messages could be >> added before the context was acquired, as KDB emits the messages using >> con->{write,write_atomic} instead of storing them on the ring buffer. > > I am a bit confused by the last paragraph. It is a very long sentence. > > Sigh, I wanted to propose a simple and clear alternative. But I ended > in a rabbit hole and with a rather complex text: > > <proposal> > The atomic flush in the release function is questionable. vkdb_printf() > is primary called only when other CPUs are quiescent in kdb_main_loop() > and do not call the classic printk(). But, for example, the > write_atomic() callback might print debug messages. Or there is > one kdb_printf() called in kgdb_panic() before other CPUs are > quiescent. So the flush might be useful. Especially, when > the kdb code fails to quiescent the CPUs and returns early. > > Let's keep it simple and just call __nbcon_atomic_flush_pending_con(). > It uses write_atomic() callback which is used by the locked kdb code > anyway. > > The legacy loop (console_trylock()/console_unlock()) is not > usable in kdb context. > > It might make sense to trigger the flush via the printk kthread. > But it would not work in panic() where is the only known kdb_printf() > called when other CPUs are not quiescent. So, it does not look > worth it. > </proposal> > > What do you think? > > My opinion: > > Honestly, I think that the flush is not much important because > it will most offten have nothing to do. > > I am just not sure whether it is better to have it there > or avoid it. It might be better to remove it after all. > And just document the decision. IMHO keeping the flush is fine. There are cases where there might be something to print. And since a printing kthread will get no chance to print as long as kdb is alive, we should have kdb flushing that console. Note that this is the only console that will actually see the new messages immediately as all the other CPUs and quiesced. For this reason we probably want to use __nbcon_atomic_flush_pending() to try to flush _all_ the consoles. As to the last paragraph of the commit message, I would keep it simple: After release try to flush all consoles since there may be a backlog of messages in the ringbuffer. The kthread console printers do not get a chance to run while kdb is active. John |