kgdb-bugreport Mailing List for kgdb
Status: Beta
Brought to you by:
jwessel
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(10) |
Dec
(14) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(22) |
Feb
(7) |
Mar
(14) |
Apr
(9) |
May
(4) |
Jun
(7) |
Jul
(17) |
Aug
(15) |
Sep
(10) |
Oct
(9) |
Nov
(10) |
Dec
(27) |
2003 |
Jan
(7) |
Feb
(7) |
Mar
(9) |
Apr
(18) |
May
(14) |
Jun
(4) |
Jul
(18) |
Aug
(18) |
Sep
(8) |
Oct
(10) |
Nov
(6) |
Dec
(32) |
2004 |
Jan
(103) |
Feb
(131) |
Mar
(161) |
Apr
(58) |
May
(38) |
Jun
(16) |
Jul
(60) |
Aug
(50) |
Sep
(115) |
Oct
(153) |
Nov
(117) |
Dec
(87) |
2005 |
Jan
(40) |
Feb
(20) |
Mar
(73) |
Apr
(40) |
May
(22) |
Jun
(54) |
Jul
(61) |
Aug
(45) |
Sep
(21) |
Oct
(36) |
Nov
(79) |
Dec
(88) |
2006 |
Jan
(180) |
Feb
(51) |
Mar
(119) |
Apr
(108) |
May
(68) |
Jun
(44) |
Jul
(47) |
Aug
(95) |
Sep
(175) |
Oct
(100) |
Nov
(27) |
Dec
(47) |
2007 |
Jan
(25) |
Feb
(41) |
Mar
(50) |
Apr
(43) |
May
(210) |
Jun
(62) |
Jul
(109) |
Aug
(94) |
Sep
(77) |
Oct
(38) |
Nov
(48) |
Dec
(56) |
2008 |
Jan
(41) |
Feb
(20) |
Mar
(59) |
Apr
(46) |
May
(50) |
Jun
(152) |
Jul
(149) |
Aug
(33) |
Sep
(53) |
Oct
(89) |
Nov
(55) |
Dec
(68) |
2009 |
Jan
(59) |
Feb
(94) |
Mar
(105) |
Apr
(264) |
May
(216) |
Jun
(123) |
Jul
(290) |
Aug
(223) |
Sep
(10) |
Oct
(2) |
Nov
(1) |
Dec
(88) |
2010 |
Jan
(133) |
Feb
(148) |
Mar
(45) |
Apr
(35) |
May
(42) |
Jun
(18) |
Jul
(68) |
Aug
(62) |
Sep
(26) |
Oct
(48) |
Nov
(46) |
Dec
(10) |
2011 |
Jan
|
Feb
|
Mar
(12) |
Apr
(12) |
May
(10) |
Jun
(12) |
Jul
(12) |
Aug
(10) |
Sep
(13) |
Oct
(3) |
Nov
(11) |
Dec
(6) |
2012 |
Jan
(4) |
Feb
(36) |
Mar
(76) |
Apr
(4) |
May
(1) |
Jun
|
Jul
(69) |
Aug
(15) |
Sep
(76) |
Oct
(11) |
Nov
(40) |
Dec
|
2013 |
Jan
(32) |
Feb
(7) |
Mar
(47) |
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(16) |
Nov
(15) |
Dec
|
2014 |
Jan
(1) |
Feb
|
Mar
(2) |
Apr
(28) |
May
(12) |
Jun
|
Jul
(12) |
Aug
(8) |
Sep
(13) |
Oct
(53) |
Nov
(22) |
Dec
(13) |
2015 |
Jan
(86) |
Feb
(15) |
Mar
(4) |
Apr
|
May
(1) |
Jun
(2) |
Jul
(6) |
Aug
(3) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(29) |
Oct
(5) |
Nov
(3) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(8) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(24) |
2018 |
Jan
(18) |
Feb
(2) |
Mar
|
Apr
(11) |
May
(26) |
Jun
(7) |
Jul
(7) |
Aug
(32) |
Sep
(32) |
Oct
(47) |
Nov
(49) |
Dec
(35) |
2019 |
Jan
(3) |
Feb
(11) |
Mar
(44) |
Apr
(18) |
May
(13) |
Jun
(11) |
Jul
(25) |
Aug
(3) |
Sep
(11) |
Oct
(45) |
Nov
(52) |
Dec
|
2020 |
Jan
(10) |
Feb
(7) |
Mar
(13) |
Apr
(102) |
May
(146) |
Jun
(125) |
Jul
(97) |
Aug
(94) |
Sep
(44) |
Oct
(59) |
Nov
(20) |
Dec
(7) |
2021 |
Jan
(47) |
Feb
(61) |
Mar
(44) |
Apr
(3) |
May
(40) |
Jun
(42) |
Jul
(31) |
Aug
(18) |
Sep
(9) |
Oct
(2) |
Nov
(5) |
Dec
(1) |
2022 |
Jan
(41) |
Feb
(80) |
Mar
(72) |
Apr
(28) |
May
(18) |
Jun
(2) |
Jul
(5) |
Aug
(3) |
Sep
(16) |
Oct
(28) |
Nov
(57) |
Dec
(4) |
2023 |
Jan
(8) |
Feb
|
Mar
(34) |
Apr
(23) |
May
(127) |
Jun
(63) |
Jul
(11) |
Aug
(78) |
Sep
(13) |
Oct
(4) |
Nov
(12) |
Dec
(18) |
2024 |
Jan
|
Feb
(2) |
Mar
(30) |
Apr
(77) |
May
(2) |
Jun
(38) |
Jul
|
Aug
(23) |
Sep
(6) |
Oct
|
Nov
|
Dec
|
From: Jonathan C. <co...@lw...> - 2024-09-30 17:39:13
|
Daniel Thompson <dan...@li...> writes: > On Wed, Sep 25, 2024 at 03:07:42AM -0700, Changhuang Liang wrote: >> Module kgdb had been converted to debug_core since commit c433820971ff >> ("Move kernel/kgdb.c to kernel/debug/debug_core.c") be added, so let's >> correct the module parameter path. >> >> Fixes: c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") >> Signed-off-by: Changhuang Liang <cha...@st...> >> Reviewed-by: Douglas Anderson <dia...@ch...> > > Reviewed-by: Daniel Thompson <dan...@li...> > > @Jon: Do you want me to hoover this up or will you take it? If you are > happy to grab it then feel free to treat my Rb: as an Acked-by: too! Sure, I can grab it in a bit. Thanks, jon |
From: Daniel T. <dan...@li...> - 2024-09-30 09:58:46
|
On Wed, Sep 25, 2024 at 03:07:42AM -0700, Changhuang Liang wrote: > Module kgdb had been converted to debug_core since commit c433820971ff > ("Move kernel/kgdb.c to kernel/debug/debug_core.c") be added, so let's > correct the module parameter path. > > Fixes: c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") > Signed-off-by: Changhuang Liang <cha...@st...> > Reviewed-by: Douglas Anderson <dia...@ch...> Reviewed-by: Daniel Thompson <dan...@li...> @Jon: Do you want me to hoover this up or will you take it? If you are happy to grab it then feel free to treat my Rb: as an Acked-by: too! Daniel. > --- > > Hi, > > v3: I wrote a error change log in v2. Now fix it. > > Documentation/dev-tools/kgdb.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst > index f83ba2601e55..a87a58e6509a 100644 > --- a/Documentation/dev-tools/kgdb.rst > +++ b/Documentation/dev-tools/kgdb.rst > @@ -329,7 +329,7 @@ ways to activate this feature. > > 2. Use sysfs before configuring an I/O driver:: > > - echo 1 > /sys/module/kgdb/parameters/kgdb_use_con > + echo 1 > /sys/module/debug_core/parameters/kgdb_use_con > > .. note:: > > -- > 2.25.1 |
From: Changhuang L. <cha...@st...> - 2024-09-25 10:23:45
|
Module kgdb had been converted to debug_core since commit c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") be added, so let's correct the module parameter path. Fixes: c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") Signed-off-by: Changhuang Liang <cha...@st...> Reviewed-by: Douglas Anderson <dia...@ch...> --- Hi, v3: I wrote a error change log in v2. Now fix it. Documentation/dev-tools/kgdb.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst index f83ba2601e55..a87a58e6509a 100644 --- a/Documentation/dev-tools/kgdb.rst +++ b/Documentation/dev-tools/kgdb.rst @@ -329,7 +329,7 @@ ways to activate this feature. 2. Use sysfs before configuring an I/O driver:: - echo 1 > /sys/module/kgdb/parameters/kgdb_use_con + echo 1 > /sys/module/debug_core/parameters/kgdb_use_con .. note:: -- 2.25.1 |
From: Changhuang L. <cha...@st...> - 2024-09-18 01:46:39
|
Module kgdb had been converted to debug_core since commit c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") be added, so let's correct the module parameter path. Fixes: c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") Signed-off-by: Changhuang Liang <cha...@st...> Reviewed-by: Douglas Anderson <dia...@ch...> --- Add "Fixes". --- --- Documentation/dev-tools/kgdb.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst index f83ba2601e55..a87a58e6509a 100644 --- a/Documentation/dev-tools/kgdb.rst +++ b/Documentation/dev-tools/kgdb.rst @@ -329,7 +329,7 @@ ways to activate this feature. 2. Use sysfs before configuring an I/O driver:: - echo 1 > /sys/module/kgdb/parameters/kgdb_use_con + echo 1 > /sys/module/debug_core/parameters/kgdb_use_con .. note:: -- 2.25.1 |
From: Doug A. <dia...@ch...> - 2024-09-16 07:49:36
|
Hi, On Sat, Sep 14, 2024 at 12:03 AM Changhuang Liang <cha...@st...> wrote: > > Module kgdb had been converted to debug_core since commit c433820971ff > ("Move kernel/kgdb.c to kernel/debug/debug_core.c") be added, so let's > correct the module parameter path. > > Signed-off-by: Changhuang Liang <cha...@st...> > --- > Documentation/dev-tools/kgdb.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) I guess this could have a "Fixes" based on the commit mentioned in the commit message. Patch looks right to me: Reviewed-by: Douglas Anderson <dia...@ch...> |
From: Changhuang L. <cha...@st...> - 2024-09-14 07:19:05
|
Module kgdb had been converted to debug_core since commit c433820971ff ("Move kernel/kgdb.c to kernel/debug/debug_core.c") be added, so let's correct the module parameter path. Signed-off-by: Changhuang Liang <cha...@st...> --- Documentation/dev-tools/kgdb.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst index f83ba2601e55..a87a58e6509a 100644 --- a/Documentation/dev-tools/kgdb.rst +++ b/Documentation/dev-tools/kgdb.rst @@ -329,7 +329,7 @@ ways to activate this feature. 2. Use sysfs before configuring an I/O driver:: - echo 1 > /sys/module/kgdb/parameters/kgdb_use_con + echo 1 > /sys/module/debug_core/parameters/kgdb_use_con .. note:: -- 2.25.1 |
From: kernel t. r. <lk...@in...> - 2024-08-16 11:13:10
|
Hi Florian, kernel test robot noticed the following build errors: [auto build test ERROR on tip/master] [also build test ERROR on tip/x86/core linus/master v6.11-rc3 next-20240816] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Florian-Rommel/x86-kgdb-trampolines-for-shadowed-instructions/20240814-230936 base: tip/master patch link: https://lore.kernel.org/r/20240814085141.171564-1-mail%40florommel.de patch subject: [PATCH WIP] x86/kgdb: trampolines for shadowed instructions config: x86_64-buildonly-randconfig-004-20240816 (https://download.01.org/0day-ci/archive/20240816/202...@in.../config) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240816/202...@in.../reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lk...@in...> | Closes: https://lore.kernel.org/oe-kbuild-all/202...@in.../ All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: execmem_alloc >>> referenced by usercopy_64.c >>> vmlinux.o:(kgdb_arch_init) -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki |
From: kernel t. r. <lk...@in...> - 2024-08-15 19:52:38
|
Hi Florian, kernel test robot noticed the following build errors: [auto build test ERROR on tip/master] [also build test ERROR on tip/x86/core linus/master v6.11-rc3 next-20240815] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Florian-Rommel/x86-kgdb-trampolines-for-shadowed-instructions/20240814-230936 base: tip/master patch link: https://lore.kernel.org/r/20240814085141.171564-1-mail%40florommel.de patch subject: [PATCH WIP] x86/kgdb: trampolines for shadowed instructions config: x86_64-buildonly-randconfig-004-20240815 (https://download.01.org/0day-ci/archive/20240816/202...@in.../config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240816/202...@in.../reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lk...@in...> | Closes: https://lore.kernel.org/oe-kbuild-all/202...@in.../ All errors (new ones prefixed by >>): ld: vmlinux.o: in function `kgdb_arch_init': >> arch/x86/kernel/kgdb.c:416: undefined reference to `execmem_alloc' vim +416 arch/x86/kernel/kgdb.c 410 411 static int kgdb_init_trampolines(void) 412 { 413 /* FIXME: We should reserve enough space for all breakpoints 414 or make the trampoline table dynamic somehow.. */ 415 /* FIXME: Currently borrowing EXECMEM_KPROBES */ > 416 kgdb_trampoline_page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE); 417 if (!kgdb_trampoline_page) 418 return 1; 419 memset(kgdb_trampoline_page, INT3_INSN_OPCODE, PAGE_SIZE); 420 set_memory_rox((unsigned long)kgdb_trampoline_page, 1); 421 kgdb_trampoline_page_curr_slot = kgdb_trampoline_page; 422 return 0; 423 } 424 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki |
From: Greg Kroah-H. <gr...@li...> - 2024-08-14 15:55:29
|
On Wed, Aug 14, 2024 at 04:38:43PM +0100, Daniel Thompson wrote: > On Mon, Aug 12, 2024 at 01:04:22PM -0700, Doug Anderson wrote: > > Hi, > > > > On Mon, Aug 12, 2024 at 1:55 AM Florian Rommel <ma...@fl...> wrote: > > > > > > The test for access watchpoints (hw_access_break_test) was broken > > > (always failed) because the compiler optimized out the write to the > > > static helper variable (hw_break_val2), as it is never read anywhere. > > > This resulted in the target variable (hw_break_val) not being accessed > > > and thus the breakpoint not being triggered. > > > > > > Remove the helper variable (hw_break_val2), and use READ_ONCE to force > > > reading the target variable (hw_break_val). > > > > > > Signed-off-by: Florian Rommel <ma...@fl...> > > > --- > > > drivers/misc/kgdbts.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > Reviewed-by: Douglas Anderson <dia...@ch...> > > Looks good. I pushed this through kgdbtest and it likes it too. I can > turn one of the XFAILs off (yay). > > Acked-by: Daniel Thompson <dan...@li...> > Tested-by: Daniel Thompson <dan...@li...> > > Arnd/Greg: Are you happy to pick this up or should I take it through the kgdb > tree? FWIW right now there are zero other patches for kgdb this cycle, although > that could change! I've already picked it up, thanks! greg k-h |
From: Daniel T. <dan...@li...> - 2024-08-14 15:38:53
|
On Mon, Aug 12, 2024 at 01:04:22PM -0700, Doug Anderson wrote: > Hi, > > On Mon, Aug 12, 2024 at 1:55 AM Florian Rommel <ma...@fl...> wrote: > > > > The test for access watchpoints (hw_access_break_test) was broken > > (always failed) because the compiler optimized out the write to the > > static helper variable (hw_break_val2), as it is never read anywhere. > > This resulted in the target variable (hw_break_val) not being accessed > > and thus the breakpoint not being triggered. > > > > Remove the helper variable (hw_break_val2), and use READ_ONCE to force > > reading the target variable (hw_break_val). > > > > Signed-off-by: Florian Rommel <ma...@fl...> > > --- > > drivers/misc/kgdbts.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > Reviewed-by: Douglas Anderson <dia...@ch...> Looks good. I pushed this through kgdbtest and it likes it too. I can turn one of the XFAILs off (yay). Acked-by: Daniel Thompson <dan...@li...> Tested-by: Daniel Thompson <dan...@li...> Arnd/Greg: Are you happy to pick this up or should I take it through the kgdb tree? FWIW right now there are zero other patches for kgdb this cycle, although that could change! Daniel. |
From: Thomas G. <tg...@li...> - 2024-08-14 13:52:50
|
On Wed, Aug 14 2024 at 11:29, Daniel Thompson wrote: > On Wed, Aug 14, 2024 at 10:51:41AM +0200, Florian Rommel wrote: > That's enough to eventuallyremove the int3 instructions but it relies > on entering the debug trap handler and there's no limit on how long > could take before that happens. For that reason I think the core should > also attempt to transition BP_REMOVE_PENDING breakpoints to BP_REMOVE > after kgdb_skipexception() returns true. That means if we keep trapping > on a disabled breakpoint eventually we will hit a window where the > text_mutex is free and clean things up. Even when text_mutex is uncontended then text_poke_kgdb() is completely broken in the KGDB NMI context when the NMI hit into anything related to mm switching and tlb flushing, which is utilized in __text_poke(). The same problem is obviously true for installing a breakpoint from that context. I'm starting to be more convinced that the only sane solution for all of this is to disable CET when KGDB is on and use CRO.WP to work around all of this. Thanks, tglx |
From: Daniel T. <dan...@li...> - 2024-08-14 10:29:55
|
On Wed, Aug 14, 2024 at 10:51:41AM +0200, Florian Rommel wrote: > Experimental implementation of Thomas' trampoline idea. > Link: https://lore.kernel.org/all/87wmkkpf5v.ffs@tglx/ > > Every breakpoint gets a trampoline entry containing the original > instruction (with a corrected displacement if necessary) and a jump > back into the function to the logically subsequent instruction. > With this, KGDB can skip a still present debug trap for a removed > breakpoint by routing the control flow through the trampoline. > > In this experimental implementation, the actual removal of the debug > trap instructions is completely disabled. So, all trap instructions > planted by KGDB currently remain in the code, and KGDB will skip all > these breakpoints through the trampolines when they are in the removed > state. There is not yet a dedicated breakpoint state for the > to-be-removed-but-still-present breakpoints. > > Inspect the trampolines via: > (gdb) x/16i kgdb_trampoline_page > > Signed-off-by: Florian Rommel <ma...@fl...> > --- > I have been experimenting a bit with Thomas' idea of the trampoline > approach. It seems to work so far but of course has a lot of rough > edges. Interesting. Perhaps a dumb question but is this trampoline code doing the same thing as the out-of-line single-step code in kprobes? > I am stuck for now, as I am not sure about the best way to implement > the safe context where KGDB finally removes the int3 instructions. I think this would actually fit really nicely into the debug core code. Firstly dbg_deactivate_sw_breakpoints() should strictly maintain BP_ACTIVE and BP_SET states (e.g. if the kgdb_arch_remove_breakpoint() fails then do not transition from BP_ACTIVE and BP_SET). There would be a little bit of extra logic to clean things up here (each equality check on BP_SET needs to be reviewed) but the resulting state tracking is more correct. Once we've done that we can add a new state BP_REMOVE_PENDING and arrange for this to be set by dbg_remove_sw_break() if the breakpoint is BP_ACTIVE. At this stage we can also arrange for dbg_deactivate_sw_breakpoints() to handle BP_REMOVE_PENDING to BP_REMOVE transitions by calling kgdb_arch_remove_breakpoint(). That's enough to eventuallyremove the int3 instructions but it relies on entering the debug trap handler and there's no limit on how long could take before that happens. For that reason I think the core should also attempt to transition BP_REMOVE_PENDING breakpoints to BP_REMOVE after kgdb_skipexception() returns true. That means if we keep trapping on a disabled breakpoint eventually we will hit a window where the text_mutex is free and clean things up. Daniel. |
From: Florian R. <ma...@fl...> - 2024-08-14 08:52:49
|
Experimental implementation of Thomas' trampoline idea. Link: https://lore.kernel.org/all/87wmkkpf5v.ffs@tglx/ Every breakpoint gets a trampoline entry containing the original instruction (with a corrected displacement if necessary) and a jump back into the function to the logically subsequent instruction. With this, KGDB can skip a still present debug trap for a removed breakpoint by routing the control flow through the trampoline. In this experimental implementation, the actual removal of the debug trap instructions is completely disabled. So, all trap instructions planted by KGDB currently remain in the code, and KGDB will skip all these breakpoints through the trampolines when they are in the removed state. There is not yet a dedicated breakpoint state for the to-be-removed-but-still-present breakpoints. Inspect the trampolines via: (gdb) x/16i kgdb_trampoline_page Signed-off-by: Florian Rommel <ma...@fl...> --- I have been experimenting a bit with Thomas' idea of the trampoline approach. It seems to work so far but of course has a lot of rough edges. I am stuck for now, as I am not sure about the best way to implement the safe context where KGDB finally removes the int3 instructions. Maybe, this is useful.. at least it was a fun ride. arch/x86/kernel/kgdb.c | 128 +++++++++++++++++++++++++++++++++----- include/linux/kgdb.h | 2 + kernel/debug/debug_core.c | 12 ++++ 3 files changed, 128 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c index 9c9faa1634fb..e09090f14894 100644 --- a/arch/x86/kernel/kgdb.c +++ b/arch/x86/kernel/kgdb.c @@ -35,6 +35,7 @@ #include <linux/hw_breakpoint.h> #include <linux/uaccess.h> #include <linux/memory.h> +#include <linux/execmem.h> #include <asm/text-patching.h> #include <asm/debugreg.h> @@ -42,6 +43,11 @@ #include <asm/apic.h> #include <asm/nmi.h> #include <asm/switch_to.h> +#include <asm/insn.h> +#include <asm/set_memory.h> + +static void *kgdb_trampoline_page; +static void *kgdb_trampoline_page_curr_slot; struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = { @@ -402,6 +408,60 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs) } } +static int kgdb_init_trampolines(void) +{ + /* FIXME: We should reserve enough space for all breakpoints + or make the trampoline table dynamic somehow.. */ + /* FIXME: Currently borrowing EXECMEM_KPROBES */ + kgdb_trampoline_page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE); + if (!kgdb_trampoline_page) + return 1; + memset(kgdb_trampoline_page, INT3_INSN_OPCODE, PAGE_SIZE); + set_memory_rox((unsigned long)kgdb_trampoline_page, 1); + kgdb_trampoline_page_curr_slot = kgdb_trampoline_page; + return 0; +} + +static int kgdb_make_trampoline(struct kgdb_bkpt *bpt, void *orig_insn) +{ + void *kernel_addr, *slot_addr, *jmp_addr, *jmp_code; + int slot_size; + struct insn insn; + u8 insn_buff[MAX_INSN_SIZE + JMP32_INSN_SIZE]; + + if (WARN_ON(insn_decode_kernel(&insn, orig_insn))) + return -EINVAL; + + slot_size = insn.length + JMP32_INSN_SIZE; + + if (!bpt->trampoline) { + /* FIXME: Not enough space for all possible breakpoints */ + if (kgdb_trampoline_page_curr_slot + slot_size + > kgdb_trampoline_page + PAGE_SIZE) + return -ENOSPC; + + bpt->trampoline = kgdb_trampoline_page_curr_slot; + kgdb_trampoline_page_curr_slot += slot_size; + } + + kernel_addr = (void *)bpt->bpt_addr; + slot_addr = bpt->trampoline; + jmp_addr = slot_addr + insn.length; + jmp_code = text_gen_insn(JMP32_INSN_OPCODE, jmp_addr, + kernel_addr + insn.length); + + memcpy(insn_buff, kernel_addr, insn.length); + memcpy(insn_buff + insn.length, jmp_code, JMP32_INSN_SIZE); + apply_relocation(insn_buff, slot_addr, insn.length, + kernel_addr, insn.length); + + if (mutex_is_locked(&text_mutex)) + return -EBUSY; + text_poke_kgdb(slot_addr, insn_buff, insn.length + JMP32_INSN_SIZE); + + return 0; +} + #ifdef CONFIG_SMP /** * kgdb_roundup_cpus - Get other CPUs into a holding pattern @@ -598,6 +658,10 @@ int kgdb_arch_init(void) { int retval; + retval = kgdb_init_trampolines(); + if (retval) + goto out; + retval = register_die_notifier(&kgdb_notifier); if (retval) goto out; @@ -708,11 +772,18 @@ void kgdb_arch_exit(void) */ int kgdb_skipexception(int exception, struct pt_regs *regs) { - if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) { - regs->ip -= 1; - return 1; + struct kgdb_bkpt *bp; + + if (exception != 3) + return false; + + bp = kgdb_get_removedbreak(regs->ip - 1); + if (bp) { + regs->ip = (unsigned long)bp->trampoline; + return true; } - return 0; + + return false; } unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs) @@ -730,12 +801,33 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long ip) int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt) { int err; + char orig_insn[MAX_INSN_SIZE]; bpt->type = BP_BREAKPOINT; - err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr, - BREAK_INSTR_SIZE); + + /* + * FIXME: Currently, debug traps are not removed, so all logically + * removed breakpoints are still there. + * For now, abort if the trampoline is already present. + * Later, we should use a dedicated breakpoint state to signal this. + */ + if (bpt->trampoline) { + bpt->type = BP_POKE_BREAKPOINT; + return 0; + } + + /* FIXME: What if we cannot read all of MAX_INSN_SIZE */ + err = copy_from_kernel_nofault(orig_insn, (char *)bpt->bpt_addr, + MAX_INSN_SIZE); if (err) return err; + + memcpy(bpt->saved_instr, orig_insn, BREAK_INSTR_SIZE); + + err = kgdb_make_trampoline(bpt, orig_insn); + if (err) + return err; + err = copy_to_kernel_nofault((char *)bpt->bpt_addr, arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE); if (!err) @@ -755,16 +847,11 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt) int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) { + /* FIXME: Do not mess with early breakpoints for now */ if (bpt->type != BP_POKE_BREAKPOINT) goto knl_write; - /* - * It is safe to call text_poke_kgdb() because normal kernel execution - * is stopped on all cores, so long as the text_mutex is not locked. - */ - if (mutex_is_locked(&text_mutex)) - goto knl_write; - text_poke_kgdb((void *)bpt->bpt_addr, bpt->saved_instr, - BREAK_INSTR_SIZE); + + /* FIXME: Removal is not necessary anymore.. for now */ return 0; knl_write: @@ -772,6 +859,19 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt) (char *)bpt->saved_instr, BREAK_INSTR_SIZE); } +int kgdb_validate_break_address(unsigned long addr) +{ + if (kgdb_within_blocklist(addr)) + return -EINVAL; + + /* + * With breakpoint trampolines, there is no need to further validate + * setting and removal of breakpoints. + */ + + return 0; +} + const struct kgdb_arch arch_kgdb_ops = { /* Breakpoint instruction: */ .gdb_bpt_instr = { 0xcc }, diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h index 76e891ee9e37..19c2d8ee8124 100644 --- a/include/linux/kgdb.h +++ b/include/linux/kgdb.h @@ -79,6 +79,7 @@ struct kgdb_bkpt { unsigned char saved_instr[BREAK_INSTR_SIZE]; enum kgdb_bptype type; enum kgdb_bpstate state; + void *trampoline; }; struct dbg_reg_def_t { @@ -324,6 +325,7 @@ extern int kgdb_hex2long(char **ptr, unsigned long *long_val); extern char *kgdb_mem2hex(char *mem, char *buf, int count); extern int kgdb_hex2mem(char *buf, char *mem, int count); +extern struct kgdb_bkpt *kgdb_get_removedbreak(unsigned long addr); extern int kgdb_isremovedbreak(unsigned long addr); extern int kgdb_has_hit_break(unsigned long addr); diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c index ce1bb2301c06..60bc63d1000a 100644 --- a/kernel/debug/debug_core.c +++ b/kernel/debug/debug_core.c @@ -393,6 +393,18 @@ int dbg_remove_sw_break(unsigned long addr) return -ENOENT; } +struct kgdb_bkpt *kgdb_get_removedbreak(unsigned long addr) +{ + int i; + + for (i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { + if ((kgdb_break[i].state == BP_REMOVED) && + (kgdb_break[i].bpt_addr == addr)) + return &kgdb_break[i]; + } + return NULL; +} + int kgdb_isremovedbreak(unsigned long addr) { int i; -- 2.46.0 |
From: Florian R. <ma...@fl...> - 2024-08-14 00:33:38
|
On Tue, 2024-08-13 at 17:10 -0700, Andrew Morton wrote: > On Mon, 12 Aug 2024 01:22:06 +0200 Florian Rommel <ma...@fl...> wrote: > > > This series fixes two problems with KGDB on x86 concerning the removal > > of breakpoints, causing the kernel to hang. Note that breakpoint > > removal is not only performed when explicitly deleting a breakpoint, > > but also happens before continuing execution or single stepping. > > Neat. It would be nice to fix earlier kernels; for that it is > desirable to identify a Fixes: target. From a quick look it appears > this issue is more than a decade old, in which case I don't believe a > Fixes: is needed - our request becomes "please backport to everything". > Thanks. There is already a v2 (due to my negligence on the details) and a bit of discussion: https://lore.kernel.org/all/202...@fl.../ Rgeards, Florian |
From: Andrew M. <ak...@li...> - 2024-08-14 00:28:56
|
On Mon, 12 Aug 2024 01:22:06 +0200 Florian Rommel <ma...@fl...> wrote: > This series fixes two problems with KGDB on x86 concerning the removal > of breakpoints, causing the kernel to hang. Note that breakpoint > removal is not only performed when explicitly deleting a breakpoint, > but also happens before continuing execution or single stepping. Neat. It would be nice to fix earlier kernels; for that it is desirable to identify a Fixes: target. From a quick look it appears this issue is more than a decade old, in which case I don't believe a Fixes: is needed - our request becomes "please backport to everything". |
From: Thomas G. <tg...@li...> - 2024-08-13 16:21:25
|
Daniel! On Tue, Aug 13 2024 at 12:31, Daniel Thompson wrote: > On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote: >> Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the >> instruction pointer to the exception address and does not skip anything, >> but that's an orthogonal issue though it could be cleaned up along the >> way... > > kgdb_skipexception() is not a directive from debug core to architecture. > It is a question to the archictecture: should the debug core skip normal > debug exception handling and resume immediately?. Ah. This code is so exceptionally intuitive .... > It allows an architecture to direct the debug core to ignore a spurious > trap that, according to the comments, can occur on some > (micro)architectures when we have already restored the original > not-a-breakpoint instruction. Potentially due to the lack of sync_core() after the copy. And of course the removal can fail as Florian discussed. > Florian's patch changes things so that we will also skip debug exception > handling if we can successfully poke to the text section. I don't think > it is sufficient on its own since the text_mutex could be owned by the > core that is stuck trapping on the int3 that we can't remove. I was asking exactly that: > What guarantees the release of text mutex? :) >> Aside of that the same problem exists on PowerPC. So you can move the >> attempt to remove the breakpoint into the generic code, no? > > Getting the debug core to track breakpoints that are stuck on would be a > good improvement. > > We would be able to use that logic to retry the removal of stuck > breakpoint once other SMP cores are running again. That would be great > for architectures like arm64 that use spin-with-irqsave locking in their > noinstr text_poke() machinery. > > However it won't solve the problem for x86 since it uses mutex locking. > > A partial solution might be to get kgdb_arch_remove_breakpoint() to > disregard the text_mutex completely if kdb_continue_catastrophic is set > (and adding stuck breakpoints to the reasons to inhibit a continue). The interrupted owner of text_mutex might be in the middle of modifying the poking_mm page tables or in the worst case modifying the code which kgdb wants to play with. Dragons are guaranteed. > This is a partial solution in the sense that it is not safe: we will > simply tell the kernel dev driving the debugger that they are > responsible for the safety of the continue and then disable the safety > rails. > > I haven't yet come up with a full fix that doesn't require > text_poke_kgdb() to not require text_mutex to be free. I did note that > comment in __text_poke() that says calling get_locked_pte() "is not > really needed, but this allows to avoid open-coding" but I got a bit lost > trying to figure out other locks and nesting concerns. So there are no other locks involved. The PTE lock is not strictly required because the access to poking_mm is fully serialized. You'd need to add a separate kgdb_poking_mm and kgdb_poking_addr. Now make __text_poke() take a @mm and @pokeaddr argument and let it operate on them. But that solves only part of the problem: 1) A concurrent modification of the same code will result in undefined behaviour. Not sure whether that's actually an issue, but I would not bet on it. 2) switch_mm() and the related code are not reentrancy safe either 3) TLB flushing. That can't use tlb_flush_mm_range() simply because that's not reentrant. Which makes me wonder about #2. As this stuff can run in NMI context, then even if text_mutex is uncontended, then tlb_flush_mm_range() might be what had been interrupted, so reentrancy would be fatal. That's all a horrorshow as you can't play with CR0.WP either at least not when CR4.CET is set. So if you can force disable CET when KGDB is enabled, then you might get away with: bp->code = *p; clear_cr0_wp(); *p = int3; set_cr0_wp(); Though the hardening people might not be really happy about this. But let's take a step back. Installing a breakpoint is done by the debugger. The debugger knows the original code, right? So why cant the debugger provide a trampoline: ORIG_OP // with fixed up displacements JMP NEXT_INSN Now you stick that into a trampoline page slot and then patch the INT3 in. Now on removal can be a two stage approach: bp->state = INACTIVE; kick_breakpoint_gc(); breakpoint_gc() removes the breakpoint and invalidates the trampoline from a safe context and up to that point kgdb_skipexception() can do: bool kgdb_skip_exception(int exception, struct pt_regs *regs) { struct kgdb_bkpt *bp; if (exception != 3) return false; bp = kgdb_get_inactive_breakpoint(--regs->ip, BP_BREAKPOINT); if (bp) regs->ip = bp->trampoline; return true; } Hmm? Thanks, tglx |
From: Florian R. <ma...@fl...> - 2024-08-13 15:06:54
|
On Tue, 2024-08-13 at 12:31 +0100, Daniel Thompson wrote: > On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote: > > > > > Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the > > instruction pointer to the exception address and does not skip anything, > > but that's an orthogonal issue though it could be cleaned up along the > > way... > > kgdb_skipexception() is not a directive from debug core to architecture. > It is a question to the archictecture: should the debug core skip normal > debug exception handling and resume immediately?. > > It allows an architecture to direct the debug core to ignore a spurious > trap that, according to the comments, can occur on some > (micro)architectures when we have already restored the original > not-a-breakpoint instruction. > > Florian's patch changes things so that we will also skip debug exception > handling if we can successfully poke to the text section. I don't think > it is sufficient on its own since the text_mutex could be owned by the > core that is stuck trapping on the int3 that we can't remove. Yes, if the text_mutex is owned by the trapping core, that's a inherent problem. This won't be solved by my patch. But this will probably be difficult to be solved at all.. The patch only adds an removal attempt, without changing the result of kgdb_skipexception(). It was thought as a best-effort solution: Better try removing the breakpoint than getting stuck in the int3 loop for sure. Of course, there is no guarantee for this being successful. > > > > > { > > > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) { > > > + struct kgdb_bkpt *bpt; > > > + int i, error; > > > + > > > regs->ip -= 1; > > > + > > > + /* > > > + * Try to remove the spurious int3 instruction. > > > + * These int3s can result from failed breakpoint removals > > > + * in kgdb_arch_remove_breakpoint. > > > + */ > > > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { > > > + if (kgdb_break[i].bpt_addr == regs->ip && > > > + kgdb_break[i].state == BP_REMOVED && > > > + (kgdb_break[i].type == BP_BREAKPOINT || > > > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) { > > > + bpt = &kgdb_break[i]; > > > + break; > > > + } > > > + } > > > > Seriously? The KGBD core already walked that array in > > kgdb_isremovedbreak() just so you can walk it again here. > > > > struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr) > > { > > struct kgdb_bkpt *bp = kgdb_break; > > > > for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) { > > if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr) > > return bp; > > } > > return NULL; > > } > > > > bool kgdb_isremovedbreak(unsigned long addr) > > { > > return !!kgdb_get_removed_breakpoint(addr); > > } > > > > bool kgdb_rewind_exception(int exception, struct pt_regs *regs) > > { > > struct kgdb_bkpt *bp; > > > > if (exception != 3) > > return false; > > > > bp = kgdb_get_removed_breakpoint(--regs->ip); > > if (!bp || !bp->type == BP_BREAKPOINT) > > return false; > > > > Hmm? > > > > > + error = kgdb_arch_remove_breakpoint(bpt); > > > + if (error) > > > + pr_err("skipexception: breakpoint remove failed: %lx\n", > > > + bpt->bpt_addr); > > > > Lacks curly brackets. See Documentation. > > > > return !error; > > > > Aside of that the same problem exists on PowerPC. So you can move the > > attempt to remove the breakpoint into the generic code, no? > > Getting the debug core to track breakpoints that are stuck on would be a > good improvement. Do you mean tracking failed removals with an extra kgdb_bptype variant? > > We would be able to use that logic to retry the removal of stuck > breakpoint once other SMP cores are running again. That would be great > for architectures like arm64 that use spin-with-irqsave locking in their > noinstr text_poke() machinery. > > However it won't solve the problem for x86 since it uses mutex locking. > > A partial solution might be to get kgdb_arch_remove_breakpoint() to > disregard the text_mutex completely if kdb_continue_catastrophic is set > (and adding stuck breakpoints to the reasons to inhibit a continue). > This is a partial solution in the sense that it is not safe: we will > simply tell the kernel dev driving the debugger that they are > responsible for the safety of the continue and then disable the safety > rails. > > I haven't yet come up with a full fix that doesn't require > text_poke_kgdb() to not require text_mutex to be free. I did note that > comment in __text_poke() that says calling get_locked_pte() "is not > really needed, but this allows to avoid open-coding" but I got a bit lost > trying to figure out other locks and nesting concerns. > This is a bit off-topic, but isn't setting software breakpoints in combination with other text modifications unsafe anyway? If we remove a breakpoint in a code location that has been modified in the meantime, we would restore an outdated saved_instr. What am I missing here? Best regards, Flo |
From: Florian R. <ma...@fl...> - 2024-08-13 15:05:50
|
Hi Thomas, Thanks for the feedback. On Mon, 2024-08-12 at 23:04 +0200, Thomas Gleixner wrote: > Either you know it or not. Speculation is reserved for CPUs. I have now checked it, it is always the static_key patching mechanism when I reproduce the issue (in QEMU with a varying number of CPUs), but we can skip this statement. > > > { > > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) { > > + struct kgdb_bkpt *bpt; > > + int i, error; > > + > > regs->ip -= 1; > > + > > + /* > > + * Try to remove the spurious int3 instruction. > > + * These int3s can result from failed breakpoint removals > > + * in kgdb_arch_remove_breakpoint. > > + */ > > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { > > + if (kgdb_break[i].bpt_addr == regs->ip && > > + kgdb_break[i].state == BP_REMOVED && > > + (kgdb_break[i].type == BP_BREAKPOINT || > > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) { > > + bpt = &kgdb_break[i]; > > + break; > > + } > > + } > > Seriously? The KGBD core already walked that array in > kgdb_isremovedbreak() just so you can walk it again here. > > struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr) > { > struct kgdb_bkpt *bp = kgdb_break; > > for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) { > if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr) > return bp; > } > return NULL; > } > > bool kgdb_isremovedbreak(unsigned long addr) > { > return !!kgdb_get_removed_breakpoint(addr); > } > > bool kgdb_rewind_exception(int exception, struct pt_regs *regs) > { > struct kgdb_bkpt *bp; > > if (exception != 3) > return false; > > bp = kgdb_get_removed_breakpoint(--regs->ip); > if (!bp || !bp->type == BP_BREAKPOINT) > return false; > > Hmm? Ok, ok, looks much better. My intention was to keep the changes in the generic debug core minimal, especially since efficiency is not important here.. but I see, it should be done right. Best regards, Flo |
From: Daniel T. <dan...@li...> - 2024-08-13 11:55:29
|
On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote: > Florian! > > On Mon, Aug 12 2024 at 19:43, Florian Rommel wrote: > > On x86, occasionally, the removal of a breakpoint (i.e., removal of > > the int3 instruction) fails because the text_mutex is taken by another > > CPU (mainly due to the static_key mechanism, I think). > > Either you know it or not. Speculation is reserved for CPUs. > > > The function kgdb_skipexception catches exceptions from these spurious > > int3 instructions, bails out of KGDB, and continues execution from the > > previous PC address. > > > > However, this led to an endless loop between the int3 instruction and > > kgdb_skipexception since the int3 instruction (being still present) > > triggered again. This effectively caused the system to hang. > > > > With this patch, we try to remove the concerned spurious int3 > > instruction in kgdb_skipexception before continuing execution. This > > may take a few attempts until the concurrent holders of the text_mutex > > have released it, but eventually succeeds and the kernel can continue. > > What guarantees the release of text mutex? > > > Signed-off-by: Florian Rommel <ma...@fl...> > > --- > > arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c > > index 64c332151af7..585a7a72af74 100644 > > --- a/arch/x86/kernel/kgdb.c > > +++ b/arch/x86/kernel/kgdb.c > > @@ -723,7 +723,31 @@ void kgdb_arch_exit(void) > > int kgdb_skipexception(int exception, struct pt_regs *regs) > > Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the > instruction pointer to the exception address and does not skip anything, > but that's an orthogonal issue though it could be cleaned up along the > way... kgdb_skipexception() is not a directive from debug core to architecture. It is a question to the archictecture: should the debug core skip normal debug exception handling and resume immediately?. It allows an architecture to direct the debug core to ignore a spurious trap that, according to the comments, can occur on some (micro)architectures when we have already restored the original not-a-breakpoint instruction. Florian's patch changes things so that we will also skip debug exception handling if we can successfully poke to the text section. I don't think it is sufficient on its own since the text_mutex could be owned by the core that is stuck trapping on the int3 that we can't remove. > > { > > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) { > > + struct kgdb_bkpt *bpt; > > + int i, error; > > + > > regs->ip -= 1; > > + > > + /* > > + * Try to remove the spurious int3 instruction. > > + * These int3s can result from failed breakpoint removals > > + * in kgdb_arch_remove_breakpoint. > > + */ > > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { > > + if (kgdb_break[i].bpt_addr == regs->ip && > > + kgdb_break[i].state == BP_REMOVED && > > + (kgdb_break[i].type == BP_BREAKPOINT || > > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) { > > + bpt = &kgdb_break[i]; > > + break; > > + } > > + } > > Seriously? The KGBD core already walked that array in > kgdb_isremovedbreak() just so you can walk it again here. > > struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr) > { > struct kgdb_bkpt *bp = kgdb_break; > > for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) { > if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr) > return bp; > } > return NULL; > } > > bool kgdb_isremovedbreak(unsigned long addr) > { > return !!kgdb_get_removed_breakpoint(addr); > } > > bool kgdb_rewind_exception(int exception, struct pt_regs *regs) > { > struct kgdb_bkpt *bp; > > if (exception != 3) > return false; > > bp = kgdb_get_removed_breakpoint(--regs->ip); > if (!bp || !bp->type == BP_BREAKPOINT) > return false; > > Hmm? > > > + error = kgdb_arch_remove_breakpoint(bpt); > > + if (error) > > + pr_err("skipexception: breakpoint remove failed: %lx\n", > > + bpt->bpt_addr); > > Lacks curly brackets. See Documentation. > > return !error; > > Aside of that the same problem exists on PowerPC. So you can move the > attempt to remove the breakpoint into the generic code, no? Getting the debug core to track breakpoints that are stuck on would be a good improvement. We would be able to use that logic to retry the removal of stuck breakpoint once other SMP cores are running again. That would be great for architectures like arm64 that use spin-with-irqsave locking in their noinstr text_poke() machinery. However it won't solve the problem for x86 since it uses mutex locking. A partial solution might be to get kgdb_arch_remove_breakpoint() to disregard the text_mutex completely if kdb_continue_catastrophic is set (and adding stuck breakpoints to the reasons to inhibit a continue). This is a partial solution in the sense that it is not safe: we will simply tell the kernel dev driving the debugger that they are responsible for the safety of the continue and then disable the safety rails. I haven't yet come up with a full fix that doesn't require text_poke_kgdb() to not require text_mutex to be free. I did note that comment in __text_poke() that says calling get_locked_pte() "is not really needed, but this allows to avoid open-coding" but I got a bit lost trying to figure out other locks and nesting concerns. Daniel. [1] I try to avoid calling them "users" because |
From: Doug A. <dia...@ch...> - 2024-08-12 21:07:17
|
Hi, On Mon, Aug 12, 2024 at 1:55 AM Florian Rommel <ma...@fl...> wrote: > > The test for access watchpoints (hw_access_break_test) was broken > (always failed) because the compiler optimized out the write to the > static helper variable (hw_break_val2), as it is never read anywhere. > This resulted in the target variable (hw_break_val) not being accessed > and thus the breakpoint not being triggered. > > Remove the helper variable (hw_break_val2), and use READ_ONCE to force > reading the target variable (hw_break_val). > > Signed-off-by: Florian Rommel <ma...@fl...> > --- > drivers/misc/kgdbts.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Douglas Anderson <dia...@ch...> |
From: Thomas G. <tg...@li...> - 2024-08-12 21:04:22
|
Florian! On Mon, Aug 12 2024 at 19:43, Florian Rommel wrote: > On x86, occasionally, the removal of a breakpoint (i.e., removal of > the int3 instruction) fails because the text_mutex is taken by another > CPU (mainly due to the static_key mechanism, I think). Either you know it or not. Speculation is reserved for CPUs. > The function kgdb_skipexception catches exceptions from these spurious > int3 instructions, bails out of KGDB, and continues execution from the > previous PC address. > > However, this led to an endless loop between the int3 instruction and > kgdb_skipexception since the int3 instruction (being still present) > triggered again. This effectively caused the system to hang. > > With this patch, we try to remove the concerned spurious int3 > instruction in kgdb_skipexception before continuing execution. This > may take a few attempts until the concurrent holders of the text_mutex > have released it, but eventually succeeds and the kernel can continue. What guarantees the release of text mutex? > Signed-off-by: Florian Rommel <ma...@fl...> > --- > arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c > index 64c332151af7..585a7a72af74 100644 > --- a/arch/x86/kernel/kgdb.c > +++ b/arch/x86/kernel/kgdb.c > @@ -723,7 +723,31 @@ void kgdb_arch_exit(void) > int kgdb_skipexception(int exception, struct pt_regs *regs) Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the instruction pointer to the exception address and does not skip anything, but that's an orthogonal issue though it could be cleaned up along the way... > { > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) { > + struct kgdb_bkpt *bpt; > + int i, error; > + > regs->ip -= 1; > + > + /* > + * Try to remove the spurious int3 instruction. > + * These int3s can result from failed breakpoint removals > + * in kgdb_arch_remove_breakpoint. > + */ > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { > + if (kgdb_break[i].bpt_addr == regs->ip && > + kgdb_break[i].state == BP_REMOVED && > + (kgdb_break[i].type == BP_BREAKPOINT || > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) { > + bpt = &kgdb_break[i]; > + break; > + } > + } Seriously? The KGBD core already walked that array in kgdb_isremovedbreak() just so you can walk it again here. struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr) { struct kgdb_bkpt *bp = kgdb_break; for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) { if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr) return bp; } return NULL; } bool kgdb_isremovedbreak(unsigned long addr) { return !!kgdb_get_removed_breakpoint(addr); } bool kgdb_rewind_exception(int exception, struct pt_regs *regs) { struct kgdb_bkpt *bp; if (exception != 3) return false; bp = kgdb_get_removed_breakpoint(--regs->ip); if (!bp || !bp->type == BP_BREAKPOINT) return false; Hmm? > + error = kgdb_arch_remove_breakpoint(bpt); > + if (error) > + pr_err("skipexception: breakpoint remove failed: %lx\n", > + bpt->bpt_addr); Lacks curly brackets. See Documentation. return !error; Aside of that the same problem exists on PowerPC. So you can move the attempt to remove the breakpoint into the generic code, no? Thanks, tglx |
From: Thomas G. <tg...@li...> - 2024-08-12 19:19:29
|
Florian! On Mon, Aug 12 2024 at 19:43, Florian Rommel wrote: > On x86, after booting, the kernel text is read-only. Then, KGDB has to > use the text_poke mechanism to install software breakpoints. KGDB > uses a special (x86-specific) breakpoint type for these kinds of > breakpoints (BP_POKE_BREAKPOINT). When removing a breakpoint, KGDB > always adheres to the breakpoint's original installment method, which is > determined by its type. > > Before this fix, early (non-"poke") breakpoints could not be removed > after the kernel text was set as read-only since the original code > patching mechanism was no longer allowed to remove the breakpoints. > Eventually, this even caused the kernel to hang (loop between int3 > instruction and the function kgdb_skipexception). > > With this patch, we convert early breakpoints to "poke" breakpoints > after the kernel text has been made read-only. This makes them > removable later. Please check Documentation/process/ including maintainers.tip for change log rules. But aside of that why having this BP_TYPE dance in the first place? kgdb_arch_set_breakpoint(...) { if (system_state == SYSTEM_BOOTING) { text_poke_early(...); return; } if (mutex_is_locked(&text_mutex)) return -EBUSY; text_poke_kgdb(...); } See? No breakpoint type, no magic post readonly fixup, nothing. Similar for arch_remove_breakpoint(). I reply to that gem on the other patch. Thanks, tglx |
From: Florian R. <ma...@fl...> - 2024-08-12 17:45:39
|
On x86, occasionally, the removal of a breakpoint (i.e., removal of the int3 instruction) fails because the text_mutex is taken by another CPU (mainly due to the static_key mechanism, I think). The function kgdb_skipexception catches exceptions from these spurious int3 instructions, bails out of KGDB, and continues execution from the previous PC address. However, this led to an endless loop between the int3 instruction and kgdb_skipexception since the int3 instruction (being still present) triggered again. This effectively caused the system to hang. With this patch, we try to remove the concerned spurious int3 instruction in kgdb_skipexception before continuing execution. This may take a few attempts until the concurrent holders of the text_mutex have released it, but eventually succeeds and the kernel can continue. Signed-off-by: Florian Rommel <ma...@fl...> --- arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c index 64c332151af7..585a7a72af74 100644 --- a/arch/x86/kernel/kgdb.c +++ b/arch/x86/kernel/kgdb.c @@ -723,7 +723,31 @@ void kgdb_arch_exit(void) int kgdb_skipexception(int exception, struct pt_regs *regs) { if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) { + struct kgdb_bkpt *bpt; + int i, error; + regs->ip -= 1; + + /* + * Try to remove the spurious int3 instruction. + * These int3s can result from failed breakpoint removals + * in kgdb_arch_remove_breakpoint. + */ + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { + if (kgdb_break[i].bpt_addr == regs->ip && + kgdb_break[i].state == BP_REMOVED && + (kgdb_break[i].type == BP_BREAKPOINT || + kgdb_break[i].type == BP_POKE_BREAKPOINT)) { + bpt = &kgdb_break[i]; + break; + } + } + if (!bpt) + return 1; + error = kgdb_arch_remove_breakpoint(bpt); + if (error) + pr_err("skipexception: breakpoint remove failed: %lx\n", + bpt->bpt_addr); return 1; } return 0; -- 2.46.0 |
From: Florian R. <ma...@fl...> - 2024-08-12 17:45:31
|
On x86, after booting, the kernel text is read-only. Then, KGDB has to use the text_poke mechanism to install software breakpoints. KGDB uses a special (x86-specific) breakpoint type for these kinds of breakpoints (BP_POKE_BREAKPOINT). When removing a breakpoint, KGDB always adheres to the breakpoint's original installment method, which is determined by its type. Before this fix, early (non-"poke") breakpoints could not be removed after the kernel text was set as read-only since the original code patching mechanism was no longer allowed to remove the breakpoints. Eventually, this even caused the kernel to hang (loop between int3 instruction and the function kgdb_skipexception). With this patch, we convert early breakpoints to "poke" breakpoints after the kernel text has been made read-only. This makes them removable later. Signed-off-by: Florian Rommel <ma...@fl...> --- v2: Add missing stub implementation for kgdb_after_mark_readonly A patch for this problem has already been proposed by Stefan Saecherl and Lorena Kretzschmar [1]. Their solution is different from the one I suggest here (it fixes the problem on removal, not "in advance"). Unfortunately, Lorena and Stefan's patch has not been accepted / the conversation has fallen asleep. One point of criticism concerned possible problems with reused init code pages. This should not be a problem with my patch. [1] https://lore.kernel.org/all/202...@fa.../ arch/x86/kernel/kgdb.c | 14 ++++++++++++++ include/linux/kgdb.h | 4 ++++ init/main.c | 1 + kernel/debug/debug_core.c | 7 ++++++- 4 files changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c index 9c9faa1634fb..64c332151af7 100644 --- a/arch/x86/kernel/kgdb.c +++ b/arch/x86/kernel/kgdb.c @@ -623,6 +623,20 @@ int kgdb_arch_init(void) return retval; } +void kgdb_after_mark_readonly(void) +{ + int i; + + /* Convert all breakpoints in rodata to BP_POKE_BREAKPOINT. */ + for (i = 0; i < KGDB_MAX_BREAKPOINTS; i++) { + if (kgdb_break[i].state != BP_UNDEFINED && + kgdb_break[i].type == BP_BREAKPOINT && + is_kernel_text(kgdb_break[i].bpt_addr)) { + kgdb_break[i].type = BP_POKE_BREAKPOINT; + } + } +} + static void kgdb_hw_overflow_handler(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) { diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h index 76e891ee9e37..903bf833dca1 100644 --- a/include/linux/kgdb.h +++ b/include/linux/kgdb.h @@ -98,6 +98,8 @@ extern int dbg_set_reg(int regno, void *mem, struct pt_regs *regs); # define KGDB_MAX_BREAKPOINTS 1000 #endif +extern struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS]; + #define KGDB_HW_BREAKPOINT 1 /* @@ -360,11 +362,13 @@ extern bool dbg_is_early; extern void __init dbg_late_init(void); extern void kgdb_panic(const char *msg); extern void kgdb_free_init_mem(void); +extern void kgdb_after_mark_readonly(void); #else /* ! CONFIG_KGDB */ #define in_dbg_master() (0) #define dbg_late_init() static inline void kgdb_panic(const char *msg) {} static inline void kgdb_free_init_mem(void) { } +static inline void kgdb_after_mark_readonly(void) {} static inline int kgdb_nmicallback(int cpu, void *regs) { return 1; } #endif /* ! CONFIG_KGDB */ #endif /* _KGDB_H_ */ diff --git a/init/main.c b/init/main.c index 206acdde51f5..33b6e092fed3 100644 --- a/init/main.c +++ b/init/main.c @@ -1441,6 +1441,7 @@ static void mark_readonly(void) mark_rodata_ro(); debug_checkwx(); rodata_test(); + kgdb_after_mark_readonly(); } else if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) { pr_info("Kernel memory protection disabled.\n"); } else if (IS_ENABLED(CONFIG_ARCH_HAS_STRICT_KERNEL_RWX)) { diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c index ce1bb2301c06..9dd6b69f1679 100644 --- a/kernel/debug/debug_core.c +++ b/kernel/debug/debug_core.c @@ -98,7 +98,7 @@ module_param(kgdbreboot, int, 0644); * Holds information about breakpoints in a kernel. These breakpoints are * added and removed by gdb. */ -static struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS] = { +struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS] = { [0 ... KGDB_MAX_BREAKPOINTS-1] = { .state = BP_UNDEFINED } }; @@ -452,6 +452,11 @@ void kgdb_free_init_mem(void) } } +void __weak kgdb_after_mark_readonly(void) +{ + /* Weak implementation, may be overridden by arch code */ +} + #ifdef CONFIG_KGDB_KDB void kdb_dump_stack_on_cpu(int cpu) { -- 2.46.0 |
From: Florian R. <ma...@fl...> - 2024-08-12 17:45:27
|
This series fixes two problems with KGDB on x86 concerning the removal of breakpoints, causing the kernel to hang. Note that breakpoint removal is not only performed when explicitly deleting a breakpoint, but also happens before continuing execution or single stepping. v2: - Add missing stub implementation for kgdb_after_mark_readonly in [1/2] when KGDB disabled - Link to v1: https://lore.kernel.org/all/202...@fl.../ Florian Rommel (2): x86/kgdb: convert early breakpoints to poke breakpoints x86/kgdb: fix hang on failed breakpoint removal arch/x86/kernel/kgdb.c | 38 ++++++++++++++++++++++++++++++++++++++ include/linux/kgdb.h | 4 ++++ init/main.c | 1 + kernel/debug/debug_core.c | 7 ++++++- 4 files changed, 49 insertions(+), 1 deletion(-) -- 2.46.0 |