Thread: [Kgdb-bugreport] [PATCH] ARM: change definition of cpu_relax() for ARM11MPCore
Status: Beta
Brought to you by:
jwessel
From: Will D. <wil...@ar...> - 2010-03-09 16:06:41
|
The cpu_relax() macro is often used in the body of busy-wait loops to ensure that the variable being spun on is re-loaded for each iteration. On the ARM11MPCore processor [where loads are prioritised over stores], spinning in such a loop will prevent the write buffer from draining. If a write contained in the write buffer indirectly affects the variable being spun on, there is a potential for deadlock. This deadlock is experienced when executing the KGDB testsuite on an SMP ARM11MPCore configuration. This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, forcing a flushing of the write buffer on SMP systems. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before. Cc: KGDB Mailing List <kgd...@li...> Acked-by: Catalin Marinas <cat...@ar...> Signed-off-by: Will Deacon <wil...@ar...> --- arch/arm/include/asm/processor.h | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 6a89567..7bed3da 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -91,7 +91,11 @@ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); +#if __LINUX_ARM_ARCH__ == 6 +#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif /* * Create a new kernel thread -- 1.6.3.3 |
From: Will D. <wil...@ar...> - 2010-04-12 17:25:09
|
Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system. On an ARM11MPCore processor, loads are prioritised over stores so it is possible for a store operation to be postponed if a polling loop immediately follows it. If the variable being polled indirectly depends on the outstanding store [for example, another CPU may be polling the variable that is pending modification] then there is the potential for deadlock if interrupts are disabled. This deadlock occurs in the KGDB testsuire when executing on an SMP ARM11MPCore configuration. This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, forcing the write buffer to drain while inside a polling loop on an SMP system. If the Kernel is not compiled for SMP support, this will expand to a barrier() as before. Cc: KGDB Mailing List <kgd...@li...> Cc: Russell King - ARM Linux <li...@ar...> Cc: Linus Torvalds <tor...@li...> Cc: Jason Wessel <jas...@wi...> Acked-by: Catalin Marinas <cat...@ar...> Signed-off-by: Will Deacon <wil...@ar...> --- arch/arm/include/asm/processor.h | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 6a89567..da2078e 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -91,7 +91,16 @@ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); +#if __LINUX_ARM_ARCH__ == 6 +/* + * The store buffer on ARM11MPCore is not guaranteed to drain when the CPU + * is performing aggresive loads. This is usually the case in a polling loop, + * so we add a memory barrier to allow any pending stores to complete. + */ +#define cpu_relax() smp_mb() +#else #define cpu_relax() barrier() +#endif /* * Create a new kernel thread -- 1.6.3.3 |
From: Russell K. - A. L. <li...@ar...> - 2010-04-12 17:33:13
|
On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote: > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > forcing the write buffer to drain while inside a polling loop on an SMP system. > If the Kernel is not compiled for SMP support, this will expand to a barrier() > as before. Linus asked how expensive (in terms of power rather than performance) this was; so far that question has remained unanswered. Can someone please answer his question? |
From: George G. D. <gd...@mv...> - 2010-04-15 17:42:26
|
Hi, On Mon, Apr 12, 2010 at 06:32:47PM +0100, Russell King - ARM Linux wrote: > On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote: > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > > forcing the write buffer to drain while inside a polling loop on an SMP system. > > If the Kernel is not compiled for SMP support, this will expand to a barrier() > > as before. If I've followed these threads [1][2] correctly, this ARM11 MPCore issue was discovered while running the "KGDB: internal test suite" (KGDB_TESTS) and that problem is resolved via "kgdb: use atomic_inc and atomic_dec instead of atomic_set" [3]. If so, isn't the original ARM11 MPCore KGDB cpu_relax() issue just a red herring? Shouldn't any polling loops which depend on specific (hardware) write/read order implement appropriate barriers rather than rely on cpu_relax() to guarantee order? If perhaps there are indeed other cases where cpu_relax() is being used incorrectly, then maybe those should be fixed instead? Just curious... > Linus asked how expensive (in terms of power rather than performance) > this was; so far that question has remained unanswered. Can someone > please answer his question? Thanks! -- Regards, George [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-March/010691.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-March/011076.html [3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=ae6bf53e0255c8ab04b6fe31806e318432570e3c |
From: Will D. <wil...@ar...> - 2010-04-16 13:55:10
|
Hi George, > Hi, > > On Mon, Apr 12, 2010 at 06:32:47PM +0100, Russell King - ARM Linux wrote: > > On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote: > > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > > > forcing the write buffer to drain while inside a polling loop on an SMP system. > > > If the Kernel is not compiled for SMP support, this will expand to a barrier() > > > as before. > > If I've followed these threads [1][2] correctly, this ARM11 MPCore issue > was discovered while running the "KGDB: internal test suite" (KGDB_TESTS) > and that problem is resolved via "kgdb: use atomic_inc and atomic_dec > instead of atomic_set" [3]. Yes, I encountered the issue in the KGDB testsuite. However, the patch [3] in mainline doesn't resolve the issue as pointed out here: http://lkml.org/lkml/2010/4/8/214 > If so, isn't the original ARM11 MPCore KGDB > cpu_relax() issue just a red herring? Shouldn't any polling loops > which depend on specific (hardware) write/read order implement appropriate > barriers rather than rely on cpu_relax() to guarantee order? If > perhaps there are indeed other cases where cpu_relax() is being used > incorrectly, then maybe those should be fixed instead? Just curious... As pointed out by Jamie Lokier, this isn't about ordering, it's about ensuring that stores eventually make it to memory. According to Linus: `Linux does expect that if some other CPU modifies a memory location, then we _will_ see that modification eventually. If the CPU needs help to do so, then cpu_relax() needs to do that.' Since the ARM11MPCore doesn't make this guarantee in hardware, then we have to fix it up in software. We can either fix up every polling loop in the Kernel individually or [as suggested above] modify cpu_relax() to do the work for us. Note that we only need to modify cpu_relax() for this particular core; other ARM SMP variants behave more sanely. Thanks, Will |
From: Jamie L. <ja...@sh...> - 2010-04-15 21:03:37
|
George G. Davis wrote: > Hi, > > On Mon, Apr 12, 2010 at 06:32:47PM +0100, Russell King - ARM Linux wrote: > > On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote: > > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > > > forcing the write buffer to drain while inside a polling loop on an SMP system. > > > If the Kernel is not compiled for SMP support, this will expand to a barrier() > > > as before. > > If I've followed these threads [1][2] correctly, this ARM11 MPCore issue > was discovered while running the "KGDB: internal test suite" (KGDB_TESTS) > and that problem is resolved via "kgdb: use atomic_inc and atomic_dec > instead of atomic_set" [3]. If so, isn't the original ARM11 MPCore KGDB > cpu_relax() issue just a red herring? Shouldn't any polling loops > which depend on specific (hardware) write/read order implement appropriate > barriers rather than rely on cpu_relax() to guarantee order? Note that the need to force the write buffer to drain is _not_ an ordering issue. It's a buffer draining issue. :-) I'm not sure if Linux smp_wmb() guarantees to ensure prior writes will be visible to other CPUs in a short time, or if it only guarantees write order. -- Jamie |
From: Will D. <wil...@ar...> - 2010-04-19 14:39:43
|
Hi Russell, > On Mon, Apr 12, 2010 at 06:23:58PM +0100, Will Deacon wrote: > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > > forcing the write buffer to drain while inside a polling loop on an SMP system. > > If the Kernel is not compiled for SMP support, this will expand to a barrier() > > as before. > > Linus asked how expensive (in terms of power rather than performance) > this was; so far that question has remained unanswered. Can someone > please answer his question? I gave my opinion on the potential performance impact here: http://sourceforge.net/mailarchive/message.php?msg_name=000001cada64%24b9f84200%242de8c600%24%40deacon%40arm.com To summarise: in the context of polling loops, I believe that the performance impact of adding an smp_mb() to cpu_relax() for v6 SMP is minimal. If cpu_relax() is used in other scenarios, the performance impact can only be reviewed on a case-by-case basis. Are there any other issues that need to be considered, or can I submit this to the patch system? Thanks, Will |
From: Russell K. - A. L. <li...@ar...> - 2010-03-09 16:41:25
|
On Tue, Mar 09, 2010 at 04:06:08PM +0000, Will Deacon wrote: > The cpu_relax() macro is often used in the body of busy-wait loops to ensure > that the variable being spun on is re-loaded for each iteration. No, cpu_relax() exists to avoid x86 CPUs overheating - if you spin like so: for(;;); the CPU will overheat, so it's conventional to write: for(;;) cpu_relax(); so that architectures can prevent those kinds of problems occuring. cpu_relax() is also defined to be a compiler barrier so that the compiler reloads the variable on every iteration. > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > forcing a flushing of the write buffer on SMP systems. If the Kernel is not > compiled for SMP support, this will expand to a barrier() as before. I don't think this is correct. You're making a macro do something on ARM which no other platform, apart from blackfin (which I believe is wrong) makes it do. |
From: Will D. <wil...@ar...> - 2010-03-09 16:35:24
|
Hi Russell, > On Tue, Mar 09, 2010 at 04:06:08PM +0000, Will Deacon wrote: > > The cpu_relax() macro is often used in the body of busy-wait loops to ensure > > that the variable being spun on is re-loaded for each iteration. > > No, cpu_relax() exists to avoid x86 CPUs overheating - if you spin like > so: > > for(;;); > > the CPU will overheat, so it's conventional to write: > > for(;;) > cpu_relax(); > > so that architectures can prevent those kinds of problems occuring. Ok. I was going by the comments in Documentation/volatile-considered-harmful.txt where cpu_relax() is also used as a memory barrier. > cpu_relax() is also defined to be a compiler barrier so that the compiler > reloads the variable on every iteration. > > > This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores, > > forcing a flushing of the write buffer on SMP systems. If the Kernel is not > > compiled for SMP support, this will expand to a barrier() as before. > > I don't think this is correct. You're making a macro do something on ARM > which no other platform, apart from blackfin (which I believe is wrong) > makes it do. In the KGDB case [where this cropped up], if cpu_relax() is left as it is, then an smp_mb() is required in the architecture independent code. This also seems wrong because it's only needed for the ARM11MPCore. There may also potentially be other situations in the Kernel which are prone to deadlock because it is assumed that the write buffer will always drain. Since both solutions to this problem are nasty, it would be good to get some clarification on the definition of cpu_relax() across the Kernel. If the KGDB guys don't mind adding an extra memory barrier, we can solve the problem that way instead and hope that it doesn't occur elsewhere. Thoughts? Cheers, Will |
From: Russell K. - A. L. <li...@ar...> - 2010-03-09 16:49:54
|
On Tue, Mar 09, 2010 at 04:35:00PM -0000, Will Deacon wrote: > Hi Russell, > > > On Tue, Mar 09, 2010 at 04:06:08PM +0000, Will Deacon wrote: > > > The cpu_relax() macro is often used in the body of busy-wait loops to ensure > > > that the variable being spun on is re-loaded for each iteration. > > > > No, cpu_relax() exists to avoid x86 CPUs overheating - if you spin like > > so: > > > > for(;;); > > > > the CPU will overheat, so it's conventional to write: > > > > for(;;) > > cpu_relax(); > > > > so that architectures can prevent those kinds of problems occuring. > > Ok. I was going by the comments in Documentation/volatile-considered-harmful.txt > where cpu_relax() is also used as a memory barrier. I thought you might; I've just submitted a patch for that to akpm, lkml and linux-arch. > In the KGDB case [where this cropped up], if cpu_relax() is left as it is, then > an smp_mb() is required in the architecture independent code. This also seems wrong > because it's only needed for the ARM11MPCore. There may also potentially be other > situations in the Kernel which are prone to deadlock because it is assumed that the > write buffer will always drain. Why is KGDB being special about this? Ah yes, it's being brain dead: static atomic_t passive_cpu_wait[NR_CPUS]; static atomic_t cpu_in_kgdb[NR_CPUS]; while (atomic_read(&passive_cpu_wait[cpu])) cpu_relax(); for (i = 0; i < NR_CPUS; i++) atomic_set(&passive_cpu_wait[i], 1); for (i = NR_CPUS-1; i >= 0; i--) atomic_set(&passive_cpu_wait[i], 0); atomic_set(&cpu_in_kgdb[cpu], 1); atomic_set(&cpu_in_kgdb[cpu], 0); atomic_set(&cpu_in_kgdb[ks->cpu], 1); for_each_online_cpu(i) { while (!atomic_read(&cpu_in_kgdb[i])) cpu_relax(); } atomic_set(&cpu_in_kgdb[ks->cpu], 0); for_each_online_cpu(i) { while (atomic_read(&cpu_in_kgdb[i])) cpu_relax(); } if (!atomic_read(&cpu_in_kgdb[cpu]) && atomic_read(&kgdb_active) != cpu && atomic_read(&cpu_in_kgdb[atomic_read(&kgdb_active)])) { None of the above is atomic in any way, and I suspect that all these places in kernel/kgdb.c are buggy. As I've said in the past, just because something is called 'atomic' does not make it so: On atomic_set(): The setting is atomic in that the return values of the atomic operations by all threads are guaranteed to be correct reflecting either the value that has been set with this operation or set with another operation. A proper implicit or explicit memory barrier is needed before the value set with the operation is guaranteed to be readable with atomic_read from another thread. On atomic_read(): The read is atomic in that the return value is guaranteed to be one of the values initialized or modified with the interface operations if a proper implicit or explicit memory barrier is used after possible runtime initialization by any other thread and the value is modified only with the interface operations. atomic_read does not guarantee that the runtime initialization by any other thread is visible yet, so the user of the interface must take care of that with a proper implicit or explicit memory barrier. Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not match this documentation - it's certainly missing the barriers as required by the above quoted paragraphs. Let me repeat: atomic_set() and atomic_read() are NOT atomic. There's nothing atomic about them. All they do is provide a pair of accessors to the underlying value in the atomic type. They are no different to declaring a volatile int and reading/writing it directly. |
From: Will D. <wil...@ar...> - 2010-03-09 18:00:01
|
> > Ok. I was going by the comments in Documentation/volatile-considered-harmful.txt > > where cpu_relax() is also used as a memory barrier. > > I thought you might; I've just submitted a patch for that to akpm, lkml > and linux-arch. Cheers, I'll take a read. > > In the KGDB case [where this cropped up], if cpu_relax() is left as it is, then > > an smp_mb() is required in the architecture independent code. This also seems wrong > > because it's only needed for the ARM11MPCore. There may also potentially be other > > situations in the Kernel which are prone to deadlock because it is assumed that the > > write buffer will always drain. > > Why is KGDB being special about this? Ah yes, it's being brain dead: <snip> > Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not > match this documentation - it's certainly missing the barriers as required > by the above quoted paragraphs. > > Let me repeat: atomic_set() and atomic_read() are NOT atomic. There's > nothing atomic about them. All they do is provide a pair of accessors > to the underlying value in the atomic type. They are no different to > declaring a volatile int and reading/writing it directly. Indeed. I'm not familiar enough with KGDB internals to dive in and look at all the potential barrier conflicts, so I'll submit a patch that addresses the one that's bitten me so far. Maybe it will motivate somebody else to take a closer look! Cheers, Will |
From: DDD <don...@wi...> - 2010-03-11 02:40:41
|
Jason Wessel wrote: > Will Deacon wrote: > >>> Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not >>> match this documentation - it's certainly missing the barriers as required >>> by the above quoted paragraphs. >>> >>> Let me repeat: atomic_set() and atomic_read() are NOT atomic. There's >>> nothing atomic about them. All they do is provide a pair of accessors >>> to the underlying value in the atomic type. They are no different to >>> declaring a volatile int and reading/writing it directly. > > > Clearly the docs state this. > >> Indeed. I'm not familiar enough with KGDB internals to dive in and look at all the >> potential barrier conflicts, so I'll submit a patch that addresses the one that's >> bitten me so far. Maybe it will motivate somebody else to take a closer look! >> > > > Do you think you can try the patch below? > > It seems we might not need to change to using the atomic_add_return(0,...) because using the atomic_inc() and atomic_dec() will end up using the memory barriers. > Hi Jason, Maybe we should initial the atomic_t variable before we using such as atomic_inc/dec() directly. Dongdong. --- a/kernel/kgdb.c +++ b/kernel/kgdb.c @@ -227,6 +227,17 @@ kgdb_post_primary_code(struct pt_regs *regs, int e_vector, int err_code) return; } +static void kgdb_initial_atomic_var() +{ + int i; + for (i = NR_CPUS-1; i >= 0; i--) { + atomic_set(&passive_cpu_wait[i], 0); + atomic_set(&cpu_in_kgdb[i], 0); + } + + atomic_set(&kgdb_setting_breakpoint, 0); +} + /** * kgdb_disable_hw_debug - Disable hardware debugging while we in kgdb. * @regs: Current &struct pt_regs. @@ -1619,6 +1630,7 @@ static void kgdb_register_callbacks(void) { if (!kgdb_io_module_registered) { kgdb_io_module_registered = 1; + kgdb_initial_atomic_var(); kgdb_arch_init(); #ifdef CONFIG_MAGIC_SYSRQ register_sysrq_key('g', &sysrq_gdb_op); > I would certainly rather fix kgdb vs mucking with the internals of cpu_relax(). > > > Jason. > > --- a/kernel/kgdb.c > +++ b/kernel/kgdb.c > @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re > * Make sure the above info reaches the primary CPU before > * our cpu_in_kgdb[] flag setting does: > */ > - smp_wmb(); > - atomic_set(&cpu_in_kgdb[cpu], 1); > + atomic_inc(&cpu_in_kgdb[cpu]); > > /* Disable any cpu specific hw breakpoints */ > kgdb_disable_hw_debug(regs); > > /* Wait till primary CPU is done with debugging */ > - while (atomic_read(&passive_cpu_wait[cpu])) > + while (atomic_add_return(0, &passive_cpu_wait[cpu])) > cpu_relax(); > > kgdb_info[cpu].debuggerinfo = NULL; > @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re > arch_kgdb_ops.correct_hw_break(); > > /* Signal the primary CPU that we are done: */ > - atomic_set(&cpu_in_kgdb[cpu], 0); > + atomic_dec(&cpu_in_kgdb[cpu]); > touch_softlockup_watchdog_sync(); > clocksource_touch_watchdog(); > local_irq_restore(flags); > @@ -1493,7 +1492,7 @@ acquirelock: > * spin_lock code is good enough as a barrier so we don't > * need one here: > */ > - atomic_set(&cpu_in_kgdb[ks->cpu], 1); > + atomic_inc(&cpu_in_kgdb[ks->cpu]); > > #ifdef CONFIG_SMP > /* Signal the other CPUs to enter kgdb_wait() */ > @@ -1505,7 +1504,7 @@ acquirelock: > * Wait for the other CPUs to be notified and be waiting for us: > */ > for_each_online_cpu(i) { > - while (!atomic_read(&cpu_in_kgdb[i])) > + while (!atomic_add_return(0, &cpu_in_kgdb[i])) > cpu_relax(); > } > > @@ -1528,7 +1527,7 @@ acquirelock: > > kgdb_info[ks->cpu].debuggerinfo = NULL; > kgdb_info[ks->cpu].task = NULL; > - atomic_set(&cpu_in_kgdb[ks->cpu], 0); > + atomic_dec(&cpu_in_kgdb[ks->cpu]); > > if (!kgdb_single_step) { > for (i = NR_CPUS-1; i >= 0; i--) > @@ -1538,7 +1537,7 @@ acquirelock: > * from the debugger. > */ > for_each_online_cpu(i) { > - while (atomic_read(&cpu_in_kgdb[i])) > + while (atomic_add_return(0, &cpu_in_kgdb[i])) > cpu_relax(); > } > } > @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod > */ > void kgdb_breakpoint(void) > { > - atomic_set(&kgdb_setting_breakpoint, 1); > + atomic_inc(&kgdb_setting_breakpoint); > wmb(); /* Sync point before breakpoint */ > arch_kgdb_breakpoint(); > wmb(); /* Sync point after breakpoint */ > - atomic_set(&kgdb_setting_breakpoint, 0); > + atomic_dec(&kgdb_setting_breakpoint); > } > EXPORT_SYMBOL_GPL(kgdb_breakpoint); > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Kgdb-bugreport mailing list > Kgd...@li... > https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport > |
From: Will D. <wil...@ar...> - 2010-03-11 13:53:22
|
> Maybe we should initial the atomic_t variable before we using such as > atomic_inc/dec() directly. > > Dongdong. > > > --- a/kernel/kgdb.c > +++ b/kernel/kgdb.c > @@ -227,6 +227,17 @@ kgdb_post_primary_code(struct pt_regs *regs, int > e_vector, int err_code) > return; > } > > +static void kgdb_initial_atomic_var() > +{ > + int i; > + for (i = NR_CPUS-1; i >= 0; i--) { > + atomic_set(&passive_cpu_wait[i], 0); > + atomic_set(&cpu_in_kgdb[i], 0); > + } > + > + atomic_set(&kgdb_setting_breakpoint, 0); > +} > + Given that passive_cpu_wait and cpu_in_kgdb are static, I think those guys are alright as-is. kgdb_setting_breakpoint should probably be reset though. Will |
From: Jason W. <jas...@wi...> - 2010-03-10 22:06:56
|
Will Deacon wrote: >> Clearly, kgdb is using atomic_set()/atomic_read() in a way which does not >> match this documentation - it's certainly missing the barriers as required >> by the above quoted paragraphs. >> >> Let me repeat: atomic_set() and atomic_read() are NOT atomic. There's >> nothing atomic about them. All they do is provide a pair of accessors >> to the underlying value in the atomic type. They are no different to >> declaring a volatile int and reading/writing it directly. Clearly the docs state this. > > Indeed. I'm not familiar enough with KGDB internals to dive in and look at all the > potential barrier conflicts, so I'll submit a patch that addresses the one that's > bitten me so far. Maybe it will motivate somebody else to take a closer look! > Do you think you can try the patch below? It seems we might not need to change to using the atomic_add_return(0,...) because using the atomic_inc() and atomic_dec() will end up using the memory barriers. I would certainly rather fix kgdb vs mucking with the internals of cpu_relax(). Jason. --- a/kernel/kgdb.c +++ b/kernel/kgdb.c @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re * Make sure the above info reaches the primary CPU before * our cpu_in_kgdb[] flag setting does: */ - smp_wmb(); - atomic_set(&cpu_in_kgdb[cpu], 1); + atomic_inc(&cpu_in_kgdb[cpu]); /* Disable any cpu specific hw breakpoints */ kgdb_disable_hw_debug(regs); /* Wait till primary CPU is done with debugging */ - while (atomic_read(&passive_cpu_wait[cpu])) + while (atomic_add_return(0, &passive_cpu_wait[cpu])) cpu_relax(); kgdb_info[cpu].debuggerinfo = NULL; @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re arch_kgdb_ops.correct_hw_break(); /* Signal the primary CPU that we are done: */ - atomic_set(&cpu_in_kgdb[cpu], 0); + atomic_dec(&cpu_in_kgdb[cpu]); touch_softlockup_watchdog_sync(); clocksource_touch_watchdog(); local_irq_restore(flags); @@ -1493,7 +1492,7 @@ acquirelock: * spin_lock code is good enough as a barrier so we don't * need one here: */ - atomic_set(&cpu_in_kgdb[ks->cpu], 1); + atomic_inc(&cpu_in_kgdb[ks->cpu]); #ifdef CONFIG_SMP /* Signal the other CPUs to enter kgdb_wait() */ @@ -1505,7 +1504,7 @@ acquirelock: * Wait for the other CPUs to be notified and be waiting for us: */ for_each_online_cpu(i) { - while (!atomic_read(&cpu_in_kgdb[i])) + while (!atomic_add_return(0, &cpu_in_kgdb[i])) cpu_relax(); } @@ -1528,7 +1527,7 @@ acquirelock: kgdb_info[ks->cpu].debuggerinfo = NULL; kgdb_info[ks->cpu].task = NULL; - atomic_set(&cpu_in_kgdb[ks->cpu], 0); + atomic_dec(&cpu_in_kgdb[ks->cpu]); if (!kgdb_single_step) { for (i = NR_CPUS-1; i >= 0; i--) @@ -1538,7 +1537,7 @@ acquirelock: * from the debugger. */ for_each_online_cpu(i) { - while (atomic_read(&cpu_in_kgdb[i])) + while (atomic_add_return(0, &cpu_in_kgdb[i])) cpu_relax(); } } @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod */ void kgdb_breakpoint(void) { - atomic_set(&kgdb_setting_breakpoint, 1); + atomic_inc(&kgdb_setting_breakpoint); wmb(); /* Sync point before breakpoint */ arch_kgdb_breakpoint(); wmb(); /* Sync point after breakpoint */ - atomic_set(&kgdb_setting_breakpoint, 0); + atomic_dec(&kgdb_setting_breakpoint); } EXPORT_SYMBOL_GPL(kgdb_breakpoint); |
From: Will D. <wil...@ar...> - 2010-03-11 13:29:55
|
Hello Jason, > Do you think you can try the patch below? Sure - I'll give it a go this afternoon. > It seems we might not need to change to using the atomic_add_return(0,...) because using the > atomic_inc() and atomic_dec() will end up using the memory barriers. The fix I posted to the LKML here: http://lkml.org/lkml/2010/3/9/260 fixed the issue for me, but your patch seems more general [although maybe overkill?]. > I would certainly rather fix kgdb vs mucking with the internals of cpu_relax(). I think that's the general consensus. I was worried about changing the general KGDB code to fix an ARM specific problem, but it turns out that the problem is an issue of semantics. I'll let you know whether your patch fixes the problems for me. Will |
From: Will D. <wil...@ar...> - 2010-03-11 14:52:35
|
> Do you think you can try the patch below? <snip> > --- a/kernel/kgdb.c > +++ b/kernel/kgdb.c > @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re > * Make sure the above info reaches the primary CPU before > * our cpu_in_kgdb[] flag setting does: > */ > - smp_wmb(); > - atomic_set(&cpu_in_kgdb[cpu], 1); > + atomic_inc(&cpu_in_kgdb[cpu]); > > /* Disable any cpu specific hw breakpoints */ > kgdb_disable_hw_debug(regs); > > /* Wait till primary CPU is done with debugging */ > - while (atomic_read(&passive_cpu_wait[cpu])) > + while (atomic_add_return(0, &passive_cpu_wait[cpu])) > cpu_relax(); > > kgdb_info[cpu].debuggerinfo = NULL; > @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re > arch_kgdb_ops.correct_hw_break(); > > /* Signal the primary CPU that we are done: */ > - atomic_set(&cpu_in_kgdb[cpu], 0); > + atomic_dec(&cpu_in_kgdb[cpu]); > touch_softlockup_watchdog_sync(); > clocksource_touch_watchdog(); > local_irq_restore(flags); > @@ -1493,7 +1492,7 @@ acquirelock: > * spin_lock code is good enough as a barrier so we don't > * need one here: > */ > - atomic_set(&cpu_in_kgdb[ks->cpu], 1); > + atomic_inc(&cpu_in_kgdb[ks->cpu]); > > #ifdef CONFIG_SMP > /* Signal the other CPUs to enter kgdb_wait() */ > @@ -1505,7 +1504,7 @@ acquirelock: > * Wait for the other CPUs to be notified and be waiting for us: > */ > for_each_online_cpu(i) { > - while (!atomic_read(&cpu_in_kgdb[i])) > + while (!atomic_add_return(0, &cpu_in_kgdb[i])) > cpu_relax(); > } > > @@ -1528,7 +1527,7 @@ acquirelock: > > kgdb_info[ks->cpu].debuggerinfo = NULL; > kgdb_info[ks->cpu].task = NULL; > - atomic_set(&cpu_in_kgdb[ks->cpu], 0); > + atomic_dec(&cpu_in_kgdb[ks->cpu]); > > if (!kgdb_single_step) { > for (i = NR_CPUS-1; i >= 0; i--) > @@ -1538,7 +1537,7 @@ acquirelock: > * from the debugger. > */ > for_each_online_cpu(i) { > - while (atomic_read(&cpu_in_kgdb[i])) > + while (atomic_add_return(0, &cpu_in_kgdb[i])) > cpu_relax(); > } > } > @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod > */ > void kgdb_breakpoint(void) > { > - atomic_set(&kgdb_setting_breakpoint, 1); > + atomic_inc(&kgdb_setting_breakpoint); > wmb(); /* Sync point before breakpoint */ > arch_kgdb_breakpoint(); > wmb(); /* Sync point after breakpoint */ > - atomic_set(&kgdb_setting_breakpoint, 0); > + atomic_dec(&kgdb_setting_breakpoint); > } > EXPORT_SYMBOL_GPL(kgdb_breakpoint); Appears to work, but it's hard to be 100% with these sorts of things. I ran the basic testsuite, the fork tests and the breakpoint tests as mentioned in drivers/misc/kgdbts.c. The patch might introduce more barriers than we really need, but this isn't performance-critical code and without it, KGDB is broken, so: Tested-by: Will Deacon <wil...@ar...> Cheers, Will |
From: Will D. <wil...@ar...> - 2010-03-16 17:27:33
|
Hi Jason, > Do you think you can try the patch below? > > It seems we might not need to change to using the atomic_add_return(0,...) because using the > atomic_inc() and atomic_dec() will end up using the memory barriers. > > I would certainly rather fix kgdb vs mucking with the internals of cpu_relax(). > > > Jason. > > --- a/kernel/kgdb.c > +++ b/kernel/kgdb.c > @@ -580,14 +580,13 @@ static void kgdb_wait(struct pt_regs *re > * Make sure the above info reaches the primary CPU before > * our cpu_in_kgdb[] flag setting does: > */ > - smp_wmb(); > - atomic_set(&cpu_in_kgdb[cpu], 1); > + atomic_inc(&cpu_in_kgdb[cpu]); > > /* Disable any cpu specific hw breakpoints */ > kgdb_disable_hw_debug(regs); > > /* Wait till primary CPU is done with debugging */ > - while (atomic_read(&passive_cpu_wait[cpu])) > + while (atomic_add_return(0, &passive_cpu_wait[cpu])) > cpu_relax(); > > kgdb_info[cpu].debuggerinfo = NULL; > @@ -598,7 +597,7 @@ static void kgdb_wait(struct pt_regs *re > arch_kgdb_ops.correct_hw_break(); > > /* Signal the primary CPU that we are done: */ > - atomic_set(&cpu_in_kgdb[cpu], 0); > + atomic_dec(&cpu_in_kgdb[cpu]); > touch_softlockup_watchdog_sync(); > clocksource_touch_watchdog(); > local_irq_restore(flags); > @@ -1493,7 +1492,7 @@ acquirelock: > * spin_lock code is good enough as a barrier so we don't > * need one here: > */ > - atomic_set(&cpu_in_kgdb[ks->cpu], 1); > + atomic_inc(&cpu_in_kgdb[ks->cpu]); > > #ifdef CONFIG_SMP > /* Signal the other CPUs to enter kgdb_wait() */ > @@ -1505,7 +1504,7 @@ acquirelock: > * Wait for the other CPUs to be notified and be waiting for us: > */ > for_each_online_cpu(i) { > - while (!atomic_read(&cpu_in_kgdb[i])) > + while (!atomic_add_return(0, &cpu_in_kgdb[i])) > cpu_relax(); > } > > @@ -1528,7 +1527,7 @@ acquirelock: > > kgdb_info[ks->cpu].debuggerinfo = NULL; > kgdb_info[ks->cpu].task = NULL; > - atomic_set(&cpu_in_kgdb[ks->cpu], 0); > + atomic_dec(&cpu_in_kgdb[ks->cpu]); > > if (!kgdb_single_step) { > for (i = NR_CPUS-1; i >= 0; i--) > @@ -1538,7 +1537,7 @@ acquirelock: > * from the debugger. > */ > for_each_online_cpu(i) { > - while (atomic_read(&cpu_in_kgdb[i])) > + while (atomic_add_return(0, &cpu_in_kgdb[i])) > cpu_relax(); > } > } > @@ -1742,11 +1741,11 @@ EXPORT_SYMBOL_GPL(kgdb_unregister_io_mod > */ > void kgdb_breakpoint(void) > { > - atomic_set(&kgdb_setting_breakpoint, 1); > + atomic_inc(&kgdb_setting_breakpoint); > wmb(); /* Sync point before breakpoint */ > arch_kgdb_breakpoint(); > wmb(); /* Sync point after breakpoint */ > - atomic_set(&kgdb_setting_breakpoint, 0); > + atomic_dec(&kgdb_setting_breakpoint); > } > EXPORT_SYMBOL_GPL(kgdb_breakpoint); > What's the status on this patch? Russell has applied the ARM SMP code to his tree now, so it would be good to get this patch into KGDB to avoid the deadlock on ARM11MPCore. I only ask because I can't see it in -next [although I see kgdb.c has moved!]. Cheers, Will |
From: Jason W. <jas...@wi...> - 2010-03-16 18:52:42
|
On 03/16/2010 12:26 PM, Will Deacon wrote: > Hi Jason, > > > What's the status on this patch? Russell has applied the ARM SMP code to his > tree now, so it would be good to get this patch into KGDB to avoid the deadlock > on ARM11MPCore. > > Linux-next presently reflects the state of the outstanding pull request I have to Linus. I have yet to hear back any word as to what happened with my pull request. > I only ask because I can't see it in -next [although I see kgdb.c has moved!]. > It lives in kernel/debug/debug_core.c, assuming the code in linux-next gets pulled into Linus's tree. I already have an updated version of the patch for the new structure, but not pushed to linux-next while my pull request is in limbo. Jason. |