From: Fengguang Wu <fen...@in...> - 2013-10-10 07:03:42
|
On Thu, Oct 10, 2013 at 08:52:33AM +0200, Geert Uytterhoeven wrote: > On Thu, Oct 10, 2013 at 4:46 AM, Fengguang Wu <fen...@in...> wrote: > > On Wed, Oct 09, 2013 at 11:47:33PM +0200, Jan Kara wrote: > >> On Wed 09-10-13 20:43:50, Richard Weinberger wrote: > >> > Am 09.10.2013 19:26, schrieb Toralf Förster: > >> > > On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote: > >> > >> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <tor...@gm...> wrote: > >> > >>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() > >> > >>>> asserts its strict positive?!? > >> > >>>> > >> > >>>> Can you please try the following instead of the BUG_ON(): > >> > >>>> > >> > >>>> if (pause < 0) { > >> > >>>> printk("pages_dirtied = %lu\n", pages_dirtied); > >> > >>>> printk("task_ratelimit = %lu\n", task_ratelimit); > >> > >>>> printk("pause = %ld\n", pause); > > >> > >>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output. > >> > >>> As soon as the issue happens I do have a > >> > >>> > >> > >>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] > >> > >>> > >> > >>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able > >> > >>> to run very basic commands like a crash dump, sysrq ond so on. > >> > >> > >> > >> You may get an idea of the magnitude of pages_dirtied by using a chain of > >> > >> BUG_ON()s, like: > >> > >> > >> > >> BUG_ON(pages_dirtied > 2000000000); > >> > >> BUG_ON(pages_dirtied > 1000000000); > >> > >> BUG_ON(pages_dirtied > 100000000); > >> > >> BUG_ON(pages_dirtied > 10000000); > >> > >> BUG_ON(pages_dirtied > 1000000); > >> > >> > >> > >> Probably 1 million is already too much for normal operation? > >> > >> > >> > > period = HZ * pages_dirtied / task_ratelimit; > >> > > BUG_ON(pages_dirtied > 2000000000); > >> > > BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467 > >> > > >> > Summary for mm people: > >> > > >> > Toralf runs trinty on UML/i386. > >> > After some time pages_dirtied becomes very large. > >> > More than 1000000000 pages in this case. > >> Huh, this is really strange. pages_dirtied is passed into > >> balance_dirty_pages() from current->nr_dirtied. So I wonder how a value > >> over 10^9 can get there. > > > > I noticed aio_setup_ring() in the call trace and find it recently > > added a SetPageDirty() call in a loop by commit 36bc08cc01 ("fs/aio: > > Add support to aio ring pages migration"). So added CC to its authors. > > > >> After all that is over 4TB so I somewhat doubt the > >> task was ever able to dirty that much during its lifetime (but correct me > >> if I'm wrong here, with UML and memory backed disks it is not totally > >> impossible)... I went through the logic of handling ->nr_dirtied but > >> I didn't find any obvious problem there. Hum, maybe one thing - what > >> 'task_ratelimit' values do you see in balance_dirty_pages? If that one was > >> huge, we could possibly accumulate huge current->nr_dirtied. > >> > >> > Thus, period = HZ * pages_dirtied / task_ratelimit overflows > >> > and period/pause becomes extremely large. > > period/pause are signed long, so they become negative instead of > extremely large when overflowing. Yeah. For that we have underflow detect as well: if (pause < min_pause) { ... break; } So we'll break out of the loop -- but yeah, whether the break is the right behavior on underflow is still questionable. > >> > It looks like io_schedule_timeout() get's called with a very large timeout. > >> > I don't know why "if (unlikely(pause > max_pause)) {" does not help. > > Because pause is now negative. So here io_schedule_timeout() won't be called with negative pause. And if ever io_schedule_timeout() calls schedule_timeout() with negative timeout, the latter will emit a warning and break out, too: if (timeout < 0) { printk(KERN_ERR "schedule_timeout: wrong timeout " "value %lx\n", timeout); dump_stack(); current->state = TASK_RUNNING; goto out; } Thanks, Fengguang |
From: Toralf F. <tor...@gm...> - 2013-10-10 16:49:42
|
On 10/10/2013 12:33 AM, Richard Weinberger wrote: > Am 09.10.2013 23:47, schrieb Jan Kara: >> On Wed 09-10-13 20:43:50, Richard Weinberger wrote: >>> CC'ing mm folks. >>> Please see below. >> Added Fenguang to CC since he is the author of this code. > > Thx, get_maintainer.pl didn't list him. > >>> Am 09.10.2013 19:26, schrieb Toralf Förster: >>>> On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote: >>>>> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <tor...@gm...> wrote: >>>>>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() >>>>>>> asserts its strict positive?!? >>>>>>> >>>>>>> Can you please try the following instead of the BUG_ON(): >>>>>>> >>>>>>> if (pause < 0) { >>>>>>> printk("pages_dirtied = %lu\n", pages_dirtied); >>>>>>> printk("task_ratelimit = %lu\n", task_ratelimit); >>>>>>> printk("pause = %ld\n", pause); >>>>>>> } >>>>>>> >>>>>>> Gr{oetje,eeting}s, >>>>>>> >>>>>>> Geert >>>>>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output. >>>>>> As soon as the issue happens I do have a >>>>>> >>>>>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] >>>>>> >>>>>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able >>>>>> to run very basic commands like a crash dump, sysrq ond so on. >>>>> >>>>> You may get an idea of the magnitude of pages_dirtied by using a chain of >>>>> BUG_ON()s, like: >>>>> >>>>> BUG_ON(pages_dirtied > 2000000000); >>>>> BUG_ON(pages_dirtied > 1000000000); >>>>> BUG_ON(pages_dirtied > 100000000); >>>>> BUG_ON(pages_dirtied > 10000000); >>>>> BUG_ON(pages_dirtied > 1000000); >>>>> >>>>> Probably 1 million is already too much for normal operation? >>>>> >>>> period = HZ * pages_dirtied / task_ratelimit; >>>> BUG_ON(pages_dirtied > 2000000000); >>>> BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467 >>> >>> Summary for mm people: >>> >>> Toralf runs trinty on UML/i386. >>> After some time pages_dirtied becomes very large. >>> More than 1000000000 pages in this case. >> Huh, this is really strange. pages_dirtied is passed into >> balance_dirty_pages() from current->nr_dirtied. So I wonder how a value >> over 10^9 can get there. After all that is over 4TB so I somewhat doubt the >> task was ever able to dirty that much during its lifetime (but correct me >> if I'm wrong here, with UML and memory backed disks it is not totally >> impossible)... I went through the logic of handling ->nr_dirtied but >> I didn't find any obvious problem there. Hum, maybe one thing - what >> 'task_ratelimit' values do you see in balance_dirty_pages? If that one was >> huge, we could possibly accumulate huge current->nr_dirtied. > > Toralf, you can try a snipplet like this one to get the values printed out: > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index f5236f8..a80e520 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1463,6 +1463,12 @@ static void balance_dirty_pages(struct address_space *mapping, > goto pause; > } > period = HZ * pages_dirtied / task_ratelimit; > + > + { > + extern int printf(char *, ...); > + printf("---> task_ratelimit: %lu\n", task_ratelimit); > + } > + > pause = period; > if (current->dirty_paused_when) > pause -= now - current->dirty_paused_when; > > > Yes, printf(), not printk(). > Using this hack we print directly to host's stdout. :) > *head smack* ofc - works fine. So given this diff : iff --git a/mm/page-writeback.c b/mm/page-writeback.c index f5236f8..5a2c337 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1464,6 +1464,13 @@ static void balance_dirty_pages(struct address_space *mapping, } period = HZ * pages_dirtied / task_ratelimit; pause = period; + if (pause < 0) { + extern int printf(char *, ...); + printf("overflow : pause : %li\n", pause); + printf("overflow : pages_dirtied : %lu\n", pages_dirtied); + printf("overflow : task_ratelimit: %lu\n", task_ratelimit); + BUG_ON(1); + } if (current->dirty_paused_when) pause -= now - current->dirty_paused_when; /* @@ -1503,6 +1510,13 @@ static void balance_dirty_pages(struct address_space *mapping, } pause: + if (pause < 0) { + extern int printf(char *, ...); + printf("ick : pause : %li\n", pause); + printf("ick: pages_dirtied : %lu\n", pages_dirtied); + printf("ick: task_ratelimit: %lu\n", task_ratelimit); + BUG_ON(1); + } trace_balance_dirty_pages(bdi, dirty_thresh, background_thresh, I got this : * Starting local net.core.warnings = 0 [ ok ] ick : pause : -984 ick: pages_dirtied : 0 ick: task_ratelimit: 0 Kernel panic - not syncing: BUG! CPU: 0 PID: 1434 Comm: trinity-child2 Not tainted 3.12.0-rc4-00029-g0e7a3ed-dirty #12 47397c84 47397cb0 0841b5a0 084c30e8 085f76e0 084b4745 47397cbc 00000000 fffffc28 01fff278 085cb4a0 47397d2c 0841c5a0 084b4745 084c5398 000005ee 08432cf0 43e47600 471757f8 47397cf0 ffffff0c 47397cdc 5256d8b9 3417ec18 47397c5c: [<08060b2c>] show_stack+0x7c/0xd0 47397c7c: [<0841e34d>] dump_stack+0x26/0x28 47397c8c: [<0841b5a0>] panic+0x7a/0x180 47397cb4: [<0841c5a0>] balance_dirty_pages.isra.32+0x4e3/0x5ad 47397d30: [<080d3595>] balance_dirty_pages_ratelimited+0xf5/0x100 47397d44: [<080e4a3f>] __do_fault+0x3cf/0x440 47397d9c: [<080e6e0f>] handle_mm_fault+0xef/0x7c0 47397dec: [<080e7817>] __get_user_pages+0x227/0x420 47397e24: [<080e7ae3>] get_user_pages+0x63/0x70 47397e4c: [<08143dc6>] SyS_io_setup+0x3c6/0x760 47397eb0: [<08062984>] handle_syscall+0x64/0x80 47397ef0: [<08074fb5>] userspace+0x475/0x5f0 47397fec: [<0805f750>] fork_handler+0x60/0x70 47397ffc: [<00000000>] 0x0 EIP: 0073:[<40001282>] CPU: 0 Not tainted ESP: 007b:bfb348f8 EFLAGS: 00000246 Not tainted EAX: ffffffda EBX: 00001000 ECX: 080d0000 EDX: 80000048 ESI: 80fbff1f EDI: ffe02f77 EBP: 90f6e2a3 DS: 007b ES: 007b 47397c0c: [<0807947f>] show_regs+0x10f/0x120 47397c28: [<080623a9>] panic_exit+0x29/0x50 47397c38: [<0809ba86>] notifier_call_chain+0x36/0x60 47397c60: [<0809bba1>] __atomic_notifier_call_chain+0x21/0x30 47397c70: [<0809bbdf>] atomic_notifier_call_chain+0x2f/0x40 47397c8c: [<0841b5bc>] panic+0x96/0x180 47397cb4: [<0841c5a0>] balance_dirty_pages.isra.32+0x4e3/0x5ad 47397d30: [<080d3595>] balance_dirty_pages_ratelimited+0xf5/0x100 47397d44: [<080e4a3f>] __do_fault+0x3cf/0x440 47397d9c: [<080e6e0f>] handle_mm_fault+0xef/0x7c0 47397dec: [<080e7817>] __get_user_pages+0x227/0x420 47397e24: [<080e7ae3>] get_user_pages+0x63/0x70 47397e4c: [<08143dc6>] SyS_io_setup+0x3c6/0x760 47397eb0: [<08062984>] handle_syscall+0x64/0x80 47397ef0: [<08074fb5>] userspace+0x475/0x5f0 47397fec: [<0805f750>] fork_handler+0x60/0x70 47397ffc: [<00000000>] 0x0 /home/tfoerste/workspace/bin/start_uml.sh: line 115: 18718 Aborted (core dumped) $LINUX earlyprintk ubda=$ROOTFS ubdb=$SWAP eth0=$NET mem=$MEM $TTY umid=uml_$NAME rootfstype=ext4 "$ARGS" >From what I see there are 2 different types of issues - and this is an example of the other of both > Thanks, > //richard > -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: Fengguang Wu <fen...@in...> - 2013-10-11 01:17:02
|
On Thu, Oct 10, 2013 at 06:49:30PM +0200, Toralf Förster wrote: > On 10/10/2013 12:33 AM, Richard Weinberger wrote: > > Am 09.10.2013 23:47, schrieb Jan Kara: > >> On Wed 09-10-13 20:43:50, Richard Weinberger wrote: > >>> CC'ing mm folks. > >>> Please see below. > >> Added Fenguang to CC since he is the author of this code. > > > > Thx, get_maintainer.pl didn't list him. > > > >>> Am 09.10.2013 19:26, schrieb Toralf Förster: > >>>> On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote: > >>>>> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <tor...@gm...> wrote: > >>>>>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() > >>>>>>> asserts its strict positive?!? > >>>>>>> > >>>>>>> Can you please try the following instead of the BUG_ON(): > >>>>>>> > >>>>>>> if (pause < 0) { > >>>>>>> printk("pages_dirtied = %lu\n", pages_dirtied); > >>>>>>> printk("task_ratelimit = %lu\n", task_ratelimit); > >>>>>>> printk("pause = %ld\n", pause); > >>>>>>> } > >>>>>>> > >>>>>>> Gr{oetje,eeting}s, > >>>>>>> > >>>>>>> Geert > >>>>>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output. > >>>>>> As soon as the issue happens I do have a > >>>>>> > >>>>>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] > >>>>>> > >>>>>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able > >>>>>> to run very basic commands like a crash dump, sysrq ond so on. > >>>>> > >>>>> You may get an idea of the magnitude of pages_dirtied by using a chain of > >>>>> BUG_ON()s, like: > >>>>> > >>>>> BUG_ON(pages_dirtied > 2000000000); > >>>>> BUG_ON(pages_dirtied > 1000000000); > >>>>> BUG_ON(pages_dirtied > 100000000); > >>>>> BUG_ON(pages_dirtied > 10000000); > >>>>> BUG_ON(pages_dirtied > 1000000); > >>>>> > >>>>> Probably 1 million is already too much for normal operation? > >>>>> > >>>> period = HZ * pages_dirtied / task_ratelimit; > >>>> BUG_ON(pages_dirtied > 2000000000); > >>>> BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467 > >>> > >>> Summary for mm people: > >>> > >>> Toralf runs trinty on UML/i386. > >>> After some time pages_dirtied becomes very large. > >>> More than 1000000000 pages in this case. > >> Huh, this is really strange. pages_dirtied is passed into > >> balance_dirty_pages() from current->nr_dirtied. So I wonder how a value > >> over 10^9 can get there. After all that is over 4TB so I somewhat doubt the > >> task was ever able to dirty that much during its lifetime (but correct me > >> if I'm wrong here, with UML and memory backed disks it is not totally > >> impossible)... I went through the logic of handling ->nr_dirtied but > >> I didn't find any obvious problem there. Hum, maybe one thing - what > >> 'task_ratelimit' values do you see in balance_dirty_pages? If that one was > >> huge, we could possibly accumulate huge current->nr_dirtied. > > > > Toralf, you can try a snipplet like this one to get the values printed out: > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index f5236f8..a80e520 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1463,6 +1463,12 @@ static void balance_dirty_pages(struct address_space *mapping, > > goto pause; > > } > > period = HZ * pages_dirtied / task_ratelimit; > > + > > + { > > + extern int printf(char *, ...); > > + printf("---> task_ratelimit: %lu\n", task_ratelimit); > > + } > > + > > pause = period; > > if (current->dirty_paused_when) > > pause -= now - current->dirty_paused_when; > > > > > > Yes, printf(), not printk(). > > Using this hack we print directly to host's stdout. :) > > > *head smack* ofc - works fine. > So given this diff : > > iff --git a/mm/page-writeback.c b/mm/page-writeback.c > index f5236f8..5a2c337 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1464,6 +1464,13 @@ static void balance_dirty_pages(struct address_space *mapping, > } > period = HZ * pages_dirtied / task_ratelimit; > pause = period; > + if (pause < 0) { > + extern int printf(char *, ...); > + printf("overflow : pause : %li\n", pause); > + printf("overflow : pages_dirtied : %lu\n", pages_dirtied); > + printf("overflow : task_ratelimit: %lu\n", task_ratelimit); > + BUG_ON(1); > + } > if (current->dirty_paused_when) > pause -= now - current->dirty_paused_when; > /* > @@ -1503,6 +1510,13 @@ static void balance_dirty_pages(struct address_space *mapping, > } > > pause: > + if (pause < 0) { Oops, we got ick : pause : -984 here! Since pause is bounded by [min_pause, max_pause], it means something goes wrong with the bounds. Would you help print min_pause and max_pause as well? It seems there are "long <=> unsigned long" conversion problems in bdi_max_pause(), which might make max_pause a negative number. And the min_pause calculation is based on max_pause to some degree.. Thanks, Fengguang > + extern int printf(char *, ...); > + printf("ick : pause : %li\n", pause); > + printf("ick: pages_dirtied : %lu\n", pages_dirtied); > + printf("ick: task_ratelimit: %lu\n", task_ratelimit); > + BUG_ON(1); > + } > trace_balance_dirty_pages(bdi, > dirty_thresh, > background_thresh, > > > I got this : > > > > > * Starting local > net.core.warnings = 0 [ ok ] > ick : pause : -984 > ick: pages_dirtied : 0 > ick: task_ratelimit: 0 > Kernel panic - not syncing: BUG! > CPU: 0 PID: 1434 Comm: trinity-child2 Not tainted 3.12.0-rc4-00029-g0e7a3ed-dirty #12 > 47397c84 47397cb0 0841b5a0 084c30e8 085f76e0 084b4745 47397cbc 00000000 > fffffc28 01fff278 085cb4a0 47397d2c 0841c5a0 084b4745 084c5398 000005ee > 08432cf0 43e47600 471757f8 47397cf0 ffffff0c 47397cdc 5256d8b9 3417ec18 47397c5c: [<08060b2c>] show_stack+0x7c/0xd0 > 47397c7c: [<0841e34d>] dump_stack+0x26/0x28 > 47397c8c: [<0841b5a0>] panic+0x7a/0x180 > 47397cb4: [<0841c5a0>] balance_dirty_pages.isra.32+0x4e3/0x5ad > 47397d30: [<080d3595>] balance_dirty_pages_ratelimited+0xf5/0x100 > 47397d44: [<080e4a3f>] __do_fault+0x3cf/0x440 > 47397d9c: [<080e6e0f>] handle_mm_fault+0xef/0x7c0 > 47397dec: [<080e7817>] __get_user_pages+0x227/0x420 > 47397e24: [<080e7ae3>] get_user_pages+0x63/0x70 > 47397e4c: [<08143dc6>] SyS_io_setup+0x3c6/0x760 > 47397eb0: [<08062984>] handle_syscall+0x64/0x80 > 47397ef0: [<08074fb5>] userspace+0x475/0x5f0 > 47397fec: [<0805f750>] fork_handler+0x60/0x70 > 47397ffc: [<00000000>] 0x0 > > > EIP: 0073:[<40001282>] CPU: 0 Not tainted ESP: 007b:bfb348f8 EFLAGS: 00000246 > Not tainted > EAX: ffffffda EBX: 00001000 ECX: 080d0000 EDX: 80000048 > ESI: 80fbff1f EDI: ffe02f77 EBP: 90f6e2a3 DS: 007b ES: 007b > 47397c0c: [<0807947f>] show_regs+0x10f/0x120 > 47397c28: [<080623a9>] panic_exit+0x29/0x50 > 47397c38: [<0809ba86>] notifier_call_chain+0x36/0x60 > 47397c60: [<0809bba1>] __atomic_notifier_call_chain+0x21/0x30 > 47397c70: [<0809bbdf>] atomic_notifier_call_chain+0x2f/0x40 > 47397c8c: [<0841b5bc>] panic+0x96/0x180 > 47397cb4: [<0841c5a0>] balance_dirty_pages.isra.32+0x4e3/0x5ad > 47397d30: [<080d3595>] balance_dirty_pages_ratelimited+0xf5/0x100 > 47397d44: [<080e4a3f>] __do_fault+0x3cf/0x440 > 47397d9c: [<080e6e0f>] handle_mm_fault+0xef/0x7c0 > 47397dec: [<080e7817>] __get_user_pages+0x227/0x420 > 47397e24: [<080e7ae3>] get_user_pages+0x63/0x70 > 47397e4c: [<08143dc6>] SyS_io_setup+0x3c6/0x760 > 47397eb0: [<08062984>] handle_syscall+0x64/0x80 > 47397ef0: [<08074fb5>] userspace+0x475/0x5f0 > 47397fec: [<0805f750>] fork_handler+0x60/0x70 > 47397ffc: [<00000000>] 0x0 > > /home/tfoerste/workspace/bin/start_uml.sh: line 115: 18718 Aborted (core dumped) $LINUX earlyprintk ubda=$ROOTFS ubdb=$SWAP eth0=$NET mem=$MEM $TTY umid=uml_$NAME rootfstype=ext4 "$ARGS" > > > >From what I see there are 2 different types of issues - and this is an example of the other of both > > > Thanks, > > //richard > > > > > -- > MfG/Sincerely > Toralf Förster > pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: Toralf F. <tor...@gm...> - 2013-10-08 19:57:04
|
Well, the quick&dirty hack below at least works for the moment to overcome the soft lookup and the hang/unresponsiveness of the 32 bit user mode linux guest : diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f5236f8..7e9483c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1503,6 +1503,8 @@ static void balance_dirty_pages(struct address_space *mapping, } pause: + if (pause < 0) + break; trace_balance_dirty_pages(bdi, dirty_thresh, background_thresh, I'm not proud of it but after starring at the source code in mm/page-writeback.c too often and too long currently I don't have any better clue. WRT to debug of the culprit: neither printk nor friends worked (maybe b/c the affected process is just hanging ?) and BUG_ON doesn't gave me any new clues. On 10/06/2013 10:26 PM, Geert Uytterhoeven wrote: > On Sun, Oct 6, 2013 at 10:08 PM, Toralf Förster <tor...@gm...> wrote: >> On 10/06/2013 08:38 PM, Geert Uytterhoeven wrote: >>> On Sun, Oct 6, 2013 at 4:17 PM, Toralf Förster <tor...@gm...> wrote: >>>> The UML stopped here : >>>> ... >>>> if (unlikely(task_ratelimit == 0)) { >>>> period = max_pause; >>>> pause = max_pause; >>>> BUG_ON(pause < 0); >>>> goto pause; >>>> } >>>> BUG_ON(pages_dirtied < 0); >>>> BUG_ON(task_ratelimit < 0); >>>> period = HZ * pages_dirtied / task_ratelimit; >>>> BUG_ON(period < 0); <----------------------here >>> >>> So pages_dirtied becomes that big compared to task_ratelimit (both are >>> "unsigned long"), that period (which is "long", just like "pause") overflows >>> into a negative number. >>> >>> This is indeed much more likely to happen on 32-bit. >>> >>>> The back trace is : >>> >>>> #9 0x08411c64 in balance_dirty_pages (pages_dirtied=9, mapping=<optimized out>) at mm/page-writeback.c:1471 >>> >>> But here pages_dirtied is only 9?? > >> Well, this points to an overflow or ? : > > Negative indicates an overflow, but pages_dirtied doesn't. > >> tfoerste@n22 ~/devel/linux $ nl -ba mm/page-writeback.c | grep -A 5 -B 5 1468 >> 1463 BUG_ON(pause < 0); >> 1464 goto pause; >> 1465 } >> 1466 period = HZ * pages_dirtied / task_ratelimit; >> 1467 pause = period; >> 1468 BUG_ON(pause < 0 && pages_dirtied > 0 && task_ratelimit > 0); >> 1469 if (current->dirty_paused_when) >> 1470 pause -= now - current->dirty_paused_when; >> 1471 /* >> 1472 * For less than 1s think time (ext3/4 may block the dirtier >> 1473 * for up to 800ms from time to time on 1-HDD; so does xfs, >> >> >> and the back trace is : >> >> #9 0x08411c6c in balance_dirty_pages (pages_dirtied=0, mapping=<optimized out>) at mm/page-writeback.c:1468 > > Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() > asserts its strict positive?!? > > Can you please try the following instead of the BUG_ON(): > > if (pause < 0) { > printk("pages_dirtied = %lu\n", pages_dirtied); > printk("task_ratelimit = %lu\n", task_ratelimit); > printk("pause = %ld\n", pause); > } > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li... > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: <st...@ni...> - 2013-10-09 10:56:31
|
Den 2013-10-08 21:56, skrev Toralf Förster: > Well, the quick&dirty hack below at least works for the moment to > overcome the soft lookup and the hang/unresponsiveness of the 32 bit > user mode linux guest : > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index f5236f8..7e9483c 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1503,6 +1503,8 @@ static void balance_dirty_pages(struct > address_space *mapping, > } > > pause: > + if (pause < 0) > + break; can we combine in the debug printk() mentioned below, that prints out pages_dirtied and task_ratelimit: >> if (pause < 0) { >> printk("pages_dirtied = %lu\n", pages_dirtied); >> printk("task_ratelimit = %lu\n", task_ratelimit); >> printk("pause = %ld\n", pause); break; >> } I have not read the source, so what is the definition of HZ? Are pages_dirtied and task_ratelimit atomically safe under UML here, and are what datatype are they? (unsigned long)? The defininion of HZ will all in all decide what kind of multiplication and division we actually perform here; so let us hope it is not something like (signed char) >>>>> BUG_ON(pages_dirtied < 0); >>>>> BUG_ON(task_ratelimit < 0); These will never hit if pages_dirtied and task_ratelimit are unsigned. Stian Skjelstad |
From: Jan K. <ja...@su...> - 2013-10-09 22:03:51
|
On Wed 09-10-13 20:43:50, Richard Weinberger wrote: > CC'ing mm folks. > Please see below. Added Fenguang to CC since he is the author of this code. > Am 09.10.2013 19:26, schrieb Toralf Förster: > > On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote: > >> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <tor...@gm...> wrote: > >>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() > >>>> asserts its strict positive?!? > >>>> > >>>> Can you please try the following instead of the BUG_ON(): > >>>> > >>>> if (pause < 0) { > >>>> printk("pages_dirtied = %lu\n", pages_dirtied); > >>>> printk("task_ratelimit = %lu\n", task_ratelimit); > >>>> printk("pause = %ld\n", pause); > >>>> } > >>>> > >>>> Gr{oetje,eeting}s, > >>>> > >>>> Geert > >>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output. > >>> As soon as the issue happens I do have a > >>> > >>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] > >>> > >>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able > >>> to run very basic commands like a crash dump, sysrq ond so on. > >> > >> You may get an idea of the magnitude of pages_dirtied by using a chain of > >> BUG_ON()s, like: > >> > >> BUG_ON(pages_dirtied > 2000000000); > >> BUG_ON(pages_dirtied > 1000000000); > >> BUG_ON(pages_dirtied > 100000000); > >> BUG_ON(pages_dirtied > 10000000); > >> BUG_ON(pages_dirtied > 1000000); > >> > >> Probably 1 million is already too much for normal operation? > >> > > period = HZ * pages_dirtied / task_ratelimit; > > BUG_ON(pages_dirtied > 2000000000); > > BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467 > > Summary for mm people: > > Toralf runs trinty on UML/i386. > After some time pages_dirtied becomes very large. > More than 1000000000 pages in this case. Huh, this is really strange. pages_dirtied is passed into balance_dirty_pages() from current->nr_dirtied. So I wonder how a value over 10^9 can get there. After all that is over 4TB so I somewhat doubt the task was ever able to dirty that much during its lifetime (but correct me if I'm wrong here, with UML and memory backed disks it is not totally impossible)... I went through the logic of handling ->nr_dirtied but I didn't find any obvious problem there. Hum, maybe one thing - what 'task_ratelimit' values do you see in balance_dirty_pages? If that one was huge, we could possibly accumulate huge current->nr_dirtied. > Thus, period = HZ * pages_dirtied / task_ratelimit overflows > and period/pause becomes extremely large. > > It looks like io_schedule_timeout() get's called with a very large timeout. > I don't know why "if (unlikely(pause > max_pause)) {" does not help. > > > > the back trace is : > > > > tfoerste@n22 ~/devel/linux $ gdb --core=/mnt/ramdisk/core /home/tfoerste/devel/linux/linux -batch -ex bt > > [New LWP 6911] > > Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'. > > Program terminated with signal 6, Aborted. > > #0 0xb77a7424 in __kernel_vsyscall () > > #0 0xb77a7424 in __kernel_vsyscall () > > #1 0x083bdf35 in kill () > > #2 0x0807296d in uml_abort () at arch/um/os-Linux/util.c:93 > > #3 0x08072ca5 in os_dump_core () at arch/um/os-Linux/util.c:148 > > #4 0x080623c4 in panic_exit (self=0x85c1558 <panic_exit_notifier>, unused1=0, unused2=0x85f76e0 <buf.16221>) at arch/um/kernel/um_arch.c:240 > > #5 0x0809ba86 in notifier_call_chain (nl=0x0, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93 > > #6 0x0809bba1 in __atomic_notifier_call_chain (nh=0x85f76c4 <panic_notifier_list>, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=0, nr_calls=0x0) at kernel/notifier.c:182 > > #7 0x0809bbdf in atomic_notifier_call_chain (nh=0x0, val=0, v=0x0) at kernel/notifier.c:191 > > #8 0x0841b5bc in panic (fmt=0x0) at kernel/panic.c:130 > > #9 0x0841c470 in balance_dirty_pages (pages_dirtied=23, mapping=<optimized out>) at mm/page-writeback.c:1467 > > #10 0x080d3595 in balance_dirty_pages_ratelimited (mapping=0x6) at mm/page-writeback.c:1661 > > #11 0x080e4a3f in __do_fault (mm=0x45ba3600, vma=0x48777b90, address=1084678144, pmd=0x0, pgoff=0, flags=0, orig_pte=<incomplete type>) at mm/memory.c:3452 > > #12 0x080e6e0f in do_linear_fault (orig_pte=..., flags=<optimized out>, pmd=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>, page_table=<optimized out>) at mm/memory.c:3486 > > #13 handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3710 > > #14 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3845 > > #15 handle_mm_fault (mm=0x45ba3600, vma=0x487034c8, address=1084678144, flags=1) at mm/memory.c:3868 > > #16 0x080e7817 in __get_user_pages (tsk=0x48705800, mm=0x45ba3600, start=1084678144, nr_pages=1025, gup_flags=519, pages=0x48558000, vmas=0x0, nonblocking=0x0) at mm/memory.c:1822 > > #17 0x080e7ae3 in get_user_pages (tsk=0x0, mm=0x0, start=0, nr_pages=0, write=1, force=0, pages=0x48777b90, vmas=0x6) at mm/memory.c:2019 > > #18 0x08143dc6 in aio_setup_ring (ctx=<optimized out>) at fs/aio.c:340 > > #19 ioctx_alloc (nr_events=<optimized out>) at fs/aio.c:605 > > #20 SYSC_io_setup (ctxp=<optimized out>, nr_events=<optimized out>) at fs/aio.c:1122 > > #21 SyS_io_setup (nr_events=65535, ctxp=135081984) at fs/aio.c:1105 > > #22 0x08062984 in handle_syscall (r=0x487059d4) at arch/um/kernel/skas/syscall.c:35 > > #23 0x08074fb5 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Linux/skas/process.c:198 > > #24 userspace (regs=0x487059d4) at arch/um/os-Linux/skas/process.c:431 > > #25 0x0805f750 in fork_handler () at arch/um/kernel/process.c:160 > > #26 0x00000000 in ?? () Honza -- Jan Kara <ja...@su...> SUSE Labs, CR |
From: Fengguang Wu <fen...@in...> - 2013-10-10 02:46:30
|
On Wed, Oct 09, 2013 at 11:47:33PM +0200, Jan Kara wrote: > On Wed 09-10-13 20:43:50, Richard Weinberger wrote: > > CC'ing mm folks. > > Please see below. > Added Fenguang to CC since he is the author of this code. Thanks! > > Am 09.10.2013 19:26, schrieb Toralf Förster: > > > On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote: > > >> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <tor...@gm...> wrote: > > >>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() > > >>>> asserts its strict positive?!? > > >>>> > > >>>> Can you please try the following instead of the BUG_ON(): > > >>>> > > >>>> if (pause < 0) { > > >>>> printk("pages_dirtied = %lu\n", pages_dirtied); > > >>>> printk("task_ratelimit = %lu\n", task_ratelimit); > > >>>> printk("pause = %ld\n", pause); > > >>>> } > > >>>> > > >>>> Gr{oetje,eeting}s, > > >>>> > > >>>> Geert > > >>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output. > > >>> As soon as the issue happens I do have a > > >>> > > >>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] > > >>> > > >>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able > > >>> to run very basic commands like a crash dump, sysrq ond so on. > > >> > > >> You may get an idea of the magnitude of pages_dirtied by using a chain of > > >> BUG_ON()s, like: > > >> > > >> BUG_ON(pages_dirtied > 2000000000); > > >> BUG_ON(pages_dirtied > 1000000000); > > >> BUG_ON(pages_dirtied > 100000000); > > >> BUG_ON(pages_dirtied > 10000000); > > >> BUG_ON(pages_dirtied > 1000000); > > >> > > >> Probably 1 million is already too much for normal operation? > > >> > > > period = HZ * pages_dirtied / task_ratelimit; > > > BUG_ON(pages_dirtied > 2000000000); > > > BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467 > > > > Summary for mm people: > > > > Toralf runs trinty on UML/i386. > > After some time pages_dirtied becomes very large. > > More than 1000000000 pages in this case. > Huh, this is really strange. pages_dirtied is passed into > balance_dirty_pages() from current->nr_dirtied. So I wonder how a value > over 10^9 can get there. I noticed aio_setup_ring() in the call trace and find it recently added a SetPageDirty() call in a loop by commit 36bc08cc01 ("fs/aio: Add support to aio ring pages migration"). So added CC to its authors. > After all that is over 4TB so I somewhat doubt the > task was ever able to dirty that much during its lifetime (but correct me > if I'm wrong here, with UML and memory backed disks it is not totally > impossible)... I went through the logic of handling ->nr_dirtied but > I didn't find any obvious problem there. Hum, maybe one thing - what > 'task_ratelimit' values do you see in balance_dirty_pages? If that one was > huge, we could possibly accumulate huge current->nr_dirtied. > > > Thus, period = HZ * pages_dirtied / task_ratelimit overflows > > and period/pause becomes extremely large. Yes, that's possible. > > It looks like io_schedule_timeout() get's called with a very large timeout. > > I don't know why "if (unlikely(pause > max_pause)) {" does not help. The test will sure work and limit pause to <= max_pause. However it's very possible balance_dirty_pages() cannot break out of the loop (or being called repeatedly) and block the task. I'm afraid there are no one to clear the dirty pages, which makes balance_dirty_pages() waiting for ever. Thanks, Fengguang > > > > > the back trace is : > > > > > > tfoerste@n22 ~/devel/linux $ gdb --core=/mnt/ramdisk/core /home/tfoerste/devel/linux/linux -batch -ex bt > > > [New LWP 6911] > > > Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'. > > > Program terminated with signal 6, Aborted. > > > #0 0xb77a7424 in __kernel_vsyscall () > > > #0 0xb77a7424 in __kernel_vsyscall () > > > #1 0x083bdf35 in kill () > > > #2 0x0807296d in uml_abort () at arch/um/os-Linux/util.c:93 > > > #3 0x08072ca5 in os_dump_core () at arch/um/os-Linux/util.c:148 > > > #4 0x080623c4 in panic_exit (self=0x85c1558 <panic_exit_notifier>, unused1=0, unused2=0x85f76e0 <buf.16221>) at arch/um/kernel/um_arch.c:240 > > > #5 0x0809ba86 in notifier_call_chain (nl=0x0, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93 > > > #6 0x0809bba1 in __atomic_notifier_call_chain (nh=0x85f76c4 <panic_notifier_list>, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=0, nr_calls=0x0) at kernel/notifier.c:182 > > > #7 0x0809bbdf in atomic_notifier_call_chain (nh=0x0, val=0, v=0x0) at kernel/notifier.c:191 > > > #8 0x0841b5bc in panic (fmt=0x0) at kernel/panic.c:130 > > > #9 0x0841c470 in balance_dirty_pages (pages_dirtied=23, mapping=<optimized out>) at mm/page-writeback.c:1467 > > > #10 0x080d3595 in balance_dirty_pages_ratelimited (mapping=0x6) at mm/page-writeback.c:1661 > > > #11 0x080e4a3f in __do_fault (mm=0x45ba3600, vma=0x48777b90, address=1084678144, pmd=0x0, pgoff=0, flags=0, orig_pte=<incomplete type>) at mm/memory.c:3452 > > > #12 0x080e6e0f in do_linear_fault (orig_pte=..., flags=<optimized out>, pmd=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>, page_table=<optimized out>) at mm/memory.c:3486 > > > #13 handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3710 > > > #14 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3845 > > > #15 handle_mm_fault (mm=0x45ba3600, vma=0x487034c8, address=1084678144, flags=1) at mm/memory.c:3868 > > > #16 0x080e7817 in __get_user_pages (tsk=0x48705800, mm=0x45ba3600, start=1084678144, nr_pages=1025, gup_flags=519, pages=0x48558000, vmas=0x0, nonblocking=0x0) at mm/memory.c:1822 > > > #17 0x080e7ae3 in get_user_pages (tsk=0x0, mm=0x0, start=0, nr_pages=0, write=1, force=0, pages=0x48777b90, vmas=0x6) at mm/memory.c:2019 > > > #18 0x08143dc6 in aio_setup_ring (ctx=<optimized out>) at fs/aio.c:340 > > > #19 ioctx_alloc (nr_events=<optimized out>) at fs/aio.c:605 > > > #20 SYSC_io_setup (ctxp=<optimized out>, nr_events=<optimized out>) at fs/aio.c:1122 > > > #21 SyS_io_setup (nr_events=65535, ctxp=135081984) at fs/aio.c:1105 > > > #22 0x08062984 in handle_syscall (r=0x487059d4) at arch/um/kernel/skas/syscall.c:35 > > > #23 0x08074fb5 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Linux/skas/process.c:198 > > > #24 userspace (regs=0x487059d4) at arch/um/os-Linux/skas/process.c:431 > > > #25 0x0805f750 in fork_handler () at arch/um/kernel/process.c:160 > > > #26 0x00000000 in ?? () > > Honza > -- > Jan Kara <ja...@su...> > SUSE Labs, CR |
From: Richard W. <ri...@no...> - 2013-10-09 22:33:22
|
Am 09.10.2013 23:47, schrieb Jan Kara: > On Wed 09-10-13 20:43:50, Richard Weinberger wrote: >> CC'ing mm folks. >> Please see below. > Added Fenguang to CC since he is the author of this code. Thx, get_maintainer.pl didn't list him. >> Am 09.10.2013 19:26, schrieb Toralf Förster: >>> On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote: >>>> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <tor...@gm...> wrote: >>>>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON() >>>>>> asserts its strict positive?!? >>>>>> >>>>>> Can you please try the following instead of the BUG_ON(): >>>>>> >>>>>> if (pause < 0) { >>>>>> printk("pages_dirtied = %lu\n", pages_dirtied); >>>>>> printk("task_ratelimit = %lu\n", task_ratelimit); >>>>>> printk("pause = %ld\n", pause); >>>>>> } >>>>>> >>>>>> Gr{oetje,eeting}s, >>>>>> >>>>>> Geert >>>>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output. >>>>> As soon as the issue happens I do have a >>>>> >>>>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] >>>>> >>>>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able >>>>> to run very basic commands like a crash dump, sysrq ond so on. >>>> >>>> You may get an idea of the magnitude of pages_dirtied by using a chain of >>>> BUG_ON()s, like: >>>> >>>> BUG_ON(pages_dirtied > 2000000000); >>>> BUG_ON(pages_dirtied > 1000000000); >>>> BUG_ON(pages_dirtied > 100000000); >>>> BUG_ON(pages_dirtied > 10000000); >>>> BUG_ON(pages_dirtied > 1000000); >>>> >>>> Probably 1 million is already too much for normal operation? >>>> >>> period = HZ * pages_dirtied / task_ratelimit; >>> BUG_ON(pages_dirtied > 2000000000); >>> BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467 >> >> Summary for mm people: >> >> Toralf runs trinty on UML/i386. >> After some time pages_dirtied becomes very large. >> More than 1000000000 pages in this case. > Huh, this is really strange. pages_dirtied is passed into > balance_dirty_pages() from current->nr_dirtied. So I wonder how a value > over 10^9 can get there. After all that is over 4TB so I somewhat doubt the > task was ever able to dirty that much during its lifetime (but correct me > if I'm wrong here, with UML and memory backed disks it is not totally > impossible)... I went through the logic of handling ->nr_dirtied but > I didn't find any obvious problem there. Hum, maybe one thing - what > 'task_ratelimit' values do you see in balance_dirty_pages? If that one was > huge, we could possibly accumulate huge current->nr_dirtied. Toralf, you can try a snipplet like this one to get the values printed out: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f5236f8..a80e520 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1463,6 +1463,12 @@ static void balance_dirty_pages(struct address_space *mapping, goto pause; } period = HZ * pages_dirtied / task_ratelimit; + + { + extern int printf(char *, ...); + printf("---> task_ratelimit: %lu\n", task_ratelimit); + } + pause = period; if (current->dirty_paused_when) pause -= now - current->dirty_paused_when; Yes, printf(), not printk(). Using this hack we print directly to host's stdout. :) Thanks, //richard |