From: Joel F. <agn...@gm...> - 2018-03-28 10:28:11
|
Hi, I wrote a kernel module to play with hrtimer subsystem and it hangs with UML, Any ideas on why it may be hanging? It doesn't hang on any of my other machines. Hopefully I'm not doing something stupid, but I don't think I am.. It appears the timer handler does fire. However, the UML process is continously doing a kill(SIGALRM) to the host, and the shell hangs. Here's the continous strace output of UML's process at the time of the hang: https://hastebin.com/ikehadapon.sql To build UML, I do: make ARCH=um x86_64_defconfig UML kernel version is v4.16-rc4 Here's the module I'm loading: static enum hrtimer_restart bigtimer_handle(struct hrtimer *timer) { printk(KERN_ERR "timer fired 2\n"); spin_lock(&il->biglock); spin_unlock(&il->biglock); release_now = 1; return HRTIMER_NORESTART; } void init_bigstr(struct bigstr *b) { spin_lock_init(&b->biglock); } static int __init test_module_init(void) { struct bigstr b1, b2; struct hrtimer *timer; release_now = 0; init_bigstr(&b1); init_bigstr(&b2); timer = &bigtimer; timer->debug = 1; hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); timer->function = bigtimer_handle; il = &b2; spin_lock(&b1.biglock); printk(KERN_ERR "Starting timer\n"); hrtimer_start(timer, ns_to_ktime(50000ULL), HRTIMER_MODE_REL_PINNED); while(release_now == 0); spin_unlock(&b1.biglock); return -1; } Thanks for any debug thoughts! I'll also try to hook up gdb tomorrow and see if I find something.. Regards, - Joel |
From: Richard W. <ri...@no...> - 2018-03-28 11:22:45
|
Am Mittwoch, 28. März 2018, 12:28:02 CEST schrieb Joel Fernandes: > Hi, > > I wrote a kernel module to play with hrtimer subsystem and it hangs > with UML, Any ideas on why it may be hanging? It doesn't hang on any > of my other machines. Hopefully I'm not doing something stupid, but I > don't think I am.. > > It appears the timer handler does fire. However, the UML process is > continously doing a kill(SIGALRM) to the host, and the shell hangs. > Here's the continous strace output of UML's process at the time of the > hang: https://hastebin.com/ikehadapon.sql > > To build UML, I do: > make ARCH=um x86_64_defconfig > > UML kernel version is v4.16-rc4 > > Here's the module I'm loading: Please share the full sources that compile. The I can have a look. Thanks, //richard |
From: Geert U. <ge...@li...> - 2018-03-28 13:11:39
|
On Wed, Mar 28, 2018 at 12:28 PM, Joel Fernandes <agn...@gm...> wrote: > while(release_now == 0); while (release_now == 0) cpu_relax(); Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li... In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds |
From: Richard W. <ri...@no...> - 2018-03-28 13:19:22
|
Am Mittwoch, 28. März 2018, 15:11:29 CEST schrieb Geert Uytterhoeven: > On Wed, Mar 28, 2018 at 12:28 PM, Joel Fernandes <agn...@gm...> wrote: > > while(release_now == 0); > > while (release_now == 0) > cpu_relax(); Not sure whether a cpu_relax() fixes the problem. I guess the root of the problem is that UML is UP and non-preemptive. Therefore the loop is never interrupted. To verify I asked for the full source. Thanks, //richard |
From: Joel F. <agn...@gm...> - 2018-03-28 22:19:49
Attachments:
ldtest1.c
|
Thanks for the quick reply. On Wed, Mar 28, 2018 at 6:19 AM, Richard Weinberger <ri...@no...> wrote: > Am Mittwoch, 28. März 2018, 15:11:29 CEST schrieb Geert Uytterhoeven: >> On Wed, Mar 28, 2018 at 12:28 PM, Joel Fernandes <agn...@gm...> > wrote: >> > while(release_now == 0); >> >> while (release_now == 0) >> cpu_relax(); > > Not sure whether a cpu_relax() fixes the problem. > I guess the root of the problem is that UML is UP and non-preemptive. > Therefore the loop is never interrupted. > To verify I asked for the full source. > cpu_relax actually worked! Any thoughts on why it helps? Even if its non-preemptive, I did receive the timer interrupt, so I expected the variable to be set. Module is attached. Thanks, -Joel |
From: Richard W. <ri...@no...> - 2018-03-28 22:35:23
|
Am Donnerstag, 29. März 2018, 00:19:39 CEST schrieb Joel Fernandes: > Thanks for the quick reply. > > On Wed, Mar 28, 2018 at 6:19 AM, Richard Weinberger <ri...@no...> wrote: > > Am Mittwoch, 28. März 2018, 15:11:29 CEST schrieb Geert Uytterhoeven: > >> On Wed, Mar 28, 2018 at 12:28 PM, Joel Fernandes <agn...@gm...> > > wrote: > >> > while(release_now == 0); > >> > >> while (release_now == 0) > >> cpu_relax(); > > > > Not sure whether a cpu_relax() fixes the problem. > > I guess the root of the problem is that UML is UP and non-preemptive. > > Therefore the loop is never interrupted. > > To verify I asked for the full source. > > > > cpu_relax actually worked! Interesting. > Any thoughts on why it helps? Even if its non-preemptive, I did > receive the timer interrupt, so I expected the variable to be set. Timers trigger also with preempt off, I forgot... I think the cpu_relax() issues internally a barrier such that the release_now variable is read again. Can you try barrier() instead of cpu_relax()? I bet it works too. Same if you mark release_now as volatile. Thanks, //richard |
From: Geert U. <ge...@li...> - 2018-03-29 06:04:30
|
On Thu, Mar 29, 2018 at 12:35 AM, Richard Weinberger <ri...@no...> wrote: > Am Donnerstag, 29. März 2018, 00:19:39 CEST schrieb Joel Fernandes: >> On Wed, Mar 28, 2018 at 6:19 AM, Richard Weinberger <ri...@no...> wrote: >> > Am Mittwoch, 28. März 2018, 15:11:29 CEST schrieb Geert Uytterhoeven: >> >> On Wed, Mar 28, 2018 at 12:28 PM, Joel Fernandes <agn...@gm...> >> > wrote: >> >> > while(release_now == 0); >> >> >> >> while (release_now == 0) >> >> cpu_relax(); >> > >> > Not sure whether a cpu_relax() fixes the problem. >> > I guess the root of the problem is that UML is UP and non-preemptive. >> > Therefore the loop is never interrupted. >> > To verify I asked for the full source. >> > >> >> cpu_relax actually worked! > > Interesting. > >> Any thoughts on why it helps? Even if its non-preemptive, I did >> receive the timer interrupt, so I expected the variable to be set. > > Timers trigger also with preempt off, I forgot... > I think the cpu_relax() issues internally a barrier such that the > release_now variable is read again. > Can you try barrier() instead of cpu_relax()? I bet it works too. > Same if you mark release_now as volatile. Without cpu_relax()/barrier()/volatile, the compiler can assume release_now never changes, and thus may "optimize" the loop to an infinite loop. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li... In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds |
From: Joel F. <agn...@gm...> - 2018-03-29 20:20:56
|
On Wed, Mar 28, 2018 at 11:04 PM, Geert Uytterhoeven <ge...@li...> wrote: > On Thu, Mar 29, 2018 at 12:35 AM, Richard Weinberger <ri...@no...> wrote: >> Am Donnerstag, 29. März 2018, 00:19:39 CEST schrieb Joel Fernandes: >>> On Wed, Mar 28, 2018 at 6:19 AM, Richard Weinberger <ri...@no...> wrote: >>> > Am Mittwoch, 28. März 2018, 15:11:29 CEST schrieb Geert Uytterhoeven: >>> >> On Wed, Mar 28, 2018 at 12:28 PM, Joel Fernandes <agn...@gm...> >>> > wrote: >>> >> > while(release_now == 0); >>> >> >>> >> while (release_now == 0) >>> >> cpu_relax(); >>> > >>> > Not sure whether a cpu_relax() fixes the problem. >>> > I guess the root of the problem is that UML is UP and non-preemptive. >>> > Therefore the loop is never interrupted. >>> > To verify I asked for the full source. >>> > >>> >>> cpu_relax actually worked! >> >> Interesting. >> >>> Any thoughts on why it helps? Even if its non-preemptive, I did >>> receive the timer interrupt, so I expected the variable to be set. >> >> Timers trigger also with preempt off, I forgot... >> I think the cpu_relax() issues internally a barrier such that the >> release_now variable is read again. >> Can you try barrier() instead of cpu_relax()? I bet it works too. >> Same if you mark release_now as volatile. > > Without cpu_relax()/barrier()/volatile, the compiler can assume release_now > never changes, and thus may "optimize" the loop to an infinite loop. > Thanks a lot! I am wondering why the same compiler works when running the test for a regular image. Maybe different compiler flags. Anyway good to learn this. Also one more slightly OT question, why is UML only doing UP ? Is it extremely hard to do SMP for UML? I also happen to notice Qemu has one thread per emulated core... thanks, - Joel |
From: Richard W. <ri...@no...> - 2018-03-29 20:34:55
|
Am Donnerstag, 29. März 2018, 22:20:47 CEST schrieb Joel Fernandes: > Thanks a lot! I am wondering why the same compiler works when running > the test for a regular image. Maybe different compiler flags. Anyway > good to learn this. > > Also one more slightly OT question, why is UML only doing UP ? Is it > extremely hard to do SMP for UML? Long story short, nobody implemented SMP so far. :-) Because SKAS3/0 we had a SMP implementation of TT mode. In terms of UML implementing SMP means having multiple threads that handle the userspace loop in arch/um/os-Linux/skas/process.c. We could also do a poor man's SMP implementation first, where only user processes run in parallel. IOW userspace() in arch/um/os-Linux/skas/process.c is still a single thread but it let's run up to N user space thread and only if the call into the kernel we degrade to UP. Adding SMP is not extremely hard but it requires a lot of re-work of the UML core and introduces tons of new issues. That said, volunteers are welcome! Thanks, //richard |