You can subscribe to this list here.
1999 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(8) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2000 |
Jan
(19) |
Feb
(11) |
Mar
(56) |
Apr
(31) |
May
(37) |
Jun
(21) |
Jul
(30) |
Aug
(31) |
Sep
(25) |
Oct
(60) |
Nov
(28) |
Dec
(57) |
2001 |
Jan
(47) |
Feb
(119) |
Mar
(279) |
Apr
(198) |
May
(336) |
Jun
(201) |
Jul
(136) |
Aug
(123) |
Sep
(123) |
Oct
(185) |
Nov
(66) |
Dec
(97) |
2002 |
Jan
(318) |
Feb
(101) |
Mar
(167) |
Apr
(233) |
May
(249) |
Jun
(134) |
Jul
(195) |
Aug
(99) |
Sep
(278) |
Oct
(435) |
Nov
(326) |
Dec
(325) |
2003 |
Jan
(214) |
Feb
(309) |
Mar
(142) |
Apr
(141) |
May
(210) |
Jun
(86) |
Jul
(133) |
Aug
(218) |
Sep
(315) |
Oct
(152) |
Nov
(162) |
Dec
(288) |
2004 |
Jan
(277) |
Feb
(267) |
Mar
(182) |
Apr
(168) |
May
(254) |
Jun
(131) |
Jul
(168) |
Aug
(177) |
Sep
(262) |
Oct
(309) |
Nov
(262) |
Dec
(255) |
2005 |
Jan
(258) |
Feb
(169) |
Mar
(282) |
Apr
(208) |
May
(262) |
Jun
(187) |
Jul
(207) |
Aug
(171) |
Sep
(283) |
Oct
(216) |
Nov
(307) |
Dec
(107) |
2006 |
Jan
(207) |
Feb
(82) |
Mar
(192) |
Apr
(165) |
May
(121) |
Jun
(108) |
Jul
(120) |
Aug
(126) |
Sep
(101) |
Oct
(216) |
Nov
(95) |
Dec
(125) |
2007 |
Jan
(176) |
Feb
(117) |
Mar
(240) |
Apr
(120) |
May
(81) |
Jun
(82) |
Jul
(62) |
Aug
(120) |
Sep
(103) |
Oct
(109) |
Nov
(181) |
Dec
(87) |
2008 |
Jan
(145) |
Feb
(69) |
Mar
(31) |
Apr
(98) |
May
(91) |
Jun
(43) |
Jul
(68) |
Aug
(135) |
Sep
(48) |
Oct
(18) |
Nov
(29) |
Dec
(16) |
2009 |
Jan
(26) |
Feb
(15) |
Mar
(83) |
Apr
(39) |
May
(23) |
Jun
(35) |
Jul
(11) |
Aug
(3) |
Sep
(11) |
Oct
(2) |
Nov
(28) |
Dec
(8) |
2010 |
Jan
(4) |
Feb
(40) |
Mar
(4) |
Apr
(46) |
May
(35) |
Jun
(46) |
Jul
(10) |
Aug
(4) |
Sep
(50) |
Oct
(70) |
Nov
(31) |
Dec
(24) |
2011 |
Jan
(17) |
Feb
(8) |
Mar
(35) |
Apr
(50) |
May
(75) |
Jun
(55) |
Jul
(72) |
Aug
(272) |
Sep
(10) |
Oct
(9) |
Nov
(11) |
Dec
(15) |
2012 |
Jan
(36) |
Feb
(49) |
Mar
(54) |
Apr
(47) |
May
(8) |
Jun
(82) |
Jul
(20) |
Aug
(50) |
Sep
(51) |
Oct
(20) |
Nov
(10) |
Dec
(25) |
2013 |
Jan
(34) |
Feb
(4) |
Mar
(24) |
Apr
(40) |
May
(101) |
Jun
(30) |
Jul
(55) |
Aug
(84) |
Sep
(53) |
Oct
(49) |
Nov
(61) |
Dec
(36) |
2014 |
Jan
(26) |
Feb
(22) |
Mar
(30) |
Apr
(4) |
May
(43) |
Jun
(33) |
Jul
(44) |
Aug
(61) |
Sep
(46) |
Oct
(154) |
Nov
(16) |
Dec
(12) |
2015 |
Jan
(18) |
Feb
(2) |
Mar
(122) |
Apr
(23) |
May
(56) |
Jun
(29) |
Jul
(35) |
Aug
(15) |
Sep
|
Oct
(45) |
Nov
(94) |
Dec
(38) |
2016 |
Jan
(50) |
Feb
(39) |
Mar
(39) |
Apr
(1) |
May
(14) |
Jun
(12) |
Jul
(19) |
Aug
(12) |
Sep
(9) |
Oct
(1) |
Nov
(13) |
Dec
(7) |
2017 |
Jan
(6) |
Feb
(1) |
Mar
(16) |
Apr
(5) |
May
(61) |
Jun
(18) |
Jul
(43) |
Aug
(1) |
Sep
(8) |
Oct
(25) |
Nov
(30) |
Dec
(6) |
2018 |
Jan
(5) |
Feb
(2) |
Mar
(25) |
Apr
(15) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
(2) |
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Anton I. <ant...@ko...> - 2015-11-20 16:43:26
|
[snip] >> I hope the change in arch/um/kernel/skas/MMU.c isn't the cause of all this trouble! > No it is not, but it is one that needs refinement. Fix coming up in 10 > minutes or so, I was just cleaning up the patches for submission. > > The issue is reentrancy of the core interrupt logic which the code in > unblock_signals is supposed to prevent, but it fails (for some reason) > to do so. Some of it looks like a heisenbug from debugging by the way. If you look in softirq.c you see: #ifdef CONFIG_TRACE_IRQFLAGS local_irq_enable <http://lxr.free-electrons.com/ident?i=local_irq_enable>() #endif That in UML translates to unblock_signals(). That in turn will re-execute hardware interrupts out of a software IRQ bottom half context. There is still an inherent reentrancy issue, but I suspect that it is less with the debugging off. Patchset coming up in next email. It throws bh warnings if you have CONFIG_TRACE_IRQFLAGS, but that is not something I can fix in first pass. A. |
From: Anton I. <ant...@ko...> - 2015-11-20 16:22:22
|
On 20/11/15 15:21, Thomas Meyer wrote: > Am 20.11.2015 3:08 nachm. schrieb Anton Ivanov <ant...@ko...>: >> On 20/11/15 13:48, st...@ni... wrote: >>> Den 2015-11-20 13:50, skrev Anton Ivanov: >>>> On 20/11/15 12:26, st...@ni... wrote: >>>>>>> 4. While I can propose a brutal patch for signal.c which sets >>>>>>> guards >>>>>>> against reentrancy which works fine, I suggest we actually get to >>>>>>> the >>>>>>> bottom of this. Why the code in unblock_signals() does not guard >>>>>>> correctly against that? >>>>>> Thanks for hunting this issue. >>>>>> I fear I'll have to grab my speleologist's hat to figure out why >>>>>> UML >>>>>> works this way. >>>>>> Cc'ing Al, do you have an idea? >>>>> In the few stack-traces that I have seen posted here, I could see >>>>> multiple calls to unlocking of signals (with a signal occurred >>>>> directly >>>>> after). That probably should not happen. Do we count the number of >>>>> timers of time we try to block/unblock signals and only actual >>>>> perform >>>>> the action when the counter reaches/leaves 0? >>>>> >>>>> if this series of calls happens: >>>>> block() >>>>> foo() >>>>> block() >>>>> bar() >>>>> unblock() <- this should be a no-op >>>>> foobar() >>>>> unblock() <- first here the signals should be unblocked again >>>> Block/unblock are not counting the number of enable/disable at >>>> present. >>>> It is either on or off. >>>> >>>> Any unblock will immediately re-trigger all pending interrupts. >>>> >>>> Some of the errata patches I have out of investigating this do >>>> exactly >>>> that - change: >>>> >>>> block to flags = set_signals(0); bar() ; set_signal(flags); >>>> >>>> This, if nested should be a NOP. >>>> >>>> However, even after fixing all of them (and their corresponding >>>> kernel >>>> side counterparts), I still get reentrancy, so there is something >>>> else >>>> at play too. >>> Please, share a stack-trace if possible. >>> >>> >>> >>> As a side-note: >>> The small issue with the code example above I can see is that what if >>> flags should have change during bar(). >> I see it too, but I have not figured out how to deal with it. >> >> >>> And code inside bar can do >>> set_signals() magic. >> Correct, which is to some extent our issue. >> >>> I am not linux kernel ABI expert. >>> >>> To me, it seems to be a more safe to have a ABI that tracks each signal >>> blocked mask individually, and have a ref-counted block-all/unblock-all >>> call. This would be like how you normally program on a CPU. You have a >>> interrupt controller that you setup (masks), and a master interrupt >>> enable/disable flag. >> That is what signal.c is trying to simulate - you have a mask for ALRM >> (or VTALRM with the older timers) and SIGIO and a global on/off. >> >> What that fails to emulate, however, is that an IRQ is usually blocked >> until it is fully serviced. This, depending on IRQ controller design may >> block all IRQs, all lower priority IRQs or none. >> >> The current code in uml tries to block all while processing an IRQ, but >> for some reason fails. >> >> I will submit a patch to put some ducktape over this for the time being, >> we should understand what is the root cause. > I hope the change in arch/um/kernel/skas/MMU.c isn't the cause of all this trouble! No it is not, but it is one that needs refinement. Fix coming up in 10 minutes or so, I was just cleaning up the patches for submission. The issue is reentrancy of the core interrupt logic which the code in unblock_signals is supposed to prevent, but it fails (for some reason) to do so. A. > > I wanted to disable interrupt processing and so the forwarding of timer interrupts to the user space process when the user space is currently in a critical section of forking itself, and no signal handler is installed yet! > >> A. >> >>> Stian |
From: Thomas M. <th...@m3...> - 2015-11-20 15:22:03
|
Am 20.11.2015 3:08 nachm. schrieb Anton Ivanov <ant...@ko...>: > > On 20/11/15 13:48, st...@ni... wrote: > > Den 2015-11-20 13:50, skrev Anton Ivanov: > >> On 20/11/15 12:26, st...@ni... wrote: > >>>>> 4. While I can propose a brutal patch for signal.c which sets > >>>>> guards > >>>>> against reentrancy which works fine, I suggest we actually get to > >>>>> the > >>>>> bottom of this. Why the code in unblock_signals() does not guard > >>>>> correctly against that? > >>>> Thanks for hunting this issue. > >>>> I fear I'll have to grab my speleologist's hat to figure out why > >>>> UML > >>>> works this way. > >>>> Cc'ing Al, do you have an idea? > >>> In the few stack-traces that I have seen posted here, I could see > >>> multiple calls to unlocking of signals (with a signal occurred > >>> directly > >>> after). That probably should not happen. Do we count the number of > >>> timers of time we try to block/unblock signals and only actual > >>> perform > >>> the action when the counter reaches/leaves 0? > >>> > >>> if this series of calls happens: > >>> block() > >>> foo() > >>> block() > >>> bar() > >>> unblock() <- this should be a no-op > >>> foobar() > >>> unblock() <- first here the signals should be unblocked again > >> Block/unblock are not counting the number of enable/disable at > >> present. > >> It is either on or off. > >> > >> Any unblock will immediately re-trigger all pending interrupts. > >> > >> Some of the errata patches I have out of investigating this do > >> exactly > >> that - change: > >> > >> block to flags = set_signals(0); bar() ; set_signal(flags); > >> > >> This, if nested should be a NOP. > >> > >> However, even after fixing all of them (and their corresponding > >> kernel > >> side counterparts), I still get reentrancy, so there is something > >> else > >> at play too. > > Please, share a stack-trace if possible. > > > > > > > > As a side-note: > > The small issue with the code example above I can see is that what if > > flags should have change during bar(). > > I see it too, but I have not figured out how to deal with it. > > > > And code inside bar can do > > set_signals() magic. > > Correct, which is to some extent our issue. > > > > > I am not linux kernel ABI expert. > > > > To me, it seems to be a more safe to have a ABI that tracks each signal > > blocked mask individually, and have a ref-counted block-all/unblock-all > > call. This would be like how you normally program on a CPU. You have a > > interrupt controller that you setup (masks), and a master interrupt > > enable/disable flag. > > That is what signal.c is trying to simulate - you have a mask for ALRM > (or VTALRM with the older timers) and SIGIO and a global on/off. > > What that fails to emulate, however, is that an IRQ is usually blocked > until it is fully serviced. This, depending on IRQ controller design may > block all IRQs, all lower priority IRQs or none. > > The current code in uml tries to block all while processing an IRQ, but > for some reason fails. > > I will submit a patch to put some ducktape over this for the time being, > we should understand what is the root cause. I hope the change in arch/um/kernel/skas/MMU.c isn't the cause of all this trouble! I wanted to disable interrupt processing and so the forwarding of timer interrupts to the user space process when the user space is currently in a critical section of forking itself, and no signal handler is installed yet! > > A. > > > > > Stian |
From: Anton I. <ant...@ko...> - 2015-11-20 14:09:03
|
On 20/11/15 13:48, st...@ni... wrote: > Den 2015-11-20 13:50, skrev Anton Ivanov: >> On 20/11/15 12:26, st...@ni... wrote: >>>>> 4. While I can propose a brutal patch for signal.c which sets >>>>> guards >>>>> against reentrancy which works fine, I suggest we actually get to >>>>> the >>>>> bottom of this. Why the code in unblock_signals() does not guard >>>>> correctly against that? >>>> Thanks for hunting this issue. >>>> I fear I'll have to grab my speleologist's hat to figure out why >>>> UML >>>> works this way. >>>> Cc'ing Al, do you have an idea? >>> In the few stack-traces that I have seen posted here, I could see >>> multiple calls to unlocking of signals (with a signal occurred >>> directly >>> after). That probably should not happen. Do we count the number of >>> timers of time we try to block/unblock signals and only actual >>> perform >>> the action when the counter reaches/leaves 0? >>> >>> if this series of calls happens: >>> block() >>> foo() >>> block() >>> bar() >>> unblock() <- this should be a no-op >>> foobar() >>> unblock() <- first here the signals should be unblocked again >> Block/unblock are not counting the number of enable/disable at >> present. >> It is either on or off. >> >> Any unblock will immediately re-trigger all pending interrupts. >> >> Some of the errata patches I have out of investigating this do >> exactly >> that - change: >> >> block to flags = set_signals(0); bar() ; set_signal(flags); >> >> This, if nested should be a NOP. >> >> However, even after fixing all of them (and their corresponding >> kernel >> side counterparts), I still get reentrancy, so there is something >> else >> at play too. > Please, share a stack-trace if possible. > > > > As a side-note: > The small issue with the code example above I can see is that what if > flags should have change during bar(). I see it too, but I have not figured out how to deal with it. > And code inside bar can do > set_signals() magic. Correct, which is to some extent our issue. > > I am not linux kernel ABI expert. > > To me, it seems to be a more safe to have a ABI that tracks each signal > blocked mask individually, and have a ref-counted block-all/unblock-all > call. This would be like how you normally program on a CPU. You have a > interrupt controller that you setup (masks), and a master interrupt > enable/disable flag. That is what signal.c is trying to simulate - you have a mask for ALRM (or VTALRM with the older timers) and SIGIO and a global on/off. What that fails to emulate, however, is that an IRQ is usually blocked until it is fully serviced. This, depending on IRQ controller design may block all IRQs, all lower priority IRQs or none. The current code in uml tries to block all while processing an IRQ, but for some reason fails. I will submit a patch to put some ducktape over this for the time being, we should understand what is the root cause. A. > > > > -- > > Stian > > ------------------------------------------------------------------------------ > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > |
From: <st...@ni...> - 2015-11-20 13:55:40
|
Den 2015-11-20 13:50, skrev Anton Ivanov: > On 20/11/15 12:26, st...@ni... wrote: >>>> 4. While I can propose a brutal patch for signal.c which sets >>>> guards >>>> against reentrancy which works fine, I suggest we actually get to >>>> the >>>> bottom of this. Why the code in unblock_signals() does not guard >>>> correctly against that? >>> Thanks for hunting this issue. >>> I fear I'll have to grab my speleologist's hat to figure out why >>> UML >>> works this way. >>> Cc'ing Al, do you have an idea? >> In the few stack-traces that I have seen posted here, I could see >> multiple calls to unlocking of signals (with a signal occurred >> directly >> after). That probably should not happen. Do we count the number of >> timers of time we try to block/unblock signals and only actual >> perform >> the action when the counter reaches/leaves 0? >> >> if this series of calls happens: >> block() >> foo() >> block() >> bar() >> unblock() <- this should be a no-op >> foobar() >> unblock() <- first here the signals should be unblocked again > > Block/unblock are not counting the number of enable/disable at > present. > It is either on or off. > > Any unblock will immediately re-trigger all pending interrupts. > > Some of the errata patches I have out of investigating this do > exactly > that - change: > > block to flags = set_signals(0); bar() ; set_signal(flags); > > This, if nested should be a NOP. > > However, even after fixing all of them (and their corresponding > kernel > side counterparts), I still get reentrancy, so there is something > else > at play too. Please, share a stack-trace if possible. As a side-note: The small issue with the code example above I can see is that what if flags should have change during bar(). And code inside bar can do set_signals() magic. I am not linux kernel ABI expert. To me, it seems to be a more safe to have a ABI that tracks each signal blocked mask individually, and have a ref-counted block-all/unblock-all call. This would be like how you normally program on a CPU. You have a interrupt controller that you setup (masks), and a master interrupt enable/disable flag. -- Stian |
From: Anton I. <ant...@ko...> - 2015-11-20 12:50:25
|
On 20/11/15 12:26, st...@ni... wrote: >>> 4. While I can propose a brutal patch for signal.c which sets guards >>> against reentrancy which works fine, I suggest we actually get to >>> the >>> bottom of this. Why the code in unblock_signals() does not guard >>> correctly against that? >> Thanks for hunting this issue. >> I fear I'll have to grab my speleologist's hat to figure out why UML >> works this way. >> Cc'ing Al, do you have an idea? > In the few stack-traces that I have seen posted here, I could see > multiple calls to unlocking of signals (with a signal occurred directly > after). That probably should not happen. Do we count the number of > timers of time we try to block/unblock signals and only actual perform > the action when the counter reaches/leaves 0? > > if this series of calls happens: > block() > foo() > block() > bar() > unblock() <- this should be a no-op > foobar() > unblock() <- first here the signals should be unblocked again Block/unblock are not counting the number of enable/disable at present. It is either on or off. Any unblock will immediately re-trigger all pending interrupts. Some of the errata patches I have out of investigating this do exactly that - change: block to flags = set_signals(0); bar() ; set_signal(flags); This, if nested should be a NOP. However, even after fixing all of them (and their corresponding kernel side counterparts), I still get reentrancy, so there is something else at play too. In any case, the errata should be fixed, I will sort it out, organize it into a patch set and send it out by Monday. A. > > > > Stian > > ------------------------------------------------------------------------------ > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > |
From: Anton I. <ant...@ko...> - 2015-11-20 12:45:54
|
It works correctly if I insert a guard around the interrupt handlers as well as into unblock_signals which prevents re-entrancy. I can clean that and send it in as well as the various irq/signal erratas I have dug out while hunting this one. A On 20/11/15 12:16, Richard Weinberger wrote: > On Fri, Nov 20, 2015 at 1:05 PM, Anton Ivanov > <ant...@ko...> wrote: >> I have gotten to the bottom of this. >> >> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a >> simple guard with a WARN_ON_ONCE around the device loop in the >> sig_io_handler catches it in plain 4.3 >> >> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c >> index 23cb935..ac0bbce 100644 >> --- a/arch/um/kernel/irq.c >> +++ b/arch/um/kernel/irq.c >> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds; >> >> extern void free_irqs(void); >> >> +static int in_poll_handler = 0; >> + >> void sigio_handler(int sig, struct siginfo *unused_si, struct >> uml_pt_regs *regs) >> { >> struct irq_fd *irq_fd; >> int n; >> >> + WARN_ON_ONCE(in_poll_handler == 1); >> + >> while (1) { >> + in_poll_handler = 1; >> n = os_waiting_for_events(active_fds); >> if (n <= 0) { >> if (n == -EINTR) >> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si, >> struct uml_pt_regs *regs) >> } >> } >> } >> + in_poll_handler = 0; >> >> free_irqs(); >> } >> >> This is dangerously broken - you can under heavy IO exhaust the stack, >> you can get packets out of order, etc. Most IO is reasonably atomic so >> corruption is not likely, but not impossible (especially if one or more >> drivers are optimized to use multi-read/multi-write). >> >> 2. I cannot catch what is wrong with the current code in signal.c. When >> I read it, it should not produce re-entrancy. But it does. >> >> 3. I found 2-3 minor issues with signal handling and the timer patch >> which I will submit a hot-fix for, including a proper fix for the >> hang-in-sleep issue. >> >> 4. While I can propose a brutal patch for signal.c which sets guards >> against reentrancy which works fine, I suggest we actually get to the >> bottom of this. Why the code in unblock_signals() does not guard >> correctly against that? > Thanks for hunting this issue. > I fear I'll have to grab my speleologist's hat to figure out why UML > works this way. > Cc'ing Al, do you have an idea? > |
From: <st...@ni...> - 2015-11-20 12:33:47
|
>> 4. While I can propose a brutal patch for signal.c which sets guards >> against reentrancy which works fine, I suggest we actually get to >> the >> bottom of this. Why the code in unblock_signals() does not guard >> correctly against that? > > Thanks for hunting this issue. > I fear I'll have to grab my speleologist's hat to figure out why UML > works this way. > Cc'ing Al, do you have an idea? In the few stack-traces that I have seen posted here, I could see multiple calls to unlocking of signals (with a signal occurred directly after). That probably should not happen. Do we count the number of timers of time we try to block/unblock signals and only actual perform the action when the counter reaches/leaves 0? if this series of calls happens: block() foo() block() bar() unblock() <- this should be a no-op foobar() unblock() <- first here the signals should be unblocked again Stian |
From: Richard W. <ric...@gm...> - 2015-11-20 12:16:54
|
On Fri, Nov 20, 2015 at 1:05 PM, Anton Ivanov <ant...@ko...> wrote: > I have gotten to the bottom of this. > > 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a > simple guard with a WARN_ON_ONCE around the device loop in the > sig_io_handler catches it in plain 4.3 > > diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c > index 23cb935..ac0bbce 100644 > --- a/arch/um/kernel/irq.c > +++ b/arch/um/kernel/irq.c > @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds; > > extern void free_irqs(void); > > +static int in_poll_handler = 0; > + > void sigio_handler(int sig, struct siginfo *unused_si, struct > uml_pt_regs *regs) > { > struct irq_fd *irq_fd; > int n; > > + WARN_ON_ONCE(in_poll_handler == 1); > + > while (1) { > + in_poll_handler = 1; > n = os_waiting_for_events(active_fds); > if (n <= 0) { > if (n == -EINTR) > @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si, > struct uml_pt_regs *regs) > } > } > } > + in_poll_handler = 0; > > free_irqs(); > } > > This is dangerously broken - you can under heavy IO exhaust the stack, > you can get packets out of order, etc. Most IO is reasonably atomic so > corruption is not likely, but not impossible (especially if one or more > drivers are optimized to use multi-read/multi-write). > > 2. I cannot catch what is wrong with the current code in signal.c. When > I read it, it should not produce re-entrancy. But it does. > > 3. I found 2-3 minor issues with signal handling and the timer patch > which I will submit a hot-fix for, including a proper fix for the > hang-in-sleep issue. > > 4. While I can propose a brutal patch for signal.c which sets guards > against reentrancy which works fine, I suggest we actually get to the > bottom of this. Why the code in unblock_signals() does not guard > correctly against that? Thanks for hunting this issue. I fear I'll have to grab my speleologist's hat to figure out why UML works this way. Cc'ing Al, do you have an idea? -- Thanks, //richard |
From: Anton I. <ant...@ko...> - 2015-11-20 12:05:09
|
I have gotten to the bottom of this. 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a simple guard with a WARN_ON_ONCE around the device loop in the sig_io_handler catches it in plain 4.3 diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c index 23cb935..ac0bbce 100644 --- a/arch/um/kernel/irq.c +++ b/arch/um/kernel/irq.c @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds; extern void free_irqs(void); +static int in_poll_handler = 0; + void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs) { struct irq_fd *irq_fd; int n; + WARN_ON_ONCE(in_poll_handler == 1); + while (1) { + in_poll_handler = 1; n = os_waiting_for_events(active_fds); if (n <= 0) { if (n == -EINTR) @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs) } } } + in_poll_handler = 0; free_irqs(); } This is dangerously broken - you can under heavy IO exhaust the stack, you can get packets out of order, etc. Most IO is reasonably atomic so corruption is not likely, but not impossible (especially if one or more drivers are optimized to use multi-read/multi-write). 2. I cannot catch what is wrong with the current code in signal.c. When I read it, it should not produce re-entrancy. But it does. 3. I found 2-3 minor issues with signal handling and the timer patch which I will submit a hot-fix for, including a proper fix for the hang-in-sleep issue. 4. While I can propose a brutal patch for signal.c which sets guards against reentrancy which works fine, I suggest we actually get to the bottom of this. Why the code in unblock_signals() does not guard correctly against that? A. |
From: Lorenzo C. <lo...@go...> - 2015-11-18 14:37:28
|
On gcc Ubuntu 4.8.4-2ubuntu1~14.04, linking vmlinux fails with: arch/um/os-Linux/built-in.o: In function `os_timer_create': /android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create' arch/um/os-Linux/built-in.o: In function `os_timer_set_interval': /android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime' arch/um/os-Linux/built-in.o: In function `os_timer_remain': /android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime' arch/um/os-Linux/built-in.o: In function `os_timer_one_shot': /android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime' arch/um/os-Linux/built-in.o: In function `os_timer_disable': /android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime' This is because -lrt appears in the generated link commandline after arch/um/os-Linux/built-in.o. Fix this by removing -lrt from arch/um/Makefile and adding it to the UM-specific section of scripts/link-vmlinux.sh. Signed-off-by: Lorenzo Colitti <lo...@go...> --- arch/um/Makefile | 2 +- scripts/link-vmlinux.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/um/Makefile b/arch/um/Makefile index 25ed409..e3abe6f 100644 --- a/arch/um/Makefile +++ b/arch/um/Makefile @@ -131,7 +131,7 @@ export LDS_ELF_FORMAT := $(ELF_FORMAT) # The wrappers will select whether using "malloc" or the kernel allocator. LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) -lrt +LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) # Used by link-vmlinux.sh which has special support for um link export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 1a10d8a..dacf71a 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -62,7 +62,7 @@ vmlinux_link() -Wl,--start-group \ ${KBUILD_VMLINUX_MAIN} \ -Wl,--end-group \ - -lutil ${1} + -lutil -lrt ${1} rm -f linux fi } -- 2.6.0.rc2.230.g3dd15c0 |
From: Richard W. <ri...@no...> - 2015-11-18 13:40:51
|
Hi! Am 18.11.2015 um 14:32 schrieb Lorenzo Colitti: > On Wed, Nov 18, 2015 at 5:06 PM, Richard Weinberger <ri...@no...> wrote: >> >>> That command line doesn't work, but if you remove the -lrt and put it >>> at the end of the line, it starts working. Is the order significant? >>> Or is it a bug in GCC's command line parsing? >> >> The order matters. >> -lrt has to be placed after all object files which need the rt library. >> Can you double check whether this is the case? > > No, it's not the case. The error is when linking arch/um/os-Linux/built-in.o: > > arch/um/os-Linux/built-in.o: In function `os_timer_create': > /android/kernel/android/arch/um/os-Linux/time.c:51: undefined > reference to `timer_create' > > and in the generated command line, -lrt appears > beforearch/um/os-Linux/built-in.o: > > + gcc -Wl,-rpath,/lib64 -m64 -Wl,-rpath,/lib -Wl,--wrap,malloc > -Wl,--wrap,free -Wl,--wrap,calloc -Wl,-m -Wl,elf_x86_64 -lrt -o > .tmp_vmlinux1 -Wl,-T,./arch/um/kernel/vmlinux.lds init/built-in.o > -Wl,--start-group usr/built-in.o arch/um/kernel/built-in.o > arch/um/drivers/built-in.o arch/um/os-Linux/built-in.o > arch/x86/crypto/built-in.o arch/x86/um/built-in.o kernel/built-in.o > certs/built-in.o mm/built-in.o fs/built-in.o ipc/built-in.o > security/built-in.o crypto/built-in.o block/built-in.o lib/lib.a > lib/built-in.o drivers/built-in.o sound/built-in.o firmware/built-in.o > net/built-in.o virt/built-in.o -Wl,--end-group -lutil > > Taking -lrt out of arch/um/Makefile and putting it into > link-vmlinux.sh, as per the patch I suggested above, results in -lrt > being the last thing on the command line, after -lutil. > Okay, please send a proper patch. :-) Thanks, //richard |
From: Lorenzo C. <lo...@go...> - 2015-11-18 13:32:30
|
On Wed, Nov 18, 2015 at 5:06 PM, Richard Weinberger <ri...@no...> wrote: > > > That command line doesn't work, but if you remove the -lrt and put it > > at the end of the line, it starts working. Is the order significant? > > Or is it a bug in GCC's command line parsing? > > The order matters. > -lrt has to be placed after all object files which need the rt library. > Can you double check whether this is the case? No, it's not the case. The error is when linking arch/um/os-Linux/built-in.o: arch/um/os-Linux/built-in.o: In function `os_timer_create': /android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create' and in the generated command line, -lrt appears beforearch/um/os-Linux/built-in.o: + gcc -Wl,-rpath,/lib64 -m64 -Wl,-rpath,/lib -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -Wl,-m -Wl,elf_x86_64 -lrt -o .tmp_vmlinux1 -Wl,-T,./arch/um/kernel/vmlinux.lds init/built-in.o -Wl,--start-group usr/built-in.o arch/um/kernel/built-in.o arch/um/drivers/built-in.o arch/um/os-Linux/built-in.o arch/x86/crypto/built-in.o arch/x86/um/built-in.o kernel/built-in.o certs/built-in.o mm/built-in.o fs/built-in.o ipc/built-in.o security/built-in.o crypto/built-in.o block/built-in.o lib/lib.a lib/built-in.o drivers/built-in.o sound/built-in.o firmware/built-in.o net/built-in.o virt/built-in.o -Wl,--end-group -lutil Taking -lrt out of arch/um/Makefile and putting it into link-vmlinux.sh, as per the patch I suggested above, results in -lrt being the last thing on the command line, after -lutil. |
From: Richard W. <ri...@no...> - 2015-11-18 08:51:57
|
If get_signal() returns us a signal to post we must not call it again, otherwise the already posted signal will be overridden. Before commit a610d6e672d this was the case as we stopped the while after a successful handle_signal(). Cc: <st...@vg...> # 3.10- Fixes: a610d6e672d ("pull clearing RESTORE_SIGMASK into block_sigmask()") Signed-off-by: Richard Weinberger <ri...@no...> --- arch/um/kernel/signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c index 57acbd6..fc8be0e 100644 --- a/arch/um/kernel/signal.c +++ b/arch/um/kernel/signal.c @@ -69,7 +69,7 @@ void do_signal(struct pt_regs *regs) struct ksignal ksig; int handled_sig = 0; - while (get_signal(&ksig)) { + if (get_signal(&ksig)) { handled_sig = 1; /* Whee! Actually deliver the signal. */ handle_signal(&ksig, regs); -- 2.5.0 |
From: Anton I. <ant...@ko...> - 2015-11-18 08:33:46
|
Hi all, I have looked through this and I still have not figured it out completely. Moving from the old poll controller to the new epoll one triggers deep recursive invocations of the interrupt handlers. I still do not understand why it does it so I will park it for now and come back to it next week. I did, however, find a number of small issues. Namely, various patches and fixes over the years have used calls to block/unblock signals and block/unblock irqs in a few places where these can create a recursion race (even with the old controller). If I understand os-Linux/irq.c correctly, block/unblock in UML does not block or unblock the signals. It blocks/unblocks the processing of them and unblock can (and will) result in the immediate processing any pending signals. So in most places, that should not be block/unblock (and respectively local_irq_enable/local_irq_disable which invoke that). It should be save+block and restore. Otherwise you recurse by invoking the IRQ handler the moment you unblock_signals(). Additionally, if you want just to wait and be interrupted by a signal, you do not need to enable/disable IRQs - signals are always received at present. If I understand the situation correctly, irq on/off only changes if they are processed or not. I am going to roll my tree back to the timer patch now and go through the ones I found so far one by one and submit them separately. Once that is out of the way we can look again at the epoll controller patch. It has potential, but it makes all gremlins come out of the woodwork so we might as well get the gremlins out of the way first. [snip] A. |
From: Richard W. <ri...@no...> - 2015-11-18 08:06:40
|
Am 18.11.2015 um 08:10 schrieb Lorenzo Colitti: > On Wed, Nov 18, 2015 at 4:00 PM, Anton Ivanov > <ant...@ko...> wrote: >> It is. >> >> You need -lrt to link in HR timers. However, the original patch should add >> that to the library list. I need to understand why it does not in the dm >> tree. > > I noticed that command already does contain "-lrt", just in a > different place. It starts off with: > > gcc -Wl,-rpath,/lib64 -m64 -Wl,-rpath,/lib -Wl,--wrap,malloc > -Wl,--wrap,free -Wl,--wrap,calloc -Wl,-m -Wl,elf_x86_64 -lrt <...> > > That command line doesn't work, but if you remove the -lrt and put it > at the end of the line, it starts working. Is the order significant? > Or is it a bug in GCC's command line parsing? The order matters. -lrt has to be placed after all object files which need the rt library. Can you double check whether this is the case? Thanks, //richard |
From: Lorenzo C. <lo...@go...> - 2015-11-18 07:11:01
|
On Wed, Nov 18, 2015 at 4:00 PM, Anton Ivanov <ant...@ko...> wrote: > It is. > > You need -lrt to link in HR timers. However, the original patch should add > that to the library list. I need to understand why it does not in the dm > tree. I noticed that command already does contain "-lrt", just in a different place. It starts off with: gcc -Wl,-rpath,/lib64 -m64 -Wl,-rpath,/lib -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -Wl,-m -Wl,elf_x86_64 -lrt <...> That command line doesn't work, but if you remove the -lrt and put it at the end of the line, it starts working. Is the order significant? Or is it a bug in GCC's command line parsing? |
From: Anton I. <ant...@ko...> - 2015-11-18 07:00:20
|
On 18/11/15 01:20, Lorenzo Colitti wrote: > On Sat, Nov 7, 2015 at 6:56 AM, Richard Weinberger <ri...@no...> wrote: >> Am 02.11.2015 um 17:16 schrieb Anton Ivanov: >>> Background: UML is using an obsolete itimer call for >>> all timers and "polls" for kernel space timer firing >>> in its userspace portion resulting in a long list >>> of bugs and incorrect behaviour(s). It also uses >>> ITIMER_VIRTUAL for its timer which results in the >>> timer being dependent on it running and the cpu >>> load. >>> >>> This patch fixes this by moving to posix high resolution >>> timers firing off CLOCK_MONOTONIC and relaying the timer >>> correctly to the UML userspace. >>> >>> Signed-off-by: Thomas Meyer <th...@m3...> >>> Signed-off-by: Anton Ivanov <ai...@br...> >> Applied! > Looks like this broke ARCH=um SUBARCH=x86-64 builds on (at least) > David Miller's net-next tree. On ubuntu 14.04, gcc (Ubuntu > 4.8.4-2ubuntu1~14.04) 4.8.4, I get: > > arch/um/os-Linux/built-in.o: In function `os_timer_create': > /android/kernel/android/arch/um/os-Linux/time.c:51: undefined > reference to `timer_create' > arch/um/os-Linux/built-in.o: In function `os_timer_set_interval': > /android/kernel/android/arch/um/os-Linux/time.c:84: undefined > reference to `timer_settime' > arch/um/os-Linux/built-in.o: In function `os_timer_remain': > /android/kernel/android/arch/um/os-Linux/time.c:109: undefined > reference to `timer_gettime' > arch/um/os-Linux/built-in.o: In function `os_timer_one_shot': > /android/kernel/android/arch/um/os-Linux/time.c:132: undefined > reference to `timer_settime' > arch/um/os-Linux/built-in.o: In function `os_timer_disable': > /android/kernel/android/arch/um/os-Linux/time.c:145: undefined > reference to `timer_settime' > collect2: error: ld returned 1 exit status > > The command being run is this: > > + gcc -Wl,-rpath,/lib64 -m64 -Wl,-rpath,/lib -Wl,--wrap,malloc > -Wl,--wrap,free -Wl,--wrap,calloc -Wl,-m -Wl,elf_x86_64 -lrt -o > .tmp_vmlinux1 -Wl,-T,./arch/um/kernel/vmlinux.lds init/built-in.o > -Wl,--start-group usr/built-in.o arch/um/kernel/built-in.o > arch/um/drivers/built-in.o arch/um/os-Linux/built-in.o > arch/x86/crypto/built-in.o arch/x86/um/built-in.o kernel/built-in.o > certs/built-in.o mm/built-in.o fs/built-in.o ipc/built-in.o > security/built-in.o crypto/built-in.o block/built-in.o lib/lib.a > lib/built-in.o drivers/built-in.o sound/built-in.o firmware/built-in.o > net/built-in.o virt/built-in.o -Wl,--end-group -lutil > > The following patch results in a working kernel, but I have no idea if > it's correct. Thoughts? It is. You need -lrt to link in HR timers. However, the original patch should add that to the library list. I need to understand why it does not in the dm tree. A. > > diff --git a/arch/um/Makefile b/arch/um/Makefile > index 25ed409..e3abe6f 100644 > --- a/arch/um/Makefile > +++ b/arch/um/Makefile > @@ -131,7 +131,7 @@ export LDS_ELF_FORMAT := $(ELF_FORMAT) > # The wrappers will select whether using "malloc" or the kernel allocator. > LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc > > -LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) -lrt > +LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) > > # Used by link-vmlinux.sh which has special support for um link > export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > index 1a10d8a..dacf71a 100755 > --- a/scripts/link-vmlinux.sh > +++ b/scripts/link-vmlinux.sh > @@ -62,7 +62,7 @@ vmlinux_link() > -Wl,--start-group \ > ${KBUILD_VMLINUX_MAIN} \ > -Wl,--end-group \ > - -lutil ${1} > + -lutil -lrt ${1} > rm -f linux > fi > } |
From: Lorenzo C. <lo...@go...> - 2015-11-18 01:21:25
|
On Sat, Nov 7, 2015 at 6:56 AM, Richard Weinberger <ri...@no...> wrote: > > Am 02.11.2015 um 17:16 schrieb Anton Ivanov: > > Background: UML is using an obsolete itimer call for > > all timers and "polls" for kernel space timer firing > > in its userspace portion resulting in a long list > > of bugs and incorrect behaviour(s). It also uses > > ITIMER_VIRTUAL for its timer which results in the > > timer being dependent on it running and the cpu > > load. > > > > This patch fixes this by moving to posix high resolution > > timers firing off CLOCK_MONOTONIC and relaying the timer > > correctly to the UML userspace. > > > > Signed-off-by: Thomas Meyer <th...@m3...> > > Signed-off-by: Anton Ivanov <ai...@br...> > > Applied! Looks like this broke ARCH=um SUBARCH=x86-64 builds on (at least) David Miller's net-next tree. On ubuntu 14.04, gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4, I get: arch/um/os-Linux/built-in.o: In function `os_timer_create': /android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create' arch/um/os-Linux/built-in.o: In function `os_timer_set_interval': /android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime' arch/um/os-Linux/built-in.o: In function `os_timer_remain': /android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime' arch/um/os-Linux/built-in.o: In function `os_timer_one_shot': /android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime' arch/um/os-Linux/built-in.o: In function `os_timer_disable': /android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime' collect2: error: ld returned 1 exit status The command being run is this: + gcc -Wl,-rpath,/lib64 -m64 -Wl,-rpath,/lib -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -Wl,-m -Wl,elf_x86_64 -lrt -o .tmp_vmlinux1 -Wl,-T,./arch/um/kernel/vmlinux.lds init/built-in.o -Wl,--start-group usr/built-in.o arch/um/kernel/built-in.o arch/um/drivers/built-in.o arch/um/os-Linux/built-in.o arch/x86/crypto/built-in.o arch/x86/um/built-in.o kernel/built-in.o certs/built-in.o mm/built-in.o fs/built-in.o ipc/built-in.o security/built-in.o crypto/built-in.o block/built-in.o lib/lib.a lib/built-in.o drivers/built-in.o sound/built-in.o firmware/built-in.o net/built-in.o virt/built-in.o -Wl,--end-group -lutil The following patch results in a working kernel, but I have no idea if it's correct. Thoughts? diff --git a/arch/um/Makefile b/arch/um/Makefile index 25ed409..e3abe6f 100644 --- a/arch/um/Makefile +++ b/arch/um/Makefile @@ -131,7 +131,7 @@ export LDS_ELF_FORMAT := $(ELF_FORMAT) # The wrappers will select whether using "malloc" or the kernel allocator. LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) -lrt +LD_FLAGS_CMDLINE = $(foreach opt,$(LDFLAGS),-Wl,$(opt)) # Used by link-vmlinux.sh which has special support for um link export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 1a10d8a..dacf71a 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -62,7 +62,7 @@ vmlinux_link() -Wl,--start-group \ ${KBUILD_VMLINUX_MAIN} \ -Wl,--end-group \ - -lutil ${1} + -lutil -lrt ${1} rm -f linux fi } |
From: Vegard N. <veg...@or...> - 2015-11-16 17:46:32
|
On 11/16/2015 06:32 PM, Richard Weinberger wrote: > Am 16.11.2015 um 12:49 schrieb Vegard Nossum: >> On 11/16/2015 12:44 PM, Richard Weinberger wrote: >>> Am 16.11.2015 um 10:43 schrieb Vegard Nossum: >>>> Starting UML like this: >>>> >>>> ./vmlinux rootfstype=hostfs rw ignore_console_loglevel con=xterm init=/bin/bash >>>> >>>> Results in unpredictable behaviour, most of the time an xterm flashes on >>>> my screen but the process aborts with only "Aborted" on the console >>>> where I ran the command, sometimes the xterm remains there but frozen, >>>> sometimes the xterm spews this warning non-stop: >>> >>> Hmm, is this a new regression? >>> I bet it only happens with con=xterm, right? >> >> It's the first UML kernel I compile in a few years, so I don't know if >> it's old or new, sorry. >> >> Yes, only con=xterm triggers this. > > /me found some odd stuff. > > arch/um/drivers/chan_user.c tries to call sigsuspend() on the host side. > But sadly the kernel has also a function with the same name. > So, chan_user.c calls into the UML kernel instead of the host. > This seems to work by accident but confuses the Linux signal logic > and you trigger from time to time the WARN_ON(). > > From a quick look, the kernel sigsuspend() has no users except in the same > object file. So we can mark it static and UML calls the real one. > > Does the attached patch help? > I'm sure we need more work as this clearly never worked as expected. :-( Perfect, with your patch xterm works every time. (With the earlyprintk tip you gave earlier, I was also able to use con0=pts -- con0=pty doesn't seem to find any host devices, but that's probably expected.) Thanks a lot! Vegard |
From: Richard W. <ri...@no...> - 2015-11-16 17:32:51
|
Am 16.11.2015 um 12:49 schrieb Vegard Nossum: > On 11/16/2015 12:44 PM, Richard Weinberger wrote: >> Am 16.11.2015 um 10:43 schrieb Vegard Nossum: >>> Starting UML like this: >>> >>> ./vmlinux rootfstype=hostfs rw ignore_console_loglevel con=xterm init=/bin/bash >>> >>> Results in unpredictable behaviour, most of the time an xterm flashes on >>> my screen but the process aborts with only "Aborted" on the console >>> where I ran the command, sometimes the xterm remains there but frozen, >>> sometimes the xterm spews this warning non-stop: >> >> Hmm, is this a new regression? >> I bet it only happens with con=xterm, right? > > It's the first UML kernel I compile in a few years, so I don't know if > it's old or new, sorry. > > Yes, only con=xterm triggers this. /me found some odd stuff. arch/um/drivers/chan_user.c tries to call sigsuspend() on the host side. But sadly the kernel has also a function with the same name. So, chan_user.c calls into the UML kernel instead of the host. This seems to work by accident but confuses the Linux signal logic and you trigger from time to time the WARN_ON(). >From a quick look, the kernel sigsuspend() has no users except in the same object file. So we can mark it static and UML calls the real one. Does the attached patch help? I'm sure we need more work as this clearly never worked as expected. :-( Thanks, //richard |
From: Anton I. <ai...@Br...> - 2015-11-16 11:58:04
|
On 16/11/15 11:53, Richard Weinberger wrote: > Am 16.11.2015 um 12:49 schrieb Vegard Nossum: >> On 11/16/2015 12:44 PM, Richard Weinberger wrote: >>> Am 16.11.2015 um 10:43 schrieb Vegard Nossum: >>>> Starting UML like this: >>>> >>>> ./vmlinux rootfstype=hostfs rw ignore_console_loglevel con=xterm init=/bin/bash >>>> >>>> Results in unpredictable behaviour, most of the time an xterm flashes on >>>> my screen but the process aborts with only "Aborted" on the console >>>> where I ran the command, sometimes the xterm remains there but frozen, >>>> sometimes the xterm spews this warning non-stop: >>> Hmm, is this a new regression? >>> I bet it only happens with con=xterm, right? >> It's the first UML kernel I compile in a few years, so I don't know if >> it's old or new, sorry. >> >> Yes, only con=xterm triggers this. > Okay, let me see. I use xterm very seldom these days. And maybe nobody else noticed. :D ignore_console_runlevel triggers this. If you do not use it xterm console actually works fine (I use it all the time with a debian userspace). A. > >> (For the record, I also couldn't con=pty or con=pts to work either, it >> just results in "Aborted." > You can use the "earlyprintk" parameter to get more output. > > Thanks, > //richard > > ------------------------------------------------------------------------------ > Presto, an open source distributed SQL query engine for big data, initially > developed by Facebook, enables you to easily query your data on Hadoop in a > more interactive manner. Teradata is also now providing full enterprise > support for Presto. Download a free open source copy now. > http://pubads.g.doubleclick.net/gampad/clk?id=250295911&iu=/4140 > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > |
From: Richard W. <ri...@no...> - 2015-11-16 11:53:18
|
Am 16.11.2015 um 12:49 schrieb Vegard Nossum: > On 11/16/2015 12:44 PM, Richard Weinberger wrote: >> Am 16.11.2015 um 10:43 schrieb Vegard Nossum: >>> Starting UML like this: >>> >>> ./vmlinux rootfstype=hostfs rw ignore_console_loglevel con=xterm init=/bin/bash >>> >>> Results in unpredictable behaviour, most of the time an xterm flashes on >>> my screen but the process aborts with only "Aborted" on the console >>> where I ran the command, sometimes the xterm remains there but frozen, >>> sometimes the xterm spews this warning non-stop: >> >> Hmm, is this a new regression? >> I bet it only happens with con=xterm, right? > > It's the first UML kernel I compile in a few years, so I don't know if > it's old or new, sorry. > > Yes, only con=xterm triggers this. Okay, let me see. I use xterm very seldom these days. And maybe nobody else noticed. :D > (For the record, I also couldn't con=pty or con=pts to work either, it > just results in "Aborted." You can use the "earlyprintk" parameter to get more output. Thanks, //richard |
From: Vegard N. <veg...@or...> - 2015-11-16 11:49:55
|
On 11/16/2015 12:44 PM, Richard Weinberger wrote: > Am 16.11.2015 um 10:43 schrieb Vegard Nossum: >> Starting UML like this: >> >> ./vmlinux rootfstype=hostfs rw ignore_console_loglevel con=xterm init=/bin/bash >> >> Results in unpredictable behaviour, most of the time an xterm flashes on >> my screen but the process aborts with only "Aborted" on the console >> where I ran the command, sometimes the xterm remains there but frozen, >> sometimes the xterm spews this warning non-stop: > > Hmm, is this a new regression? > I bet it only happens with con=xterm, right? It's the first UML kernel I compile in a few years, so I don't know if it's old or new, sorry. Yes, only con=xterm triggers this. (For the record, I also couldn't con=pty or con=pts to work either, it just results in "Aborted." on the console where I ran ./vmlinux -- if I try con1=pty or con1=pts it boots normally, but there is no message in the kernel log indicating the host device that was used for the console. But this could be a PEBKAC, I only mention it in case it might be relevant. Also, normal ./vmlinux without any con*= arguments works fine, but I only get the plain stdin/stdout console.) Thanks, Vegard |
From: Richard W. <ri...@no...> - 2015-11-16 11:44:17
|
Hi! Am 16.11.2015 um 10:43 schrieb Vegard Nossum: > Hi, > > Starting UML like this: > > ./vmlinux rootfstype=hostfs rw ignore_console_loglevel con=xterm init=/bin/bash > > Results in unpredictable behaviour, most of the time an xterm flashes on > my screen but the process aborts with only "Aborted" on the console > where I ran the command, sometimes the xterm remains there but frozen, > sometimes the xterm spews this warning non-stop: Hmm, is this a new regression? I bet it only happens with con=xterm, right? Thanks, //richard |