Thread: [uml-devel] UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

user-mode-linux-devel

[uml-devel] UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Armin.Warda@ePost.De> - 2004-12-14 08:59:15

  Hi,

running UML 2.6.9-bb4 on SuSE Professional 9.2 with host kernel SuSE=20
2.6.8-24.5 (with SKAS) I notice the strange effect, that /tmp keeps=20
filling irreversibly.

/tmp space is not only consumed as long as UMLs are running (as seen=20
with "lsof|grep deleted"), /tmp space is _not_released_ when the UMLs=20
are gone. When all UMLs are gone "lsof|grep deleted" returns nothing,=20
but "df /tmp" still shows the allocations. In this situation attempts=20
to umount (or remount ro) the /tmp fail, even in single user mode.

The only way to recover from such a /tmp full situation is to reboot=20
the host (then unmount of /tmp and the rootfs fail during shutdown,=20
which causes fscks at the reboot...)

I tried out different filesystem types for /tmp: reiserfs 3.6, ext3=20
and tmpfs, no difference.

Anybody seen such behaviour before?

I suppose this must be a bug of the _host_ kernel, not of the UML=20
kernel. Do you agree?

  Armin.

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2004-12-16 18:51:07

  Hi,

no difference with SuSE host kernel 2.6.8-20041214185031.=20

  Armin.

On Wednesday 15 December 2004 13:20, Gerd Knorr wrote:
> On Tuesday 14 December 2004 09:58, Armin M. Warda wrote:
> > running UML 2.6.9-bb4 on SuSE Professional 9.2 with host kernel
> > SuSE 2.6.8-24.5 (with SKAS) I notice the strange effect, that
> > /tmp keeps filling irreversibly.
>
> Any change with a kernel from
> ftp://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD/ ?
>
> These have a slightly newer skas patch in.
>
>   Gerd

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2004-12-30 16:07:49

  Hi,

just tried again with SuSE host kernel 2.6.8-24.10: same problem.

(Is 2.6.8-24.10 a 2.6.8 with back-ported fixes from 2.6.10 ?)

  Armin.

On Thursday 16 December 2004 20:13, Paolo Giarrusso wrote:
> On Thursday 16 December 2004 19:36, Gerd Knorr wrote:
> > > Well, this matches some other reports of a different problem
> > > (impossible to unmount a fs used by UML) which probably relates
> > > to 2.6.9 host bugs.
> > >
> > > I'd suggest trying to reproduce the bug with a 2.6.8.1 +
> > > SKAS3-v7 patch, or checking what's different in the SuSE kernel
> > > from vanilla 2.6.8.1 (I'm particularly suspiscious about the
> > > TASK_TRACED introduction).
> >
> > The suse 2.6.8 kernel is actually 2.6.9-rc2, thus more close to
> > 2.6.9 than 2.6.8.  Especially it already has the TASK_TRACED
> > stuff and I've used the 2.6.9 version of the skas v7 patch
> >
> > (and also thats why I've
> > suggested to check against 2.6.9 not 2.6.8.1 vanilla to see if
> > the issue is still there).
>
> Remember that there was an invisible reference from died UMLs that
> kept the tmpfs from being unmounted (multiply reported, I
> experienced that too). Well, it the file is kept used, then it will
> keep using the tmpfs space. So nothing new....

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Christopher S. A. <ca...@th...> - 2004-12-30 16:14:58

You need to run 2.6.10, as it doesn't exhibit the tmpfs filling up 
problem like 2.6.9 variants do.  The tmpfs problem is also compounded 
by the fact that UMLs without the fix-kill patch don't exit properly, 
but the 2.6.9-bb4 patchset has the fix for UML.

For 2.4-um, you can find the fix-kill patch here:
http://www.theshore.net/~caker/uml/patches/2.4.27-1um/

-Chris

On Dec 30, 2004, at 9:20 AM, Armin M. Warda wrote:

>   Hi,
>
> just tried again with SuSE host kernel 2.6.8-24.10: same problem.
>
> (Is 2.6.8-24.10 a 2.6.8 with back-ported fixes from 2.6.10 ?)
>
>   Armin.
>
> On Thursday 16 December 2004 20:13, Paolo Giarrusso wrote:
>> On Thursday 16 December 2004 19:36, Gerd Knorr wrote:
>>>> Well, this matches some other reports of a different problem
>>>> (impossible to unmount a fs used by UML) which probably relates
>>>> to 2.6.9 host bugs.
>>>>
>>>> I'd suggest trying to reproduce the bug with a 2.6.8.1 +
>>>> SKAS3-v7 patch, or checking what's different in the SuSE kernel
>>>> from vanilla 2.6.8.1 (I'm particularly suspiscious about the
>>>> TASK_TRACED introduction).
>>>
>>> The suse 2.6.8 kernel is actually 2.6.9-rc2, thus more close to
>>> 2.6.9 than 2.6.8.  Especially it already has the TASK_TRACED
>>> stuff and I've used the 2.6.9 version of the skas v7 patch
>>>
>>> (and also thats why I've
>>> suggested to check against 2.6.9 not 2.6.8.1 vanilla to see if
>>> the issue is still there).
>>
>> Remember that there was an invisible reference from died UMLs that
>> kept the tmpfs from being unmounted (multiply reported, I
>> experienced that too). Well, it the file is kept used, then it will
>> keep using the tmpfs space. So nothing new....
>
> -- 
>    --- May the Source be with you! Linux. ---
>    --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-03 10:40:37

> (Is 2.6.8-24.10 a 2.6.8 with back-ported fixes from 2.6.10 ?)

Yep, there is a batch of fixes backported to fix the breakouts ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2004-12-31 11:23:24

On Thursday 30 December 2004 17:14, Christopher S. Aker wrote:
> You need to run 2.6.10,=20

That's supposed to be a 2.6.10 _Host_-Kernel? (Not 2.6.10 UM-Kernel?)

> as it doesn't exhibit the tmpfs filling up=20
> problem like 2.6.9 variants do.=20

Any chance to have a 2.6.10 SuSE Kernel soon?=20

(As I use some non-GPL kernel modules - Atheros, Soft-Modem - that=20
come with the SuSE kernels but are not in kernel.org's, I would=20
prefer to stick with the SuSE-Kernel and not switch to the vanilla=20
2.6.10 kernel. But if this would be the only possibility to solve=20
this problem, I would indeed switch...)

> The tmpfs problem is also =20
> compounded by the fact that UMLs without the fix-kill patch don't
> exit properly, but the 2.6.9-bb4 patchset has the fix for UML.

You say UM-Kernel 2.6.9-bb4 has that fix? (Yes, I am running 2.6.9-bb4=20
UM-Kernels)

> For 2.4-um, you can find the fix-kill patch here:
> http://www.theshore.net/~caker/uml/patches/2.4.27-1um/
>
> -Chris
>
> On Dec 30, 2004, at 9:20 AM, Armin M. Warda wrote:
> >   Hi,
> >
> > just tried again with SuSE host kernel 2.6.8-24.10: same problem.
> >
> > (Is 2.6.8-24.10 a 2.6.8 with back-ported fixes from 2.6.10 ?)
> >
> >   Armin.
> >
> > On Thursday 16 December 2004 20:13, Paolo Giarrusso wrote:
> >> On Thursday 16 December 2004 19:36, Gerd Knorr wrote:
> >>>> Well, this matches some other reports of a different problem
> >>>> (impossible to unmount a fs used by UML) which probably
> >>>> relates to 2.6.9 host bugs.
> >>>>
> >>>> I'd suggest trying to reproduce the bug with a 2.6.8.1 +
> >>>> SKAS3-v7 patch, or checking what's different in the SuSE
> >>>> kernel from vanilla 2.6.8.1 (I'm particularly suspiscious
> >>>> about the TASK_TRACED introduction).
> >>>
> >>> The suse 2.6.8 kernel is actually 2.6.9-rc2, thus more close to
> >>> 2.6.9 than 2.6.8.  Especially it already has the TASK_TRACED
> >>> stuff and I've used the 2.6.9 version of the skas v7 patch
> >>>
> >>> (and also thats why I've
> >>> suggested to check against 2.6.9 not 2.6.8.1 vanilla to see if
> >>> the issue is still there).
> >>
> >> Remember that there was an invisible reference from died UMLs
> >> that kept the tmpfs from being unmounted (multiply reported, I
> >> experienced that too). Well, it the file is kept used, then it
> >> will keep using the tmpfs space. So nothing new....
> >
> > --
> >    --- May the Source be with you! Linux. ---
> >    --- secure eMail: http://www.gnupg.de/ ---
>
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well,
> almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> User-mode-linux-devel mailing list
> Use...@li...
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Christopher S. A. <ca...@th...> - 2005-01-01 20:10:10

> That's supposed to be a 2.6.10 _Host_-Kernel? (Not 2.6.10 UM-Kernel?)

Correct.

>> The tmpfs problem is also  
>> compounded by the fact that UMLs without the fix-kill patch don't
>> exit properly, but the 2.6.9-bb4 patchset has the fix for UML.

> You say UM-Kernel 2.6.9-bb4 has that fix? (Yes, I am running 2.6.9-bb4 
> UM-Kernels)

Correct.

>> For 2.4-um, you can find the fix-kill patch here:
>> http://www.theshore.net/~caker/uml/patches/2.4.27-1um/

-Chris

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-02 09:25:51

  Hi,

I can positively confirm that the problem with /tmp filling=20
irreversibly and being unable to unmount /tmp is solved under=20
host kernel 2.6.10.

Unfortunately I am experiencing some new problems after the=20
change from host kernel SuSE 2.6.8-24.10 to vanilla 2.6.10:=20
submount-0.9: name of iocharset gets truncated when calling=20
df, slmodem-2.9.10: modules complain about unknown symbols,=20
swsusp: ohci1394 fails to release. But that's a different=20
story, has nothing to do with UML.=20

Hopefully the guys at SuSE pick up the /tmp-fixes from 2.6.10=20
and backport them to their SuSE-Kernel...

  Armin

On Saturday 01 January 2005 21:10, Christopher S. Aker wrote:
> > That's supposed to be a 2.6.10 _Host_-Kernel? (Not 2.6.10
> > UM-Kernel?)
>
> Correct.
>
> >> The tmpfs problem is also
> >> compounded by the fact that UMLs without the fix-kill patch
> >> don't exit properly, but the 2.6.9-bb4 patchset has the fix for
> >> UML.
> >
> > You say UM-Kernel 2.6.9-bb4 has that fix? (Yes, I am running
> > 2.6.9-bb4 UM-Kernels)
>
> Correct.
>
> >> For 2.4-um, you can find the fix-kill patch here:
> >> http://www.theshore.net/~caker/uml/patches/2.4.27-1um/
>
> -Chris
>
>
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well,
> almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> User-mode-linux-devel mailing list
> Use...@li...
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-03 14:31:58

That was host-kernel 2.6.10 plus skas3-v7 (and UM-kernel 2.6.9-bb4).

On Sunday 02 January 2005 10:25, I wrote:
> I can positively confirm that the problem with /tmp filling
> irreversibly and being unable to unmount /tmp is solved under
> host kernel 2.6.10.

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-03 11:53:06

Attachments: forwarded message forwarded message

  Hi Gerd,

On Monday 03 January 2005 11:13, Gerd Knorr wrote:
> > (Is 2.6.8-24.10 a 2.6.8 with back-ported fixes from 2.6.10 ?)
>
> Yep, there is a batch of fixes backported to fix the breakouts ...

Is there any chance, that the bugfixfor the 
"/tmp-fills-irreversibly-and-cannot-be-unmounted" problem, 
that seems to be fixed starting with 2.6.10-rc3-bk10 (and was
positively tested by me in the final 2.6.10) is being backported 
to a SuSE 2.6.8-XX.YY host kernel, too?

...and a happy new year!

  Armin.

-- 
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-03 14:32:05

  Hi Gerd,

On Monday 03 January 2005 14:34, you wrote:
> host or uml kernel fix?  URL?  Havn't tracked stuff at all over
> xmas holidays ...

Must be a host-Kernel fix, not a uml-kernel fix, because um-kernel=20
2.6.9-bb4 running on host-kernels

 -  SuSE 2.6.8-24-default
 -  SuSE 2.6.8-24.3-default
 -  SuSE 2.6.8-24.5-default
 -  SuSE 2.6.8-24.10-default

(which are actually 2.6.9-rc2, as you stated in earlier mail) makes=20
the problem occur (I tested it), and Christopher S. Aker reports the=20
same problem with host-kernel

 -  2.6.9-ck3
 -  2.6.10-rc2-bk7

The same um-kernel 2.6.9-bb4 on a plain kernel.org host-kernel=20

 +  2.6.10

and on a kernel.org host-kernel

 +  2.6.10 with skas3-v7

does not show the problem (I tested it), and Christopher S. Aker=20
reports no problem with host-kernel

 +  2.6.10-rc3-bk10

Thus the bugfix must have been introduced somewhere between=20
2.6.10-rc2-bk7 and 2.6.10-rc3-bk10.

I scanned the changelog for 2.6.10 but until now failed to identify=20
the patch:

  http://www.de.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.10

Can somebody else on the list help me to identify the patch for=20
this "problem with /tmp filling irreversibly and being unable to
unmount /tmp" bugfix? Thanks!=20

  Armin.

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-04 12:20:18

> Thus the bugfix must have been introduced somewhere between 
> 2.6.10-rc2-bk7 and 2.6.10-rc3-bk10.

Thats a pretty wide range (Nov 22 => Dec 16), anyone tried kernels
inbetween and can narrow that a bit?

Intrestingly mm/shmem.c (which implements shmfs/tmpfs) hasn't been
touched since Nov 19 according to bitkeeper ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-04 14:06:29

On Tuesday 04 January 2005 12:59, you wrote:
> Intrestingly mm/shmem.c (which implements shmfs/tmpfs)=20
> hasn't been touched since Nov 19 according to bitkeeper ...

I don't think the problem is in tmpfs!=20

When I saw the problem for the 1st time, of cause I immediately=20
blamed tmpfs and switched to a reiserfs /tmp, but to my surprise
the problem did not go away. Finally I even tested an ext3 /tmp -=20
just to be sure - and the problem was reproduced, too.

That was with SuSE-2.6.8-24* Host-Kernels.

  Armin.

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

[uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-04 15:00:25

> I don't think the problem is in tmpfs! 

Ok, good to know.

> That was with SuSE-2.6.8-24* Host-Kernels.

Hmm, it's not reproducable here, I can umount tmpfs after running
uml just fine.  Any other conditions are needed to trigger it?

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Michael R. <mc...@sa...> - 2005-01-04 15:06:30

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "Armin" == Armin M Warda <Arm...@gm...> writes:
    Armin> I don't think the problem is in tmpfs!

    Armin> When I saw the problem for the 1st time, of cause I
    Armin> immediately blamed tmpfs and switched to a reiserfs /tmp, but
    Armin> to my surprise the problem did not go away. Finally I even
    Armin> tested an ext3 /tmp - just to be sure - and the problem was
    Armin> reproduced, too.

  This isn't a new problem to us.
  We see it regularly. In *most* cases (but I can't say *ALL*) it is
because of swap files for UMLs that are no longer running, but for which
there are T-linux processes hanging around.
  kill -9/-CONT/etc. them and the disk space comes back.
  Most of the time.

  (this relates to the thread where we read /proc/XXX/environ, and so
couldn't recognize that the processes could be killed, and also to the
thread where i complained about having to -CONT a process before it
could be -9'ed)

- -- 
]       ON HUMILITY: to err is human. To moo, bovine.           |  firewalls  [
]   Michael Richardson,    Xelerance Corporation, Ottawa, ON    |net architect[
] mc...@xe...      http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Finger me for keys

iQCVAwUBQdqtwYqHRg3pndX9AQFqRwQAxkIGcI44qepaQfRF2F281fkJTcClojT4
AxKX+bS+u2yUwYz+xypF4ccr1NaK9Vx+Ez+LdbIqx67m5XgPa6Bi3lbXp0wPwjnN
TO00L33cnronUsETn6exoq/iNhI+GqaatfuNGkE3E9CWh3JqUoKzTk7Hs+3T9OJ9
IDIJYuoJ1Gk=
=qsvP
-----END PGP SIGNATURE-----

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Paolo G. <bla...@ya...> - 2005-01-08 12:15:46

On Monday 03 January 2005 15:31, Armin M. Warda wrote:
>   Hi Gerd,

> On Monday 03 January 2005 14:34, you wrote:
> > host or uml kernel fix?  URL?  Havn't tracked stuff at all over
> > xmas holidays ...
>
> Must be a host-Kernel fix, not a uml-kernel fix, because um-kernel
> 2.6.9-bb4 running on host-kernels
[...]
Yes, this is definitely a host bug, which was reported to the appropriate 
people and fixed...

> I scanned the changelog for 2.6.10 but until now failed to identify
> the patch:
>
>   http://www.de.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.10

I had read the Changelog from -rc3 to final (after hearing from Christopher 
Aker that in -bk it had been fixed), and I seemed had a quick shoot about 
this... it was a fix from Roland McGrath about something strange, which made 
me guess it was the fix...

Roland McGrath:

  o fix bogus ECHILD return from wait* with zombie group leader

I'm not sure this is the fix, but it is possible indeed, given that the 
problem UML triggered was, IIRC, that when it exited, there wasn't a proper 
cleanup of the status, and the process became invisible but still kept a 
reference to the file in /tmp, preventing it from being deleted...

The changelog does not mention our particular bug it but there is a good 
reason for not mentioning it, and it's related to TASK_TRACED which is *the* 
problem in 2.6.9 with UML.

I think you can search for it on linux.bkbits.net... I'll do that when I have 
time...

However, I've received (after 2.6.10) the answer from Roland McGrath (ptrace 
coder / maintainer) about this problem, and he sent two more patches for 
this, which I'll bundle in next SKAS as a temp. solution and forward to 
you... I'll check if it is a complete solution.

> Can somebody else on the list help me to identify the patch for
> this "problem with /tmp filling irreversibly and being unable to
> unmount /tmp" bugfix? Thanks!

-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-10 13:20:26

> Roland McGrath:
> 
>   o fix bogus ECHILD return from wait* with zombie group leader
> 
> I'm not sure this is the fix, but it is possible indeed, given that the 
> problem UML triggered was, IIRC, that when it exited, there wasn't a proper 
> cleanup of the status, and the process became invisible but still kept a 
> reference to the file in /tmp, preventing it from being deleted...

Yep, that makes sense.  The race thing also explains why it doesn't
allways happen.  I still can't reproduce it on my machine btw, so it's
hard for me to test whenever that really fixes it or not.

> I think you can search for it on linux.bkbits.net... I'll do that when I have 
> time...

Attached below for reference.

  Gerd

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/12/17 09:18:41-08:00 ro...@re... 
#   [PATCH] fix bogus ECHILD return from wait* with zombie group leader
#   
#   Klaus Dittrich observed this bug and posted a test case for it.
#   
#   This patch fixes both that failure mode and some others possible.  What
#   Klaus saw was a false negative (i.e.  ECHILD when there was a child)
#   when the group leader was a zombie but delayed because other children
#   live; in the test program this happens in a race between the two threads
#   dying on a signal.
#   
#   The change to the TASK_TRACED case avoids a potential false positive
#   (blocking, or WNOHANG returning 0, when there are really no children
#   left), in the race condition where my_ptrace_child returns zero.
#   
#   Signed-off-by: Roland McGrath <ro...@re...>
#   Signed-off-by: Andrew Morton <ak...@os...>
#   Signed-off-by: Linus Torvalds <tor...@os...>
# 
# kernel/exit.c
#   2004/12/17 00:09:08-08:00 ro...@re... +13 -2
#   fix bogus ECHILD return from wait* with zombie group leader
# 
diff -Nru a/kernel/exit.c b/kernel/exit.c
--- a/kernel/exit.c	2005-01-10 14:00:38 +01:00
+++ b/kernel/exit.c	2005-01-10 14:00:38 +01:00
@@ -1319,6 +1319,10 @@
 
 	add_wait_queue(&current->wait_chldexit,&wait);
 repeat:
+	/*
+	 * We will set this flag if we see any child that might later
+	 * match our criteria, even if we are not able to reap it yet.
+	 */
 	flag = 0;
 	current->state = TASK_INTERRUPTIBLE;
 	read_lock(&tasklist_lock);
@@ -1337,11 +1341,14 @@
 
 			switch (p->state) {
 			case TASK_TRACED:
-				flag = 1;
 				if (!my_ptrace_child(p))
 					continue;
 				/*FALLTHROUGH*/
 			case TASK_STOPPED:
+				/*
+				 * It's stopped now, so it might later
+				 * continue, exit, or stop again.
+				 */
 				flag = 1;
 				if (!(options & WUNTRACED) &&
 				    !my_ptrace_child(p))
@@ -1377,8 +1384,12 @@
 						goto end;
 					break;
 				}
-				flag = 1;
 check_continued:
+				/*
+				 * It's running now, so it might later
+				 * exit, stop, or stop and then continue.
+				 */
+				flag = 1;
 				if (!unlikely(options & WCONTINUED))
 					continue;
 				retval = wait_task_continued(

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Christopher S. A. <ca...@th...> - 2005-01-04 15:32:33

> Hmm, it's not reproducable here, I can umount tmpfs after running
> uml just fine.  Any other conditions are needed to trigger it?
> 
>   Gerd

Here's a post of mine which details a sure fire way to reproduce it every time

http://marc.theaimsgroup.com/?l=user-mode-linux-devel&m=110324354001225&w=2

> My method for causing the problem every time (example works under 2.6.9-ck3 and
> 2.6.10-rc2-bk7)
> 
> # mount tmpfs tmp/ -t tmpfs
> # TMPDIR=./tmp/ /kernels/non-fix-kill-patched-kernel rootfs=debian.fs
> # uml_mconsole .uml/xxx/mconsole cad
> # kill -KILL <pid1> <pid2>
> # umount tmp/
> umount: /root/tmp: device is busy
> 
> (cad is set to shutdown in the UML's inittab.  After sending CAD, it hangs, which
> requires kill)

Any old kernel that doesn't have the fix-kill patch will do, but here's one:
http://www.theshore.net/~caker/uml/kernels/2.4.26-linode31-1um

-Chris

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-05 11:40:36

> > # mount tmpfs tmp/ -t tmpfs
> > # TMPDIR=./tmp/ /kernels/non-fix-kill-patched-kernel rootfs=debian.fs
> > # uml_mconsole .uml/xxx/mconsole cad
> > # kill -KILL <pid1> <pid2>

That doesn't kill all the processes, I additionally need 'kill -CONT'
for one of them.

> > # umount tmp/
> > umount: /root/tmp: device is busy

Works for me, at least when no uml kernel process left over.  Sure there
are no uml processes hanging around any more at this point?

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-05 17:11:05

On Wednesday 05 January 2005 12:28, Gerd Knorr wrote:
> > > # mount tmpfs tmp/ -t tmpfs
> > > # TMPDIR=3D./tmp/ /kernels/non-fix-kill-patched-kernel
> > > rootfs=3Ddebian.fs # uml_mconsole .uml/xxx/mconsole cad
> > > # kill -KILL <pid1> <pid2>
>
> That doesn't kill all the processes, I additionally need 'kill
> -CONT' for one of them.

I remember that behaviour you describe from the uml-kernel that was=20
delivered on SuSE 9.2's Distribution media, um-host-kernel-2.6.8-24.*=20

But my 2.6.9-bb4 um-kernel does not show that problem, I think because=20
bb4 has that fix-kill patch inside.

> > > # umount tmp/
> > > umount: /root/tmp: device is busy
>
> Works for me, at least when no uml kernel process left over.=20
> Sure there are no uml processes hanging around any more=20
> at this point? =20

Yes, I am very sure: ps and lsof show no more uml processes, lsof=20
shows no processes using the /tmp any more, but still I cannot umount=20
it. (As I described in my original posting, see:=20
http://marc.theaimsgroup.com/?l=3Duser-mode-linux-devel&m=3D110301482900866=
&w=3D2

  Armin.

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-07 11:40:11

"Armin M. Warda" <Arm...@gm...> writes:

> On Wednesday 05 January 2005 12:28, Gerd Knorr wrote:
> > > > # mount tmpfs tmp/ -t tmpfs
> > > > # TMPDIR=./tmp/ /kernels/non-fix-kill-patched-kernel
> > > > rootfs=debian.fs # uml_mconsole .uml/xxx/mconsole cad
> > > > # kill -KILL <pid1> <pid2>
> >
> > That doesn't kill all the processes, I additionally need 'kill
> > -CONT' for one of them.
> 
> I remember that behaviour you describe from the uml-kernel that was 
> delivered on SuSE 9.2's Distribution media, um-host-kernel-2.6.8-24.* 
> 
> But my 2.6.9-bb4 um-kernel does not show that problem, I think because 
> bb4 has that fix-kill patch inside.

Yep, but without the fix-kill patch it's reporty easier to trigger.  I
still can't reproduce it on my machine though for some reason, hints
very welcome.  It happens to me only when some uml process still hangs
around, and after killing it off I can umount the filesystem just fine.

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Armin M. W. <Arm...@gm...> - 2005-01-12 15:53:59

On Tuesday 11 January 2005 17:58, Gerd Knorr wrote:
> Once it is tested and confirmed that this patch actually fixes the
> issue I can take care to put it into the suse kernel and get it
> released with the next security update kernel. =20
> [...]
> So it would be nice if one of you guys can try that=20
> [...]
> The kernel-source rpm has a fully patched suse source tree, you can
> try to apply the patch there and rebuild a kernel. =20

#  2004/12/17 09:18:41-08:00 ro...@re...=20
#  [PATCH] fix bogus ECHILD return from wait* with zombie group leader

I tried to apply this patch to the SuSE kernel-source-2.6.8-24.10, but=20
it failed. The reason is obvious: there are other patches necessary=20
before applying this one.

The kernel/exit.c of SuSE 2.6.8-24.10 corresponds to bitkeeper rev=20
1.155, and the above "bogus ECHILD return" fix is in bk rev 1.170 of=20
kernel/exit.c, thus there are at least 15 additional fixes to be=20
considered for backporting from 2.6.10 to SuSE's 2.6.8-24.10.

This is to much for me, as I am no kernel hacker. If no one picks up=20
that job, then my only hope is that SuSE will finally move its SuSE=20
9.2 forward to 2.6.10.

On Saturday 08 January 2005 13:15, Paolo Giarrusso wrote:
> The changelog does not mention our particular bug it but there is a
> good reason for not mentioning it, and it's related to TASK_TRACED
> which is *the* problem in 2.6.9 with UML.

  Armin.

=2D-=20
   --- May the Source be with you! Linux. ---
   --- secure eMail: http://www.gnupg.de/ ---

Re: [uml-devel] Re: UML fills /tmp irreversibly (2.6.9-bb4 on SuSE 9.2)

From: Gerd K. <kr...@by...> - 2005-01-17 15:30:29

> The kernel/exit.c of SuSE 2.6.8-24.10 corresponds to bitkeeper rev 
> 1.155, and the above "bogus ECHILD return" fix is in bk rev 1.170 of 
> kernel/exit.c, thus there are at least 15 additional fixes to be 
> considered for backporting from 2.6.10 to SuSE's 2.6.8-24.10.

Hmm, well, it's only 5 patches which actually touch that area, but
partly they are big and complex :-/

Sneaking in a small obviouslycorrect[tm] fix isn't a big issue, but
thats a bit too much.

> This is to much for me, as I am no kernel hacker. If no one picks up 
> that job, then my only hope is that SuSE will finally move its SuSE 
> 9.2 forward to 2.6.10.

The releases usually never ever get a new kernel version, so a 2.6.10
update for 9.2 isn't going to happen.  We are already working on the
kernel for the next release though, you can fetch experimental rpms
from http://ftp.suse.com/pub/projects/kernel/kotd/i386/HEAD/ and give
them a try.  They are 2.6.10 based.

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)