From: Jan P. <Jan...@of...> - 2007-05-08 17:39:43
|
Jeff Dike <jd...@ad...> schrieb am 05/07/2007 11:07:27 PM: > On Sun, May 06, 2007 at 03:35:49PM +0200, Jan Ploski wrote: > > This is on x86_64 architecture. > > > 2.6.21-mm1. The file system is the Debian root_fs downloaded from the > > web site. I did an apt-get upgrade and installed a few packages. I'm > > sending you a download link for the whole "experiment" with separate email. > > OK, since you are on x86_64, this is a host bug - you need to apply > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git; > a=commitdiff_plain;h=d18951834216eae82e2f9112416111b4f55f1849 > > to the host. Thanks, I will try this out. The bad thing is that if it's a host bug on x86_64 then it will likely affect other clusters (on the German Grid) as well. The major advantage I saw/see in wrapping jobs into a UML instance was/is that the remote admins do NOT need to be bothered about tweaking their configuration to make foreign applications execute in their environment... > > > If that happens again, can you attach gdb to it and see where it is? > > > > Ok. From my perspective this type of hang is not a big deal, as I can > > have my wrapper script watch the log file and kill off an instance which > > gets stuck at the end. > > From my perspective, it's a bug, and it may be significantly more > trouble for the next person to hit it. I will keep an eye on it. > > I can reproduce it any time in my current setup. ps auxwww|grep linux > > shows no UML processes hanging around. > > The offender may not be called linux. You may have processes with > garbage names still hanging around. How can I find out the offender? lsof does not show any processes accessing root_fs. Best regards - Jan Ploski -- Dipl.-Inform. (FH) Jan Ploski OFFIS Betriebliches Informationsmanagement Escherweg 2 - 26121 Oldenburg - Germany Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202 E-Mail: Jan...@of... - URL: http://www.offis.de |