Jeff Dike wrote:
> On Sat, May 05, 2007 at 10:33:22PM +0200, Jan Ploski wrote:
>>I repeated the tests with linux-2.6.21-rc7-mm2 (didn't compile right
>>away, required a one-liner fix)
> What fix?
kernel/tlb.c:255 contains the statement:
log_info("total flush time - %Ld nsecs\n", end_time - start_time);
"end_time" (or maybe "start_time"?) was undefined, I just commented out
the line to get it to compile.
>>1) [Note: this has nothing to do with the previously reported problem,
>>but might be interesting anyway:] Compiling the UML kernel on SLES10
>>with its shipped gcc-4.1.0 (ignoring warnings during the compilation) is
>>a bad idea. With such a miscompiled kernel, 80% of my test cases hang at
> Hmmm, I've got 4.1.1 here with no problems and I haven't noticed
> problems with gcc recently, although I haven't kept track of what I
The version I have is gcc-4.1.0-28.4, it reports as
gcc (GCC) 4.1.0 (SUSE Linux). This is on x86_64 architecture.
>>3) I noticed that the UML console output, which I redirected to a file
>>with > in my wrapper shell script, was being randomly truncated. As a
>>remedy, I changed "con0=fd:0,fd:1" to "con0=pts,fd:1" on the command
>>line. Now I'm getting the complete console output from each run
>>collected, which is good.
> One UML per log file, so they're not stepping on each other?
Yes. Each UML has its own working directory.
>>4) I am now experiencing random segmentation faults - for example, in 18
>>out of 842 UML instances in today's test. The root_fs is Debian stable,
>>so I wouldn't blame it. It also does not seem to be flaky hardware, as
>>the instances crash on different hardware nodes. In over half of the
>>faulty cases, fsck on boot will crash:
> This one I need to fix. This is with rc7-mm2 or 2.6.21-mm1? Can you
> point me to the filesystem you're booting?
2.6.21-mm1. The file system is the Debian root_fs downloaded from the
web site. I did an apt-get upgrade and installed a few packages. I'm
sending you a download link for the whole "experiment" with separate email.
>>5) Something which I observed only once so far: the UML process does not
>>terminate, but instead starts consuming 100% CPU time. The captured
>>console output ends with "System halted." and does not differ from a
> If that happens again, can you attach gdb to it and see where it is?
Ok. From my perspective this type of hang is not a big deal, as I can
have my wrapper script watch the log file and kill off an instance which
gets stuck at the end.
>>6) When running a UML instance to edit my root_fs (with all other
>>instances killed, of course) I get:
>>F_SETLK failed, file already locked by pid 934
>>Failed to lock 'root_fs', err = 11
>>Failed to open 'root_fs', errno = 11
>>ubda: Can't open "root_fs": errno = 11
>>with a subsequent Kernel panic. There is never any process with pid 934,
>>nor any other UML instance which could be the culprit.
> There is some UML process still hanging around - it may not be
> obviously UML, but it should be there. If not, then this is a host
> kernel bug, but I would put good money on there being a UML process
> that you're not noticing.
I can reproduce it any time in my current setup. ps auxwww|grep linux
shows no UML processes hanging around. I have also written a small test
program which just attempts to lock the file. In this program, fcntl(fd,
F_SETLK, &lock) fails with errno = Bad file descriptor.
Best regards -