From: Blaisorblade <bla...@ya...> - 2007-05-05 00:53:04
|
On sabato 5 maggio 2007, Jan Ploski wrote: > Jeff Dike wrote: > > On Fri, May 04, 2007 at 07:30:36PM +0200, Jan Ploski wrote: > >>I am experimenting with UML in a HPC cluster. What I do is basically > >> start up 60 instances all at once, a bunch of instances on each hardware > >> node, using the resource manager TORQUE. Each instance gets a different > >> umid. The instances are configured to boot up, execute a job and halt > >> after that. Most of the times it works very well. However, every now and > >> then some instance of the 60 will get stuck with the infamous "INIT: Id > >> 0 respawning too fast" message at boot and consequently neither run the > >> job nor terminate. > >> > >>So far I have found mentions of two possible causes for this problem: 1) > >>wrong name of the tty device in inittab 2) /lib/tls problem. Neither > >>applies in my case (/dev/tty0 is correct, and I have already renamed > >>/lib/tls, just in case). > > > > These would cause problems all the time, not sporadically as you're > > seeing. > > > >>As I can reproduce the problem "statistically" (quite reliably in the > >>cluster context) but not at will when running a single instance from the > >>command line, my question is: how should I proceed about troubleshooting > >>it? Are there any locations in the UML kernel code where I could insert > >>some debug statements (or maybe delays? maybe the problem is > >>timing-related somehow?) to gather useful diagnostic information? > > > > Is it possible that it is caused by confusion about how quickly real > > time is progressing compared to how much computation is happening in > > that time? By default, UML will match its time to the host, with the > > effect that, on a busy system, it will see time progressing quickly > > compared to the work it's doing. > > > > If so, then disable CONFIG_UML_REAL_TIME_CLOCK, and use > > 2.6.21-rc7-mm2, which has a fix in this area, and see if that makes > > any difference. > > Jeff, > > I'm having trouble applying the 2.6.21-rc7-mm2 patch against 2.6.21 > sources - lots of rejected hunks (but not all) when I run patch -p1 < > 2.6.21-rc7-mm2, and the kernel does not compile after that. I have never > used mm kernels before and Google did not help identify my mistake. Can > you give me a hint about how/against which target to apply this patch? It will apply perfectly on top of 2.6.21-rc7, which you can find here: http://www.kernel.org/pub/linux/kernel/v2.6/testing/ Patch (on top of 2.6.20): http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.21-rc7.bz2 or full (40M) tarball: http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.21-rc7.tar.bz2 -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade |