From: Jeff D. <jd...@ad...> - 2004-07-16 19:23:40
|
This long-awaited patch contains a lot of humfs/hostfs work, plus a bunch of bug fixes which hopefully return UML to the stability of 2.4.24. More specifically: Fixed some humfs bugs and bumped the version number. Modes are now included in the metadata and directories with subdirectories named 'metadata' now have someplace to put their own metadata. Fixed hostfs to work with the restructured externfs. It is now also safe against running out of file descriptors. The one problem I see now is a 'inodes in use' message when unmounting a hostfs filesystem. linux is now the default build target, so that 'make ARCH=um' now does what you'd expect. Fixed some bugs in the malloc and free wrappers. The port chan now sets SO_REUSEEADDR so ports can be immediately reused. Directory opens and pipe creations are now handled in filehandle.c to protect them against failing because with -EMFILE. The tt mode switch pipe is now filehandles to protect process creations from failing with -EMFILE. Backed out a bug fix which broke modules. Jeff |
From: Nick Craig-W. <ni...@me...> - 2004-07-30 07:48:47
|
I'm having trouble getting this to run stably. On a Fedora core 2 image it locks up with 100% CPU after about 12 hours normally. uml-patch-2.4.26-1 did the same thing (but required quite a bit of patching to even get throught the boot process!). I'm using 2.4.26 + uml-patch-2.4.26-2 plus the patches from /work/2.4 on a SKAS host running 2.4.25. Here is a gdb backtrace of the locked UML. It still responds to ping and uml_mconsole but it doesn't respond to ssh... #0 0xa01e68f5 in sigprocmask () #1 0xa00e5a10 in change_signals (type=0) at signal_user.c:69 #2 0xa00e5a48 in block_signals () at signal_user.c:75 #3 0xa00165c0 in do_softirq () at softirq.c:95 #4 0xa00e0c42 in do_IRQ (irq=0, regs=0xa0294278) at irq.c:336 #5 0xa00e710e in timer_irq (regs=0xa0294278) at time_kern.c:76 #6 0xa00e7418 in timer_handler (sig=14, regs=0xa0294278) at time_kern.c:163 #7 0xa00ed1d4 in sig_handler_common_skas (sig=14, sc_ptr=0xa0297868) at trap_user.c:35 #8 0xa00e7eed in alarm_handler (sig=14, sc= {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 2687057920, esi = 2687057920, ebp = 2687073132, esp = 2687073088, ebx = 2687073124, edx = 2687057920, ecx = 0, eax = 4294967292, trapno = 14, err = 6, eip = 2686421985, cs = 35, __csh = 0, eflags = 643, esp_at_signal = 2687073088, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0, cr2 = 2827055104}) at trap_user.c:115 #9 <signal handler called> #10 0xa01f8be1 in nanosleep () #11 0xa00e6fe9 in idle_sleep (secs=10) at time.c:132 #12 0xa00e36ea in cpu_idle () at process_kern.c:212 #13 0xa000e5b1 in rest_init () at init/main.c:346 #14 0xa000255c in start_kernel () at init/main.c:440 #15 0xa00ecb5f in start_kernel_proc (unused=0x0) at process_kern.c:156 #16 0xa00e3236 in run_kernel_thread (fn=0xa00ecb38 <start_kernel_proc>, arg=0x0, jmp_ptr=0xa0294558) at process.c:227 #17 0xa00ec92d in new_thread_handler (sig=10) at process_kern.c:70 #18 <signal handler called> #19 0xa01e6971 in kill () #20 0x0000001c in ?? () Let me know if I can produce anything more interesting from gdb - I've left the UML locked up. -- Nick Craig-Wood Tel: 0800 195 4968 Net: ni...@me... Memset Ltd Web: http://www.memset.com |
From: Nix <ni...@es...> - 2004-07-30 12:08:55
|
On Fri, 30 Jul 2004, Nick Craig-Wood said: > I'm having trouble getting this to run stably. On a Fedora core 2 > image it locks up with 100% CPU after about 12 hours > normally. I think this is caused the patch to use gettimeofday(), which breaks if time ever moves backwards. Lots of other things break if time moves backwards, too, but most don't care if it moves backwards by only a second or so. UML does care. To work around it, avoid moving time backwards (ntpd and ntpdate both have switches to avoid this and always use adjtimex() instead.) -- `The copyright file is for everyone. That we make it available in plain-text, uncompressed form rather than in spinning, throbbing OpenGL-rendered 3D text over a thumping dance music soundtrack is a feature, not a bug.' --- Branden Robinson |
From: Nick Craig-W. <ni...@me...> - 2004-07-30 14:46:41
|
On Fri, Jul 30, 2004 at 01:08:15PM +0100, Nix wrote: > On Fri, 30 Jul 2004, Nick Craig-Wood said: > > I'm having trouble getting this to run stably. On a Fedora core 2 > > image it locks up with 100% CPU after about 12 hours > > normally. > > I think this is caused the patch to use gettimeofday(), which breaks if > time ever moves backwards. Lots of other things break if time moves > backwards, too, but most don't care if it moves backwards by only a > second or so. UML does care. > > To work around it, avoid moving time backwards (ntpd and ntpdate both > have switches to avoid this and always use adjtimex() instead.) Yes you are right - Thanks! The time of the crash lined up exactly with the time being reset on the host. I've changed it to use adjtimex which should work around the problem. However I view the above to be a bug in UML - it shouldn't lock up. Its entitled to miss events etc though. gettimeofday(2) isn't guaranteed to be monotonic, but unfortunately there isn't a monotonic timer (like jiffies) that escapes to user space AFAIK. ( We've got a laptop in the office who's time jumps backwards (just by a few ms) quite regularly due to buggy BIOS and speedstep. This causes a few problems (notably with sawfish)... I even sent a patch to fix this to LKML but it was deemed that the BIOS should be fixed instead of the kernel ) -- Nick Craig-Wood Tel: 0800 195 4968 Net: ni...@me... Memset Ltd Web: http://www.memset.com |
From: Jeff D. <jd...@ad...> - 2004-07-30 16:38:25
|
ni...@me... said: > However I view the above to be a bug in UML - it shouldn't lock up. > Its entitled to miss events etc though. gettimeofday(2) isn't > guaranteed to be monotonic, but unfortunately there isn't a monotonic > timer (like jiffies) that escapes to user space AFAIK. Agreeed. Try the time-warp patch at http://user-mode-linux.sourceforge.net/patches.html It WorksForMe (tm). Jeff |
From: Adam H. <do...@br...> - 2004-07-30 17:02:53
|
On Fri, 30 Jul 2004, Nick Craig-Wood wrote: > However I view the above to be a bug in UML - it shouldn't lock up. > Its entitled to miss events etc though. gettimeofday(2) isn't > guaranteed to be monotonic, but unfortunately there isn't a monotonic > timer (like jiffies) that escapes to user space AFAIK. rdtsc, but only on newer cpus. |
From: Jeff D. <jd...@ad...> - 2004-07-30 17:50:41
|
do...@br... said: > rdtsc, but only on newer cpus. Is it monotonic on SMP boxes when you get moved from one CPU to another? This question, and some others, are what convinced me to use gettimeofday instead. Jeff |
From: Adam H. <do...@br...> - 2004-07-30 19:14:15
|
On Fri, 30 Jul 2004, Jeff Dike wrote: > do...@br... said: > > rdtsc, but only on newer cpus. > > Is it monotonic on SMP boxes when you get moved from one CPU to another? > > This question, and some others, are what convinced me to use gettimeofday > instead. Actually, that's a good point. rdtsc is per-cpu, I believe. Which means I have a bug in my java profiler code. :| |