From: Jan R. <ja...@ry...> - 2006-07-03 12:21:12
|
I've just tried the linux-2.6.16 binary UML kernel from the new web site, together with a guest ubuntu installation. Looks good at a first glance, but I've hit several problems: 1) I mount /home with hostfs using the following line in fstab: none /home hostfs /home,rw 0 0 df shows: none 10413255443693385415481253412470784 10413248933519167673017998136836096 47190722440083771496586606219886592 19% /home 2) the guest OS sees stale file content on hostfs mounts. E.g. I open a file in the UML machine, look at it, close it, edit on the host, open it again on the UML -- and I don't get the changes saved on the host. This is a major problem. 3) what do these messages mean: setitimer: mpr (pid = 1646) provided invalid timeval it_value: tv_sec = 0 tv_usec = 1790000 setitimer: mpr (pid = 1646) provided invalid timeval it_value: tv_sec = 0 tv_usec = 1943000 ... and do they have anything to do with: 4) my application sometimes hangs completely, eating CPU, until I type something in another console (!!!). This is using the web-site-supplied 2.6.16 binary kernel, the host machine is a 2.6.15.6 with skas3 applied, running Gentoo. --J. |
From: Jim C. <ji...@ma...> - 2006-07-03 17:28:40
|
Please forgive me for interpreting without suggesting how to fix, but the interpretation might be useful. Is there any chance of trying out the 2.6.16 UML kernel on a 2.6.16 host? Version skew, particularly backwards, is very plausible as an explanation for all of these. Also be sure that the guest and host are similar in wordsize, i.e. 32bit vs 64bit. On Mon, 3 Jul 2006, Jan Rychter wrote: > 1) I mount /home with hostfs using the following line in fstab: > none /home hostfs /home,rw 0 0 > df shows: > none 10413255443693385415481253412470784 10413248933519167673017998136836096 47190722440083771496586606219886592 19% /home This says to me that the guest is interpreting a size field as 64 bits while the host is providing 32 bits plus irrelevant high order neighboring fields. > 2) the guest OS sees stale file content on hostfs mounts. E.g. I open a > file in the UML machine, look at it, close it, edit on the host, open > it again on the UML -- and I don't get the changes saved on the > host. This is a major problem. If /dev/ubda were involved, the guest kernel would "know" that only it could change the data, so file data in kernel memory buffers stays valid indefinitely. But that isn't true for a remote filesystem, and I have no clue why the guest's "hardware" interface didn't stat the host's inode, realize that the file had been modified, and invalidate the cached blocks. > 3) what do these messages mean: > > setitimer: mpr (pid = 1646) provided invalid timeval it_value: tv_sec = 0 tv_usec = 1790000 > setitimer: mpr (pid = 1646) provided invalid timeval it_value: tv_sec = 0 tv_usec = 1943000 The microsecond field shows 1 sec + 7.90e5 usec or 1 sec + 9.43e5 usec, which could very well give your app indigestion. But again I have no clue why it happened. Blame it on the 64bit issue? I hope some of this helps! James F. Carter Voice 310 825 2897 FAX 310 206 6673 UCLA-Mathnet; 6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555 Email: ji...@ma... http://www.math.ucla.edu/~jimc (q.v. for PGP key) |
From: Jeff D. <jd...@ad...> - 2006-07-07 01:26:32
|
On Mon, Jul 03, 2006 at 10:28:22AM -0700, Jim Carter wrote: > Please forgive me for interpreting without suggesting how to fix, but the > interpretation might be useful. Is there any chance of trying out the > 2.6.16 UML kernel on a 2.6.16 host? Version skew, particularly backwards, > is very plausible as an explanation for all of these. Also be sure that > the guest and host are similar in wordsize, i.e. 32bit vs 64bit. Neither version skew or word size mismatch should cause any problem. Any version of UML should basically run on any host version. The only exception I can think of would be a 2.6 UML on a 2.4 host - NPTL support inside UML depends on host NPTL support. Similarly, AFAIK, 32-bit UMLs run fine on x86_64. Heff |
From: Jan R. <ja...@ry...> - 2006-07-04 08:51:30
|
>>>>> "Jim" == Jim Carter <ji...@ma...> writes: Jim> Please forgive me for interpreting without suggesting how to fix, Jim> but the interpretation might be useful. Is there any chance of Jim> trying out the 2.6.16 UML kernel on a 2.6.16 host? Version skew, Jim> particularly backwards, is very plausible as an explanation for Jim> all of these. Is that really the case? I was under the impression that the whole point of UML was to make these things irrelevant and that any recent kernel would do as a host. No, there is no easy way to upgrade the host. I use software suspend and a number of other things and I'm not keen on changing the status quo, as the current setup kind-of-works after years of upgrading and tinkering. Jim> Also be sure that the guest and host are similar in wordsize, Jim> i.e. 32bit vs 64bit. They are. I'm most puzzled by the fact that the application hangs until I start typing in another console. If there is nothing going on in the other consoles, UML will eat CPU while doing nothing. --J. |
From: Blaisorblade <bla...@ya...> - 2006-07-12 19:06:52
|
On Tuesday 04 July 2006 10:50, Jan Rychter wrote: > >>>>> "Jim" == Jim Carter <ji...@ma...> writes: > > Jim> Please forgive me for interpreting without suggesting how to fix, > Jim> but the interpretation might be useful. Is there any chance of > Jim> trying out the 2.6.16 UML kernel on a 2.6.16 host? Version skew, > Jim> particularly backwards, is very plausible as an explanation for > Jim> all of these. > > Is that really the case? Normally not. A recent UML should work even on a 2.4 kernel. That suggestion (if it has ever been valid, and I doubt this) is only useful to start debugging bugs. For instance, the host may change something, UML then doesn't work (because a bug then reveals itself) and afterwards UML is fixed. At least until the host kernel doesn't change some APIs, supposed stable, that UML uses (and it has happened in 2.6.9 and 2.6.10 because "it wasn't possible to do otherwise" - meanwhile some big bugs where introduced; in 2.6.11 both things were fixed and they discovered it was possible and easy to avoid the breakage). > I was under the impression that the whole point > of UML was to make these things irrelevant and that any recent kernel > would do as a host. > No, there is no easy way to upgrade the host. I use software suspend and > a number of other things and I'm not keen on changing the status quo, as > the current setup kind-of-works after years of upgrading and tinkering. > Jim> Also be sure that the guest and host are similar in wordsize, > Jim> i.e. 32bit vs 64bit. > They are. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Jeff D. <jd...@ad...> - 2006-07-07 01:33:22
|
On Mon, Jul 03, 2006 at 02:20:16PM +0200, Jan Rychter wrote: > 1) I mount /home with hostfs using the following line in fstab: > none /home hostfs /home,rw 0 0 > > df shows: > > none 10413255443693385415481253412470784 10413248933519167673017998136836096 47190722440083771496586606219886592 19% /home > 2) the guest OS sees stale file content on hostfs mounts. E.g. I open a > file in the UML machine, look at it, close it, edit on the host, open > it again on the UML -- and I don't get the changes saved on the > host. This is a major problem. This is due to hostfs using the page cache. mount -o sync should help the other direction - modify it inside UML and the changes should be visible on the host. The direction you want is a bit more complicated given that hostfs uses the page cache. The last time I thought about this, I was thinking about using inotify to know when to invalidate files from the UML page cache when they are modified on the host. > 3) what do these messages mean: > > setitimer: mpr (pid = 1646) provided invalid timeval it_value: tv_sec = 0 tv_usec = 1790000 > setitimer: mpr (pid = 1646) provided invalid timeval it_value: > tv_sec = 0 tv_usec = 1943000 Where do they come from? They're not kernel messages. > 4) my application sometimes hangs completely, eating CPU, until I type > something in another console (!!!). Can you strace it during this period and see what it's doing? This looks like an interesting bug. Jeff |
From: Jeff D. <jd...@ad...> - 2006-07-07 01:34:29
|
On Mon, Jul 03, 2006 at 02:20:16PM +0200, Jan Rychter wrote: > 1) I mount /home with hostfs using the following line in fstab: > none /home hostfs /home,rw 0 0 > > df shows: > > none 10413255443693385415481253412470784 10413248933519167673017998136836096 47190722440083771496586606219886592 19% /home As for this problem, I get reasonable numbers here: none 10283212 9183164 577676 95% /mnt Is there anything about the host /home that's unusual that might fake UML into producing insanely large numbers? Jeff |
From: Blaisorblade <bla...@ya...> - 2006-07-12 19:13:25
|
On Friday 07 July 2006 03:34, Jeff Dike wrote: > On Mon, Jul 03, 2006 at 02:20:16PM +0200, Jan Rychter wrote: > > 1) I mount /home with hostfs using the following line in fstab: > > none /home hostfs /home,rw 0 0 > > > > df shows: > > > > none 10413255443693385415481253412470784 > > 10413248933519167673017998136836096 47190722440083771496586606219886592 > > 19% /home > > As for this problem, I get reasonable numbers here: > > none 10283212 9183164 577676 95% /mnt > > Is there anything about the host /home that's unusual that might fake > UML into producing insanely large numbers? I vaguely remember some low-hanging fruit in hostfs's statfs about 32 vs 64 bit, but I can't find it at a first look. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Jan R. <ja...@ry...> - 2006-07-07 07:35:57
|
>>>>> "Jeff" == Jeff Dike <jd...@ad...> writes: Jeff> On Mon, Jul 03, 2006 at 02:20:16PM +0200, Jan Rychter wrote: >> 1) I mount /home with hostfs using the following line in fstab: >> none /home hostfs /home,rw 0 0 >> >> df shows: >> >> none 10413255443693385415481253412470784 >> 10413248933519167673017998136836096 >> 47190722440083771496586606219886592 19% /home Jeff> As for this problem, I get reasonable numbers here: Jeff> none 10283212 9183164 577676 95% /mnt Jeff> Is there anything about the host /home that's unusual that might Jeff> fake UML into producing insanely large numbers? No, not that I know of. It's an ext3 filesystem. /dev/hda4 69924764 59317952 7054844 90% / --J. |
From: Jan R. <ja...@ry...> - 2006-07-07 07:40:17
|
>>>>> "Jeff" == Jeff Dike <jd...@ad...> writes: Jeff> On Mon, Jul 03, 2006 at 02:20:16PM +0200, Jan Rychter wrote: >> 1) I mount /home with hostfs using the following line in fstab: >> none /home hostfs /home,rw 0 0 >> >> df shows: >> >> none 10413255443693385415481253412470784 >> 10413248933519167673017998136836096 >> 47190722440083771496586606219886592 19% /home >> 2) the guest OS sees stale file content on hostfs mounts. E.g. I >> open a >> file in the UML machine, look at it, close it, edit on the host, >> open it again on the UML -- and I don't get the changes saved on the >> host. This is a major problem. Jeff> This is due to hostfs using the page cache. mount -o sync should Jeff> help the other direction - modify it inside UML and the changes Jeff> should be visible on the host. Jeff> The direction you want is a bit more complicated given that Jeff> hostfs uses the page cache. The last time I thought about this, Jeff> I was thinking about using inotify to know when to invalidate Jeff> files from the UML page cache when they are modified on the host. Since it seems to be a known issue, perhaps it's worth documenting it on the web site -- I wouldn't have used hostfs if I knew it could mean stale data for UML (the information is probably in there somewhere, but I missed it). >> 3) what do these messages mean: >> >> setitimer: mpr (pid = 1646) provided invalid timeval it_value: >> tv_sec = 0 tv_usec = 1790000 setitimer: mpr (pid = 1646) provided >> invalid timeval it_value: tv_sec = 0 tv_usec = 1943000 Jeff> Where do they come from? They're not kernel messages. No idea. Probably glibc. >> 4) my application sometimes hangs completely, eating CPU, until I >> type >> something in another console (!!!). Jeff> Can you strace it during this period and see what it's doing? Jeff> This looks like an interesting bug. I think this is the relevant snippet: [...] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 times({tms_utime=260, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21373 getrusage(RUSAGE_SELF, {ru_utime={2, 600000}, ru_stime={1, 750000}, ...}) = 0 gettimeofday({1152257489, 617919}, NULL) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 times({tms_utime=263, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21369 times({tms_utime=263, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21369 getrusage(RUSAGE_SELF, {ru_utime={2, 630000}, ru_stime={1, 750000}, ...}) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 times({tms_utime=271, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21359 getrusage(RUSAGE_SELF, {ru_utime={2, 710000}, ru_stime={1, 750000}, ...}) = 0 gettimeofday({1152257489, 746673}, NULL) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 times({tms_utime=275, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21354 times({tms_utime=275, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21354 getrusage(RUSAGE_SELF, {ru_utime={2, 750000}, ru_stime={1, 750000}, ...}) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 times({tms_utime=282, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21331 getrusage(RUSAGE_SELF, {ru_utime={2, 830000}, ru_stime={1, 750000}, ...}) = 0 gettimeofday({1152257490, 29341}, NULL) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 times({tms_utime=288, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21324 times({tms_utime=288, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21324 getrusage(RUSAGE_SELF, {ru_utime={2, 880000}, ru_stime={1, 750000}, ...}) = 0 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 times({tms_utime=294, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21317 getrusage(RUSAGE_SELF, {ru_utime={2, 940000}, ru_stime={1, 750000}, ...}) = 0 gettimeofday({1152257490, 174689}, NULL) = 0 mmap2(0x720e8000, 2097152, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x720e80 00 gettimeofday({1152257490, 248411}, NULL) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 times({tms_utime=361, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21219 times({tms_utime=361, tms_stime=175, tms_cutime=0, tms_cstime=0}) = -21219 getrusage(RUSAGE_SELF, {ru_utime={3, 610000}, ru_stime={1, 750000}, ...}) = 0 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476505 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0x722bda19) = 1915476513 [... this goes on until there is enough "movement" in another TTY ...] --J. |
From: Jeff D. <jd...@ad...> - 2006-07-07 13:36:08
|
On Fri, Jul 07, 2006 at 09:38:05AM +0200, Jan Rychter wrote: > Since it seems to be a known issue, perhaps it's worth documenting it on > the web site -- I wouldn't have used hostfs if I knew it could mean > stale data for UML (the information is probably in there somewhere, but > I missed it). OK. > rt_sigreturn(0x722bda19) = 1915476513 > --- SIGALRM (Alarm clock) @ 0 (0) --- > rt_sigreturn(0x722bda19) = 1915476513 > [... this goes on until there is enough "movement" in another TTY ...] But what happens here when there is movement elsewhere? Jeff |
From: Jan R. <ja...@ry...> - 2006-07-07 16:37:20
|
> > rt_sigreturn(0x722bda19) = 1915476513 > > --- SIGALRM (Alarm clock) @ 0 (0) --- > > rt_sigreturn(0x722bda19) = 1915476513 > > [... this goes on until there is enough "movement" in another TTY ...] > > But what happens here when there is movement elsewhere? I didn't capture that... I've been trying to reproduce the problem, but for some reason I can't, even though it happened quite often before. In the meantime, I managed to hang the entire UML. I got the usual message: setitimer: mpr (pid = 1705) provided invalid timeval it_value: tv_sec = 0 tv_usec = 1870000 at this point UML became totally unresponsive: none of the TTYs would work, no networking, etc -- a total hang. Afterwards when killing uml I got this: > remove_umid_dir - actually_do_remove failed with err = -2 > remove_umid_dir - actually_do_remove failed with err = -2 > BUG: warning at kernel/irq/manage.c:276/free_irq() > 081d7420: [<08058573>] dump_stack+0x1b/0x1d > 081d7438: [<0808a58b>] free_irq+0x3d/0xdf > 081d7464: [<0805efb9>] close_devices+0x20/0x69 > 081d7478: [<08057584>] do_uml_exitcalls+0x13/0x1b > 081d7484: [<08057b98>] uml_cleanup+0x12/0x19 > 081d748c: [<080650f3>] last_ditch_exit+0x23/0x2a > 081d74ac: [<ffffe420>] _etext+0xf7e66406/0x0 > 081d77d0: [<080645a8>] os_read_file+0x1b/0x1d > 081d77e4: [<0805b0ec>] generic_read+0x10/0x28 > 081d77f8: [<0805bb43>] chan_interrupt+0x65/0xd2 > 081d7820: [<0805bf1b>] line_interrupt+0x1b/0x25 > 081d7838: [<0808a249>] handle_IRQ_event+0x2a/0x5e > 081d7860: [<0808a2d5>] __do_IRQ+0x58/0x9b > 081d7878: [<080562af>] do_IRQ+0x2f/0x3b > 081d7888: [<08055e9c>] sigio_handler+0x48/0x5a > 081d78a0: [<08069877>] sig_handler_common_skas+0xbf/0xda > 081d78c4: [<080666d9>] sig_handler+0x2f/0x3c > 081d78dc: [<ffffe420>] _etext+0xf7e66406/0x0 > 081d7be0: [<08057356>] default_idle+0x22/0x25 > 081d7bf0: [<0805a413>] init_idle_skas+0x24/0x28 > 081d7c00: [<08057361>] cpu_idle+0x8/0xa > 081d7c08: [<080553f5>] rest_init+0x21/0x23 > 081d7c10: [<0804951b>] start_kernel+0x13e/0x140 > 081d7c1c: [<0805a441>] start_kernel_proc+0x2a/0x2e > 081d7c28: [<08065de4>] run_kernel_thread+0x43/0x4b > 081d7cdc: [<0805a1c0>] new_thread_handler+0xc3/0xf5 > 081d7d1c: [<ffffe420>] _etext+0xf7e66406/0x0 > > Trying to free free IRQ5 What could be unusual about my configuration? cpufreq changing host clock frequency? 250HZ on the host? --J. |
From: Jeff D. <jd...@ad...> - 2006-07-07 17:21:54
|
On Fri, Jul 07, 2006 at 06:11:00PM +0200, Jan Rychter wrote: > I didn't capture that... I've been trying to reproduce the problem, but > for some reason I can't, even though it happened quite often before. Actually, I just found and fixed a bug which could cause this. Unfortunately, the patch is backed up behind some others which are on their way to mainline. So, let me send it in, and you can see whether it helps. Jeff |
From: Jan R. <ja...@ry...> - 2006-07-07 20:14:03
|
>>>>> "Jeff" == Jeff Dike <jd...@ad...> writes: Jeff> On Fri, Jul 07, 2006 at 06:11:00PM +0200, Jan Rychter wrote: >> I didn't capture that... I've been trying to reproduce the problem, >> but for some reason I can't, even though it happened quite often >> before. Jeff> Actually, I just found and fixed a bug which could cause this. Jeff> Unfortunately, the patch is backed up behind some others which Jeff> are on their way to mainline. So, let me send it in, and you can Jeff> see whether it helps. Would it also fix the hangs of the entire UML? I've just had two more of those. Let me know if you want me to test something. Otherwise, I'll wait until something shows up in the mainline (which might take a while I guess?). --J. |
From: Jeff D. <jd...@ad...> - 2006-07-07 21:01:34
|
On Fri, Jul 07, 2006 at 10:13:33PM +0200, Jan Rychter wrote: > Would it also fix the hangs of the entire UML? I've just had two more of > those. Maybe, that is the symptom. > Let me know if you want me to test something. Otherwise, I'll wait until > something shows up in the mainline (which might take a while I guess?). It's headed for -mm first, and it will probably show up reasonably quickly. Jeff |