From: Nikola C. <nik...@li...> - 2006-09-30 16:31:05
|
Hello everybody, I've been trying to get x86_64 guest UMLs working on SMP x86_64 host machine for quite long time, during today and yesterday I've also checked 2.6.18 (+ both skas & skas-fremap v9pre9), but it's still not working as good as 32bit/32bit was working for me for last few years ;). Soo, do You somebody know what could I be doing wrong? Running full SKAS kernel segfaults with following error: Kernel panic - not syncing: map_stub_pages : /proc/mm map for code failed, err = 22 (full trace listing below) I've noticed on some other discussions that SKAS mode is still not fully working on x86_64, so I'd like to ask, what are remaining issues, and maybe could I somehow try helping in fixing them? I wonder why /proc/mm is used, what's /proc/mm64? shouldn't it be used instead? Running noprocmm or skas0 mode is a bit better, guest system seems to be working quite OK (at least with 2.6.18, older kernels had some networking issues), but when I try to shutdown guest, it hangs at last stage, remaining uml process cycling: --- SIGVTALRM (Virtual timer expired) @ 0 (0) --- rt_sigprocmask(SIG_UNBLOCK, [USR1], [USR1 ALRM VTALRM WINCH IO], 8) = 0 gettimeofday({1159632583, 625562}, NULL) = 0 gettimeofday({1159632583, 625582}, NULL) = 0 gettimeofday({1159632583, 625603}, NULL) = 0 rt_sigreturn(0x1) = 2266636588 forever, consuming all CPU time and waiting to be shot by 9mm signal.. Any ideas? thanks in advance NiK. and promised SKAS mode segfault trace: [42949373.810000] Kernel panic - not syncing: map_stub_pages : /proc/mm map for code failed, err = 22 [42949373.810000] [42949373.810000] [42949373.810000] Modules linked in: [42949373.810000] Pid: 1, comm: swapper Not tainted 2.6.18lb2.00_01_PRE01 [42949373.810000] RIP: 0033:[<00000000601ceb99>] [42949373.810000] RSP: 00002b07c6a7afd0 EFLAGS: 00000246 [42949373.810000] RAX: 0000000000000000 RBX: 0000000000002969 RCX: ffffffffffffffff [42949373.810000] RDX: 0000000000000000 RSI: 0000000000000013 RDI: 0000000000002969 [42949373.810000] RBP: 0000000000002966 R08: 0000000000000001 R09: 0000000000000000 [42949373.810000] R10: 0000000000000000 R11: 0000000000000246 R12: 00002b07c6a7a000 [42949373.810000] R13: 00007fffe402d328 R14: 00007fffe402d7b8 R15: 00007fffe402d7e8 [42949373.810000] Call Trace: [42949373.810000] 60ecb968: [<60018d0a>] panic_exit+0x2a/0x50 [42949373.810000] 60ecb978: [<6005dba0>] notifier_call_chain+0x20/0x40 [42949373.810000] 60ecb998: [<60049bdf>] panic+0xcf/0x170 [42949373.810000] 60ecb9e0: [<60014f40>] copy_to_user_proc+0x0/0x30 [42949373.810000] 60ecb9f8: [<600838e6>] prep_new_page+0x126/0x200 [42949373.810000] 60ecba18: [<601fa139>] __syscall_error_1+0x6/0x1d [42949373.810000] 60ecba28: [<60038ba5>] file_io+0x35/0x110 [42949373.810000] 60ecba38: [<60014f40>] copy_to_user_proc+0x0/0x30 [42949373.810000] 60ecba78: [<6003f83a>] map_stub_pages+0x13a/0x280 [42949373.810000] 60ecbac8: [<60084079>] __alloc_pages+0x59/0x2e0 [42949373.810000] 60ecbb78: [<6001c5d1>] new_mm+0x61/0x80 [42949373.810000] 60ecbb98: [<6001bfd7>] init_new_context_skas+0x117/0x1e0 [42949373.810000] 60ecbba8: [<60013b67>] pgd_alloc+0x67/0x70 [42949373.810000] 60ecbbe8: [<600ad808>] do_execve+0x278/0x290 [42949373.810000] 60ecbc10: [<600121b0>] init+0x0/0x170 [42949373.810000] 60ecbc38: [<60012a74>] execve1+0x24/0x80 [42949373.810000] 60ecbc68: [<60012ada>] um_execve+0xa/0x50 [42949373.810000] 60ecbc78: [<600b53f9>] dupfd+0x79/0xb0 [42949373.810000] 60ecbc88: [<60012179>] run_init_process+0x49/0x80 [42949373.810000] 60ecbc98: [<600121b0>] init+0x0/0x170 [42949373.810000] 60ecbcb8: [<6001227f>] init+0xcf/0x170 [42949373.810000] 60ecbcd8: [<6003aeee>] run_kernel_thread+0x4e/0x60 [42949373.810000] 60ecbcf8: [<600121b0>] init+0x0/0x170 [42949373.810000] 60ecbd08: [<600121b0>] init+0x0/0x170 [42949373.810000] 60ecbd38: [<6003aecb>] run_kernel_thread+0x2b/0x60 [42949373.810000] 60ecbd88: [<60044e52>] schedule_tail+0x32/0x1b0 [42949373.810000] 60ecbdd8: [<6001c2ed>] new_thread_handler+0xdd/0x140 [42949373.810000] 60ecbe38: [<601ce6b0>] __restore_rt+0x0/0x10 [42949373.810000] 60ecbee8: [<601ceb99>] __kill+0x9/0x20 |
From: Jeff D. <jd...@ad...> - 2006-09-30 21:50:20
|
On Sat, Sep 30, 2006 at 07:30:35PM +0300, Nikola Ciprich wrote: > I've been trying to get x86_64 guest UMLs working on SMP x86_64 host > machine for quite long time, during today and yesterday I've also > checked 2.6.18 (+ both skas & skas-fremap v9pre9), but it's still not > working as good as 32bit/32bit was working for me for last few years ;). skas0 works fine (as well as on i386) afaik. Putting a stock kernel on the host or "noprocmm" on the UML command line will put you in skas0 mode. > I've noticed on some other discussions that SKAS mode is still not fully > working on x86_64, so I'd like to ask, what are remaining issues, and > maybe could I somehow try helping in fixing them? Well, the -EINVAL return from writing /proc/mm would be a good place to start looking to fix things :-) Jeff |
From: Nikola C. <nik...@li...> - 2006-09-30 22:11:49
|
Hi Jeff, thanks for reply! yup, skas0 seems to be working quite OK, but do You have an idea where could be problem with shutting down? (this Virtual timer expired cycling?) and regarding /proc/mm, shouldn't this mm64 file be used? or what's it used for? nik Jeff Dike wrote: > On Sat, Sep 30, 2006 at 07:30:35PM +0300, Nikola Ciprich wrote: > >> I've been trying to get x86_64 guest UMLs working on SMP x86_64 host >> machine for quite long time, during today and yesterday I've also >> checked 2.6.18 (+ both skas & skas-fremap v9pre9), but it's still not >> working as good as 32bit/32bit was working for me for last few years ;). >> > > skas0 works fine (as well as on i386) afaik. > > Putting a stock kernel on the host or "noprocmm" on the UML command line > will put you in skas0 mode. > > >> I've noticed on some other discussions that SKAS mode is still not fully >> working on x86_64, so I'd like to ask, what are remaining issues, and >> maybe could I somehow try helping in fixing them? >> > > Well, the -EINVAL return from writing /proc/mm would be a good place > to start looking to fix things :-) > > Jeff > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > > |
From: Jeff D. <jd...@ad...> - 2006-09-30 23:59:16
|
On Sun, Oct 01, 2006 at 01:11:37AM +0300, Nikola Ciprich wrote: > You have an idea where > could be problem with shutting down? (this Virtual timer expired cycling?) Oops, missed that in your previous message. Where in the shutdown process is this? There have been some previous reports of hangs during swapoff which I have been unable to reproduce. I spent part of a day last week pushing UMLs deeply into swap and shutting them down, always successfully. The SIGVTALRM is normal - that's just the timer. That strace you sent looks like the idle loop, so as far as the kernel is concerned, everything is fine. > and regarding /proc/mm, shouldn't this mm64 file be used? or what's it > used for? /proc/mm64? What would it do that /proc/mm doesn't? Jeff |
From: Nikola C. <nik...@li...> - 2006-10-01 00:04:36
|
> Where in the shutdown process is this? There have been some previous > reports of hangs during swapoff which I have been unable to reproduce. > I spent part of a day last week pushing UMLs deeply into swap and > shutting them down, always successfully. it always hangs executing /sbin/halt > > The SIGVTALRM is normal - that's just the timer. That strace you sent > looks like the idle loop, so as far as the kernel is concerned, > everything is fine. it's consuming all available host CPU time, it's certainly cycling as fast as possible > > > and regarding /proc/mm, shouldn't this mm64 file be used? or what's it > > used for? > > /proc/mm64? What would it do that /proc/mm doesn't? well, I must admit I don't know what's difference between two at all, I was just guessing. Is there some documentation about it? I wasn't able to find nothing which would explain it... > > Jeff > |
From: Nikola C. <nik...@li...> - 2006-10-01 00:08:22
|
hmm, I've also noticed another strange thing (maybe bug?) when I run this UML using noprocmm, arch is incorrectly reported as i686, while if I run it using skas0, it's (correctly) reported as x86_64... n. PS: You don't need to CC me, I'm reading this list |
From: Nikola C. <nik...@li...> - 2006-10-01 00:19:25
|
oops, so difference is not in mode, problems seems to be in NPTL usage, arch is correctly reported only when I set LD_ASSUME_KERNEL to 2.4.1 sorry about confusion... Nikola Ciprich wrote: > hmm, I've also noticed another strange thing (maybe bug?) > when I run this UML using noprocmm, arch is incorrectly reported as > i686, while if I run it using skas0, it's (correctly) reported as > x86_64... > n. > PS: You don't need to CC me, I'm reading this list > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > > |
From: Nikola C. <nik...@li...> - 2006-10-01 00:45:41
|
Jeff, I'm terribly sorry, please ignore previous two emails. Problem was not in UML itself, but in my scripts, as it happened that I was running it with setarch i386, which caused guest uml report i386 architecture, and it made quite a mess when I was testing it... once again I'm sorry, I guess I really should rather have some sleep :( n. PS: but anyways, I wonder, is this behavior bug or feature? that running x86_64 uml using setarch i386 then causes guest UML report i686 arch? (even it's still x86_64) Nikola Ciprich wrote: > oops, so difference is not in mode, problems seems to be in NPTL usage, > arch is correctly reported only when I set LD_ASSUME_KERNEL to 2.4.1 > sorry about confusion... > > Nikola Ciprich wrote: > >> hmm, I've also noticed another strange thing (maybe bug?) >> when I run this UML using noprocmm, arch is incorrectly reported as >> i686, while if I run it using skas0, it's (correctly) reported as >> x86_64... >> n. >> PS: You don't need to CC me, I'm reading this list >> >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys -- and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> User-mode-linux-devel mailing list >> Use...@li... >> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel >> >> >> > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > > |
From: Blaisorblade <bla...@ya...> - 2006-10-01 15:11:41
Attachments:
fix-uname-under-setarch-i386
|
On Sunday 01 October 2006 02:45, Nikola Ciprich wrote: > Jeff, > I'm terribly sorry, please ignore previous two emails. Problem was not > in UML itself, but in my scripts, as it happened that I was running it > with setarch i386, which caused guest uml report i386 architecture, and > it made quite a mess when I was testing it... > once again I'm sorry, I guess I really should rather have some sleep :( > n. > PS: but anyways, I wonder, is this behavior bug or feature? that running > x86_64 uml using setarch i386 then causes guest UML report i686 arch? > (even it's still x86_64) It is a bug. Under setarch i386, uname returns i686; however, UML should correct this. A 32bit UML says i686 on a 64bit host. Patch attached (compile-tested). Please report whether it works or not. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade |
From: Blaisorblade <bla...@ya...> - 2006-10-01 15:28:05
|
On Sunday 01 October 2006 01:57, Jeff Dike wrote: > On Sun, Oct 01, 2006 at 01:11:37AM +0300, Nikola Ciprich wrote: > > You have an idea where > > could be problem with shutting down? (this Virtual timer expired > > cycling?) > > Oops, missed that in your previous message. > > Where in the shutdown process is this? There have been some previous > reports of hangs during swapoff which I have been unable to reproduce. > I spent part of a day last week pushing UMLs deeply into swap and > shutting them down, always successfully. > > The SIGVTALRM is normal - that's just the timer. That strace you sent > looks like the idle loop, so as far as the kernel is concerned, > everything is fine. > > > and regarding /proc/mm, shouldn't this mm64 file be used? or what's it > > used for? > > /proc/mm64? What would it do that /proc/mm doesn't? Well, I'll explain everything (even to Jeff, since I never documented anything of this). The SKAS patch V8 supports well i386 host and i386 guest. That's all. SKAS patch V9-preX tries to support x86_64 host, with both 32 and 64 bit guests: * 32 bit guests: a bug prevents using full SKAS3, you must pass noprocmm and you'll get most performance advantages (because faultinfo works). When I worked on the SKAS port to x86_64 I did not discover that something worked, and I never tested support for 64bit guests. I latter discovered the bug is in a different portion of code - Bodo Stroesser unveiled it, and I think this is the same bug that hit even Jeff's 1st attempt to write SKAS4 (all I know can be found in http://user-mode-linux.sourceforge.net/diary.html -> grep skas4 and you'll find what I'm talking about on 23 Sep 2004). * 64bit guests: they do not support SKAS3, but the host patch gives the needed support - to do that, the 1st thing to do would be to use /proc/mm64 as you point out (and it could even suffice, even if I doubt that), but guest kernels still try to use /proc/mm and it does not work. So you must pass skas0 on the cmdline (I do not remember whether noprocmm works for this case). -EINVAL is given on purpose: the request struct UML writes on /proc/mm has a different layout for 32 and 64bit requests, or better a different size since pointers are larger (and there's an explicit check for that - IIRC it has always been there). I chose to have a separate /proc/mm64, I did not think to try autodetection. And frankly my first impression is that I'd still do it this way. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Nikola C. <nik...@li...> - 2006-10-01 15:51:22
|
Hi Paolo and others, thanks a lot for explanation. I'll just add some things I've noticed (and also few another questions ;): My setup is dualcore opteron machine, running SMP 2.6.18, patched with skasv9-pre9 with fremap support. guests: vanilla 2.6.18-x86_64 SKAS - crashes trying to access /proc/mm vanilla 2.6.18-x86_64 noprocmm OR skas0 - works equally well, even performance is pretty much the same (measured on kernel compilation), there's just this problem with /sbin/halt hang vanilla 2.6.18-x86 - doesn't matter which arguments are passed - hangs, strace shows following: old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7f40000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7f3f000 clone(child_stack=0xf7f3ffd4, flags=|SIGCHLD) = 8120 waitpid(8120, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WSTOPPED) = 8120 --- SIGCHLD (Child exited) @ 0 (0) --- and child process dies this way: getppid() = 8119 rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0 ptrace(PTRACE_TRACEME, 0, 0, 0) = -1 EPERM (Operation not permitted) dup(2) = 4 fcntl64(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat64(0x4, 0xf7f3fa44) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7f3e000 _llseek(4, 0, 0xf7f3faa4, SEEK_CUR) = -1 ESPIPE (Illegal seek) write(4, "ptrace: Operation not permitted\n", 32) = 32 close(4) = 0 munmap(0xf7f3e000, 4096) = 0 kill(8120, SIGKILL) = 0 +++ killed by SIGKILL +++ 2.6.18-x86_64 with bb1 patch - SKAS crashes as expected - noprocmm or skas0 is weird: if run, panics with: [42949373.500000] VFS: Mounted root (ext2 filesystem) readonly. Usage: init 0123456SsQqAaBbCcUu [42949373.500000] Kernel panic - not syncing: Attempted to kill init! - aparentrly init is being executed in some strange way? - given init=/bin/sh hangs, tracing shows following loop: --- SIGCHLD (Child exited) @ 0 (0) --- wait4(1381, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED, NULL) = 1381 ptrace(PTRACE_GETREGS, 1381, 0, 0x60aec330) = 0 ptrace(PTRACE_GETFPREGS, 1381, 0, 0x60aec408) = 0 ptrace(PTRACE_CONT, 1381, 0, SIGSEGV) = 0 - round and round I hope it helps in some way. I'm going to try mm patches now, to see if something behaves different (better if possible :) nik. PS: your setarch patch works, thanks! Blaisorblade wrote: > On Sunday 01 October 2006 01:57, Jeff Dike wrote: > >> On Sun, Oct 01, 2006 at 01:11:37AM +0300, Nikola Ciprich wrote: >> >>> You have an idea where >>> could be problem with shutting down? (this Virtual timer expired >>> cycling?) >>> >> Oops, missed that in your previous message. >> >> Where in the shutdown process is this? There have been some previous >> reports of hangs during swapoff which I have been unable to reproduce. >> I spent part of a day last week pushing UMLs deeply into swap and >> shutting them down, always successfully. >> >> The SIGVTALRM is normal - that's just the timer. That strace you sent >> looks like the idle loop, so as far as the kernel is concerned, >> everything is fine. >> >> >>> and regarding /proc/mm, shouldn't this mm64 file be used? or what's it >>> used for? >>> >> /proc/mm64? What would it do that /proc/mm doesn't? >> > > Well, I'll explain everything (even to Jeff, since I never documented anything > of this). > > The SKAS patch V8 supports well i386 host and i386 guest. That's all. > SKAS patch V9-preX tries to support x86_64 host, with both 32 and 64 bit > guests: > > * 32 bit guests: a bug prevents using full SKAS3, you must pass noprocmm and > you'll get most performance advantages (because faultinfo works). When I > worked on the SKAS port to x86_64 I did not discover that something worked, > and I never tested support for 64bit guests. > I latter discovered the bug is in a different portion of code - Bodo Stroesser > unveiled it, and I think this is the same bug that hit even Jeff's 1st > attempt to write SKAS4 (all I know can be found in > http://user-mode-linux.sourceforge.net/diary.html -> grep skas4 and you'll > find what I'm talking about on 23 Sep 2004). > > * 64bit guests: they do not support SKAS3, but the host patch gives the needed > support - to do that, the 1st thing to do would be to use /proc/mm64 as you > point out (and it could even suffice, even if I doubt that), but guest > kernels still try to use /proc/mm and it does not work. So you must pass > skas0 on the cmdline (I do not remember whether noprocmm works for this > case). > > -EINVAL is given on purpose: the request struct UML writes on /proc/mm has a > different layout for 32 and 64bit requests, or better a different size since > pointers are larger (and there's an explicit check for that - IIRC it has > always been there). I chose to have a separate /proc/mm64, I did not think to > try autodetection. And frankly my first impression is that I'd still do it > this way. > |
From: Blaisorblade <bla...@ya...> - 2006-10-05 21:19:06
|
On Sunday 01 October 2006 17:51, Nikola Ciprich wrote: > Hi Paolo and others, > thanks a lot for explanation. I'll just add some things I've noticed > (and also few another questions ;): > My setup is dualcore opteron machine, running SMP 2.6.18, patched with > skasv9-pre9 with fremap support. Argh... I thought I marked that as "for developers" (there was a single release for 2.6.18-rc4, but separate ones for 2.6.18...) - that patch (fremap) is IMHO very stable, but I've been its only tester till now. > guests: > vanilla 2.6.18-x86_64 SKAS > - crashes trying to access /proc/mm When 64bit guests try using /proc/mm they'll crash, as explained, so you can skip testing/reporting this case. > vanilla 2.6.18-x86_64 noprocmm OR skas0 > - works equally well, even performance is pretty much the same > (measured on kernel compilation), there's just this problem with > /sbin/halt hang > vanilla 2.6.18-x86 - doesn't matter which arguments are passed > - hangs, strace shows following: ... > kill(8120, SIGKILL) = 0 > +++ killed by SIGKILL +++ > 2.6.18-x86_64 with bb1 patch > - SKAS crashes as expected > - noprocmm or skas0 is weird: > if run, panics with: > [42949373.500000] VFS: Mounted root (ext2 filesystem) readonly. > Usage: init 0123456SsQqAaBbCcUu > [42949373.500000] Kernel panic - not syncing: Attempted to > kill init! > - aparentrly init is being executed in some strange way? Ok, you are the second user reporting this problem. Some option is not being parsed correctly by UML, so it is passed to init. So you obtain this as a regression in -bb1... try unapplying patches/tls/x86-64-support.diff (you find separate patches in the broken-out folder in the archive). > - given init=/bin/sh hangs, tracing shows following loop: > --- SIGCHLD (Child exited) @ 0 (0) --- > wait4(1381, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], > WSTOPPED, NULL) = 1381 > ptrace(PTRACE_GETREGS, 1381, 0, 0x60aec330) = 0 > ptrace(PTRACE_GETFPREGS, 1381, 0, 0x60aec408) = 0 > ptrace(PTRACE_CONT, 1381, 0, SIGSEGV) = 0 > - round and round > > I hope it helps in some way. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Nikola C. <nik...@li...> - 2006-10-07 14:27:24
|
Yes, You're right, reverting this particular patch fixed the problem, so it boots now... All other problems (halt problem with x86_64 guest and hang trying to run i386 guest) persist Is there something else I could try to fix those or find where the problem is? thanks in advance! nik. Blaisorblade wrote: > Ok, you are the second user reporting this problem. Some option is not being > parsed correctly by UML, so it is passed to init. > > So you obtain this as a regression in -bb1... try unapplying > patches/tls/x86-64-support.diff (you find separate patches in the broken-out > folder in the archive). > > |
From: Nikola C. <nik...@li...> - 2006-10-01 21:00:59
|
OK, so I've tested 2.6.18-mm2, results are: pcap network transport seems to be broken, and doesn't compile, I had to disable it apart from that, all of behavior described before remains the same, nothing behaves neither better, nor worse... n. > guests: > vanilla 2.6.18-x86_64 SKAS > - crashes trying to access /proc/mm > vanilla 2.6.18-x86_64 noprocmm OR skas0 > - works equally well, even performance is pretty much the same > (measured on kernel compilation), there's just this problem with > /sbin/halt hang > vanilla 2.6.18-x86 - doesn't matter which arguments are passed > - hangs, strace shows following: > old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7f40000 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7f3f000 > clone(child_stack=0xf7f3ffd4, flags=|SIGCHLD) = 8120 > waitpid(8120, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], > WSTOPPED) = 8120 > --- SIGCHLD (Child exited) @ 0 (0) --- > and child process dies this way: > getppid() = 8119 > rt_sigprocmask(SIG_BLOCK, [WINCH], [], 8) = 0 > ptrace(PTRACE_TRACEME, 0, 0, 0) = -1 EPERM (Operation > not permitted) > dup(2) = 4 > fcntl64(4, F_GETFL) = 0x8002 (flags > O_RDWR|O_LARGEFILE) > fstat64(0x4, 0xf7f3fa44) = 0 > old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7f3e000 > _llseek(4, 0, 0xf7f3faa4, SEEK_CUR) = -1 ESPIPE (Illegal seek) > write(4, "ptrace: Operation not permitted\n", 32) = 32 > close(4) = 0 > munmap(0xf7f3e000, 4096) = 0 > kill(8120, SIGKILL) = 0 > +++ killed by SIGKILL +++ > > 2.6.18-x86_64 with bb1 patch > - SKAS crashes as expected > - noprocmm or skas0 is weird: > if run, panics with: > [42949373.500000] VFS: Mounted root (ext2 filesystem) readonly. > Usage: init 0123456SsQqAaBbCcUu > [42949373.500000] Kernel panic - not syncing: Attempted to > kill init! > - aparentrly init is being executed in some strange way? > - given init=/bin/sh hangs, tracing shows following loop: > --- SIGCHLD (Child exited) @ 0 (0) --- > wait4(1381, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], > WSTOPPED, NULL) = 1381 > ptrace(PTRACE_GETREGS, 1381, 0, 0x60aec330) = 0 > ptrace(PTRACE_GETFPREGS, 1381, 0, 0x60aec408) = 0 > ptrace(PTRACE_CONT, 1381, 0, SIGSEGV) = 0 > - round and round > > |
From: Blaisorblade <bla...@ya...> - 2006-10-05 21:19:01
Attachments:
const-def-after-decl-pcap.diff
|
On Sunday 01 October 2006 23:00, Nikola Ciprich wrote: > OK, so I've tested 2.6.18-mm2, results are: > pcap network transport seems to be broken, and doesn't compile, I had to > disable it Attached patch should fix it (it's against git head but should be the same thing). > apart from that, all of behavior described before remains the same, > nothing behaves neither better, nor worse... > n. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade |
From: Blaisorblade <bla...@ya...> - 2006-10-08 11:07:45
|
On Thursday 05 October 2006 13:33, Blaisorblade wrote: > On Sunday 01 October 2006 23:00, Nikola Ciprich wrote: > > OK, so I've tested 2.6.18-mm2, results are: > > pcap network transport seems to be broken, and doesn't compile, I had to > > disable it > > Attached patch should fix it (it's against git head but should be the same > thing). Since you've said that that patch does not fix pcap compilation, post the compilation error (both with and without the patch) so I know what happens there. Assuming that pcap compiles on your system for stable kernels (i.e. libpcap.a is installed), as you imply in your report. > > apart from that, all of behavior described before remains the same, > > nothing behaves neither better, nor worse... > > n. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Nikola C. <nik...@li...> - 2006-10-08 19:08:58
|
Hello Paolo, I've spent half a day today trying all variants, and well, I'm no longer able to reproduce the problem :( In fact now APPLYING your patch seems to cause the problem, which is of course weird. So please consider this false bugreport until I'm able to find out how the hell did I manage to do it. If I'm reading my confusing posts to this list, I'm starting to feel a bit stupid, sorry for them :-/ I'll certainly have to track all my attempts and compilations, as I'm having quite a mess in it then... n. > Since you've said that that patch does not fix pcap compilation, post the > compilation error (both with and without the patch) so I know what happens > there. Assuming that pcap compiles on your system for stable kernels (i.e. > libpcap.a is installed), as you imply in your report. > > |
From: Blaisorblade <bla...@ya...> - 2006-10-08 19:34:09
|
On Sunday 08 October 2006 21:08, Nikola Ciprich wrote: > Hello Paolo, > I've spent half a day today trying all variants, and well, I'm no longer > able to reproduce the problem :( > In fact now APPLYING your patch seems to cause the problem, which is of > course weird. The patch will cause the problem if applied without another certain patch, so it means that 2.6.18-mm2 does not contain it. I can't be sure without the compilation error but I can guess it is the problem. > So please consider this false bugreport until I'm able to find out how > the hell did I manage to do it. > If I'm reading my confusing posts to this list, I'm starting to feel a > bit stupid, sorry for them :-/ Do not worry, it happens - you've falsified your bugreport, normally someone else must do it (in one occasion I've received configuration files from the wrong machine, i.e. "this is my host config" while it was the guest one, wondered a bit and then pointed kindly this possibility to the poster, who recognized promptly his error). > I'll certainly have to track all my attempts and compilations, as I'm > having quite a mess in it then... > n. > > > Since you've said that that patch does not fix pcap compilation, post the > > compilation error (both with and without the patch) so I know what > > happens there. Assuming that pcap compiles on your system for stable > > kernels (i.e. libpcap.a is installed), as you imply in your report. > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > User-mode-linux-devel mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Nikola C. <nik...@li...> - 2006-10-08 20:01:08
|
That's it! -mm3 kernel is causing this problem, and Your patch indeed fixes it! I completely forgot that I was also trying mm patches and I certainly mixed them with Your patches :-/ at least I'm glad that I now know where the problem was... Blaisorblade wrote: > On Sunday 08 October 2006 21:08, Nikola Ciprich wrote: > >> Hello Paolo, >> I've spent half a day today trying all variants, and well, I'm no longer >> able to reproduce the problem :( >> In fact now APPLYING your patch seems to cause the problem, which is of >> course weird. >> > The patch will cause the problem if applied without another certain patch, so > it means that 2.6.18-mm2 does not contain it. I can't be sure without the > compilation error but I can guess it is the problem. > > > |
From: Blaisorblade <bla...@ya...> - 2006-10-14 00:23:18
|
On Sunday 08 October 2006 22:01, Nikola Ciprich wrote: > That's it! > -mm3 kernel is causing this problem, and Your patch indeed fixes it! I > completely forgot that I was also trying mm patches and I certainly > mixed them with Your patches :-/ > at least I'm glad that I now know where the problem was... Fine! Next time, however, you'll save your time if you cut'n'paste the gcc output and do "head Makefile" (brighter variations are possible, but that should work if you don't use git) to get the exact version string. I don't want to criticize, just to give you a tip :-) ! Bye -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Nikola C. <nik...@li...> - 2006-10-14 09:24:01
|
Hi, thanks for tips! actually I'm really glad for them as I'd like to help in any way and later even join development if I'll be able to do sth ;). Actually there are two questions regarding to this I have: 1) You mentioned GIT, do You use it? or Jeff? I haven't noticed any UML git branch... 2) I'm still worried about this halt problem on amd64, can You give me some tips I could track it down?? I't only thing which is preventing me from migrating users on new machine, otherwise it seems to be working great (but of course I'm still really looking forward for full SKAS ;)) thanks a lot for Your patience with me... nik On Sat, 2006-10-14 at 02:23 +0200, Blaisorblade wrote: > On Sunday 08 October 2006 22:01, Nikola Ciprich wrote: > > That's it! > > -mm3 kernel is causing this problem, and Your patch indeed fixes it! I > > completely forgot that I was also trying mm patches and I certainly > > mixed them with Your patches :-/ > > at least I'm glad that I now know where the problem was... > > Fine! > Next time, however, you'll save your time if you cut'n'paste the gcc output > and do "head Makefile" (brighter variations are possible, but that should > work if you don't use git) to get the exact version string. I don't want to > criticize, just to give you a tip :-) ! > > Bye |
From: Blaisorblade <bla...@ya...> - 2006-10-15 17:39:08
|
On Saturday 14 October 2006 11:23, Nikola Ciprich wrote: > Hi, > thanks for tips! actually I'm really glad for them as I'd like to help > in any way and later even join development if I'll be able to do sth ;). > Actually there are two questions regarding to this I have: > 1) You mentioned GIT, do You use it? or Jeff? I use it, I was thinking about the Linus' git branch. If you use that, head Makefile won't tell you the whole truth (it won't tell you which commit you are on). > I haven't noticed any UML > git branch... Right, we queue our patches with quilt and send them through -mm. Jeff publishes his *development* quilt tree on UML homepage. > 2) I'm still worried about this halt problem on amd64, can You give me > some tips I could track it down?? I't only thing which is preventing me > from migrating users on new machine, otherwise it seems to be working > great (but of course I'm still really looking forward for full SKAS ;)) A 100%-cpu loop without messages is difficult to diagnose - the first answer is "check if SMP is enabled on that UML binary, disable SMP, and see if it disappears". SMP does not work; when Jeff will announce "SMP support ready" then it will be possible to start using it again. Otherwise, if you enable debug info and recompile it, try attaching gdb to the process using 100% cpu (type gdb $vmlinux_binary $pid) and then type bt at the prompt. While recompiling, under "Kernel hacking" try enabling every debug option (excluding "kobject debugging" and a few others) - this is what I have on testing kernels, for reference (comes from a GIT head kernel): CONFIG_PRINTK_TIME=y CONFIG_ENABLE_MUST_CHECK=y # CONFIG_UNUSED_SYMBOLS is not set CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=15 CONFIG_DETECT_SOFTLOCKUP=y CONFIG_SCHEDSTATS=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_RT_MUTEX_TESTER=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_RWSEMS=y CONFIG_DEBUG_SPINLOCK_SLEEP=y CONFIG_DEBUG_LOCKING_API_SELFTESTS=y # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_FS is not set CONFIG_DEBUG_VM=y CONFIG_DEBUG_LIST=y CONFIG_FRAME_POINTER=y CONFIG_UNWIND_INFO=y # CONFIG_FORCED_INLINING is not set CONFIG_RCU_TORTURE_TEST=m # CONFIG_CMDLINE_ON_HOST is not set # CONFIG_PT_PROXY is not set # CONFIG_GCOV is not set > thanks a lot for Your patience with me... > nik -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Jeff D. <jd...@ad...> - 2006-10-16 15:04:32
|
On Sat, Oct 14, 2006 at 12:23:50PM +0300, Nikola Ciprich wrote: > 1) You mentioned GIT, do You use it? or Jeff? I haven't noticed any UML > git branch... I don't. I use quilt, although I occasionally construct a git tree when there's something in -linux that I need to look at. > 2) I'm still worried about this halt problem on amd64, can You give me > some tips I could track it down?? The hang happens at the very end of the shutdown sequence, after all services have been turned off? IIRC, you said that stracing UML shows a VTALRM-handling loop and that the UML is consuming all available CPU. Those two statements are not consistent, AFIAK, since there should be no busy loops in UML (although I can imagine bugs which could induce one) and the VTALRM loop is the sign of the idle loop, which isn't busy. So, could you confirm all this, and also get a stack trace from the hang? Jeff |
From: Blaisorblade <bla...@ya...> - 2006-10-17 23:40:12
|
On Tuesday 17 October 2006 22:21, Nikola Ciprich wrote: > Hi guys, > thanks for suggestions. Now I've been able to track the problem, and it > seems to be arch/um/sys-x86_64/delay.c, function __const_udelay. > halt process hangs at drivers/md/md.c, function md_notify_reboot(...): > it never gets after the mdelay(1000*1) line, however according to my > printk's I've added it seems that this function is being executed > periodically, but it's certainly because of some overflow. > I've noticed that it's being called with usecs value of 4294000, and > I've also noticed following comment at include/linux/delay.h: > /* > * Using udelay() for intervals greater than a few milliseconds can > * risk overflow for high loops_per_jiffy (high bogomips) machines. The > * mdelay() provides a wrapper to prevent this. For delays greater > * than MAX_UDELAY_MS milliseconds, the wrapper is used. Architecture > * specific values can be defined in asm-???/delay.h as an override. > * The 2nd mdelay() definition ensures GCC will optimize away the > * while loop for the common cases where n <= MAX_UDELAY_MS -- Paul G. > */ > So maybe calling __const_udelay with such high value is problem? When I > print value of n, it's always some big value (like 3419150745), so it > seems that this variable is overflowing). When I compare um-i386 and > um-x86_64 implementation of __const_udelay, it turns that difference > between them is that i386 version uses plain unsigned ints, while x86_64 > version uses unsigned longs. So I've tried to change longs to ints (yup, > it's certainly brain damaged, but I just wanted to try) and things seem > to started working. You really changed long to int? In this case, you may only have introduced a bigger overflow. Jeff, I remember vaguely a recent (>=2.6.17) patch about udelay fixing a misbehaviour, would you please complete my recollection and verify the interaction with this? Also, since we take __const_udelay and __udelay() from asm/arch/delay.h: (for i386): #define udelay(n) (__builtin_constant_p(n) ? \ ((n) > 20000 ? __bad_udelay() : __const_udelay((n) * 0x10c6ul)) : \ __udelay(n)) #define ndelay(n) (__builtin_constant_p(n) ? \ ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \ __ndelay(n)) We then do _not_ take these definitions: arch/i386/lib/delay.c: void __udelay(unsigned long usecs) { __const_udelay(usecs * 0x000010c7); /* 2**32 / 1000000 (rounded up) */ } void __ndelay(unsigned long nsecs) { __const_udelay(nsecs * 0x00005); /* 2**32 / 1000000000 (rounded up) */ } instead for us __udelay and __const_udelay match, which is not likely to help at all. Their code is correct because the asm "mull" call divides the obtained product by 2^32 by throwing away the 32 low bits (i.e. EAX, since the mull result is in EDX:EAX), and the meaning of this is to do the division only at the very end (to improve accuracy). This bug is here at least since 2.6.16. Suggested solution: 1) have _our own_ delay.h, to avoid depending on subarchs' changes 2) first avoid playing so complex tricks, then maybe copy their implementation (note it is slightly different already among x86 and x86_64). > But I'm sure this is not the right solution, so what > do You guys suggest? And also is really i386 variant safely implemented? > thanks a lot! > nik. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com |
From: Nikola C. <nik...@li...> - 2006-10-18 07:40:57
|
> You really changed long to int? In this case, you may only have introduced a > bigger overflow. Yes, I'm aware of that, I just wanted to try how i386 variant behaves. btw I just tried gdb to find out why i386 UML on x86_64 host hangs just at the beginning, and it turned out to be same problem with __const_udelay. well, it's not a big surprise now I guess. So if You need further testing, please feel free to post patches etc ;) regards nik. On Wed, 2006-10-18 at 01:39 +0200, Blaisorblade wrote: > On Tuesday 17 October 2006 22:21, Nikola Ciprich wrote: > You really changed long to int? In this case, you may only have introduced a > bigger overflow. > Jeff, I remember vaguely a recent (>=2.6.17) patch about udelay fixing a > misbehaviour, would you please complete my recollection and verify the > interaction with this? > > Also, since we take __const_udelay and __udelay() from asm/arch/delay.h: > (for i386): > #define udelay(n) (__builtin_constant_p(n) ? \ > ((n) > 20000 ? __bad_udelay() : __const_udelay((n) * 0x10c6ul)) : \ > __udelay(n)) > > #define ndelay(n) (__builtin_constant_p(n) ? \ > ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \ > __ndelay(n)) > > We then do _not_ take these definitions: > arch/i386/lib/delay.c: > > void __udelay(unsigned long usecs) > { > __const_udelay(usecs * 0x000010c7); /* 2**32 / 1000000 (rounded up) */ > } > > void __ndelay(unsigned long nsecs) > { > __const_udelay(nsecs * 0x00005); /* 2**32 / 1000000000 (rounded up) */ > } > > instead for us __udelay and __const_udelay match, which is not likely to help > at all. Their code is correct because the asm "mull" call divides the > obtained product by 2^32 by throwing away the 32 low bits (i.e. EAX, since > the mull result is in EDX:EAX), and the meaning of this is to do the division > only at the very end (to improve accuracy). > > This bug is here at least since 2.6.16. > Suggested solution: > 1) have _our own_ delay.h, to avoid depending on subarchs' changes > 2) first avoid playing so complex tricks, then maybe copy their implementation > (note it is slightly different already among x86 and x86_64). > > > But I'm sure this is not the right solution, so what > > do You guys suggest? And also is really i386 variant safely implemented? > > > thanks a lot! > > nik. > |