From: Dan A. <da...@co...> - 2004-03-02 21:29:35
|
Hello, I've figured the issue mentioned in the topic. Well, apparently the optimized kernel code corrupts MMX / FPU registers for userspace while in the path between hardware interrupt handling and forwarding to the host. It occurs more often when the coLinux machine gets more CPU from the host, since hardware interrupts are caught more often in that situation. A i386-compatible vmlinux gets around it, but it's not the solution I want. I'm now working on a fix, while also merging Ballard's patches. -- Dan Aloni da...@co... |
From: Dan A. <da...@co...> - 2004-03-05 06:29:26
|
On Tue, Mar 02, 2004 at 11:16:29PM +0200, Dan Aloni wrote: > Hello, > > I've figured the issue mentioned in the topic. Well, apparently the optimized > kernel code corrupts MMX / FPU registers for userspace while in the path > between hardware interrupt handling and forwarding to the host. > > It occurs more often when the coLinux machine gets more CPU from the host, > since hardware interrupts are caught more often in that situation. > > A i386-compatible vmlinux gets around it, but it's not the solution I want. Okay, *now* I really figured out the issue mentioned in the topic. The vmlinux i686 issue is only a lesser of two evils - more testing led me to the conclusion that the fix for the %fs/%gs wasn't complete. The fix that got into 0.5.4 only prevented recurring segmentation faults, but didn't take one thing into account. During a context switch of the Linux scheduler, the LDT of the next process is loaded prior to saving the %fs/%gs register pair for the previous one. This means that in that small period of time, an hardware interrupt can occur and cause coLinux to switch back to Windows for it to be handled. On the way back, it tries to restore %gs. However, the LDT isn't valid and %gs is loaded with 0. When __switch_to finally switches to the new process, it saves the zero'ed %gs in ->thread.gs. So, when a switch occurs back to the process with the zero %gs, it dies when the user space code tries to access it. Considering this can occur only during hardware interrupts, it explains why CPU intensive tasks are affected. When coLinux is idle, it intercepts much less hardware interrupts. The fix is to simplely move the saving of %fs and %gs and to prepare_to_switch(), that occurs before switch_mm(). I've tested it on my Gentoo setup and it seems the problem is gone. For this milestone I've uploaded a snapshot. diff -u linux/arch/i386/kernel/process.c linux/arch/i386/kernel/process.c --- linux/arch/i386/kernel/process.c 2004-03-04 00:16:11.000000000 +0200 +++ linux/arch/i386/kernel/process.c 2004-03-05 07:21:18.000000000 +0200 @@ -671,12 +671,20 @@ */ tss->esp0 = next->esp0; +#ifdef CONFIG_COOPERATIVE + /* + * We would save %fs and %gs using an atomic operation in the + * just before the LDT of the next process is loaded. It is + * not here, it's in... + */ +#else /* * Save away %fs and %gs. No need to save %es and %ds, as * those are always kernel segments while inside the kernel. */ asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs)); asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs)); +#endif /* * Restore %fs and %gs. only in patch2: unchanged: --- linux/include/asm-i386/system.h 2004-02-02 22:39:43.000000000 +0200 +++ linux/include/asm-i386/system.h 2004-03-05 07:17:57.000000000 +0200 @@ -12,7 +12,15 @@ struct task_struct; /* one of the stranger aspects of C forward declarations.. */ extern void FASTCALL(__switch_to(struct task_struct *prev, struct task_struct *next)); +#ifdef CONFIG_COOPERATIVE +#define prepare_to_switch() { \ + asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->thread.fs)); \ + asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->thread.gs)); \ +} +#else #define prepare_to_switch() do { } while(0) +#endif + #define switch_to(prev,next,last) do { \ asm volatile("pushl %%esi\n\t" \ "pushl %%edi\n\t" \ -- Dan Aloni da...@co... |
From: Sean B. <sea...@so...> - 2004-03-05 13:07:42
|
----- Original Message ----- From: "Dan Aloni" <da...@co...> To: "Cooperative Linux Development" <col...@li...> Sent: Tuesday, March 02, 2004 9:16 PM Subject: [coLinux-devel] 0.5.4: i686-optimized vmlinux along with i686-optimized userspace - segfaults | Hello, | | I've figured the issue mentioned in the topic. Well, apparently the optimized | kernel code corrupts MMX / FPU registers for userspace while in the path | between hardware interrupt handling and forwarding to the host. | | It occurs more often when the coLinux machine gets more CPU from the host, | since hardware interrupts are caught more often in that situation. | | A i386-compatible vmlinux gets around it, but it's not the solution I want. | | I'm now working on a fix, while also merging Ballard's patches. Does the latest snapshot (20040305) fix the seg fault issues? | | -- | Dan Aloni | da...@co... | | | ------------------------------------------------------- | SF.Net is sponsored by: Speed Start Your Linux Apps Now. | Build and deploy apps & Web services for Linux with | a free DVD software kit from IBM. Click Now! | http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click | _______________________________________________ | coLinux-devel mailing list | coL...@li... | https://lists.sourceforge.net/lists/listinfo/colinux-devel |
From: Dan A. <da...@co...> - 2004-03-05 13:10:46
|
On Fri, Mar 05, 2004 at 12:53:34PM -0000, Sean Brook wrote: > | I'm now working on a fix, while also merging Ballard's patches. > > Does the latest snapshot (20040305) fix the seg fault issues? Yes. -- Dan Aloni da...@co... |
From: Sean B. <sea...@so...> - 2004-03-05 13:17:40
|
----- Original Message ----- From: "Dan Aloni" <da...@co...> To: "Sean Brook" <sea...@so...> Cc: "Dan Aloni" <da...@co...>; "Cooperative Linux Development" <col...@li...> Sent: Friday, March 05, 2004 12:56 PM Subject: Re: [coLinux-devel] 0.5.4: i686-optimized vmlinux along with i686-optimized userspace - segfaults | On Fri, Mar 05, 2004 at 12:53:34PM -0000, Sean Brook wrote: | | > | I'm now working on a fix, while also merging Ballard's patches. | > | > Does the latest snapshot (20040305) fix the seg fault issues? | | Yes. Fantastic. | | -- | Dan Aloni | da...@co... |
From: Sean B. <sea...@so...> - 2004-03-05 13:35:51
|
Can I use an athlon optimised vmlinux or should I go with i686? My processor: Athlon 1.4GHz # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 4 model name : AMD Athlon(tm) Processor stepping : 4 cpu MHz : 0.000 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow bogomips : 837.22 |
From: Sean B. <sea...@so...> - 2004-03-05 19:30:40
|
----- Original Message ----- From: "Sean Brook" <sea...@so...> To: "Dan Aloni" <da...@co...> Cc: "Cooperative Linux Development" <col...@li...> Sent: Friday, March 05, 2004 1:21 PM Subject: Re: [coLinux-devel] 0.5.4: i686-optimized vmlinux along with i686-optimized userspace - segfaults | Can I use an athlon optimised vmlinux or should I go with i686? I get spurious 'Floating Point Exception' with an athlon optimised vmlinux even when doing an $ mv I will stick with i686 for now which is working great. | | My processor: Athlon 1.4GHz | | # cat /proc/cpuinfo | | processor : 0 | vendor_id : AuthenticAMD | cpu family : 6 | model : 4 | model name : AMD Athlon(tm) Processor | stepping : 4 | cpu MHz : 0.000 | cache size : 256 KB | fdiv_bug : no | hlt_bug : no | f00f_bug : no | coma_bug : no | fpu : yes | fpu_exception : yes | cpuid level : 1 | wp : yes | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca | cmov | pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow | bogomips : 837.22 | | | | | ------------------------------------------------------- | This SF.Net email is sponsored by: IBM Linux Tutorials | Free Linux tutorial presented by Daniel Robbins, President and CEO of | GenToo technologies. Learn everything from fundamentals to system | administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click | _______________________________________________ | coLinux-devel mailing list | coL...@li... | https://lists.sourceforge.net/lists/listinfo/colinux-devel |
From: morfic <mo...@bb...> - 2004-03-05 18:19:25
|
works great here now with a gentoo using CFLAGS= -march=pentium4 -Os -fomit-frame-pointer -pipe switched to -Os cause i want to keep the build as small as possible and past experience showed -Os still gives decent speed while really making a nice size difference compared to -O3 or even -O2 (which is what i had run on my athlon-XP as -O3 caused me serious problems) anyway thanks for the fix, once this thing runs stable for a while youll see a donation from me, but dont expect to see it from this email ;) now i can go and check on how to make the kernel for colinux and use P4 optimizations and hopefully make a static kernel, im not much of a module fan especially if i have to copy the modules around since i cant just drag n drop into the vm anyway looks great now, biggest problem was tar which now even unpacks big stuff like glibc/gcc w/o a hitch Dan Aloni wrote: >On Fri, Mar 05, 2004 at 12:53:34PM -0000, Sean Brook wrote: > > > >>| I'm now working on a fix, while also merging Ballard's patches. >> >>Does the latest snapshot (20040305) fix the seg fault issues? >> >> > >Yes. > > > |