|
From: Jake H. <jh...@an...> - 2004-10-24 17:47:44
|
Kristian Van Der Vliet wrote: > On Saturday 23 October 2004 9:03 pm, Jake Hamby wrote: >> >>What's the status on importing my patches for the atomic primitives? > > I'll get on with checking them in. Hopefully they'll be in CVS today; I'll > let you know as soon as I'm done. Thanks! I assume you checked in the driver and fs patches as well. > I'll capture a few kernel logs from both an -O3 and -O2 kernel; the symptoms > appear to be identical. Got the logs. I'll keep poking around and let you know if I find anything. >>* Implement lazy FP context switching. Instead of spending the time to > > Sounds like a great idea to me. Anything that can save a few ticks on a > context switch it always a good idea :) Definitely! I'm playing around with the lmbench-3.0-a4 microbenchmarks right now, trying to figure out why Syllable is so freaking slow under VMware!! I tried running the LiveCD on my desktop machine and Syllable is an order of magnitude faster on a real machine than under VMware. Unfortunately, I couldn't do a native install because my hard drives are too big and I get a "Drive uses 48bit addressing" error. It looks like I can enable support for that with a kernel flag but I want to look at the code a little and build a more recent kernel for the install CD before I try that. My laptop hard drive is 60GB, which may be small enough to do a native install there for comparison benchmarks. However, I really enjoy being able to hack on Syllable in VMware while browsing the code in Visual Studio (even though I can't build it, VS has great code browsing and search capabilities), listening to music, and watching the debug output in a terminal window. On a laptop, especially, having good OS support for things like power management and 802.11g is very important as well. Speaking of 802.11g, I spent many hours a few months back adapting the latest version of the madwifi Linux driver for Atheros chipsets to FreeBSD. Once I've finished my current crop of projects, I definitely intend to try hacking the 802.11 and Atheros driver code into Syllable. I'm also interested in tackling ACPI, at least well enough to enable Hyperthreading so I have at least a simulation of SMP (though not in VMware of course) to test on. >>This goes hand-in-hand with my plan to extend save_fpu_state() to use >>FXSAVE instead of FSAVE on Pentium III and above (including Athlon), >>which saves the SSE/SSE2 context as well as the FP/MMX context. This >>would allow us to set the bit to enable SSE/SSE2 support so Pentium3/4 >>and Athlon optimized vector instructions could be used. > > Would this allow MMX/SSE/SSE2 instructions in the kernel? Someone was asking > about this a little while back, although to be honest I forget why.. Here's the deal. MMX uses the same registers as the FPU so there is no additional OS support needed. A side effect is that you can't use MMX and FP at the same time but have to switch between them. All this is handled by GCC if you use the built-in vector instructions. SSE/SSE2, on the other hand, uses an additional set of 8 128-bit registers plus a few additional control and status words, so there is additional OS support needed and so these instructions are disabled until the OS signals to the CPU that it can handle them. The changes are not too complicated: basically you have to set aside a 512-byte area per thread to save the FP state with FXSAVE instead of the 128-byte area needed for FSAVE. There's an additional exception, vector 19 (#XF) used to signal SIMD floating-point exceptions, much as vector 16 (#MF) is used for regular floating-point exceptions. The good news is that you can intermix SSE/SSE2 and FP or MMX instructions in the regular registers. For some very basic information on using the vector instructions from GCC without having to write inline assembly code, see: http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html GCC also supports the <xmmintrin.h> header file for compatibility with MSFT and Intel C++ compilers, which is good because it's a lot easier to use. See here for a simple tutorial with links to other sites: http://www.codeproject.com/cpp/sseintro.asp In addition, I believe that compiling with "-march=pentium3", "-march=pentium4", or "-march=athlon-xp" will potentially use the MMX or SSE instructions automatically when beneficial, although I have not seen this first-hand. It will definitely use SSE for floating-point if you add "-mfpmath=sse". I've attached some very short test programs, one using MMX and the other using SSE, that you can use to test for example Linux vs. Syllable. Right now Syllable runs the MMX one but kills the SSE one with an **Illegal instruction**, which makes sense as SSE is not enabled yet. -- Jake |