You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
| 2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(10) |
Jul
|
Aug
|
Sep
|
Oct
(39) |
Nov
(14) |
Dec
(8) |
| 2005 |
Jan
(46) |
Feb
(36) |
Mar
(5) |
Apr
(12) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(10) |
Nov
|
Dec
(2) |
| 2006 |
Jan
(3) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(9) |
Nov
(14) |
Dec
(4) |
| 2007 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
(2) |
May
|
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: BurningShadow <syl...@bu...> - 2004-10-31 07:38:54
|
I got a 486 somewhere (and it should be working), so I can test it. But who would run it on a 486? The 486 hasnt been in active sale for many many years, and I dont think that Syllable should be known as a you-can-run-it-on-your-oldest-computer system. Thats the way people (in Denmark) see Linux, and now they cant understand why they cant run Fedora on their pre-pentium system. In fact I think the exact upper side should be done.10 years from now, there will be almost no 32bits left, so at that time it wont make sense to keep 32bit code. It should be wiped out, in order to keep the system up-to-date. >Otherwise, it's probably never going to get fixed. And I think thats the best. >> The NexGen Nx586, AMD/Cyrix 5x86s, 486s etc, don't have it at all. Not that >> Syllable can currently run on those because of a couple of lines of Kurt's >> i586 assembler enthusiasm, but someone's going to want to sooner or later :) > >Besides the code which assumes the presence of the cpuid instruction, is >there any other assembly language code that anyone knows of that is >keeping it from working on 486s? I know how to fix the cpuid code to >detect if the machine doesn't support cpuid, but how many people these >days have access to a 486-class machine with math coprocessor that they >want to run Syllable on? If you do, let me know and I may be able to >work something out. Otherwise, it's probably never going to get fixed. > |
|
From: Maritimus <rea...@ih...> - 2004-10-31 05:55:48
|
> > The NexGen Nx586, AMD/Cyrix 5x86s, 486s etc, don't have it at all. Not that > > Syllable can currently run on those because of a couple of lines of Kurt's > > i586 assembler enthusiasm, but someone's going to want to sooner or later :) > > Besides the code which assumes the presence of the cpuid instruction, is > there any other assembly language code that anyone knows of that is > keeping it from working on 486s? From AtheOS.cx (thanks to archive.org): """ Here is a list of supported hardware: CPU: Pentium or better (Yes AtheOS use Pentium only instructions and will crash and burn on a 486). """ As far as I remember, there were a couple of i586 opcodes in the assembler (I think it was only a few lines) because Kurt owned a Pentium. This was probably talked about on the AtheOS mailing list, if anyone can search their old mailbox. I guess it'd be possible to make the system's utilisation of these instructions dependent on an Intel Pentium or fully compatible having been CPUID'ed, though I don't know whether they're of any benefit or Kurt was just experimenting. > I know how to fix the cpuid code to > detect if the machine doesn't support cpuid, Like so? http://www.paradicesoftware.com/specs/cpuid/ http://www.sandpile.org/ia32/cpuid.htm > but how many people these > days have access to a 486-class machine with math coprocessor that they > want to run Syllable on? If you do, let me know and I may be able to > work something out. Otherwise, it's probably never going to get fixed. I've got a couple of 80486 boxes next to me =) Though it isn't just 486s, as it was a long time after the Pentium was released before all the non Intel x86s had fully implemented the i586 instructions. There is little way Syllable's going to be usable on an SX25, but some of these older chips best a Pentium 100, so it's probably as 'enabling' to Syllable's hardware bredth as adding another video driver or two more or less. |
|
From: Jake H. <jh...@an...> - 2004-10-31 03:49:50
|
Maritimus wrote: > The NexGen Nx586, AMD/Cyrix 5x86s, 486s etc, don't have it at all. Not that > Syllable can currently run on those because of a couple of lines of Kurt's > i586 assembler enthusiasm, but someone's going to want to sooner or later :) Besides the code which assumes the presence of the cpuid instruction, is there any other assembly language code that anyone knows of that is keeping it from working on 486s? I know how to fix the cpuid code to detect if the machine doesn't support cpuid, but how many people these days have access to a 486-class machine with math coprocessor that they want to run Syllable on? If you do, let me know and I may be able to work something out. Otherwise, it's probably never going to get fixed. -Jake |
|
From: Maritimus <rea...@ih...> - 2004-10-30 10:55:58
|
> > This looked interesting: > > http://www.cs.helsinki.fi/linux/linux-kernel/2001-31/0542.html > > Hmm.. That is interesting. I know that the "rdtsc" instruction was > introduced with the Pentium so it should work on all machines that > Syllable currently runs on. From what I've read, at least some or all incarnations of the AMD K5, IDT Winchip, Cyrix 6x86 / MediaGX either don't support rdtsc, or will screw it up. The Linux kernel code seems to have workarounds for chips that don't support the rdtsc instruction. Many non Intel Pentium clones can't be counted on supporting rdtsc. The NexGen Nx586, AMD/Cyrix 5x86s, 486s etc, don't have it at all. Not that Syllable can currently run on those because of a couple of lines of Kurt's i586 assembler enthusiasm, but someone's going to want to sooner or later :) > |
|
From: Jake H. <jh...@an...> - 2004-10-29 21:34:32
|
Maritimus wrote: > > Ok I don't really know this stuff, but will this have a fallback if i586 > time stamp code doesn't work with a particular x86 CPU? > > This looked interesting: > http://www.cs.helsinki.fi/linux/linux-kernel/2001-31/0542.html Hmm.. That is interesting. I know that the "rdtsc" instruction was introduced with the Pentium so it should work on all machines that Syllable currently runs on. Whether it's _accurate_ is a completely different question. In particular, it's definitely not accurate under VMware as it appears to be an emulated instruction, not the real TSC clock, and it slows down massively when the virtual machine manager has to do a lot of emulation of system mode functions. The Linux code appears to fall back to the PIT timer with a warning if the TSC loses too many ticks. The comment at the top of the Linux code for gettimeofday() says that it achieves microsecond accuracy even with the PIT timer so I'm hoping to be able to get the accuracy I need for lmbench one way or the other. Anyway, I'm going to hold off on the timer code for a day or two while I get caught up and submit diffs for my already completed fixes. I've also been writing doxygen comment blocks for all the kernel functions after I got sick of looking at all those empty comment blocks. It's a bit tedious but the end result should be quite nice looking. I'm tagging all of the private functions with "\ingroup Internal" so they show up in their own module, and I will make a group for each set of related kernel functions, e.g. Areas, Semaphores, Networking, VFS. -Jake |
|
From: Maritimus <rea...@ih...> - 2004-10-29 05:53:06
|
Jake Hamby wrote: > I am busy preparing my next "mega-patch" for the kernel which will: > be great > I'm also halfway done with integrating the latest TSC timer code from > Linux which uses the Pentium internal clock to give sub-microsecond > precision to the udelay() and get_system_time() calls. It's necessary > to have the best possible accuracy for gettimeofday() in order for > LMbench to give accurate results in a short benchmarking run. Right > now, LMbench says it is only accurate to 1000000us, compared to 5000us > precision under Linux, which is the best possible result (1000000us is > the worst possible result). Ok I don't really know this stuff, but will this have a fallback if i586 time stamp code doesn't work with a particular x86 CPU? This looked interesting: http://www.cs.helsinki.fi/linux/linux-kernel/2001-31/0542.html |
|
From: Jake H. <jh...@an...> - 2004-10-28 23:58:47
|
I am busy preparing my next "mega-patch" for the kernel which will: * add support for SSE/SSE2 and lazy floating-point context switching, * inline numerous short assembly language functions from intel.s, * fix a few signal handling and fork/exec context initialization bugs, * correct numerous typos and grammatical errors in the source code, * improve the performance of allocating and freeing memory pages, and * integrate the latest version of <atheos/isa_io.h> and "inc/io_ports.h" from Linux (the second one gives us more readable code since I was able to replace "0x20" with "PIC_MASTER_CMD", "0x43" with "PIT_MODE", and so on throughout the i8259A access code). I'm also halfway done with integrating the latest TSC timer code from Linux which uses the Pentium internal clock to give sub-microsecond precision to the udelay() and get_system_time() calls. It's necessary to have the best possible accuracy for gettimeofday() in order for LMbench to give accurate results in a short benchmarking run. Right now, LMbench says it is only accurate to 1000000us, compared to 5000us precision under Linux, which is the best possible result (1000000us is the worst possible result). Anyway, I will try to have those patches posted to syllable-kernel list tomorrow afternoon for y'all. I finally got around to installing Syllable onto a native partition on my laptop and it is super fast! I was able to make some small improvements to Syllable's performance under VMware, but it is still a very slow OS to emulate. Compiling things takes at least 10x longer in VMware than natively so be forewarned if you are thinking of going that route for development. Cheers, Jake |
|
From: Jake H. <jh...@an...> - 2004-10-24 17:47:44
|
Kristian Van Der Vliet wrote: > On Saturday 23 October 2004 9:03 pm, Jake Hamby wrote: >> >>What's the status on importing my patches for the atomic primitives? > > I'll get on with checking them in. Hopefully they'll be in CVS today; I'll > let you know as soon as I'm done. Thanks! I assume you checked in the driver and fs patches as well. > I'll capture a few kernel logs from both an -O3 and -O2 kernel; the symptoms > appear to be identical. Got the logs. I'll keep poking around and let you know if I find anything. >>* Implement lazy FP context switching. Instead of spending the time to > > Sounds like a great idea to me. Anything that can save a few ticks on a > context switch it always a good idea :) Definitely! I'm playing around with the lmbench-3.0-a4 microbenchmarks right now, trying to figure out why Syllable is so freaking slow under VMware!! I tried running the LiveCD on my desktop machine and Syllable is an order of magnitude faster on a real machine than under VMware. Unfortunately, I couldn't do a native install because my hard drives are too big and I get a "Drive uses 48bit addressing" error. It looks like I can enable support for that with a kernel flag but I want to look at the code a little and build a more recent kernel for the install CD before I try that. My laptop hard drive is 60GB, which may be small enough to do a native install there for comparison benchmarks. However, I really enjoy being able to hack on Syllable in VMware while browsing the code in Visual Studio (even though I can't build it, VS has great code browsing and search capabilities), listening to music, and watching the debug output in a terminal window. On a laptop, especially, having good OS support for things like power management and 802.11g is very important as well. Speaking of 802.11g, I spent many hours a few months back adapting the latest version of the madwifi Linux driver for Atheros chipsets to FreeBSD. Once I've finished my current crop of projects, I definitely intend to try hacking the 802.11 and Atheros driver code into Syllable. I'm also interested in tackling ACPI, at least well enough to enable Hyperthreading so I have at least a simulation of SMP (though not in VMware of course) to test on. >>This goes hand-in-hand with my plan to extend save_fpu_state() to use >>FXSAVE instead of FSAVE on Pentium III and above (including Athlon), >>which saves the SSE/SSE2 context as well as the FP/MMX context. This >>would allow us to set the bit to enable SSE/SSE2 support so Pentium3/4 >>and Athlon optimized vector instructions could be used. > > Would this allow MMX/SSE/SSE2 instructions in the kernel? Someone was asking > about this a little while back, although to be honest I forget why.. Here's the deal. MMX uses the same registers as the FPU so there is no additional OS support needed. A side effect is that you can't use MMX and FP at the same time but have to switch between them. All this is handled by GCC if you use the built-in vector instructions. SSE/SSE2, on the other hand, uses an additional set of 8 128-bit registers plus a few additional control and status words, so there is additional OS support needed and so these instructions are disabled until the OS signals to the CPU that it can handle them. The changes are not too complicated: basically you have to set aside a 512-byte area per thread to save the FP state with FXSAVE instead of the 128-byte area needed for FSAVE. There's an additional exception, vector 19 (#XF) used to signal SIMD floating-point exceptions, much as vector 16 (#MF) is used for regular floating-point exceptions. The good news is that you can intermix SSE/SSE2 and FP or MMX instructions in the regular registers. For some very basic information on using the vector instructions from GCC without having to write inline assembly code, see: http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html GCC also supports the <xmmintrin.h> header file for compatibility with MSFT and Intel C++ compilers, which is good because it's a lot easier to use. See here for a simple tutorial with links to other sites: http://www.codeproject.com/cpp/sseintro.asp In addition, I believe that compiling with "-march=pentium3", "-march=pentium4", or "-march=athlon-xp" will potentially use the MMX or SSE instructions automatically when beneficial, although I have not seen this first-hand. It will definitely use SSE for floating-point if you add "-mfpmath=sse". I've attached some very short test programs, one using MMX and the other using SSE, that you can use to test for example Linux vs. Syllable. Right now Syllable runs the MMX one but kills the SSE one with an **Illegal instruction**, which makes sense as SSE is not enabled yet. -- Jake |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-24 16:15:16
|
All, I have commited all of Jake Hamby's patches which change the atomic_t primitives throughout Syllable. This is part of our ongoing effort to improve SMP stability. As part of this change g++ requires a patch to one of the STL headers. A ZIP is attached which contains the patch, a pre-patched header and an install script. If you are keeping in sync with CVS and wish to build C++ with the atomic_t changes then you must install this patch. Future releases of GCC will include this patch by default. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-24 16:11:45
|
On Saturday 23 October 2004 9:03 pm, Jake Hamby wrote: > What's the status on importing my patches for the atomic primitives? All patches have now been checked in to CVS. They may take a little while to propogate to the anonymous CVS though. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-24 13:41:04
|
On Monday 25 October 2004 2:12 am, Kristian Van Der Vliet wrote: > On Saturday 23 October 2004 9:03 pm, Jake Hamby wrote: >> Anyway, I'm still looking at the SMP code. Can you give me a better >> idea of how things are failing when compiled with -O3? So far, the only >> real error message I got was from the original poster, William Rose, a >> "Divide error" when opening Terminal. What are the other symptoms? > > I'll capture a few kernel logs from both an -O3 and -O2 kernel; the > symptoms appear to be identical. Logs attached. I did six runs; three with a kernel compiled with -O3 and three with a kernel compiled with -O2 -mprefered-stack-boundry=2 Let me know if you need anything else. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-24 11:13:34
|
On Saturday 23 October 2004 9:03 pm, Jake Hamby wrote: > Kristian Van Der Vliet wrote: > > I may still accept these atomic patches as it's probably a good idea to > > move away from direct access of atomic_t types as the kernel and drivers > > do now, and I see no harm in moving to a more Linux-centric way of doing > > things. It will make drivers easier to port at least. But sadly it does > > not fix the immediate problem of SMP crashes. > > > > Back to scratching our heads it seems. > > What's the status on importing my patches for the atomic primitives? I'll get on with checking them in. Hopefully they'll be in CVS today; I'll let you know as soon as I'm done. > Anyway, I'm still looking at the SMP code. Can you give me a better > idea of how things are failing when compiled with -O3? So far, the only > real error message I got was from the original poster, William Rose, a > "Divide error" when opening Terminal. What are the other symptoms? I'll capture a few kernel logs from both an -O3 and -O2 kernel; the symptoms appear to be identical. > * Inlining a number of assembly routines from intel.s, such as > cli()/sti(), get/put_cpu_flags(), isa_read*(), isa_write*(), > flush_tlb(), save/load_fpu_state(), etc. Sounds sensible. > * Implement lazy FP context switching. Instead of spending the time to > save and restore the FPU state on every context switch, a flag can be > set to throw an exception the next time the FPU is used. For threads > that never touch the FPU, nothing has to be saved, and when a thread > accesses the FPU, the exception handler saves the FPU state for the last > thread that was using it, then restores the FPU state for the current > thread. If the current thread is also the last thread that was using > the FPU, then nothing has to be saved and the FPU is simply enabled. Sounds like a great idea to me. Anything that can save a few ticks on a context switch it always a good idea :) > This goes hand-in-hand with my plan to extend save_fpu_state() to use > FXSAVE instead of FSAVE on Pentium III and above (including Athlon), > which saves the SSE/SSE2 context as well as the FP/MMX context. This > would allow us to set the bit to enable SSE/SSE2 support so Pentium3/4 > and Athlon optimized vector instructions could be used. Would this allow MMX/SSE/SSE2 instructions in the kernel? Someone was asking about this a little while back, although to be honest I forget why.. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Jake H. <jh...@an...> - 2004-10-23 20:03:23
|
Kristian Van Der Vliet wrote: > > I may still accept these atomic patches as it's probably a good idea to move > away from direct access of atomic_t types as the kernel and drivers do now, > and I see no harm in moving to a more Linux-centric way of doing things. It > will make drivers easier to port at least. But sadly it does not fix the > immediate problem of SMP crashes. > > Back to scratching our heads it seems. > What's the status on importing my patches for the atomic primitives? I have a whole slew of new patches that I'm working on and it would be a lot easier for me to create diffs from CVS if the atomic patches were checked in. Also, since the new primitives require updating a file in the GCC include directory before compiling C++ apps, it's best if we make the change now and let everyone on syllable-developer know what's going on, so it'll get plenty of testing before 0.5.5. I don't plan to suggest any more changes that would break source compatibility like this, but in this case, I think the new primitives are definitely the way to go from a performance standpoint. Anyway, I'm still looking at the SMP code. Can you give me a better idea of how things are failing when compiled with -O3? So far, the only real error message I got was from the original poster, William Rose, a "Divide error" when opening Terminal. What are the other symptoms? Here's what I've been hacking on: * Inlining a number of assembly routines from intel.s, such as cli()/sti(), get/put_cpu_flags(), isa_read*(), isa_write*(), flush_tlb(), save/load_fpu_state(), etc. I was particularly concerned with the first two pairs, as they are used whenever interrupts are disabled, such as before acquiring a spinlock. * Change kmalloc() to warn whenever allocating 128K or larger. The current memory allocator (from Linux 2.0.x) is very inefficient when a power-of-two size is requested, as it has to round up to the next higher power-of-two due to a 16-byte overhead subtracted from each block. In other words, a 128K allocation would use up a 256K block, and so on. Even worse, the entire block must consistent of adjacent pages, so memory fragmentation is a real issue. Ultimately I want to import the new slab allocator introduced in Linux 2.2.x, which is more efficient but also much more complex and is still only intended for <128K allocations. In the meantime, introducing the warning allowed me to discover and fix the code in bcache.c to use create_area() instead of kmalloc() for its hash table, which is an exact power-of-two. Another case for potentially large kmalloc()'s is in copy_arg_list() when a _very_ long command line is passed. I'll have to see how Linux handles these types of scenarios as it doesn't allow kmalloc() over 128K at all. * Simplify array.c by removing code to calculate the values nTabCount, nAvgCount, and nMaxCount, which are never used anywhere and so can be safely removed from the structure in inc/array.h. * Implement lazy FP context switching. Instead of spending the time to save and restore the FPU state on every context switch, a flag can be set to throw an exception the next time the FPU is used. For threads that never touch the FPU, nothing has to be saved, and when a thread accesses the FPU, the exception handler saves the FPU state for the last thread that was using it, then restores the FPU state for the current thread. If the current thread is also the last thread that was using the FPU, then nothing has to be saved and the FPU is simply enabled. This goes hand-in-hand with my plan to extend save_fpu_state() to use FXSAVE instead of FSAVE on Pentium III and above (including Athlon), which saves the SSE/SSE2 context as well as the FP/MMX context. This would allow us to set the bit to enable SSE/SSE2 support so Pentium3/4 and Athlon optimized vector instructions could be used. -Jake |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-20 16:29:10
|
On Tuesday 19 October 2004 2:53 am, Jake Hamby wrote: > Kristian Van Der Vliet wrote: >> Attached is the result of tonights work. I've identified smp.c as the >> culprit (Makes sense, at least) I was working with the stock 0.5.4 >> source which did not include your previous patches for -O2, so I've used >> -O3 and -O1, although I will confirm tommorrow with -O1 and -O2 just to >> be 100% sure. > > Well, there is at least one small bug which could break a -O3 build: > the read_cpu_id() function is modifying the %ebx register but doesn't > tell GCC that it is. I've attached a patch. That still doesn't explain > why a -O2 build is also failing. I'll keep looking at it... Even though I don't pretend to fully understand it, the patch looks sensible enough and has been applied. It does not fix an -O3 kernel but it doesn't break anything either. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Brent P. N. <me...@ot...> - 2004-10-19 21:47:41
|
(Replied from syllable-developer) Quoting Jake Hamby <jh...@an...>: [snip] > So, in terms of the correct fix. First, we should probably add > "-DCONFIG_SMP" to the default Makefile and wrap our SMP-specific stuff > with that, like Linux does. That way, a UP-optimized kernel could be > built if the user desires to. The main challenge is to figure out > exactly what our locking requirements are, and ideally, hold > sched_lock() for as short of a time as possible. This would improve > scheduling latency on both UP and SMP cases. [snip] Do we need to do this now? I ask this honestly. What, precisely, are the benefits, other than increased speed? -- Brent P. Newhall http://brent.other-space.com/ ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. |
|
From: Jake H. <jh...@an...> - 2004-10-19 01:53:51
|
Kristian Van Der Vliet wrote: > > Attached is the result of tonights work. I've identified smp.c as the culprit > (Makes sense, at least) I was working with the stock 0.5.4 source which did > not include your previous patches for -O2, so I've used -O3 and -O1, although > I will confirm tommorrow with -O1 and -O2 just to be 100% sure. Well, there is at least one small bug which could break a -O3 build: the read_cpu_id() function is modifying the %ebx register but doesn't tell GCC that it is. I've attached a patch. That still doesn't explain why a -O2 build is also failing. I'll keep looking at it... -- Jake |
|
From: Daniel G. <da...@fp...> - 2004-10-18 17:15:36
|
On Mon, 2004-10-18 at 05:03 +0100, Kristian Van Der Vliet wrote: > Attached is the result of tonights work. I've identified smp.c as the culprit > (Makes sense, at least) I was working with the stock 0.5.4 source which did > not include your previous patches for -O2, so I've used -O3 and -O1, although > I will confirm tommorrow with -O1 and -O2 just to be 100% sure. I'm not an asm guru, and the results of those two compiles are extremely different. Hardly surprising. :) I did read over the source in smp.c, and I didn't see anything that jumped out at me. One obvious (temprorary) solution is to just build this file with -O1 rather than -O2 or -O3. That would get smp working, and I don't think the optimizations will make any difference in this case. FWIW, the Linux kernel is built with -O2, and says not to use anything higher. Daniel |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-17 22:02:31
|
On Sunday 17 October 2004 7:33 pm, Jake Hamby wrote: > Kristian Van Der Vliet wrote: >> So far the only thing which has made any difference is to compile with >> -O1 instead of -O2, so it must be an over-agresive optimisation going on >> somewhere. It just doesn't appear to be in the atomic code. > > Can you figure out which file(s) need to be compiled with -O1 for the > kernel to work? You should be able to do a binary search: compile half > of the files with -O1 and keep dividing by two until you've isolated the > files which are responsible. Then post your results and I'll look at > the assembly output to see if anything pops out. Hopefully the problem > is isolated to one or two specific source files. Attached is the result of tonights work. I've identified smp.c as the culprit (Makes sense, at least) I was working with the stock 0.5.4 source which did not include your previous patches for -O2, so I've used -O3 and -O1, although I will confirm tommorrow with -O1 and -O2 just to be 100% sure. The zip contains the following files: smp.o.O1 - smp.c compiled with the following flags: -D__KERNEL__ -D__BUILD_KERNEL__ -no-fPIC -c -g -O1 -fno-strict-aliasing -Wall -march=i586 -mcpu=i586 smp.o.O3 - smp.c compiled with the following flags: -D__KERNEL__ -D__BUILD_KERNEL__ -no-fPIC -c -g -O3 -fno-strict-aliasing -Wall -march=i586 -mcpu=i586 smp.s.O1 - smp.c compiled with the following flags: -D__KERNEL__ -D__BUILD_KERNEL__ -no-fPIC -s -g -O1 -fno-strict-aliasing -Wall -march=i586 -mcpu=i586 smp.s.O3 - smp.c compiled with the following flags: -D__KERNEL__ -D__BUILD_KERNEL__ -no-fPIC -s -g -O3 -fno-strict-aliasing -Wall -march=i586 -mcpu=i586 smp.O1.dump - Output from objdump -ds smp.o.O1 smp.O3.dump - Output from objsump -ds smp.o.O3 If you need anything else please give me a shout and I'll get see what I can do. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Jake H. <jh...@an...> - 2004-10-17 18:33:25
|
Kristian Van Der Vliet wrote: > So far the only thing which has made any difference is to compile with -O1 > instead of -O2, so it must be an over-agresive optimisation going on > somewhere. It just doesn't appear to be in the atomic code. Can you figure out which file(s) need to be compiled with -O1 for the kernel to work? You should be able to do a binary search: compile half of the files with -O1 and keep dividing by two until you've isolated the files which are responsible. Then post your results and I'll look at the assembly output to see if anything pops out. Hopefully the problem is isolated to one or two specific source files. -- Jake |
|
From: Jake H. <jh...@an...> - 2004-10-17 17:21:45
|
Kristian Van Der Vliet wrote: > I'm afraid we seem to be on the wrong track; the patches do not fix the > instability on SMP systems. The symptoms are identical to the current 0.5.4 > kernel when compiled with Gcc 3.3 and -O2 optimisations E.g. random panics > during boot. That's too bad. Anyway, here's my final set of atomic patches for the appserver and filesystem code. The FS patch also fixes a few compiler warnings having to do with missing newlines at the end of some files and a commented-out section with spaces following the "\" token, and adds the "-fno-strict-aliasing" flag to get rid of some warnings. The appserver patch fixes <util/locker.h> and <gui/font.h> which must be installed to "/ainc" before compiling. It also inlines the ddriver_mmx.c code into ddriver.cpp (the only place where it's used), which adds a couple of new "use of memory input without lvalue in asm operand x is deprecated" warnings which appear to be harmless as they also occur when compiling the MPlayer code from which ddriver_mmx.c was taken. Finally, the atomic_gthr_default_h.diff patch must be applied in /usr/gcc/include/c++/3.3.4/i586-pc-syllable/bits before compiling the appserver. In CVS it should be applied to the "gthr-syllable.h" files in system/apps/utils/Builder/packages/gcc-*/patches/gcc. I notice that I missed an opportunity to change "atomic_add( &mutex->count, -1 );" to "atomic_dec( &mutex->count );" in one location so that could also be done for a tiny performance improvement. Like I said in my post to syllable-developer, it would be nice if we had an UPDATING file in CVS to alert users of changes like this where manual intervention is needed. Anyway, I'm sorry to hear that these changes didn't fix your problem, but I'm not too surprised. They do seem to be a necessary precondition for figuring out what the real problem is. If I see anything else suspicious, I'll be sure to let you know. -- Jake |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-17 13:46:55
|
On Saturday 16 October 2004 9:25 pm, Jake Hamby wrote: > As discussed recently on syllable-developer (in the "LiveCD 0.5.4 Aterm > problems" thread), the atomic primitives in Syllable badly needed > updating for SMP safety. > > <snip> > > Amazingly, the new kernel appears to run okay on my VMware > setup, although I don't have an SMP installation to test on, and so this > should be considered highly experimental. I'm afraid we seem to be on the wrong track; the patches do not fix the instability on SMP systems. The symptoms are identical to the current 0.5.4 kernel when compiled with Gcc 3.3 and -O2 optimisations E.g. random panics during boot. So far the only thing which has made any difference is to compile with -O1 instead of -O2, so it must be an over-agresive optimisation going on somewhere. It just doesn't appear to be in the atomic code. I may still accept these atomic patches as it's probably a good idea to move away from direct access of atomic_t types as the kernel and drivers do now, and I see no harm in moving to a more Linux-centric way of doing things. It will make drivers easier to port at least. But sadly it does not fix the immediate problem of SMP crashes. Back to scratching our heads it seems. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |
|
From: Jake H. <jh...@an...> - 2004-10-17 00:47:09
|
Daniel Gryniewicz wrote: > > Great work. I especially like the atomic_add(..., -1) -> atomic_dec(). > Some of those old uses were quite scary... I'm not surprised we had SMP > issues. > > I didn't notice anything wrong with the patch. You might want to > consider a few new local variables, where consecutive atomic_read()'s > occure on the same variable. For example, this hunk: > > It seems quite possible that p_nCount gets changed between the reads. > It would only cause printing problems, but could make debugging harder, > and should be easy to fix. Yeah, I noticed that too, but didn't want to change too much at once. Anyway, here are my patches to the kernel/drivers directory for the new primitives (install the new headers from my last patch with "make dist" before compiling). One nice thing was that several of the drivers were Linux ports that used the atomic_*() macros properly, so all I had to do for them was comment out the compatibility macros in "linuxcomp.h". I also fixed all the compiler warnings except for a single "implicit declaration of function 'ffs'" warning in sound/emu10k1/src/ac97_codec.c since I didn't know what header file to include and I'm not even sure whether ffs() can be called by drivers since it doesn't seem to be exported by the kernel. Actually, I'm a bit suspicious of the emu10k1 driver since I had to add atomic versions of test_bit() and set_bit() from Linux to emu_wrapper.h. The current version was using atomic_and() in efxmgr.c when it should have been using test_bit(). Unfortunately, I don't have access to an EMU10K1 based sound card so I have no way of testing my changes. -- Jake |
|
From: Daniel G. <da...@fp...> - 2004-10-16 22:39:56
|
On Sat, 2004-10-16 at 13:25 -0700, Jake Hamby wrote:
> As discussed recently on syllable-developer (in the "LiveCD 0.5.4 Aterm
> problems" thread), the atomic primitives in Syllable badly needed
> updating for SMP safety. Based on a suggestion by Daniel Gryniewicz, I
> replaced the primitives in atheos/atomic.h with versions based on Linux
> 2.6.8's include/asm-i386/atomic.h. Many locations in the kernel needed
> to be patched, but the patches were generally minor. Also, I replaced
> occurrences of atomic_add(&foo, 1) and atomic_add(&foo, -1) with
> atomic_inc(&foo) and atomic_dec(&foo), which are slightly faster and
> more self-explanatory.
>
> I also fixed a few typos and changed kernel/debug.c to use ints instead
> of floats. Amazingly, the new kernel appears to run okay on my VMware
> setup, although I don't have an SMP installation to test on, and so this
> should be considered highly experimental. I still need to patch the
> drivers and there may be other code that is broken by this change.
> Fortunately, the size of atomic_t stayed the same so new and old code
> should interoperate. If you have any trouble with the bz2'd attachment,
> I can make the patches available for download.
>
> Enjoy!
> --
> Jake Hamby
Great work. I especially like the atomic_add(..., -1) -> atomic_dec().
Some of those old uses were quite scary... I'm not surprised we had SMP
issues.
I didn't notice anything wrong with the patch. You might want to
consider a few new local variables, where consecutive atomic_read()'s
occure on the same variable. For example, this hunk:
@@ -242,14 +243,14 @@
for ( ; psTmp <= psPage; psTmp++ )
{
- if ( 0 != psTmp->p_nCount )
+ if ( 0 != atomic_read( &psTmp->p_nCount ) )
{
- printk( "Page %d present on free list
with count of %d\n", psTmp->p_nPageNum, psTmp->p_nCount );
+ printk( "Page %d present on free list
with count of %d\n", psTmp->p_nPageNum,
atomic_read( &psTmp->p_nCount ) );
}
- atomic_add( &psTmp->p_nCount, 1 );
+ atomic_inc( &psTmp->p_nCount );
psTmp->p_psNext = NULL;
g_nAllocatedPages++;
- g_sSysBase.ex_nFreePageCount--;
+ atomic_dec( &g_sSysBase.ex_nFreePageCount );
}
break;
}
It seems quite possible that p_nCount gets changed between the reads.
It would only cause printing problems, but could make debugging harder,
and should be easy to fix.
Reading this patch made me think I should be re-visiting the atomic
usage in the irq code... <sigh>
Great work, thank you much.
Daniel
|
|
From: Jake H. <jh...@an...> - 2004-10-16 20:25:38
|
As discussed recently on syllable-developer (in the "LiveCD 0.5.4 Aterm problems" thread), the atomic primitives in Syllable badly needed updating for SMP safety. Based on a suggestion by Daniel Gryniewicz, I replaced the primitives in atheos/atomic.h with versions based on Linux 2.6.8's include/asm-i386/atomic.h. Many locations in the kernel needed to be patched, but the patches were generally minor. Also, I replaced occurrences of atomic_add(&foo, 1) and atomic_add(&foo, -1) with atomic_inc(&foo) and atomic_dec(&foo), which are slightly faster and more self-explanatory. I also fixed a few typos and changed kernel/debug.c to use ints instead of floats. Amazingly, the new kernel appears to run okay on my VMware setup, although I don't have an SMP installation to test on, and so this should be considered highly experimental. I still need to patch the drivers and there may be other code that is broken by this change. Fortunately, the size of atomic_t stayed the same so new and old code should interoperate. If you have any trouble with the bz2'd attachment, I can make the patches available for download. Enjoy! -- Jake Hamby |
|
From: Kristian V. D. V. <va...@li...> - 2004-10-11 18:53:24
|
On Sunday 10 October 2004 4:24 am, Maritimus wrote: > Simultaneous multiprocessing causes the system to lock up whenever the > Syllable kernel is compiled with GCC 3.x instead of 2.9x. I've just got a dual-CPU machine up and running. I've installed 0.5.4 it, and the kernel in 0.5.4 was compiled with Gcc 2.96 which, I have been told, is an acceptable workaround. With both CPU's enabled on that machine though it's as unstable as a chocolate fire-poker, locking up at random and applications which just do not start up. It looks like a much more general problem with SMP and Syllable 0.5.3/0.5.4 I havn't the faintest what could be causing it. -- Vanders http://syllable.sourceforge.net/ http://www.liqwyd.com |