Re: [Syllable-kernel] Patch: Updated atomic primitives

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Kristian Van Der Vliet wrote:
> 
> I may still accept these atomic patches as it's probably a good idea to move 
> away from direct access of atomic_t types as the kernel and drivers do now, 
> and I see no harm in moving to a more Linux-centric way of doing things.  It 
> will make drivers easier to port at least.  But sadly it does not fix the 
> immediate problem of SMP crashes.
> 
> Back to scratching our heads it seems.
> 

What's the status on importing my patches for the atomic primitives?  I 
have a whole slew of new patches that I'm working on and it would be a 
lot easier for me to create diffs from CVS if the atomic patches were 
checked in.  Also, since the new primitives require updating a file in 
the GCC include directory before compiling C++ apps, it's best if we 
make the change now and let everyone on syllable-developer know what's 
going on, so it'll get plenty of testing before 0.5.5.  I don't plan to 
suggest any more changes that would break source compatibility like 
this, but in this case, I think the new primitives are definitely the 
way to go from a performance standpoint.

Anyway, I'm still looking at the SMP code.  Can you give me a better 
idea of how things are failing when compiled with -O3?  So far, the only 
real error message I got was from the original poster, William Rose, a 
"Divide error" when opening Terminal.  What are the other symptoms?

Here's what I've been hacking on:

* Inlining a number of assembly routines from intel.s, such as 
cli()/sti(), get/put_cpu_flags(), isa_read*(), isa_write*(), 
flush_tlb(), save/load_fpu_state(), etc.  I was particularly concerned 
with the first two pairs, as they are used whenever interrupts are 
disabled, such as before acquiring a spinlock.

* Change kmalloc() to warn whenever allocating 128K or larger.  The 
current memory allocator (from Linux 2.0.x) is very inefficient when a 
power-of-two size is requested, as it has to round up to the next higher 
power-of-two due to a 16-byte overhead subtracted from each block.  In 
other words, a 128K allocation would use up a 256K block, and so on. 
Even worse, the entire block must consistent of adjacent pages, so 
memory fragmentation is a real issue.  Ultimately I want to import the 
new slab allocator introduced in Linux 2.2.x, which is more efficient 
but also much more complex and is still only intended for <128K 
allocations.  In the meantime, introducing the warning allowed me to 
discover and fix the code in bcache.c to use create_area() instead of 
kmalloc() for its hash table, which is an exact power-of-two.  Another 
case for potentially large kmalloc()'s is in copy_arg_list() when a 
_very_ long command line is passed.  I'll have to see how Linux handles 
these types of scenarios as it doesn't allow kmalloc() over 128K at all.

* Simplify array.c by removing code to calculate the values nTabCount, 
nAvgCount, and nMaxCount, which are never used anywhere and so can be 
safely removed from the structure in inc/array.h.

* Implement lazy FP context switching.  Instead of spending the time to 
save and restore the FPU state on every context switch, a flag can be 
set to throw an exception the next time the FPU is used.  For threads 
that never touch the FPU, nothing has to be saved, and when a thread 
accesses the FPU, the exception handler saves the FPU state for the last 
thread that was using it, then restores the FPU state for the current 
thread.  If the current thread is also the last thread that was using 
the FPU, then nothing has to be saved and the FPU is simply enabled. 
This goes hand-in-hand with my plan to extend save_fpu_state() to use 
FXSAVE instead of FSAVE on Pentium III and above (including Athlon), 
which saves the SSE/SSE2 context as well as the FP/MMX context.  This 
would allow us to set the bit to enable SSE/SSE2 support so Pentium3/4 
and Athlon optimized vector instructions could be used.

-Jake