Re: [lc-devel] WKdm to kernel mode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello Scott Kaplan,
>   
>> -- current WKdm uses Wk_word temp[1200] to store unpacked 10-bits i.e.
>> 1 word (32/64-bit) per 10-bits. Why doesn't it use 
>> short(16-bit) to store these 10-bit patterns in unpacked 
>> state? Although 2k memory saving is insignificant but why 
>> waste it for free? Was it done for some optimization purpose?
>>     
>
> The goal was efficiency.  Some architectures don't handle half-words as
> quickly as full words.  For the x86, there may be no difference.  I'd
> handle this as an empirical issue:  Compile is using shorts and test the
> speed; change in back to ints/longs and see if it's any faster.  If not,
> keep it as 16-bit values.
>
>   
Currently I have some problem getting timing information needed to see 
how much time de/compress()
is taking. I will definitely compare speeds for these cases. Based on 
results it may be useful to have arch
specific compile directives -- code needs minimal changes for short to 
ints/long conversion in this case,
so it would not be a problem to have some arch specific optimization too.

>> -- WKdm_compress() does kmalloc()/kfree() pair thrice per 
>> call to this function. Is it too much overhead? Should these 
>> temp buffers be made global and use locking to serialize 
>> request for them (this isn't performance friendly either)?
>>     
>
> The original code, which ran only in user-land, did no heap allocating.
> Given that the compressor will be called very frequently, it needs to be
> fast.  If you can allocate these spaces statically or on the stack, then
> do it and save the malloc/free operations.
>
>   
Kernel mode stack is only 4k/8k. So, it is not possible to allocate 
these temporary buffers (which are
about 4k when shorts are used and about 7k when ints/long are used) on 
stack. I was just thinking
to bring down temp buffers to <4k so that I can de/allocate single (0 
order) page for these buffers.
De/allocating 0-order pages is fast. This will eliminate need for triple 
kmalloc/kfree().
Or, maybe allocating a double page is still more efficient than triple 
kmalloc/kfree...I'll benchmark these cases.
>> If kernel preempts in this ceil(), will value of 'x' remain 
>> valid (kernel does not save and restore the floating point processor's
>> state) ?
>>     
>
> The problem is worse than that:  Since the kernel does not preserve FP
> registers, use of FP values in the kernel may clobber legitimate
> user-land values.  You can't do that.  You have to change the use of
> doubles to some kind of integer for kernel code.
>
>   
ceil() is now replaced with simple integer operations to get ceiling value.


Best Regards,
Nitin