Re: [lc-devel] WKdm to kernel mode
Status: Beta
Brought to you by:
nitin_sf
From: Nitin G. <nit...@gm...> - 2006-04-24 19:10:47
|
Hello Scott Kaplan, > >> -- current WKdm uses Wk_word temp[1200] to store unpacked 10-bits i.e. >> 1 word (32/64-bit) per 10-bits. Why doesn't it use >> short(16-bit) to store these 10-bit patterns in unpacked >> state? Although 2k memory saving is insignificant but why >> waste it for free? Was it done for some optimization purpose? >> > > The goal was efficiency. Some architectures don't handle half-words as > quickly as full words. For the x86, there may be no difference. I'd > handle this as an empirical issue: Compile is using shorts and test the > speed; change in back to ints/longs and see if it's any faster. If not, > keep it as 16-bit values. > > Currently I have some problem getting timing information needed to see how much time de/compress() is taking. I will definitely compare speeds for these cases. Based on results it may be useful to have arch specific compile directives -- code needs minimal changes for short to ints/long conversion in this case, so it would not be a problem to have some arch specific optimization too. >> -- WKdm_compress() does kmalloc()/kfree() pair thrice per >> call to this function. Is it too much overhead? Should these >> temp buffers be made global and use locking to serialize >> request for them (this isn't performance friendly either)? >> > > The original code, which ran only in user-land, did no heap allocating. > Given that the compressor will be called very frequently, it needs to be > fast. If you can allocate these spaces statically or on the stack, then > do it and save the malloc/free operations. > > Kernel mode stack is only 4k/8k. So, it is not possible to allocate these temporary buffers (which are about 4k when shorts are used and about 7k when ints/long are used) on stack. I was just thinking to bring down temp buffers to <4k so that I can de/allocate single (0 order) page for these buffers. De/allocating 0-order pages is fast. This will eliminate need for triple kmalloc/kfree(). Or, maybe allocating a double page is still more efficient than triple kmalloc/kfree...I'll benchmark these cases. >> If kernel preempts in this ceil(), will value of 'x' remain >> valid (kernel does not save and restore the floating point processor's >> state) ? >> > > The problem is worse than that: Since the kernel does not preserve FP > registers, use of FP values in the kernel may clobber legitimate > user-land values. You can't do that. You have to change the use of > doubles to some kind of integer for kernel code. > > ceil() is now replaced with simple integer operations to get ceiling value. Best Regards, Nitin |