From: Jack S. <st...@sg...> - 2001-12-10 16:33:05
|
> > > > > > >Assuming the slab allocator manages by node, kmem_cache_alloc_node() & > >kmem_cache_alloc_cpu() would be identical (exzcept for spelling :-). > >Each would pick up the nodeid from the cpu_data struct, then allocate > >from the slab cache for that node. > > > > kmem_cache_alloc is simple - the complex operation is kmem_cache_free. > > The current implementation > - assumes that virt_to_page() and reading one cacheline from the page > structure is fast. Is that true for your setups? > - uses an array to batch several free calls together: If the array > overflows, then up to 120 objects are freed in one call, to reduce > cacheline trashing. > > If virt_to_page is fast, then a NUMA allocator would be a simple > extention of the current implementation: I can give you 1 data point. This is for the SGI SN1 platform. This is a NUMA platform & is running with the DISCONTIGMEM patch that is on sourceforge. The virt_to_page() function currently generates the following code: 23 instructions 18 "real" instructions 5 noop (I would like to believe the compiler can eventually use these instructions slots for something else) The code has 2 load instructions that are always reference node-local memory & have a high probability of hitting in the caches 1 load to the node that contains the target page I think I see a couple opportunities for reducing the amount of code. However, I consider the code to be "fast enough" for most purposes. > > * one slab chain for each node, one spinlock for each node. > * 2 per-cpu arrays for each cpu: one for "correct node" kmem_cache_free > calls , one for "foreign node" kmem_cache_free calls. > * kmem_cache_alloc allocates from the "correct node" per-cpu array, > fallback to the per-node slab chain, then fallback to __get_free_pages. > * kmem_cache_free checks to which node the freed object belongs and adds > it to the appropriate per-cpu array. The array overflow function then > sorts the objects into the correct slab chains. > > If virt_to_page is slow we need a different design. Currently it's > called in every kmem_cache_free/kfree call. BTW, I think Tony Luck (at Intel) is currently changing the slab allocator to be numa-aware. Are coordinating your work with his??? -- Thanks Jack Steiner (651-683-5302) (vnet 233-5302) st...@sg... |