From: skaller <sk...@us...> - 2007-05-31 15:06:23
|
On Thu, 2007-05-31 at 11:42 +1000, skaller wrote: > Ok, I just changed Felix gc to replace a doubly linked list > with a JudyLArray and an auxiliary Judy1 array. The GC now > scans for garbage using JudyLFirst/Next functions instead > of tracing the doubly linked list. Test program timings: > > Without Judy: 1 minute 8 seconds > With Judy: 1 minute 28 seconds Now, I have implemented the interior pointer code; it looks like this: ///////////////////////////////////////////////// for(unsigned int i=0; i<n_offsets; ++i) { void **q = (void**)(void*)(p + offsets[i]); #ifndef JUDYGC if(*q) scan_object(CLIENT_TO_FRAME(*q)); #else Word_t cand = (Word_t)*q; if(cand) { Word_t fp=cand; Word_t *plen = (Word_t*)JudyLLast(ja,&fp,&je); // this is only true for object pointers.. // not interior pointers -- FOR TESTING ONLY! assert(fp == (Word_t)CLIENT_TO_FRAME((void*)cand)); if(plen && fp<=cand && cand < fp+*plen) scan_object((frame_t*)(void*)fp); } #endif } ////////////////////////////////////// Here, the non-Judy code uses a macro CLIENT_TO_FRAME to convert a pointer to the client part of a store to the address of the header which prefixes it. The header contains various information including a pointer to a shape object which includes the array of offsets of all pointers in the object, which this code is scanning. The Judy code does the same job, by taking a pointer and looking BACKWARDS for the previous registered memory block address, then checking the candidate pointer is between that address and the end of the block, which is obtained by adding the block length on. The judy array is a map: pointer -> length. The assertion holds in CURRENT Felix code, since we don't yet allow interior pointers. And I just noticed that the test fp <= cand is redundant too .. since the JudyLLast postcondition is if plen then fp <= cand i.e. if we get a non-null value back, the index returned is less than or equal to the input index. The timing on this code is: Judy with interior pointer check: 1 minute 31 seconds which is only marginally slower than without the check. The extra Judy function call is more or less free. I have no idea why but one can't complain! NOTE: at least part of the reason is almost certainly that the earliest nodes in the Judy tree are invariant and singleton, because all the malloc()d addresses are close to each other (in the 64 bit scheme of things .. ) So although in theory each lookup requires 8 memory accesses, the first 5 are probably permanently wired into the cache, and cost almost no machine cycles. One HUGE advantage of the Judy structure is that this performance optimisation is delivered WITHOUT compromising the ability of the routine to work on a machine with a fully populated zigabyte store (all 64 bits of address space mapped to memory .. :) -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net |