On Mon, Dec 20, 2010 at 2:02 PM, john skaller <skaller@...
> On 21/12/2010, at 8:17 AM, Jim Lloyd wrote:
> > I agree that malloc/free has excellent performance and that it was a good
> choice to have Judy use malloc as opposed to some other memory allocator
> using a single heap. It is only because we want multiple pools that we want
> something other than malloc.
> Is your app multi-threaded? If not, you can use a global variable to store
> allocator data: very low cost compared to passing the allocator around.
> [You set the variable before each operation]
Yes, our app is multi-threaded . Actually, we have several different
applications that share common code and Judy is integral to all of these
applications. Some of these apps are batch and some are long-lived daemons.
All of them scale well up to 16 cores.
> The same method would work for multi-threaded apps with a lock to protect
> the allocator: pretty ugly and much higher performance cost.
No, that's not feasible. We can have literally millions of Judy arrays in
one process. One global lock shared by all Judy arrays would kill the
> > We're prepared to sacrifice a bit of performance due to the increase in
> complexity. Even if the underlying pool allocator is as good as malloc/free,
> there will be a small performance cost because a reference to the allocator
> will have to be stored in the root record for a Judy array and then passed
> down as a function argument in the various operations that traverse the
> internal tree/trie data structures. We believe the benefits will outweigh
> the cost.
> For you. The challenge would be to modify the source so this overhead is
> Say, by having two completely distinct versions of the source.. (Arggg not
> MACRO please .. :) The current build technology is very bad so lets not
> that even worse with more generators: we had to throw that stuff out
> it is too hard to manage.
I agree this is a challenge.
> My primary application for Judy is in a garbage collector, which maps
> to information about the allocated objects. Judy is also available "in
> language" as
> a data structure to hold integers (can't use pointers yet because it
> requires the gc
> know all about Judy, since Judy hides addresses).
> In this context, a small extra cost adding a new address to the Judy data
> would be acceptable: allocation is already expensive, so an extra 5-10%
> would be OK.
That's my assumption. I'd expect the added overhead to be under 1% when
amortized over the total cost (i.e. including the cost of the allocations).
There should be no extra cost to the Judy*Get operations since these don't
ever allocate. A Judy*Insert operation will be marginally more expensive,
even in the case where the key already exists and no allocation is
necessary, due to an additional argument on each stack frame. This is the
worst case for cost, because the other case of a Judy*Insert where the key
doesn't exist will have cost dominated by the cost of the memory allocation.
> The cost on a collection is another story though, i.e. the cost of
> *removing* values
> from a Judy array matters here because all the removals are done "at once"
> the reaping phase of the collector and so are observable by the programmer
> (think games and framerates).
> Do you need to de-allocate as well? Often pools are used to avoid
> costs: you just de-allocate the whole pool.
Yes, de-allocation is one of the big motivating factors for us, primarily in
our long-lived daemon applications.
> john skaller