Re: AW: AW: New Judy version

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Aleksey:

First Three things:

1) You don't need to wait a year to ping me next time.  I rarely loose
   email, but apparently lost this one.  I really like to respond to
   feedback when I can.

2) I am very embarrassed that I did not know about "judyhash".  My first
   impression is I am very pleased with you work.  I think I have been
   asked recently the Judy STL question and answered "I dont know".  I
   now have a good answer.

3) Some early feedback:  Your performance graphs seem to only go to 1.6
   Million and are not LOG(10) on the X axis.  My current testing of
   JudyL goes to 1.2 Billion and Judy1 is way above that.  Your
   measurements would be in the first pixel or two on a Linear X-Y plot.

> I personally don't see why it cannot be done or why it is conseptually
  wrong.  The fact that Judy is cache-efficient doesn't make an idea
  wrong.

The idea (smarter/faster JudyNext()) has been pondered way back in
2001-2002.  I did some breadboards then with not very acceptable
results.  The instruction times were a much bigger portion of the
overall performance then.  Today, I see 10+ instructions being executed
per nanosecond (4.2GHz Core 2 duo - E8600), while memory speeds are
still near a 100 nanosecond.  This trend is likely to continue --
suggesting even less improvement of smarter/faster JudyNext().  Alan
Silverstein (co-developer of Judy) even wrote Judy array dump/restore
routines that were included in the Judy package, but never documented.
I felt they were not fast enough compared to using a "JudyNext in a
loop" to justify releasing it.  Plus the user API was very "iffy".
Actually, I keep thinking about the idea, but it always fell to low on
my priority list -- especially now.  I am working on an improved version
of Judy up to my eye-balls.  Finally, the breadboards are showing
results.  I believe the new Judy will be 2 X faster when it gets
released.  Preliminary results of Judy1 is showing Judy1Test() times in
the 20nS range when in-cache and 130nS when out-of-cache (population in
the Billions).  This is contrasted to the speed of a pure bitmap being
accessed with the same data set of 5nS in-cache and 70nS out-of-cache
(One (1) cache-line fill time with a 512MB Bitmap).  I am excited about
this because it shows random "Get" times of large populations of less
that 2 cache-line fills.  (Note:  "in-cache" means the 6MByte cache on
this machine, there are smaller/faster caches).

So "wrong" is not the problem, low priorty is.  I will certainally will
take another look at this when the new Judy is more mature.

I would enjoy any feedback you could give on the Judy API.  My C++
knowledge could fit in one hand.  An example: a JudyNext could be
written to return a small array (<100) of Indexes/Values?  How to 
invalidate it when someone does an Insert/Delete/Free?  What is
a good API for C++?

Thanks for your interest,

Doug

PS.  I got curious on what is the current speed of JudyNext.  It
measures about 25-40nS on a random data set.  I am using a 4.2GHz - 16GB
core 2 duo - Intel E8600.  Price in the ~$1000 range.  That clocks out
to about 120MB-240MB of Indexes per second.  That would keep up with
current storage devices.  However, I will keep tabs on how JudyNext is
doing during future development.

----- Original Message ----
From: Aleksey Cheusov <vl...@gm...>
To: Jud...@li...
Sent: Sunday, February 1, 2009 1:10:16 AM
Subject: Re: AW: AW: New Judy version

1 year after

>> I don't know how Judy works internally, but i think you do a search
>> first (like JudyGet) of parameter passed to JudyNext/JudyPrev and then
>> return the next/prev element.  With the cursor interface you can store
>> the current pointers (tree path) in the structure and use this info at
>> the next call (without a new search at each call).
...
> But it's important to note that for a
> cache-smart algorithm like libJudy, the overhead time to do this after a
> previous lookup is virtually zero.  That's one reason we didn't worry
> about giving you back some kind of quickref/"cursor" for the next call.

It's true that Judy uses cache-smart algorithm. This is why Judy's
iteration is rather efficient. But CPU usage may also be significant
constituent. Two years ago, I wrote C++ stl-like templates that
implement hash tables (maps and sets) using Judy as a backend instead of
linear arrays. I have benchmarks comparing time of
insert/delet/lookup operations and...  iteration too.

  Benchmarks are here 

    http://judyhash.sourceforge.net/benchmarks/bench_size65599.html
    http://judyhash.sourceforge.net/benchmarks/

See the graphic about an iteration.  std::map and std::set (AVL tree)
and google's SPARCE abnd DENSE hashes show a better performance.
I think "cursor" idea for First/Last/Next/Prev functions will make Judy
even more efficient especially for the case when there are lots of
sequencial keys (n, n+1 etc.) keys in the array.

I personally don't see why it cannot be done or why it is conseptually wrong.
The fact that Judy is cache-efficient doesn't make an idea wrong.

-- 
Best regards, Aleksey Cheusov.

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Judy-devel mailing list
Jud...@li...
https://lists.sourceforge.net/lists/listinfo/judy-devel