From: Doug B. <dou...@ya...> - 2009-02-01 08:06:17
|
Aleksey: First Three things: 1) You don't need to wait a year to ping me next time. I rarely loose email, but apparently lost this one. I really like to respond to feedback when I can. 2) I am very embarrassed that I did not know about "judyhash". My first impression is I am very pleased with you work. I think I have been asked recently the Judy STL question and answered "I dont know". I now have a good answer. 3) Some early feedback: Your performance graphs seem to only go to 1.6 Million and are not LOG(10) on the X axis. My current testing of JudyL goes to 1.2 Billion and Judy1 is way above that. Your measurements would be in the first pixel or two on a Linear X-Y plot. > I personally don't see why it cannot be done or why it is conseptually wrong. The fact that Judy is cache-efficient doesn't make an idea wrong. The idea (smarter/faster JudyNext()) has been pondered way back in 2001-2002. I did some breadboards then with not very acceptable results. The instruction times were a much bigger portion of the overall performance then. Today, I see 10+ instructions being executed per nanosecond (4.2GHz Core 2 duo - E8600), while memory speeds are still near a 100 nanosecond. This trend is likely to continue -- suggesting even less improvement of smarter/faster JudyNext(). Alan Silverstein (co-developer of Judy) even wrote Judy array dump/restore routines that were included in the Judy package, but never documented. I felt they were not fast enough compared to using a "JudyNext in a loop" to justify releasing it. Plus the user API was very "iffy". Actually, I keep thinking about the idea, but it always fell to low on my priority list -- especially now. I am working on an improved version of Judy up to my eye-balls. Finally, the breadboards are showing results. I believe the new Judy will be 2 X faster when it gets released. Preliminary results of Judy1 is showing Judy1Test() times in the 20nS range when in-cache and 130nS when out-of-cache (population in the Billions). This is contrasted to the speed of a pure bitmap being accessed with the same data set of 5nS in-cache and 70nS out-of-cache (One (1) cache-line fill time with a 512MB Bitmap). I am excited about this because it shows random "Get" times of large populations of less that 2 cache-line fills. (Note: "in-cache" means the 6MByte cache on this machine, there are smaller/faster caches). So "wrong" is not the problem, low priorty is. I will certainally will take another look at this when the new Judy is more mature. I would enjoy any feedback you could give on the Judy API. My C++ knowledge could fit in one hand. An example: a JudyNext could be written to return a small array (<100) of Indexes/Values? How to invalidate it when someone does an Insert/Delete/Free? What is a good API for C++? Thanks for your interest, Doug PS. I got curious on what is the current speed of JudyNext. It measures about 25-40nS on a random data set. I am using a 4.2GHz - 16GB core 2 duo - Intel E8600. Price in the ~$1000 range. That clocks out to about 120MB-240MB of Indexes per second. That would keep up with current storage devices. However, I will keep tabs on how JudyNext is doing during future development. ----- Original Message ---- From: Aleksey Cheusov <vl...@gm...> To: Jud...@li... Sent: Sunday, February 1, 2009 1:10:16 AM Subject: Re: AW: AW: New Judy version 1 year after >> I don't know how Judy works internally, but i think you do a search >> first (like JudyGet) of parameter passed to JudyNext/JudyPrev and then >> return the next/prev element. With the cursor interface you can store >> the current pointers (tree path) in the structure and use this info at >> the next call (without a new search at each call). ... > But it's important to note that for a > cache-smart algorithm like libJudy, the overhead time to do this after a > previous lookup is virtually zero. That's one reason we didn't worry > about giving you back some kind of quickref/"cursor" for the next call. It's true that Judy uses cache-smart algorithm. This is why Judy's iteration is rather efficient. But CPU usage may also be significant constituent. Two years ago, I wrote C++ stl-like templates that implement hash tables (maps and sets) using Judy as a backend instead of linear arrays. I have benchmarks comparing time of insert/delet/lookup operations and... iteration too. Benchmarks are here http://judyhash.sourceforge.net/benchmarks/bench_size65599.html http://judyhash.sourceforge.net/benchmarks/ See the graphic about an iteration. std::map and std::set (AVL tree) and google's SPARCE abnd DENSE hashes show a better performance. I think "cursor" idea for First/Last/Next/Prev functions will make Judy even more efficient especially for the case when there are lots of sequencial keys (n, n+1 etc.) keys in the array. I personally don't see why it cannot be done or why it is conseptually wrong. The fact that Judy is cache-efficient doesn't make an idea wrong. -- Best regards, Aleksey Cheusov. ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Judy-devel mailing list Jud...@li... https://lists.sourceforge.net/lists/listinfo/judy-devel |