## Re: Judy1

 Re: Judy1 From: Doug Baskins - 2008-05-17 15:21:24 Attachments: Message as HTML ```Hi Simen: First, if I understand your email, I think you could solve the problem a little differently. Do J1N() on both arrays at the same time and advance the one that returns the lessor Index. All the data you need can be extracted when the J1N's return different Indexes. This will only require only one access on each data element in both arrays. For arrays of very large populations, J1N's are very fast because typically most of the data is in the cache from the last access (J1N). Second, I do not understand your statement: > However, it seems to scale not so well when the largest indicex present grows. If in your example that the number "20" were changed to 1234567890, I think the performance would be about the same because Judy1 does not store 0's (missing Indexes) in the array. I can answer more of your questions if you like. I am surprised that Judy1 is used for this "small" of problem. I am use to thinking in 100's of million of Indexes. Judy1 will store any combination of Indexes, in any order up to the 4 billion max while using less than 1GB of RAM. Doug Doug Baskins ----- Original Message ---- From: Simen Kvaal To: judy-devel@... Sent: Friday, May 16, 2008 9:59:17 PM Subject: Judy1 Hi! I came across Judy1 yesterday and have incorporated it into my code, and it works wonderfully. My code heavily relies on a fast and memory-efficient way to represent a sparse pattern of bits -- hence, Judy1! To be more specific, my code repeatedly compares two variable patterns (Judy1 arrays) A and B, both containing a constant number of set bits, to find differing bit positions. For example, if A = (1, 4, 5, 10, 15) B = (1, 5, 6, 10, 20) Then i need the output d(A,B) = (4, 15), d(B,A) = (6, 20) and also j(A,B) = (2, 5) and j(B,A) = (3,5), where j(A,B) are the number in the sequence of the differing bits in A compared to B. What would be the most efficient way to do this? Would it depend on the density of the array? Typically, I have ~1000 indices and ~5 bits set. My present code traverses first A (using J1F and J1N on A) and using J1T on B for each result. The result is then stored. Then, it repeats the process on B. This is fast; about as fast as using bit-wise manipulations on unsigned integers, which was my old representation of the patterns. However, it seems to scale not so well when the largest indicex present grows. Thanks in advance for any advice! Regards, Simen Kvaal. -- ----- Simen Kvaal -- Ph.D student in Physics/Applied Maths ----- Centre of Mathematics for Applications, University of Oslo web: http://folk.uio.no/simenkva/ office: +47 22857708 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Judy-devel mailing list Judy-devel@... https://lists.sourceforge.net/lists/listinfo/judy-devel ```

 Re: Judy1 From: Doug Baskins - 2008-05-17 15:21:24 Attachments: Message as HTML ```Hi Simen: First, if I understand your email, I think you could solve the problem a little differently. Do J1N() on both arrays at the same time and advance the one that returns the lessor Index. All the data you need can be extracted when the J1N's return different Indexes. This will only require only one access on each data element in both arrays. For arrays of very large populations, J1N's are very fast because typically most of the data is in the cache from the last access (J1N). Second, I do not understand your statement: > However, it seems to scale not so well when the largest indicex present grows. If in your example that the number "20" were changed to 1234567890, I think the performance would be about the same because Judy1 does not store 0's (missing Indexes) in the array. I can answer more of your questions if you like. I am surprised that Judy1 is used for this "small" of problem. I am use to thinking in 100's of million of Indexes. Judy1 will store any combination of Indexes, in any order up to the 4 billion max while using less than 1GB of RAM. Doug Doug Baskins ----- Original Message ---- From: Simen Kvaal To: judy-devel@... Sent: Friday, May 16, 2008 9:59:17 PM Subject: Judy1 Hi! I came across Judy1 yesterday and have incorporated it into my code, and it works wonderfully. My code heavily relies on a fast and memory-efficient way to represent a sparse pattern of bits -- hence, Judy1! To be more specific, my code repeatedly compares two variable patterns (Judy1 arrays) A and B, both containing a constant number of set bits, to find differing bit positions. For example, if A = (1, 4, 5, 10, 15) B = (1, 5, 6, 10, 20) Then i need the output d(A,B) = (4, 15), d(B,A) = (6, 20) and also j(A,B) = (2, 5) and j(B,A) = (3,5), where j(A,B) are the number in the sequence of the differing bits in A compared to B. What would be the most efficient way to do this? Would it depend on the density of the array? Typically, I have ~1000 indices and ~5 bits set. My present code traverses first A (using J1F and J1N on A) and using J1T on B for each result. The result is then stored. Then, it repeats the process on B. This is fast; about as fast as using bit-wise manipulations on unsigned integers, which was my old representation of the patterns. However, it seems to scale not so well when the largest indicex present grows. Thanks in advance for any advice! Regards, Simen Kvaal. -- ----- Simen Kvaal -- Ph.D student in Physics/Applied Maths ----- Centre of Mathematics for Applications, University of Oslo web: http://folk.uio.no/simenkva/ office: +47 22857708 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Judy-devel mailing list Judy-devel@... https://lists.sourceforge.net/lists/listinfo/judy-devel ```
 Re: Judy1 From: Simen Kvaal - 2008-05-19 07:44:21 ```Hi, and thank you all for replies. Doug, your suggestion for improving the algorithm makes perfectly sense, thank you. I will test a modified implentation ASAP. As for the scaling statement, i.e., Doug Baskins wrote: > Second, I do not understand your statement: > >> However, it seems to scale not so well when the largest indicex > present grows. > If in your example that the number "20" were changed to 1234567890, > I think the performance would be about the same because Judy1 > does not store 0's (missing Indexes) in the array. You might be right; it may be other factors that cause this apparent slowdown. I will address this in a proper benchmark run. Doug Baskins wrote: > > I can answer more of your questions if you like. I am surprised that Judy1 > is used for this "small" of problem. I am use to thinking in 100's of > million > of Indexes. > Thank you. The reason why I try Judy1 is this: I work in computational quantum mechanics, in which a finite set of so-called slater determinants are constructed. These are identified with bit patterns -- each bit set corresponds to the whereabouts of a quantum particle. Think of the particles "living" in the set of integers, and a bit set says that the particle at the momemt resides "there". Thus, a bit pattern specifies the state of a many-body quantum system. The number K of particles is fixes, while the number of possible bit positions N varies, and quickly approaches ~1000 for a realistic calculation. The number of bit patterns needed grows exponentially with K, and very fast and efficient way to store and process the bit patterns is needed. The "classical" approach is to store the patterns in real bit bit patterns in words. This is wasteful, especially when N is large. My tests show that Judy1 outperforms this naive approach. This answers, by the way, the comment by John Skaller, i.e, John Skaller wrote: >However given your small set sizes (5) and small maximum element value, >it's not clear how to improve performance algorithmically: it's likely small >machine/compiler dependent quirks will play a significant role in >performance. 1024 bits is 128 bytes, which is pretty close to a cache line >size, so your original bit array was probably quite fast. Yes, the original approach was *very* fast when N was in the range 64-128. However, when adding more words to create a Huge Integer class, things got slow ... Of course, there might be other ways to implement a bit-pattern. I have considered a simple packed storage, simply storing the indices, but I believe it will be slow to to the comparison thing with it. As for the hybrid storage scheme suggested by John: Maybe, it is not entirely clear to me how it could improve performance, and the size K is small, so it could introduce a lot of overhead. ... I will come back with more test results in a short while. Thank you all for your suggestions. regards, Simen Kvaal. -- ----- Simen Kvaal -- Ph.D student in Physics/Applied Maths ----- Centre of Mathematics for Applications, University of Oslo web: http://folk.uio.no/simenkva/ office: +47 22857708 ```