## Re: Judy1

 Re: Judy1 From: john skaller - 2008-05-16 22:34:11 ```On 17/05/2008, at 12:59 AM, Simen Kvaal wrote: > > What would be the most efficient way to do this? Would it depend on > the density of the array? Typically, I have ~1000 indices and ~5 bits > set. > > My present code traverses first A (using J1F and J1N on A) and using > J1T on B for each result. The result is then stored. Then, it repeats > the process on B. > > This is fast; about as fast as using bit-wise manipulations on > unsigned integers, which was my old representation of the patterns. > However, it seems to scale not so well when the largest indicex > present grows. I have two ideas. The first depends on your insert/delete vs read frequency: maintain d(A,B), d(B,A) and the j functions directly, and modify them on insertion and deletion. The second is to use a JudyL, and store a bit pattern in the data word, i.e. use a hybrid data structure. However given your small set sizes (5) and small maximum element value, it's not clear how to improve performance algorithmically: it's likely small machine/compiler dependent quirks will play a significant role in performance. 1024 bits is 128 bytes, which is pretty close to a cache line size, so your original bit array was probably quite fast. -- john skaller skaller@... ```

 Judy1 From: Simen Kvaal - 2008-05-16 14:59:40 ```Hi! I came across Judy1 yesterday and have incorporated it into my code, and it works wonderfully. My code heavily relies on a fast and memory-efficient way to represent a sparse pattern of bits -- hence, Judy1! To be more specific, my code repeatedly compares two variable patterns (Judy1 arrays) A and B, both containing a constant number of set bits, to find differing bit positions. For example, if A = (1, 4, 5, 10, 15) B = (1, 5, 6, 10, 20) Then i need the output d(A,B) = (4, 15), d(B,A) = (6, 20) and also j(A,B) = (2, 5) and j(B,A) = (3,5), where j(A,B) are the number in the sequence of the differing bits in A compared to B. What would be the most efficient way to do this? Would it depend on the density of the array? Typically, I have ~1000 indices and ~5 bits set. My present code traverses first A (using J1F and J1N on A) and using J1T on B for each result. The result is then stored. Then, it repeats the process on B. This is fast; about as fast as using bit-wise manipulations on unsigned integers, which was my old representation of the patterns. However, it seems to scale not so well when the largest indicex present grows. Thanks in advance for any advice! Regards, Simen Kvaal. -- ----- Simen Kvaal -- Ph.D student in Physics/Applied Maths ----- Centre of Mathematics for Applications, University of Oslo web: http://folk.uio.no/simenkva/ office: +47 22857708 ```
 Re: Judy1 From: john skaller - 2008-05-16 22:34:11 ```On 17/05/2008, at 12:59 AM, Simen Kvaal wrote: > > What would be the most efficient way to do this? Would it depend on > the density of the array? Typically, I have ~1000 indices and ~5 bits > set. > > My present code traverses first A (using J1F and J1N on A) and using > J1T on B for each result. The result is then stored. Then, it repeats > the process on B. > > This is fast; about as fast as using bit-wise manipulations on > unsigned integers, which was my old representation of the patterns. > However, it seems to scale not so well when the largest indicex > present grows. I have two ideas. The first depends on your insert/delete vs read frequency: maintain d(A,B), d(B,A) and the j functions directly, and modify them on insertion and deletion. The second is to use a JudyL, and store a bit pattern in the data word, i.e. use a hybrid data structure. However given your small set sizes (5) and small maximum element value, it's not clear how to improve performance algorithmically: it's likely small machine/compiler dependent quirks will play a significant role in performance. 1024 bits is 128 bytes, which is pretty close to a cache line size, so your original bit array was probably quite fast. -- john skaller skaller@... ```