On 11/16/2012 6:47 PM, john skaller wrote:
On 17/11/2012, at 10:43 AM, Tim Margheim wrote:

One of the most frequent operations I need to perform is a prefix 
search, i.e. "return the first element that starts with Index, or return 
null if nothing does".

I'm currently doing this by calling JudySLNext().  If it returns a 
non-null pointer, but the new Index doesn't start with the prefix, I 
still return null.

Two questions:
1.) Does anyone know of a more efficient way to do that with the current 
set of Judy functions?
The searching is close to optimal. The only cost is copying the unwanted
key, i.e. descending into the Judy tree copying characters as we go,
when we don't actually want that key.
In the first question, I was asking if anyone can think of a more efficient way to do a prefix search using the current Judy functions--not asking for something more efficient than the current functions.

It's hard to see how to improve on "JudySLNext() followed by an strncmp() on the returned key", but I thought it couldn't hurt to ask.

However subsequent insertion is expensive since the search
must be repeated, even when the position for a new key is known.

Someone made a "cursor" based version of Judy that fixed this,
however that code is not re-entrant and therefore unacceptable 
to me at least, because their modified Judy only supported
one cursor. 
Hmm, I might have to look into that--my current application isn't affected much by the insertion performance, but it could matter for another application that I have in mind.  And I could make do with one cursor.

However, if 0 is an invalid value for someone's JudyTree, then they don't need to do a search-then-insert-if-it-failed.  You can replace the initial find with an insertion.  If the pointed-to-value returned by the insertion is 0, then you know the key wasn't already in the tree.  If the pointed-to-value is anything other than 0, then you know the key was already in the tree.  So for those kinds of purposes, you don't need a cursor.

In fact, even if 0 is a valid value, you could get around it if your application would allow you to store "value+1" instead of "value".  The only cost is an addition/subtraction on storage & retrieval of the value, plus the slight reduction in the range of values you can store.

Also, if you modified the Judy insert function to accept a pointer to a bool, then the bool could return whether or not the insertion created a new element or returned an already-existing one. 

2.) Doug, have you thought about making a search function that will give 
up as soon as it determines that there are no matches beginning with 
Index?  What do you think about including it in one of the upcoming 
versions?
A more consistent function would be 

	find-between

Good point.  And it would be simple enough to write an inline prefix-search function that auto-generates the upper-bound string and wraps a call to find-between.  (Though if you have to call the wrapper function multiple times, it would involve multiple allocations of the upper-bound buffer, so you might want to use the find-between function directly in that case.  Or make a version of the wrapper where the upper-bound buffer is passed in and then populated by the wrapper.)

On the other hand, you'd also want to carefully compare the performance of a find-between with the performance of a prefix search.  It might be that the find-between requires some additional instructions.  Still, find-between would certainly perform better than running Next and then checking if the returned key is outside your bounds.


---
Tim