Re: Prefix-only search?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 11/16/2012 6:47 PM, john skaller wrote:
> On 17/11/2012, at 10:43 AM, Tim Margheim wrote:
>
>> One of the most frequent operations I need to perform is a prefix
>> search, i.e. "return the first element that starts with Index, or return
>> null if nothing does".
>>
>> I'm currently doing this by calling JudySLNext().  If it returns a
>> non-null pointer, but the new Index doesn't start with the prefix, I
>> still return null.
>>
>> Two questions:
>> 1.) Does anyone know of a more efficient way to do that with the current
>> set of Judy functions?
> The searching is close to optimal. The only cost is copying the unwanted
> key, i.e. descending into the Judy tree copying characters as we go,
> when we don't actually want that key.
In the first question, I was asking if anyone can think of a more 
efficient way to do a prefix search /using/ the current Judy 
functions--not asking for something more efficient /than/ the current 
functions.

It's hard to see how to improve on "JudySLNext() followed by an 
strncmp() on the returned key", but I thought it couldn't hurt to ask.

> However subsequent insertion is expensive since the search
> must be repeated, even when the position for a new key is known.
>
> Someone made a "cursor" based version of Judy that fixed this,
> however that code is not re-entrant and therefore unacceptable
> to me at least, because their modified Judy only supported
> one cursor.
Hmm, I might have to look into that--my current application isn't 
affected much by the insertion performance, but it could matter for 
another application that I have in mind.  And I could make do with one 
cursor.

However, if 0 is an invalid value for someone's JudyTree, then they 
don't need to do a search-then-insert-if-it-failed.  You can replace the 
initial find with an insertion.  If the pointed-to-value returned by the 
insertion is 0, then you know the key wasn't already in the tree.  If 
the pointed-to-value is anything other than 0, then you know the key 
/was/ already in the tree.  So for those kinds of purposes, you don't 
need a cursor.

In fact, even if 0 /is/ a valid value, you could get around it if your 
application would allow you to store "value+1" instead of "value".  The 
only cost is an addition/subtraction on storage & retrieval of the 
value, plus the slight reduction in the range of values you can store.

Also, if you modified the Judy insert function to accept a pointer to a 
bool, then the bool could return whether or not the insertion created a 
new element or returned an already-existing one.

>> 2.) Doug, have you thought about making a search function that will give
>> up as soon as it determines that there are no matches beginning with
>> Index?  What do you think about including it in one of the upcoming
>> versions?
> A more consistent function would be
>
> 	find-between

Good point.  And it would be simple enough to write an inline 
prefix-search function that auto-generates the upper-bound string and 
wraps a call to find-between.  (Though if you have to call the wrapper 
function multiple times, it would involve multiple allocations of the 
upper-bound buffer, so you might want to use the find-between function 
directly in that case.  Or make a version of the wrapper where the 
upper-bound buffer is passed in and then populated by the wrapper.)

On the other hand, you'd also want to carefully compare the performance 
of a find-between with the performance of a prefix search.  It might be 
that the find-between requires some additional instructions.  Still, 
find-between would certainly perform better than running Next and then 
checking if the returned key is outside your bounds.

---
Tim