From: Tim M. <tma...@ne...> - 2012-11-19 16:48:15
|
On 11/16/2012 6:47 PM, john skaller wrote: > On 17/11/2012, at 10:43 AM, Tim Margheim wrote: > >> One of the most frequent operations I need to perform is a prefix >> search, i.e. "return the first element that starts with Index, or return >> null if nothing does". >> >> I'm currently doing this by calling JudySLNext(). If it returns a >> non-null pointer, but the new Index doesn't start with the prefix, I >> still return null. >> >> Two questions: >> 1.) Does anyone know of a more efficient way to do that with the current >> set of Judy functions? > The searching is close to optimal. The only cost is copying the unwanted > key, i.e. descending into the Judy tree copying characters as we go, > when we don't actually want that key. In the first question, I was asking if anyone can think of a more efficient way to do a prefix search /using/ the current Judy functions--not asking for something more efficient /than/ the current functions. It's hard to see how to improve on "JudySLNext() followed by an strncmp() on the returned key", but I thought it couldn't hurt to ask. > However subsequent insertion is expensive since the search > must be repeated, even when the position for a new key is known. > > Someone made a "cursor" based version of Judy that fixed this, > however that code is not re-entrant and therefore unacceptable > to me at least, because their modified Judy only supported > one cursor. Hmm, I might have to look into that--my current application isn't affected much by the insertion performance, but it could matter for another application that I have in mind. And I could make do with one cursor. However, if 0 is an invalid value for someone's JudyTree, then they don't need to do a search-then-insert-if-it-failed. You can replace the initial find with an insertion. If the pointed-to-value returned by the insertion is 0, then you know the key wasn't already in the tree. If the pointed-to-value is anything other than 0, then you know the key /was/ already in the tree. So for those kinds of purposes, you don't need a cursor. In fact, even if 0 /is/ a valid value, you could get around it if your application would allow you to store "value+1" instead of "value". The only cost is an addition/subtraction on storage & retrieval of the value, plus the slight reduction in the range of values you can store. Also, if you modified the Judy insert function to accept a pointer to a bool, then the bool could return whether or not the insertion created a new element or returned an already-existing one. >> 2.) Doug, have you thought about making a search function that will give >> up as soon as it determines that there are no matches beginning with >> Index? What do you think about including it in one of the upcoming >> versions? > A more consistent function would be > > find-between Good point. And it would be simple enough to write an inline prefix-search function that auto-generates the upper-bound string and wraps a call to find-between. (Though if you have to call the wrapper function multiple times, it would involve multiple allocations of the upper-bound buffer, so you might want to use the find-between function directly in that case. Or make a version of the wrapper where the upper-bound buffer is passed in and then populated by the wrapper.) On the other hand, you'd also want to carefully compare the performance of a find-between with the performance of a prefix search. It might be that the find-between requires some additional instructions. Still, find-between would certainly perform better than running Next and then checking if the returned key is outside your bounds. --- Tim |