From: Al P. <adp...@us...> - 2008-10-06 02:59:22
|
I had been happily using JudySL for a text retrieval application and had been using the stored keys as a list of words to search on. Now I need to do case-insensitive searching. Is there a simple way to make JSLF and JSLN match keys without respect to case? I saw in JudySL.c the line where one could define which string comparison function to use for determining order, but changing this from strcmp to strcasecmp is not enough. I delved into the code a bit, but after reviewing the third level of macro definitions, I gave up. The main option that occurs to me is to store all keys in one case, do all searches using a string forced to be in the same case, and store an encoding of the case of the individual characters of the original key in the structure pointed to JudySL.. Thought I'd inquire here before embarking on that. -Al -- Al Pacifico Seattle, WA |
From: Alan S. <aj...@fr...> - 2008-10-06 04:47:36
|
Al and everyone, > Is there a simple way to make JSLF and JSLN match keys without respect > to case? No simple way comes to mind. Bear in mind that libJudy is about fast, efficient mapping from indexes to values, and other features like rudimentary sorting are lucky accidents, not key features. Consider that JudySL, given a string like "AbCdEf", actually stores 0x62416443 as a JudyL index pointing to a value which in turn points to another JudyL array containing index 0x66450000, etc. So what you want to do is to treat wholly different (from the JudyL point of view) indexes as identical... > I saw in JudySL.c the line where one could define which string > comparison function to use for determining order... I don't even recall that, and doubt it would work. > I delved into the code a bit, but after reviewing the third level of > macro definitions, I gave up. Sorry about that. I wish we'd had inline functions. More recently Doug has determined that on modern processors, actual function call overhead is effectively zero and there's no reason to use macros OR inlines. > The main option that occurs to me is to store all keys in one case, do > all searches using a string forced to be in the same case, and store > an encoding of the case of the individual characters of the original > key in the structure pointed to JudySL.. Yeah, that should work. That's how a lot of systems do case-insensitive comparisons. Note well that one beauty of libJudy is that indexes are "compressed" and shared. Unlike a lot of trees, the full index is seldom manifest. Space is saved because common bytes only appear once. If you stick the full string in the JudySL value area, that's wasteful -- if you care. Since all you actually need to know is uppercase versus lowercase for each character (byte), if space is an issue, you could store a bit pattern, say given "AbCdEf", something like 101010 => 0xc80000, for which bytes should be uppercased. Cheers, Alan Silverstein |