I had been happily using JudySL for a text retrieval application and had
been using the stored keys as a list of words to search on. Now I need to do
Is there a simple way to make JSLF and JSLN match keys without respect to
case? I saw in JudySL.c the line where one could define which string
comparison function to use for determining order, but changing this from
strcmp to strcasecmp is not enough. I delved into the code a bit, but after
reviewing the third level of macro definitions, I gave up.
The main option that occurs to me is to store all keys in one case, do all
searches using a string forced to be in the same case, and store an encoding
of the case of the individual characters of the original key in the
structure pointed to JudySL..
Thought I'd inquire here before embarking on that.
From: Alan Silverstein <ajs@fr...> - 2008-10-06 04:47:36
Al and everyone,
> Is there a simple way to make JSLF and JSLN match keys without respect
> to case?
No simple way comes to mind. Bear in mind that libJudy is about fast,
efficient mapping from indexes to values, and other features like
rudimentary sorting are lucky accidents, not key features.
Consider that JudySL, given a string like "AbCdEf", actually stores
0x62416443 as a JudyL index pointing to a value which in turn points to
another JudyL array containing index 0x66450000, etc. So what you want
to do is to treat wholly different (from the JudyL point of view)
indexes as identical...
> I saw in JudySL.c the line where one could define which string
> comparison function to use for determining order...
I don't even recall that, and doubt it would work.
> I delved into the code a bit, but after reviewing the third level of
> macro definitions, I gave up.
Sorry about that. I wish we'd had inline functions. More recently Doug
has determined that on modern processors, actual function call overhead
is effectively zero and there's no reason to use macros OR inlines.
> The main option that occurs to me is to store all keys in one case, do
> all searches using a string forced to be in the same case, and store
> an encoding of the case of the individual characters of the original
> key in the structure pointed to JudySL..
Yeah, that should work. That's how a lot of systems do case-insensitive
Note well that one beauty of libJudy is that indexes are "compressed"
and shared. Unlike a lot of trees, the full index is seldom manifest.
Space is saved because common bytes only appear once. If you stick the
full string in the JudySL value area, that's wasteful -- if you care.
Since all you actually need to know is uppercase versus lowercase for
each character (byte), if space is an issue, you could store a bit
pattern, say given "AbCdEf", something like 101010 => 0xc80000, for
which bytes should be uppercased.