From: Alan S. <aj...@fr...> - 2014-02-25 00:34:29
|
John et al, > WLOG is mathspeak for Without Loss of Generality. Sorry :) OK, WTF. :-) > Ocaml is a programming language. Ocaml strings are multiple words > with null padding at the end plus an indicator how many bytes of > padding got added. OK, I guess if you know the language, then the format is a good metaphor. So how do you know when you're in the last word, to say it's time to stop instead of continuing? Any pattern you can invent with nulls and a count, I can assert to exist somewhere in the middle of a longer string, so long as all bytes 0..255 are valid in strings. (No distinguished values like a terminator.) So far this just sounds to me like a variation on C length-terminated (by nulls) with a fancier termination code. Anyway if I had more time and energy, I'd be inclined to revisit this topic and maybe even write some CODE. Well, maybe someday when I'm caught up with O(N^2) other retirement tasks! OK, at least you motivated me to glance at Judy.h. I found this: // The JLAP_INVALID pattern never appears in a JudyL array pointer. Hence // applications that build trees of JudyL arrays can use this pattern to // distinguish a JudyL array value that is a pointer to another JudyL array // from one that is a pointer to some other, application-defined object. OK I remember now, there's ONE pointer LSB pattern an application can use to embed a non-Judy pointer (JudyL value) in a meta-trie (array of array of JudyL). This should be sufficient! Basically if all common substrings through some point in the meta-trie continue at least one level deeper, then you just make the present JudyL value be a pointer to another JudyL array. But if one or more substrings terminate "at this level" then you store a pointer (to your own user data) with JLAP_INVALID to mark it as such. Of course your meta-library must encode/decode this above libJudy while descending the meta-trie for any operation. "Your own user data" is some kind of node/struct that must include at least one of: A value area for 1..4 or 1..8 terminating substrings (depending on word size), and possibly also one of: A value area holding a pointer to a subsidiary JudyL array for superstrings. For example, suppose I store ALL of these trailing substrings: ... 0x01 ... 0x0100 ... 0x010000 ... 0x01000000 ... 0x01000000 0x05... ... 0x0102 ... 0x010200 ... 0x01020000 ... 0x01020000 0x05... ... 0x010203 ... 0x01020300 ... 0x01020300 0x05... ... 0x01020304 ... 0x01020304 0x05... Each group above occupies one key in a given JudyL array (meta-trie node), but as you can see, in the worst case (the first group), four different terminating strings plus one or more superstrings is represented. All appear under a four-byte key of 0x01000000 (since nulls are appended to shorter substrings). The value associated with that key is a JLAP_INVALID pointer to my own data structure where I list which combinations are valid, like this (informally): slot 1: yes + value # meaning 0x01 exists. slot 2: yes + value # meaning 0x0100 exists (and so on). slot 3: yes + value slot 4: yes + value slot 5: yes + value (which is a pointer to a JudyL array) Having never coded this, I don't know if it would pan out, or how it would perform. The size would have to be at least 5 bits + 5 words, or really 6 words, on a 32-bit machine. Just to be clearer, suppose I only stored these three strings/keys in an entire improved JudyNL array: length = 1: 0x01 length = 5: 0x01000000 0x05 length = 8: 0x01000000 0x05060708 The 0x01000000 key in the top JudyL array would point (with JLAP_INVALID) to a JudyNL special node that says: slot 1: yes + value # meaning 0x01 exists, and here's its value area. slot 2: no slot 3: no slot 4: no slot 5: yes + value (which is a pointer to a JudyL array containing 0x05 and 0x05060708) Got it? Cheers, Alan |