Is it possible to serialize to disk the memory data (binary) used by a judy array? So basically, let's say you create your array then you save it on the disk in binary form, then later you can restore it and have it back without repeating the previous operations.
Could you please provide a sample of how one can accomplish this?
Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share what you know at http://ca.answers.yahoo.com
From: Alan Silverstein <ajs@fr...> - 2008-01-26 20:41:34
> Is it possible to serialize to disk the memory data (binary) used by a
> judy array? So basically, let's say you create your array then you
> save it on the disk in binary form, then later you can restore it and
> have it back without repeating the previous operations.
> Could you please provide a sample of how one can accomplish this?
Unfortunately there's no API for this, unless someone has created one
since 2002 when libJudy was opensourced and I don't know about it. We
did talk about it a great deal before 2002, but it's another possible
feature that fell by the wayside when the project was canceled.
We called this, "persistent Judy arrays." We understand the desire for
them. The closest we got was to creating some batch insertion functions
that, if I recall right, are undocumented but in the source code, and,
"not known not to work." You might look for those.
Using them would still require a first/next loop to write out all array
values in some form (ASCII or binary) that is later read back for
re-insertion (batch or not).
-- Note that the hard part about saving any binary data structure out of
memory to disk is what's commonly referred to as "pointer fixups". You
can't ensure that the data blocks constituting your "database" will come
back to the same memory addresses. Therefore any node pointing to any
other node is volatile. You'd need a way to keep track of them all
("meta-data") and fix them up upon rereading.
The "meta-data" can be null on disk and created on the fly while
reading, if you have a way to scan the saved data to unambiguously
locate pointers/addresses in order to "fix them up." (You might even
use a temporary JudyL array to map old to new addresses. :-)
Also, applications that chunk their databases into large, self-managed
blocks can make pointer fixups (of their own, application-specific data)
simpler/faster. But late in JudyIV development we gave up on having our
own "small block memory manager" because straight malloc()/free() calls
were just as good, if not better, given a decent malloc() library.
Also note that by its nature, Judy arrays have many relatively small
nodes, meaning many pointers, although they are often held in a very
compressed form, not just simple addresses. So you can see, it almost
might not pay to try to save and restore the binary data anyway. The
pointer fixup overhead time might swamp any savings over simple
I'm sure Doug will follow up with more/different perspective.