Suitability for multimap

  • Anonymous - 2010-06-08

    Hi. I have some code that builds an index over a potentially large data set. Currently this is implemented as hashtables. Some entries are 1:1, some are 1:N. For instance, one map has around five entries, each containing a collection of around 100,000 values. I tend to use Google's Multimap implementation for this.

    It has got to the point that it is too large for memory. I want to store the index on disk, suitably cached. Do you think JDBM's HTree implementation would be a suitable choice? My worry is that put(key, value) would mean a get, add, then put which, on a 100,000 size collection, would be wasteful. I wonder if anyone had any experience doing this, or could suggest a better data structure?

  • Kevin Day

    Kevin Day - 2010-06-08

    The way I handle this sort of thing is with a B+Tree that uses Key+Value as the key.

    1234:ABCD -> Null Object
    1234:DEFG -> Null Object

    So I don't use the values at all.  This gives me multi-map capability (I can browse the tree starting at 1234: and ending at 1234:zzzzzz or some other suitable maximum value.

    I don't know that this would translate to HTree, though.

    One option would be to store keys in the HTree, with values that are a pointer to a different data structure that contains your list.  BTree would probably be appropriate for your list in this case.

    Personally, I would go with the BTree only approach first and see if there are performance problems - it is by far the simplest approach.

  • Anonymous - 2010-06-10

    Thanks for the suggestion. I implemented this over the past couple of nights and it worked nicely.

    This tool is looking very promising.

    I'm now going to ramble on a few things to see if I can get the best possible use out of JDBM. If you have any input, do let me know.

    I started this because the index I am building got too big for memory. So I wanted to persist to disk somehow, but I didn't want the burden of a SQL store. My concerns were: performance hit, actual memory saved, disk space used. The first two, from my initial stress tests, seem fine. Disk usage, at about 180MB with transaction log switched off, is something I would like to improve on a little. Here are my ideas:

    The multipart keys you suggested are very long. For instance, their structure currently look like this:


    (Obviously this is representative data - my actual data is far, longer, probably in the order of seventy characters per full multipart key).

    My first inclination is to normalise. I could have HTrees for teams, positions and players. E.g:

    england -> A
    forward -> A
    midfielder -> B

    So the keys become:


    This would require (number of key parts) lookups before the full lookup. It should save space though, in principle, although I'm not sure if JDBM does anything similar itself.

    Also, I noticed mention of a 'compressor' in another thread but I see no existence in the version I downloaded.

    If you have any thoughts I'd be very interested.

  • Alex Boisvert

    Alex Boisvert - 2010-06-10

    I suggest you try Kevin's other suggestion then.   Create a HTree with your top-level keys and with pointers (recid; java long) to BTree containing the multiple values in associated with the key.   This should help with space consumption and keep your lookups fairly efficient.


  • Kevin Day

    Kevin Day - 2010-06-11

    The key compressor works very well in these situations - key compression has been in the code base for quite some time.  Here is code I use for constructing me indexes:

                    tree = BTree.createInstance(recman, keyComparator, keySerializer, EmptyValueSerializer.INSTANCE);
                    tree.setKeyCompressionProvider(new LeadingValueCompressionProvider());

  • Anonymous - 2010-06-18

    Thanks. I might be using the wrong version, but setKeyCompressionProvider is definitely not on the BTree API I have. I'm using the src distribution downloaded from

  • Kevin Day

    Kevin Day - 2010-06-23

    Check out and build from here:

    I don't know if there is an actual dstro (pre-built jar file) or not, but the above will get you what you are asking.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks