I'm trying to implement jdbm for a little analytics job that needs to store a large number of key/value pairs (previously tried doing this just in RAM but ran out of heap space :-( ). I'm using the MRU cache policy to store hashtables per user in a stream of web log data, since all the records for a given user are likely to be processed around the same time.
I think I've finally got it working at a reasonable speed (incl. disabling transactions) but when I run it now the file size just seems to blow out hugely in a very short space of time. Could this be because my objects are growing in size regularly and the database isn't being packed (as discussed in another thread)?
Aslo, I'm assuming when we're talking about caching that the caching, e.g., with MRU, is happening in RAM. Is that right?
Yes, caching is only in RAM. Remember that you need to commit() periodically even if you do not use transactions as this releases some internal (transient) data structure.
Packing the database is not necessary unless you're doing a lot of updates or deletes. Continuous insertion shouldn't result in much dead space / fragmentation.
Thanks, alex. I was doing continual updates of hashtable entries where the value of the entry was a list that was growing pretty steadily, so I'm guessing that probably caused the blow out. I've taken a completely different tack on the project now and don't require JDBM but the above is all useful to know for next time :-)
Log in to post a comment.