Large BLOB storage

Kevin Day
2005-08-18
2013-06-03
  • Kevin Day
    Kevin Day
    2005-08-18

    Following up on a comment by bryan about support for large data objects stored in a jdbm file:

    Is there any good reason to *not* use jdbm as a primitive file system?  It seems like it would be quite simple to break a file into chunks (similar to blocks in a real file system) and store each chunk as a record in the record manager, along with a next and previous record id (I would probably choose to store the file's definition in a BTree instead of in a linked list like this to better allow for random access, but that's an implementation detail).

    Most file systems incorporate logging now, so I suppose that there would be extra overhead of the jdbm log file writes, but if one was willing to accept that performance hit, then it seems that jdbm would work quite nicely as a bare bones file system - and the block size is variable, which opens up some nifty options for data storage that a fixed block file system doesn't allow for.

    I've run some tests inserting large numbers of "block sized" data chunks (4 KB), and the performance of jdbm is fairly consistent throughout the test (I went up 4 GB).  All tests were performed with my freelogicalXXXXX patch in place.

    In jdbm, I was able to write 350 blocks per second with transactions enabled.  Performance was IO limited (CPU was 30% during run).

    A test with writing directly to the file system achieved 4000 blocks per second.  Performance was IO limited (CPU was around 35-40% during the run).

    So we are talking about an order of magnitude difference - not great, but also not that bad depending on your system requirements.

    I may try the jdbm test in the profiler just to see where things are blocking - there may be some areas that could be improved (one that comes to mind is the free page manager - right now, it is iterating all of the pages in the free list looking for an open slot.  Depending on memory caching and IO, that could result in a bunch of unnecessary disk reads.)

    - K

     
    • Kevin Day
      Kevin Day
      2005-08-19

      It occured to me as I was digging in to things that, in order to really have a fair comparisson, my initial test was not really an apples-to-apples comparision.  The JDBM part of the test was syncing to disk, and the RandomAccessFile was not - definitely not a fair comparison.

      So, I've re-run the tests, and adding sync() calls to the RandomAccessFile at the same interval as my commit() calls to JDBM.

      The results are more promising:

      JDBM, transactions ENABLED, adding 1500 bytes per insert and commiting every 50 inserts has performance of around 650 insertions per second.

      RAF, adding 1500 bytes per insert, and calling sync every 50 writes has performance of around 1500 insertions per second.

      There's still a factor of 3 difference, most of which I attribute to the double write and double sync occuring in jdbm (once to the log and once to the db file).

      Given that the jdbm implementation is guaranteed to not have data corruption if the computer is turned off in the middle of a write, I'd say this is pretty darned good.

      - K