Currently, a single cabinet is used to store the hashes (expensive and very random searches, unsuccessful for unique blocks) and block data. TC is good for storing massive amounts of tiny pieces, but doesn't appear to scale well when storing big chunks of data.
What if the block data was stored in a separate physical file (optionally on a raw block device) and the hash cabinet simply mapped each hash key to a block number? Block number * block size = offset in the block data file (or even a raw block device, if so configured)
The hash database would be smaller, cheaper to search and both read and write ops would be far more efficient.
While I understand that this approach might involve some additional work to ensure filesystem integrity as well as re-thinking of delete/recycle logic, the performance gain would almost certainly be worth the effort.
Good thinking and a correct analysis. I do have to say though that the searches are mostly done against the dbu database that has a tigerhash as key and an unsigned long as value. Searches against de dbdta database are only done when we really need the data.
That said storing the data in tokyocabinet is not the best thing to do. The big est problem is that this database will fragment easily and there is no clean way to defragment it. The code that I am working on now will add snapshot support and change the dbdta database to a flat file. In the block database per inode ->blocknr now not only the hash is stored but also the offset in the file and the number of bytes that this block is comprised of. This will enable defragmentation to be implemented in a way that makes sense.
Performance will go up with another 5~10% as far as I can conclude from a small test on my laptop.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.