casync
Content-addressable data synchronization tool
...Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let's call this directory a "chunk store". At the same time, generate a "chunk index" file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream.