A combination of the rsync algorithm and content-addressable storage. An efficient way to store and retrieve multiple related versions of large file systems or directory trees. An efficient way to deliver and update OS, VM, IoT and container images over the Internet in an HTTP and CDN friendly way. Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let's call this directory a "chunk store". At the same time, generate a "chunk index" file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream.
Features
- Operations on directory trees
- Operations on blob index files
- Operations on archives
- Operations involving ssh remoting
- Operations involving the web
- Encoding and decoding