If you go to my school and are in my info tech class, you may have heard of my new "sdfs" project. This project is now official, and I intend to make this an actually useful filesystem, not just a school project.
Currently, the code is not yet complete and the program is not usable, but I have planned how it'll work. The basic structure is this:
- Data is stored in fixed-sized units called "chunks". A chunk will typically be about 1MB to 16MB depending on the filesystem size and the bandwidth. The chunk size is defined when a filesystem is created, and can be changed at any time. However, changing the chunk size will not affect old chunks.
- Each chunk is divided in to "blocks". "Blocks" here does not refer to traditional filesystem blocks. Each block can have a different size. A block is generally a data structure in the filesystem, for example directory listings or stat structures.
- Each chunk has a unique ID, and each block has a locally-unique ID which is only unique within it's chunk. Blocks are referred to by its chunk's ID and its ID. Currently chunk ID's are 48 bit and block ID's are 16 bit, and the combined length, 64 bit, is enough to identify a block within a filesystem.
- Each chunk is stored in one or more "stores", aka storage accounts, and the redundancy level defines how many stored each chunk will be stored in. A redundancy level of 1 means no redundancy, and 2 means each chunk is stored in exactly 2 stores. If there are some chunks that are only in one store, a "rebalance" operation will copy them to another store.
- Extents. There are 2 types of extents: Chunk extents and file extents. Currently at mount time each store will have to download a complete index of all the chunks stored on that store, for performance reasons. To save memory and bandwidth, index entries may describe "chunk extents" -- a group of chunks with contiguous ID's. File extents are similar. Entries in a "file index" -- indexes that describe all the blocks in a file -- may describe a "file extent". This is when the "size" field in a file index entry is larger than the block that the entry points to. In this case, the rest of the data are stored in the blocks that are logically after the block that the entry points to. For example the next file data block in a chunk or the blocks in the next chunk(the first chunk with its ID greater then the current chunk).... read more