When doing many changes inside the transaction, many small write operations can be performed which slows down performance. Therefore I added buffering layer between storage and underlying stream (above transaction stream). All writes during transaction are now buffered in memory and flushed to underlying stream on transaction commit.
I started making changes to implement back the stream table. One stream is used to hold stream table that maps streamId to first segment position. For now the table is a simple table which is also fully chached in memory but the idea is to later replace it's structure with B+ tree.
When I did the changes I had many problems. I had to fix bugs which where difficult to locate but still it crashed with my testing tool. So one day I just rewrote the core part from scratch. The whole engine that allocates segments is now new. I kept some of the parts like StorageStream and transaction system.... read more
Since I didn't have much time it took me a while to fix a bug when doing transaction rollback. I've known that this bug exists for some time but I just couldn't find the cause because the rollback just created the state because from which it failed some time later. Just today, I finally found where the problem was. I actually found a couple of problems (some potential) which are now fixed.
Now I need to perform a big high-load test which will simulate all forms of usages, including crashing the process running the storage to see if transaction system will correctly rollback the transaction.
Stream table (that maps stream IDs to location of first segment in underlying stream) is something that I need to make better. For the simplicity, stream table only exists in memory. When storage is opened it needs to be rebuilt by going through all segments in underlying stream and checking if segment contains StreamMetadata structure. If yes, storage adds it to stream table.
But, since scanning if of course slow, stream table is cached in the storage when it is closed. It is stored inside the empty space (actually in the stream representing empty space). So next time when storage opens it loads the stream table back instead of scanning the whole storage.... read more
I have a plan to implement snapshots. Imagine that you have a storage full of data and you want to do some tests on it by changing it's contents. But since you don't actually want to change the storage, snapshots could be used to keep the state.
The idea is same as implemented in transactions but turned arround. When a snapshot is created, every write to the underlying stream will actually write data to a temporary snapshot instead keeping the underlying stream unchanged. Snapshot will keep track of areas where data has been written. Also, snapshot will contain all the data that has changed. During reading, snapshot will check if data read is contained in the snapshot at current position and read it from snapshot if it is, otherwise read it from underlying stream.... read more
I recently added full transaction support to TmStorage. I've implemented transactions on a lowest level of TmStorage - underlying stream where storage stores all it's data.
During the transaction, every write to underlying stream will first backup data being overwritten into the transaction log. Transaction will then mark which parts of underlying stream are already backed up. On rollback transaction will simply copy data from transaction log back to underlying stream.... read more
From the beginning I didn't pay much attention to speed. Of course, I didn't want TmStorage to be slow but I always avoided complexity to prevent hard-to-find bugs from appearing. The end result is a fast storage - at least I think so.
Number of data streams
At the start of development I said to miself that when TmStorage contains one million data streams it must be reasonably fast. Well, I was surprised when I saw that it was very fast at that numbers. Although I havent really measured the times but it doesn't even seem to be slower. I'm very pleased to see that I far exceeded my expectations. I will fill it up with ten million data streams one day.... read more
Here it is. After long time TmStorage is available as open source. I planned to put it out a year ago but I found some more bugs and I also wanted to change a few things in the hart of the library. Therefore I wrote it mostly from scratch in just two months . Current results are very promising and TmStorage works pefectly.
Development & history
Long time ago I came to an idea of how to manage space inside the storage. Before that I read articles of how i.e. FAT32 file system works but I took different approach. My idea was to manage space by using segments. Each data stream inside storage would be composed of one or more segments linked together. Even empty space is composed of segments. Storage only needs to allocate or deallocate segments when data streams are enlarged or shrinked. Only two operations on segments are required for everything to work: split and merge. This structure proved to be excelent.... read more