Recently discovered snapraid while researching RAID options. Pretty impressed with all the features and performance from a tiny 2.5MB binary! On my aging i7-930 system, I get about 125MB/s sync speed (1 data, 1 parity) with 5% cpu usage, whereas Windows RAID1 only gives about 100MB/s.
Have some questions regarding parity disk, etc.
Let's say I start out with 1 parity disk and 1 data disk. Since there is only one data disk, does this mean the parity data is the same as the data itself?
Now, when I add a 2nd (or 3rd) data disk, and do a sync, can the new parity be computed from the existing parity plus the new data, or does the new parity needs to be re-computed from all of the source data again?
When I use more than one parity disk, to protect against 2 HD failures, does the 2nd parity disk contain the same data as the 1st parity disk (ie, an identical copy like content), or they contain different data because the checksum is computed differently for each additional parity disk?
My understanding is that I can use 2x 2TB disk as a single 4TB parity disk (to protect data from 4TB data disk), is this correct?
Lastly, as everything is file based (snapraid works on top of an existing FS, not block level on disk), I am curious how snapraid determine which file gets mapped/matched to what block against other disks? (If it's disk based, there's a simple, easy to visualize 1:1 mapping between blocks on each disk for computing parity). Is this like a simple queue in some directory tree order (to fill up 'virtual' blocks), or some fancier algorithm to reduce the amount of recalculation needed when data changes?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Let's say I start out with 1 parity disk and 1 data disk. Since there is only one data disk, does this mean the parity data is the same as the data itself?
Yes, except that all file contents are merged into 1 big parity file with zeroes between and all metadata (filename, location, timestamps, etc) is in a separate file called snapraid.content.
Now, when I add a 2nd (or 3rd) data disk, and do a sync, can the new parity be computed from the existing parity plus the new data, or does the new parity needs to be re-computed from all of the source data again?
If you add a new data disk with only 10 MiB data on it, then you only need to recalculate parity for 10 MiB.
When I use more than one parity disk, to protect against 2 HD failures, does the 2nd parity disk contain the same data as the 1st parity disk (ie, an identical copy like content), or they contain different data because the checksum is computed differently for each additional parity disk?
The parities are different, which is the reason that you can recover from more than one data disk failure, if you have more than one parity file.
My understanding is that I can use 2x 2TB disk as a single 4TB parity disk (to protect data from 4TB data disk), is this correct?
Yes, like this: parity X:\p1a.parity,Y:\p1b.parity
Lastly, as everything is file based (snapraid works on top of an existing FS, not block level on disk), I am curious how snapraid determine which file gets mapped/matched to what block against other disks? (If it's disk based, there's a simple, easy to visualize 1:1 mapping between blocks on each disk for computing parity). Is this like a simple queue in some directory tree order (to fill up 'virtual' blocks), or some fancier algorithm to reduce the amount of recalculation needed when data changes?
Snapraid uses a parity file which is normally divided into 256 kiB blocks, which are then mapped against files which are virtually broken down into 256 kiB blocks that corresponds to the parity blocks. The snapraid.content file is used to keep track of this. If a file is smaller than 256 kiB snapraid pretends the gaps are full of zeroes for parity computational purposes.
Last edit: Leifi Plomeros 2016-11-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
All three data disks will be read 50% of total disk capacity and parity file will be modified an equal amount.
Snapraid uses a "bottom-up strategy" and map new data to block 0 first, then block 1, then block 2, then block 3. Like this:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Recently discovered snapraid while researching RAID options. Pretty impressed with all the features and performance from a tiny 2.5MB binary! On my aging i7-930 system, I get about 125MB/s sync speed (1 data, 1 parity) with 5% cpu usage, whereas Windows RAID1 only gives about 100MB/s.
Have some questions regarding parity disk, etc.
Let's say I start out with 1 parity disk and 1 data disk. Since there is only one data disk, does this mean the parity data is the same as the data itself?
Now, when I add a 2nd (or 3rd) data disk, and do a sync, can the new parity be computed from the existing parity plus the new data, or does the new parity needs to be re-computed from all of the source data again?
When I use more than one parity disk, to protect against 2 HD failures, does the 2nd parity disk contain the same data as the 1st parity disk (ie, an identical copy like content), or they contain different data because the checksum is computed differently for each additional parity disk?
My understanding is that I can use 2x 2TB disk as a single 4TB parity disk (to protect data from 4TB data disk), is this correct?
Lastly, as everything is file based (snapraid works on top of an existing FS, not block level on disk), I am curious how snapraid determine which file gets mapped/matched to what block against other disks? (If it's disk based, there's a simple, easy to visualize 1:1 mapping between blocks on each disk for computing parity). Is this like a simple queue in some directory tree order (to fill up 'virtual' blocks), or some fancier algorithm to reduce the amount of recalculation needed when data changes?
Thanks!
Yes, except that all file contents are merged into 1 big parity file with zeroes between and all metadata (filename, location, timestamps, etc) is in a separate file called snapraid.content.
If you add a new data disk with only 10 MiB data on it, then you only need to recalculate parity for 10 MiB.
The parities are different, which is the reason that you can recover from more than one data disk failure, if you have more than one parity file.
Yes, like this: parity X:\p1a.parity,Y:\p1b.parity
Snapraid uses a parity file which is normally divided into 256 kiB blocks, which are then mapped against files which are virtually broken down into 256 kiB blocks that corresponds to the parity blocks. The snapraid.content file is used to keep track of this. If a file is smaller than 256 kiB snapraid pretends the gaps are full of zeroes for parity computational purposes.
Last edit: Leifi Plomeros 2016-11-30
Thank you for the response!
Let's say I have 2 data + 1 parity disk. All 50% full and same size. I add another data disk, 50% full and same size.
The new parity is computed from:
In #2, all existing 2 data disks needs to be re-read, where as #1, only the parity disk is touched and the existing 2 data disk is not re-read.
All three data disks will be read 50% of total disk capacity and parity file will be modified an equal amount.

Snapraid uses a "bottom-up strategy" and map new data to block 0 first, then block 1, then block 2, then block 3. Like this: