Menu

Newbie SnapRAID parity/calculation question

Help
Raymond
2016-11-30
2016-12-01
  • Raymond

    Raymond - 2016-11-30

    Recently discovered snapraid while researching RAID options. Pretty impressed with all the features and performance from a tiny 2.5MB binary! On my aging i7-930 system, I get about 125MB/s sync speed (1 data, 1 parity) with 5% cpu usage, whereas Windows RAID1 only gives about 100MB/s.

    Have some questions regarding parity disk, etc.

    Let's say I start out with 1 parity disk and 1 data disk. Since there is only one data disk, does this mean the parity data is the same as the data itself?

    Now, when I add a 2nd (or 3rd) data disk, and do a sync, can the new parity be computed from the existing parity plus the new data, or does the new parity needs to be re-computed from all of the source data again?

    When I use more than one parity disk, to protect against 2 HD failures, does the 2nd parity disk contain the same data as the 1st parity disk (ie, an identical copy like content), or they contain different data because the checksum is computed differently for each additional parity disk?

    My understanding is that I can use 2x 2TB disk as a single 4TB parity disk (to protect data from 4TB data disk), is this correct?

    Lastly, as everything is file based (snapraid works on top of an existing FS, not block level on disk), I am curious how snapraid determine which file gets mapped/matched to what block against other disks? (If it's disk based, there's a simple, easy to visualize 1:1 mapping between blocks on each disk for computing parity). Is this like a simple queue in some directory tree order (to fill up 'virtual' blocks), or some fancier algorithm to reduce the amount of recalculation needed when data changes?

    Thanks!

     
  • Leifi Plomeros

    Leifi Plomeros - 2016-11-30

    Let's say I start out with 1 parity disk and 1 data disk. Since there is only one data disk, does this mean the parity data is the same as the data itself?

    Yes, except that all file contents are merged into 1 big parity file with zeroes between and all metadata (filename, location, timestamps, etc) is in a separate file called snapraid.content.

    Now, when I add a 2nd (or 3rd) data disk, and do a sync, can the new parity be computed from the existing parity plus the new data, or does the new parity needs to be re-computed from all of the source data again?

    If you add a new data disk with only 10 MiB data on it, then you only need to recalculate parity for 10 MiB.

    When I use more than one parity disk, to protect against 2 HD failures, does the 2nd parity disk contain the same data as the 1st parity disk (ie, an identical copy like content), or they contain different data because the checksum is computed differently for each additional parity disk?

    The parities are different, which is the reason that you can recover from more than one data disk failure, if you have more than one parity file.

    My understanding is that I can use 2x 2TB disk as a single 4TB parity disk (to protect data from 4TB data disk), is this correct?

    Yes, like this: parity X:\p1a.parity,Y:\p1b.parity

    Lastly, as everything is file based (snapraid works on top of an existing FS, not block level on disk), I am curious how snapraid determine which file gets mapped/matched to what block against other disks? (If it's disk based, there's a simple, easy to visualize 1:1 mapping between blocks on each disk for computing parity). Is this like a simple queue in some directory tree order (to fill up 'virtual' blocks), or some fancier algorithm to reduce the amount of recalculation needed when data changes?

    Snapraid uses a parity file which is normally divided into 256 kiB blocks, which are then mapped against files which are virtually broken down into 256 kiB blocks that corresponds to the parity blocks. The snapraid.content file is used to keep track of this. If a file is smaller than 256 kiB snapraid pretends the gaps are full of zeroes for parity computational purposes.

     

    Last edit: Leifi Plomeros 2016-11-30
    • Raymond

      Raymond - 2016-12-01

      Thank you for the response!

      If you add a new data disk with only 10 MiB data on it, then you only need to recalculate parity for 10 MiB.

      Let's say I have 2 data + 1 parity disk. All 50% full and same size. I add another data disk, 50% full and same size.

      The new parity is computed from:

      1. existing parity + new data disk, or
      2. all 3 existing data disk

      In #2, all existing 2 data disks needs to be re-read, where as #1, only the parity disk is touched and the existing 2 data disk is not re-read.

       
      • Leifi Plomeros

        Leifi Plomeros - 2016-12-01

        All three data disks will be read 50% of total disk capacity and parity file will be modified an equal amount.
        Snapraid uses a "bottom-up strategy" and map new data to block 0 first, then block 1, then block 2, then block 3. Like this:

         

Log in to post a comment.

MongoDB Logo MongoDB