Menu

Better data resilience with parity groups

Help
2016-08-13
2016-08-31
  • Leifi Plomeros

    Leifi Plomeros - 2016-08-13

    I think that a new option that allows to define parity groups, where specific data disks are not allowed share parity blocks with any other member of the the same parity group, could significantly increase the data resilience, when users have differently sized data disks.

    Imagine this setup:

    Data D1 E:\ (1 TB)
    Data D2 F:\ (1 TB)
    Data D3 G:\ (1 TB)
    Data D4 H:\ (1 TB)
    Data D5 I:\ (2 TB)
    Data D6 J:\ (2 TB)
    Data D7 K:\ (4 TB)
    Data D8 L:\ (4 TB)
    
    parity X:\parity1.par   (8 TB)
    2-parity Y:\parity2.par (8 TB)
    

    As far as I understand parity allocation it pretty much works bottom up. (When data is added to a data disk and you run sync snapraid allocates the first available parity block not already used by the data disk.)

    In the example above it would result in parity allocation like this:

    If we lose any 3 disks in this setup we permanently lose data.

    But if parity was instead allocated like this:

    Then we could literally lose all data disks and still be able to restore all files.

    The only thing different would be that when snapraid selects which parity block to use for new data it selects the first parity block not used by any disk in the same parity group instead of only the data disk where the data is.

    I imagine it could be defined in the config like this:

    paritygroup D1,D2,D3,D4,D5,D6
    paritygroup D7,D8
    

    It would of course also be nice if there was a sync option --Force-reallocate that reassigns parity for existing data if a parity group has been added or modified later.

     
  • John

    John - 2016-08-16

    Even if it might seem that I find your idea redundand I don't - at all - it's ALWAYS good to have flexibility.

    However I think the problem here is you're using the 8TB drives as 4TB drives. If you would be using them to full capacity, for example making a backup for D1, D2, D3, D4 on P1 and a backup of D5 and D6 on P2 you'll get more or less the same result, you can still recover from all data disks failing.

    Now granted maybe this isn't a good example for what you want (nothing that leaves unused space would be I guess). Let's say you have D1..D8 1TB drives (8 of them), D9 = 8TB and one parity P1 = 8TB (of course we're always ignoring the overhead). Now if you would have the disks D1-D8 you would "parity staggered" like in the improvement proposed you could very well lose all D1..D8 (8 disks) and still lose no data!

    That's all good but what would be the drawbacks (I suspect there is some law of conservation at play here so assuming you use all the storage you've got there is no such thing as "best setup", each one has pluses and minuses )?

    If you have the parity of all D1..D8 "ovelapping" like in your first picture then you can recover 7GB out of D9 (=8TB) only using parity, this would be the difference. Each disks from D1..D8 that to don't "parity overlap" would allow for them both to be recovered in case they both fail (normally impossible with only one parity) but they would also eat more and more from the "recoverable 7GB" of D9 in case at least two disks from D1..D8 fail together with D9.

     
  • Leifi Plomeros

    Leifi Plomeros - 2016-08-16

    The downside is less parallelization and speed.
    The way it currently works is pretty much optimized for performance.
    If D1 and D2 is lost they can both be recovered in parallel.

    If it worked like proposed data on D1 and D2 could only be recovered sequentially, but as a bonus it would sometimes be possible to recover more than 2 disks.

    The example was intentionally an extreme setup to make the benefit of the proposal as easily as possible to see.

    A more realistic example would be a user who has 10x4 TB disks and over time adds 6x8 TB more disk ending up with this setup (which is very similar to my personal array):

    The last half of the data on the 8 TB disks are very well protected with only 3 data blocks per parity block (D/P), while the first half is much less protected with 6 data blocks per parity block.

    With a better spread of the data in relation to the parity it could instead be like this:

    Resulting in 4.5 data blocks per parity block.

    Of course the argument could then be made that the last half of the 8 TB disks becomes less protected if done like that. While this is true I don't really have any controll over which data that is, so it makes much more sense to have more equal protection of all data than varying protection of unknown data subsets.

    However when I added the D/P column I just realized that instead of the user specifying which data disks should be grouped together... Maybe it would be a much simpler and better implementation if Snapraid automatically prefered to use the least used parity block whenever new data is added. Or made into an option: Optimized for performance or optimized for resilience.

     
  • Andrea Mazzoleni

    Hi Leifi,

    Yep. It's a good idea. I'll add it at the TODO list.

    The only drawbak I can see is that the parity file will grow faster, when instead now its size is keept at the minimum as possible. And you know, when the parity fill-up the disk, it's not so easy to handle.
    But likely we can think at a parity allocation that avoid this problem, like to never grow to reah this point, if not absolutely necessary.

    Ciao,
    Andrea

     
  • Leifi Plomeros

    Leifi Plomeros - 2016-08-28

    Maybe you could add the following parameter in the config file:
    parityfreespace xxx
    Parity files will never grow beyond xxx MiB free space on the partition where it is located.

    I think that option would be really helpful for users with split parity already in v11.

    As for the parity grouping I now think maybe the best way to go would be to allow users to add an optional lower block boundry for each disk specifed in GB (1,000,000,000 bytes):
    data d1 F:\
    data d2 G:\ 2000
    data d3 H:\ 4000
    data d4 I:\
    data d5 J:\ 1500
    data d6 K:\ 3000

    This way snapraid can continue to function the same as today except that when user sync new data it will respect any lower boundry specified in config.

    If the user wants to force already existing data to respect this setting the user could either create a new array from scratch or move all data from the data disk, sync, put it back and sync again. Not super simple but it would be a one-time job.

     

    Last edit: Leifi Plomeros 2016-08-28
  • Andrea Mazzoleni

    Hi Leifi,

    No sure about all these new configuration options. Maybe all of this can be automated.

    In v11 I already added a special handling for Windows to avoid always leave 256 MB free to avoid the Windows message. I suppose the same can work also for this case.

    And for grouping, I suppose that it's should be possible to do this automatically, always selecting the best option.

    Anyway, this looks promising, but for sure not for v11.

    Ciao,
    Andrea

     
  • Leifi Plomeros

    Leifi Plomeros - 2016-08-31

    256 MB free space in windows is a good limit, so I guess no need to add an option for it.

    Regarding grouping it is definitely desireable to have it automatic.

    Originally I figured it could be done at block level, always adding new data to least used block number. But after thinking a little more I realized that it would eventually lead to all data disks having partial dependencies after user removes and adds data on different disks.

    That is why I switched to proposing user being able to user select a start block or lower boundry to effectively define a range of blocks for each data disk.

    I still think the start block or range of blocks for each data disk is a good idea even if automatically selected.

    For a new array it could be selected pretty much optimally.
    And for existing arrays whenever a new disk is added to the array snapraid attempts to select a range that will overlap with as few other data disks as possible.

    And yes it probably best to implement in a later version than v11 :)

     

    Last edit: Leifi Plomeros 2016-08-31

Log in to post a comment.

MongoDB Logo MongoDB