Menu

how does parity works and how to calculate it's requirement?

Help
abubin
2014-05-03
2014-05-24
  • abubin

    abubin - 2014-05-03

    I have been looking at a lot of software raids and snapraid suit my usage very much. So I am going to convert my existing fakeraid (dmraid) into snapraid.

    However, I have a question about the way parity works. I have been searching around but nowhere is it explained on how this parity work or is calculated.

    How much space do I actually need for parity? Or does it matter?

    For eg, my system can support up to 5 drives. I will have 4 x 1tb drives which I will use as content. I will swap my fifth boot drive with a 2tb then partition it with 200gb (for ubuntu os) and the rest 1.8gb will be used for parity. It that enough? Or should I use 3 x 1tb + 1.8gb as content and then reserve 1 x 1tb as parity?

    What is the ratio of parity to content space? When do I need second parity or third and so on?

    These were never explained in the documentation.

     
  • Leifi Plomeros

    Leifi Plomeros - 2014-05-03

    Each level of parity will require a parity file which is a tiny bit bigger than the largest sum of all data files on a single data disk.
    In your system that means that you will need space to store a ~1TB parity file.

    How it works on a user level:
    1. Set up the config file
    2. Run snapraid sync everytime something has changed
    3. Do not delete or modify any data files until you are a bit more used to the tool and understand how to do it safely.
    4. If any of the data disks are lost, just replace it with an empty disk and run snapraid fix.

    How it works on a technical level:
    All data files in a computer is really just sequences of 1 or 0.
    Instead of reading them in sequence, snapraid reads them across disks and writes 0 for even sums and 1 for odd sums in the parity file.
    Example: 0 + 1 + 1 + 1 = 1 (since 3 is an odd number)

    When a disk is lost snapraid can just sum together the remaining disks like this:
    0 + ? + 1 + 1 = 1 (since 0 + 1 + 1 is even number it is deducted that number 1 is missing from the broken disk).

    In reality it is of course a lot more complex when file system is involved and snapraid must keep track on all changes on all files between syncs, but the core principle is as explained above.

    If you one day expand your system to have more disks and you need a second parity it will probably require an high degree in mathematics or preferably an understanding of black magic to figure out the technical level.

    So when do you need to have more parity files?
    It all comes down to how many disks you are expecting to be lost at the same time.
    If you lose more disks than you have parity then all data on the lost disks is permanently lost.

    So if you have 2 parity levels you can lose 2 disks and still be able to restore all lost data.
    If you have 2 parity levels and lose 3 disks then all data on those 3 disks is permanently lost.

    Andrea has made a recommendation table based on number of data disks in the FAQ, which is probably wise to stick with.

     
  • abubin

    abubin - 2014-05-05

    thanks for the detailed explanation. I understand more how snapshot works now.

    Back to the basic question. You mentioned the parity need to be slightly larger than the largest content drive size.

    So for example, what if I used 4 x 1tb for content and then 1 x 1tb for parity? That would inufficient, right? How can this problem be handled?

    Another example, I also said about using my 1.8gb as content instead of as parity, how would this impact the whole snapraid system?

     
  • Leifi Plomeros

    Leifi Plomeros - 2014-05-05

    Actually no, 4x1TB data and 1x1TB parity would be a perfectly fine setup.
    Just avoid filling up the data disks to the absolute limit.

    With default block size of 256KB the parity file will be about 1.25GB bigger than the data for each 10.000 files of data or about 12.5GB for each 100.000 files.

    So in your case where you are planing to use a dedicated partition on a bigger disk... Just make that partition a few GB bigger than the data disks.

    Another trick is to put the content files on the data disks but not on the parity disk.
    That way you will automatically have a little less room for data on the data disks than on the parity disk.

    Since snapraid is 100% passive when not syncing it will have zero impact on the system in general and unless you do a lot of other disk activities while running sync or scrub you will not have any real negative impact on the performance of snapraid either.

    In you situation I would simply make a 940 GB partition to put the parity file on, a few hundred GB system partition and dedicate the rest to a temp partition where you can put ongoing downloads and other stuff that don't really need to have a backup.

     
  • abubin

    abubin - 2014-05-06

    great!!! Thanks for the advice. Really helps before I drive into snapraid.

    I am getting the 2TB drive tomorrow and then will start planning for the implementation.

    With little time that I have between work and family, I really can't afford to waste time on doing it wrongly.

    This is what I am planning for the 2TB boot drive:
    - 100GB debian 7 or ubuntu 14.04LTS (still deciding)
    - 1200GB snapraid parity (slightly extra just in case)
    - balance storing junk/temp files that don't need backup.

     

    Last edit: abubin 2014-05-06
  • sionzris

    sionzris - 2014-05-23

    Hope you dont mind me hijacking this thread, i have a few questions to figure this all out in my head (correctly^^).
    I am planing my Snapraid in 2 possible setups:
    5x or 6x 4TB disks.
    What i dont quite get is: when i want to use 6 disks, i need 2 parity disks, compared to 5 disks with 1 parity disk, i dont get any more space from my 6th disk, but more security. Is this correct?
    If i go for the 5x 4TB setting and all 4 Data disks are full, the 5th parity disk is not sufficient anymore? How much more space would i need here? The only space left in this scenario is my System drive which ist a 120GB SSD, so lets say it has 60GB free Space. Exactly which files could i "outsource" to that disk? The content files? Those are critical for a fix too though right?

     
  • Leifi Plomeros

    Leifi Plomeros - 2014-05-23

    The parity disk is not to be considered as a data disk.
    The recommendation is to not have more than 4 data disks with a single parity disk.

    So if you plan to follow the recommendation you would have the following alternatives:
    4 data disks + 1 parity disk = 5 disks total
    5 data disks + 2 parity disks = 7 disks total

    With 6 disks you would have to choose:
    A. More safe than recommended
    B. Less safe than recommended

    The number of parity disks has nothing to do with the amount of data stored, only the number of disks used to store the data.
    If all disks have equal size then the fullest disk decides how big parity file is needed on each parity disk.

     
  • sionzris

    sionzris - 2014-05-23

    Ok maybe i wasn't being very clear, that was exactly what i meant^^
    5 Disks = 4 Datadisks + 1 Paritydisks (recommended)
    6 Disks = 4 Datadisks + 2 Paritydisks (more than recommended)
    6 Disks = 5 Datadisks + 1 Paritydisks (not recommended)
    7 Disks = 5 Datadisks + 2 Paritydisks (recommended)

    For now i will just stick with the 5 Disk Setup and when i reach the limit increase the setup by 2 disks.

    My main questions is still: With 5 equal disks. And at least 1 Datadisks filled to the brim, is the parity disk big enough or do i need some more space than the maximum space of a data disk?

     

    Last edit: sionzris 2014-05-24
  • Leifi Plomeros

    Leifi Plomeros - 2014-05-23

    Ok... The table you wrote above makes no sense at all...
    5 Data disks + 1 Parity disk = Not recommended.
    All other configurations are recommended or better than recommended.

    You absolutely MUST NOT fill any data disk to the brim when all disks are equal size.
    Either make smaller partitions on the data disks or simply avoid filling the data disks to the brim.
    The total number of files on all data disks added together determines how much bigger the parity file needs to be.

    With default block size of 256KB the parity file will be about 1.25GB bigger than the data for each 10.000 files of data or about 12.5GB for each 100.000 files of data.

     
  • sionzris

    sionzris - 2014-05-24

    Yeah the table was totally messed up... it looked like that before my first edit, didnt go through it seems, fixed it now.

    The info on the parity file is great. In that case i'll just decrease each DATA partition by, i dunno, 10-15GB and be on the safe side.

    Some of my software has a tendancy to fill disks and only switch to another disk when full. TV-Recording Service for example.

     

Log in to post a comment.