Menu

How SnapRAID reacts to a failed disk?

Help
2022-07-30
2022-08-13
  • David Sainsbury

    David Sainsbury - 2022-07-30

    I'm evaluating SnapRAID for my use and I'm certain it fits perfectly. I
    plan on using one of the scripts found in the topic "Simple scripts for
    daily runs", but what I'm confused about is this in the manual:

    http://www.snapraid.it/manual#4.4

    4.4 Recovering
    The worst happened, and you lost one or more disks!

    There's something I'm not certain about. How does SnapRAID react if a data
    disk has had a catastrophic data loss? If the diff command is ran in such a
    situation would the missing files be classed as "removed"? If so, is
    interpreting the results of the diff command down to the user to judge if
    data loss has occured prior to running a sync command?

    Thanks.

     
  • TooMeeK User

    TooMeeK User - 2022-07-31

    I was about to ask the same until I got VM running and tested how it works.
    Recovery worked perfectly fine every time - except permissions and extended attributes of recovered files, which are gone.

    I had multiple disk failure in my set due controller failure and they were all BTRFS; FS was unable to recover at that time. Only small amount of latest data without parity was lost.

    So in worst case scenario D1 fails: recovery depends on failure itself - if there are media errors and it's partially readable - You'll connect new disk and copy all data still readable to new disk, then issue snapraid fix on that drive, which greatly speeds up recovery time. If it fails completly You just replace D1 with new drive and issue fix, all data will be recovered from parity (and important - from remaining drives). So if You have let's say incosistent data across all disks (which I had in my scenario) it may be not possible to recover everything as parity depends on health of remaining drives. Most important is that to replace D1 member with D1 replacement, not the other number, as they are sticked to their position in array.

    Now I'm running XFS across all drives, controller is stable after some tweaks and all seems to be fine, double parity setup.

    Your diff result may output missing files, which is correct as drive is now missing them, but UNTIL YOU issue snapraid sync they can be RECOVERED with snapraid fix command.

    My script for manual SYNC after adding new data:

    #!/bin/bash
    #vars
    data=$(date +'%Y%m%d')
    #echo Today is $data !
    snapraid diff --log $data.diff; snapraid status --log $data.status; snapraid sync --log $data.sync; snapraid scrub -p new --log $data.scrub; snapraid touch --log $data.touch;snapraid status --log $data.status
    
     
  • David Sainsbury

    David Sainsbury - 2022-08-03

    Hello TooMeek,

    Thank you for your detailed comment, much appreciated. However, I plan on running SnapRAID on a headless server, but the issue I'm questioning is regarding this:

    So in worst case scenario D1 fails

    How exactly would this be noticed on a headless server running a daily script? What indications does SnapRAID give that something has gone wrong requiring intervention?

     
  • TooMeeK User

    TooMeeK User - 2022-08-03

    snapraid status always reports failures, either missing/broken disks or files, best would be implement script that outputs that to e-mail on daily basis. It will inform You when it cannot eg.
    read content file, something bad is reported from the drive, it's missing, etc.
    However, you should still implement smartctl --test=short /dev/sdX and smartctl --test=long /dev/sdX scripts because the way how SnapRAID works it's up to You to first detect failure before it escalates and consume more disks. You can also read all protected data with snapraid scrub for example weekly, monthly. It compares actual data with checksums in parity data, so You also detect silent corruption.
    You can also add snapraid smart to the output, which tries to predict which drive is going to die first.
    As additional info snapraid diff output too, which gives all changes eg moved, deleted and added files.

     
  • David Sainsbury

    David Sainsbury - 2022-08-13

    Thank you very much! :)

     

Log in to post a comment.