Menu

Can't get rid of recoverable errors

Help
tonwa
2017-03-18
2017-03-18
  • tonwa

    tonwa - 2017-03-18

    Hello, I'm stuck with errors reported by the "snapraid check" command. Here's the output:

    $ snapraid check
    Self test...
    Loading state from /media/data0/content...
    Searching disk data0...
    Searching disk data1...
    Searching disk data2...
    Using 2596 MiB of memory for the FileSystem.
    Initializing...
    Checking...
    100% completed, 6130217 MB accessed in 6:01

    1088 errors
       0 unrecoverable errors
    

    WARNING! There are errors!

    After running a check I ran status expecting to see the 1088 errors reported there as well, but got "No error detected". I then tried fixing the errors:

    $ snapraid -e fix
    Self test...
    Loading state from /media/data0/content...
    Searching disk data0...
    Searching disk data1...
    Searching disk data2...
    Filtering...
    Using 2596 MiB of memory for the FileSystem.
    Initializing...
    Fixing...
    Nothing to do
    Everything OK

    "Nothing to do"? That's odd. I then ran a full scrub:

    $ snapraid scrub -p 100
    Self test...
    Loading state from /media/data0/content...
    Using 2016 MiB of memory for the FileSystem.
    Initializing...
    Scrubbing...
    Using 40 MiB of memory for 64 blocks of IO cache.
    100% completed, 5279488 MB accessed in 5:04

    data0 10% | **
    data1 5% | ***
    data2 0% |
    parity 63% |
    *****
    raid 5% | ***
    hash 8% |
    ****
    sched 6% | ***
    misc 0% |
    |______________
    wait time (total, less is better)

    Everything OK
    Saving state to /media/data0/content...
    Saving state to /media/data1/content...
    Saving state to /media/data2/content...
    Verifying /media/data0/content...
    Verifying /media/data1/content...
    Verifying /media/data2/content...

    Then another snapraid check to see if the errors remained:

    $ snapraid check
    Self test...
    Loading state from /media/data0/content...
    Searching disk data0...
    Searching disk data1...
    Searching disk data2...
    Using 2596 MiB of memory for the FileSystem.
    Initializing...
    Checking...
    100% completed, 6130217 MB accessed in 6:00

    1088 errors
       0 unrecoverable errors
    

    WARNING! There are errors!

    So no difference. What else can I try to remove the 1088 errors besides rebuilding the full parity? Also, how can I see which files are affected by the errors? Running snapraid 11.0 on Ubuntu 14.04.

     

    Last edit: tonwa 2017-03-18
  • Leifi Plomeros

    Leifi Plomeros - 2017-03-18

    It is false errors reported by the check function for unused parity blocks which have not been cleared (filled with zeroes).

    The logic for scrub and apparently also fix -e does not agree that it is a problem (which it isn't, since whatever is inside these blocks is not used for anything and would be overwritten if they become used again).

    Easiest fix is to simply ignore it.

    But if you don't want to ignore it, you can get rid of the errors message like this:

    1. Run snapraid status to find out which blocks have "errors"
    2. Run snapraid fix -S FirstErrorBlock -B 1088
     
  • tonwa

    tonwa - 2017-03-19

    Thanks for the info. The problem is that snapraid status reports no error so I'm unable to determine FirstErrorBlock.

    $ snapraid status -v
    Self test...
    Loading state from /media/data0/content...
    2355511 files
    0 hardlinks
    95932 symlinks
    41853 empty dirs
    Using 2016 MiB of memory for the FileSystem.
    SnapRAID status report:

    Files Fragmented Excess Wasted Used Free Use Name
    Files Fragments GB GB GB
    487456 32 167 53.1 1990 906 69% data0
    867853 149 2047 - 2139 106 95% data1
    1000202 4 6 106.1 1999 843 71% data2


    2355511 185 2220 159.2 6130 1856 77%

    (Chart removed)

    The oldest block was scrubbed 10 days ago, the median 1, the newest 1.

    No sync is in progress.
    The 5% of the array is not scrubbed.
    No file has a zero sub-second timestamp.
    No rehash is in progress or needed.
    No error detected.

    I suppose I'll follow your recommendation and simply ignore the error message from snapraid check.

     
  • mrmessyau

    mrmessyau - 2017-03-22

    I'm getting the same errors as tonwa and there I'm also not getting any references to which blocks are "bad" in the status however if I run the check with full logging I think the number Leifi is suggesting I use in the fix command is 20055991. Can someone confirm that is correct?

    The first line of errors from check log is below.

    parity_error:20055991:parity: Data error, diff bits 1037974

    tonwa: you can get logs by using -l FILENAME

     

    Last edit: mrmessyau 2017-03-22
  • mrmessyau

    mrmessyau - 2017-03-22

    OK so I looking into exactly what the fix command that Leifi posted does and realised there was no risk to just going ahead and trying the command.

    Based on the result it looks like it's all fixed now!

     

Log in to post a comment.

MongoDB Logo MongoDB