Menu

Error but nothing to fix?

Help
alabama
2018-06-16
2018-06-21
  • alabama

    alabama - 2018-06-16

    This has happened 4-5 times already which confuses me. When I sync there's an error, but no error detected when I run scrubnew, status or "-e fix" after the sync. Is this because the error block was detected and moved during the sync and hence no longer an error? If so, why does it tell me be run "-e fix"?

    I also know it's not a persistent/replicatable error because I have separate hash and parity files and they come out clean each time. Does this mean there was a random one-time read error during the sync?

    Thanks in advance!

    This is the error message I usually get:

    error:3156673:u1:xxx/xxx/yyyy.yyy: Data error at position 429, diff bits 63/128
    msg:error: Data error in file 'E:/XXX/XXX/yyyy.yyy' at position '429', diff bits 63/128
    msg:status:
    msg:status: 0 file errors
    msg:status: 0 io errors
    msg:status: 1 data errors
    msg:fatal: DANGER! Unexpected data errors! The failing blocks are now marked as bad!
    msg:fatal: Use 'snapraid status' to list the bad blocks.
    msg:fatal: Use 'snapraid -e fix' to recover.
    summary:error_file:0
    summary:error_io:0
    summary:error_data:1
    summary:exit:error

     
  • Leifi Plomeros

    Leifi Plomeros - 2018-06-16

    Most likely you have a bad memory module or some other failing hardware.

    Sync never moves anything. If you scrub the bad block(s) and snapraid finds nothing wrong, then they are not expected to be listed as bad any more.

     

    Last edit: Leifi Plomeros 2018-06-16
  • alabama

    alabama - 2018-06-18

    Understand, so this is a symptom of hardware/non-ECC memory... worrying...

    So I had run Status immediately after the error was detected and came back with no errors. Does this mean Status will rechecks/rescrub the error block to see if the error still exist before it reports back? ie, snapraid knows there was an error (because of the original sync), proceeded to rescrub/retest it, realised the error no longer exists so reports back as no errors found?

    Think I would have preferred if it confirmed the error block, then maybe indicate that the error went away without fixing, indicating an intermittent error and perhaps failing hardware.

     
  • Leifi Plomeros

    Leifi Plomeros - 2018-06-18

    Snapraid status is supposed to list the block as bad until sync, fix or scrub confirms that there is nothing wrong.
    Are you sure you didn't do any of those things before running snapraid status?

    Edit: I just did a little experiment introducing a data error on a file and this was the result:

    C:\Snapraid>snapraid sync
    ...
    Syncing...
    Using 24 MiB of memory for 32 blocks of IO cache.
    Data error in file 'E:/F.txt' at position '0', diff bits 76/128
    ...
           0 file errors
           0 io errors
           1 data errors
    DANGER! Unexpected data errors! The failing blocks are now marked as bad!
    Use 'snapraid status' to list the bad blocks.
    Use 'snapraid -e fix' to recover.
    ...
    C:\Snapraid>snapraid status
    ...
    WARNING! The array is NOT fully synced.
    You have a sync in progress at 99%.
    The 100% of the array is not scrubbed.
    You have 10 files with zero sub-second timestamp.
    Run the 'touch' command to set it to a not zero value.
    No rehash is in progress or needed.
    DANGER! In the array there are 1 errors!
    
    They are from block 0 to 0, specifically at blocks: 0
    
    To fix them use the command 'snapraid -e fix'.
    The errors will disappear from the 'status' at the next 'scrub' command.
    
     

    Last edit: Leifi Plomeros 2018-06-18
  • Webmaster33

    Webmaster33 - 2018-06-21

    Interesting, I had similar errors.

    root@omvnas:~# /etc/scripts/snapraid-cron/snapraid_diff_n_sync.sh
    Data error in file '/srv/dev-disk-by-label-SR1D1/Backup/OMV/omvbackup/etc/apache2/mods-available/authn_core.load' at position '0', diff bits 62/128
    Data error in file '/srv/dev-disk-by-label-SR1D1/Backup/OMV/omvbackup/etc/apache2/mods-available/authz_owner.load' at position '0', diff bits 68/128
    DANGER! Unexpected data errors! The failing blocks are now marked as bad!
    Use 'snapraid status' to list the bad blocks.
    Use 'snapraid -e fix' to recover.
    

    I checked the files authn_core.load and authz_owner.load, but they are readable, and since the content is just one line, seems to be fine.
    But then what is the error?
    Should I fix it?

     

Log in to post a comment.