i have snapraid 10 configured with 3 data drives and 1 parity drive. Today,
one harddrive was discovered to have 41 bad sectors and there is a single
video file which is corrupted. I figured this is no big deal since the
content has been backed up with snapraid. Therefore I issue the command
(elevated command prompt)
"snapraid fix -f "filename.mp4"
The recovery starts, gets to about 75% and I receive the following and it
fails recovery:
msg:fatal: Unexpected Windows error 23.
msg:error: Error reading file
'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4'
at offset 3664248832 for size 262144. Input/output error [5/23].
error:3260895:d1:Shares/Video/filename.mp4: Read error at position 13978
It almost appears that snapraid is trying to recover the file directly from
it's source location, which is corrupted rather than rebuilding it using
parity? I see a message saying it can't recover without a hash? Do I have
something setup incorrectly?
SnapRAID is trying to read the existing file, which as you noted has bad sectors. This is the expected behaviour. Delete the file and then run fix. If the bad sectors are not writable in the future, the drive will automatically remap those sectors.
Last edit: Quaraxkad 2018-02-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the information. I have deleted the file and ran the fix again. Note I only type the file name, not a full path to the file. However, when I do this now, I get a slew of messages stating:
Reading data from missing file 'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7
/Shares/Video/filename.mp4' at offset 4824498176.
Reading data from missing file 'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7
/Shares/Video/filename.mp4' at offset 4824760320.
unrecoverable D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4
100% completed, 19300 MB processed in 0:07
18406 errors
18402 recovered errors
4 UNRECOVERABLE errors
DANGER! There are unrecoverable errors!
c:\snapraid-10.0>
The filename is now 'filename.mp4.unrecoverable
Therefore, I'm assuming it cannot rebuild the file? I'm ok with that becuase luckily I do have another copy of it elsewhere. However, I'm trying to determine what I did wrong that isn't allowing the Parity to work as it should. Considering if this was a whole drive failure and not an early detection, I would be up a creek without a paddle, as they say.
Any help is appreciated!
[EDIT] The D: drive is the drive that failed. I had deleted the file off of it as you stated.
Last edit: Jnick 2018-02-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Most likely the array was not fully synced. Run it again with "-l fix.log". The log file will give you more information about why those 4 blocks (out of 18,406) were unrecoverable. Chances are, one (or more) file(s) on either (or both) of the other two data drives in your array had changes or were deleted, and those changes/deletions were not synced.
It's very important to keep the array synced at all times when you only have one parity drive. Adding parity drives not only increases the ability to recover from drive failures, but also the ability to recover from unsynced arrays.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the information. I am running it again, enabling logging. With that said, the sync function runs everyday @ 4am. I have a script to run it scheduled in task scheduler and confirmed through the history report that it has been running. The one thing I will say is that I do not know when the failure first happened. It could have been days or weeks that past before I realized the drive was acting up. If this was the case, would the sync have screwed itself up if it was trying to sync AFTER the drive had already gone bad?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I ran the command again and within 45 seconds, it was done. It said it was 100% successful. NOTE: I did not delete the recovered file I did the last time (that had 4 missing blocks). This was the log:
msg:progress: Filtering...
msg:verbose: filename.mp4
memory:used:190090858
memory:block:17
memory:chunk:88
memory:file:192
memory:link:88
memory:dir:80
msg:progress: Using 181 MiB of memory for the FileSystem.
msg:progress: Initializing...
msg:progress: Fixing...
entry:0:change:lost:good:d1:Shares/Video/filename.mp4:13978:
recover_sync:3260895:0: Skipped for already recovered
entry:0:change:lost:good:d1:Shares/Video/filename.mp4:14019:
recover_sync:3260936:0: Skipped for already recovered
entry:0:change:lost:good:d1:Shares/Video/filename.mp4:14026:
recover_sync:3260943:0: Skipped for already recovered
entry:0:change:lost:good:d1:Shares/Video/filename.mp4:14074:
recover_sync:3260991:0: Skipped for already recovered
msg:status: Everything OK
summary:error:0
summary:error_recovered:0
summary:error_unrecoverable:0
summary:exit:ok
The file name no longer has '.unrecoverable' appended to it. Am I to believe it is really fixed? I'll be able to check the file later tonight. I'm confused on why the file was successful now but wasn't yesterday. The only thing I can think of is when I initially ran the comman to fix it AFTER I deleted the file, I did NOT re-sync the array. The array would have re-synced overnight. Could that be the difference? Even though I deleted the file, the sync was still trying to pull it from the bad sectors?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are you saying that a sync was ran after the first fix, but before this one? You should never allow a sync to run until after all repairs/restores are completed. That essentially updates the parity files with the wrong information.
If that is the case, I don't know exactly what happened here. I don't think those 4 bad blocks were truly recovered, so I'm not sure why SnapRAID removed the .unrecoverable tag.
If this was the case, would the sync have screwed itself up if it was trying to sync AFTER the drive had already gone bad?
In short, no. Sync will not have used the "bad" file with unreadable sectors for parity updates. If it had tried to read that file it would have given you an error message in the log (your nightly automated task does save log files, right?).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
i have snapraid 10 configured with 3 data drives and 1 parity drive. Today,
one harddrive was discovered to have 41 bad sectors and there is a single
video file which is corrupted. I figured this is no big deal since the
content has been backed up with snapraid. Therefore I issue the command
(elevated command prompt)
"snapraid fix -f "filename.mp4"
The recovery starts, gets to about 75% and I receive the following and it
fails recovery:
msg:fatal: Unexpected Windows error 23.
msg:error: Error reading file
'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4'
at offset 3664248832 for size 262144. Input/output error [5/23].
error:3260895:d1:Shares/Video/filename.mp4: Read error at position 13978
entry:0:change:lost:bad:d1:Shares/Video/filename.mp4:13978:
strategy_error:3260895: No strategy to recover from 1 failures with 1
parity without hash
recover_sync:3260895:1: Failed with no attempts
recover_unsync:3260895:1: Skipped for nothing to recover
unrecoverable:3260895:d1:Shares/Video/filename.mp4: Unrecoverable error at
position 13978
msg:fatal: Unexpected Windows error 23.
msg:error: Error reading file
'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4'
at offset 3674996736 for size 262144. Input/output error [5/23].
error:3260936:d1:Shares/Video/filename.mp4: Read error at position 14019
entry:0:change:lost:bad:d1:Shares/Video/filename.mp4:14019:
strategy_error:3260936: No strategy to recover from 1 failures with 1
parity without hash
recover_sync:3260936:1: Failed with no attempts
recover_unsync:3260936:1: Skipped for nothing to recover
unrecoverable:3260936:d1:Shares/Video/filename.mp4: Unrecoverable error at
position 14019
msg:fatal: Unexpected Windows error 23.
msg:error: Error reading file
'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4'
at offset 3676831744 for size 262144. Input/output error [5/23].
error:3260943:d1:Shares/Video/filename.mp4: Read error at position 14026
entry:0:change:lost:bad:d1:Shares/Video/filename.mp4:14026:
strategy_error:3260943: No strategy to recover from 1 failures with 1
parity without hash
recover_sync:3260943:1: Failed with no attempts
recover_unsync:3260943:1: Skipped for nothing to recover
unrecoverable:3260943:d1:Shares/Video/filename.mp4: Unrecoverable error at
position 14026
msg:fatal: Unexpected Windows error 23.
msg:error: Error reading file
'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4'
at offset 3678666752 for size 262144. Input/output error [5/23].
error:3260950:d1:Shares/Video/filename.mp4: Read error at position 14033
entry:0:block:known:bad:d1:Shares/Video/filename.mp4:14033:
msg:fatal: Unexpected Windows error 23.
msg:fatal: Error reading file
'D:/PoolPart.52ecd30f-4ea2-413f-bb92-1a7fdeddc9c7/Shares/Video/filename.mp4'.
Input/output error [5/23].
It almost appears that snapraid is trying to recover the file directly from
it's source location, which is corrupted rather than rebuilding it using
parity? I see a message saying it can't recover without a hash? Do I have
something setup incorrectly?
Thank you!
John
SnapRAID is trying to read the existing file, which as you noted has bad sectors. This is the expected behaviour. Delete the file and then run fix. If the bad sectors are not writable in the future, the drive will automatically remap those sectors.
Last edit: Quaraxkad 2018-02-09
Thanks for the information. I have deleted the file and ran the fix again. Note I only type the file name, not a full path to the file. However, when I do this now, I get a slew of messages stating:
The filename is now 'filename.mp4.unrecoverable
Therefore, I'm assuming it cannot rebuild the file? I'm ok with that becuase luckily I do have another copy of it elsewhere. However, I'm trying to determine what I did wrong that isn't allowing the Parity to work as it should. Considering if this was a whole drive failure and not an early detection, I would be up a creek without a paddle, as they say.
Any help is appreciated!
[EDIT] The D: drive is the drive that failed. I had deleted the file off of it as you stated.
Last edit: Jnick 2018-02-14
Most likely the array was not fully synced. Run it again with "-l fix.log". The log file will give you more information about why those 4 blocks (out of 18,406) were unrecoverable. Chances are, one (or more) file(s) on either (or both) of the other two data drives in your array had changes or were deleted, and those changes/deletions were not synced.
It's very important to keep the array synced at all times when you only have one parity drive. Adding parity drives not only increases the ability to recover from drive failures, but also the ability to recover from unsynced arrays.
Quaraxkad,
Thanks for the information. I am running it again, enabling logging. With that said, the sync function runs everyday @ 4am. I have a script to run it scheduled in task scheduler and confirmed through the history report that it has been running. The one thing I will say is that I do not know when the failure first happened. It could have been days or weeks that past before I realized the drive was acting up. If this was the case, would the sync have screwed itself up if it was trying to sync AFTER the drive had already gone bad?
Ok, this is weird....
I ran the command again and within 45 seconds, it was done. It said it was 100% successful. NOTE: I did not delete the recovered file I did the last time (that had 4 missing blocks). This was the log:
The file name no longer has '.unrecoverable' appended to it. Am I to believe it is really fixed? I'll be able to check the file later tonight. I'm confused on why the file was successful now but wasn't yesterday. The only thing I can think of is when I initially ran the comman to fix it AFTER I deleted the file, I did NOT re-sync the array. The array would have re-synced overnight. Could that be the difference? Even though I deleted the file, the sync was still trying to pull it from the bad sectors?
Are you saying that a sync was ran after the first fix, but before this one? You should never allow a sync to run until after all repairs/restores are completed. That essentially updates the parity files with the wrong information.
If that is the case, I don't know exactly what happened here. I don't think those 4 bad blocks were truly recovered, so I'm not sure why SnapRAID removed the .unrecoverable tag.
In short, no. Sync will not have used the "bad" file with unreadable sectors for parity updates. If it had tried to read that file it would have given you an error message in the log (your nightly automated task does save log files, right?).