Any ideas why it can't fix it and how should i proceed from here?
Can I still use the scrub command to carry on checking the rest of the array?
I do have another copy of the file on an external disk, what would be the correct way to to try to replace the file assuming that the copy is still fine?
Can I just delete the file and run a sync to get the array back into a happy state?
Thanks for any help
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks.
I'm hesitant to advise a next step in light of incomplete info:
Can I just delete the file and run a sync to get the array back into a happy state?
Did you, in fact, delete that file? (in the time period between your asking that question, and the time you acted upon my request)
Also,
Hi all,
Have a problem with my array
(Prior to you starting this thread,) It appears to me that there was an ERROR reported by SnapRAID, probably in a scrub. (Do you, by chance, have any record of such error?) [It is very important to take note of any/all ERROR reports, so that you can provide them when seeking help. Also, unresolved ERRORs can metastasize (yes, like cancer) with subsequent sync (or even fix) commands.]
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Did you, in fact, delete that file? (in the time period between your asking that question, and the time you acted upon my request)
No, I've not made any changes to the files in the array since the error was reported. The filename has been changed, by I assume the program itself, to have .unrecoverable on the end.
(Prior to you starting this thread,) It appears to me that there was an ERROR reported by SnapRAID, probably in a scrub. (Do you, by chance, have any record of such error?) [It is very important to take note of any/all ERROR reports, so that you can provide them when seeking help. Also, unresolved ERRORs can metastasize (yes, like cancer) with subsequent sync (or even fix) commands.]
The error was reported by the last scrub that was done a couple of days ago as can be seen in the status report in the first post, no other errors were reported before this.
As it was late I tried the "snapraid -e fix" command the next day which only took a few minutes to fail with the same message as in the first post, I may also have tried it again for a 2nd time afterwards with the same result.
After coming on to the snapraid website I saw there was an update available so I downloaded that with the thought it might help, the result of that is what I posted in the first post with the main difference been it took over 2.5 hours to fail this time.
Note that I can't be certain what version I was running before I updated it, sorry, but with the difference in time it took I assume it was pre the 11.6 update, but when I downloaded 12.1 I did notice that at some point I had already downloaded 12.0 as the zip file was there, but have no idea if I actually installed it.
Something I have thought of, as the file has been renamed does the fix process attempt to create a new correct file and as such needs to have enough space to have another copy of the file, as there is not enough room to do that?
Thanks for your help, let me know if you need any more info.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the added info. (It was also good that you reminded me about the (renamed) .unrecoverable file)
I'm still a little bit uncomfortable about proceeding without knowing the precise nature of the error that SnapRAID reported. However, we can hit "Replay" and get a second look ... if you un-rename that file (by removing the ".unrecoverable" suffix, then you can do a
Some interesting extra information from something I tried last night.
For the file in question I also had a .sha256 file created by HashCheck from when the file was synced into the array so i could double check it had copied properly compared to the original file.
So I took a copy of the file to a different disk outside the array so I could rename it back and check it against the .sha256 file without affecting anything to do with the array.
Expecting it to end with a fail was surprised to find that it ended with a match which to me means that the file hasn't changed, not sure what this means with regard to the error though.
Will try your suggestion above when I get home in about 8 hours from posting this unless you want me to try something else instead or as well as.
Result below and logfile attached, note that I had to put a / in front of the Misc to get it to work.
Using2173MiBofmemoryforthefile-system.
Initializing...
Selecting...
Checking...
unrecoverableMisc/xxxxxx/vhd's/xxxxxx-SERVER-2.VHDX0 ETA100% completed, 3735982 MB accessed in 2:36 2 errors 1 UNRECOVERABLE errorsWARNING! There are errors!DANGER! There are unrecoverable errors!C:\Array\Snapraid>
OK, I've got a pretty good idea what happened.
Summary:
It's easily rectified (i.e., getting your array back to a clean state).
What I believe happened is that when this file was sync'd into the array, the hash for one (256KB) block (#33415 of the file) got glitched as it was being copied into the in-core database (of the .content file), and then got written into the .content files. This went unnoticed until that array_block was scrub'd; that would have produced a report of a "Data error" for this file/block#.
It appears that SnapRAID is "prejudiced" in this situation, believing the stored hash to be correct, and assuming the (file) data to be in error; so when you ran the fix command, SR used the other disks' data blocks and the parity block,for array_block 22937272, to re-create the correct data block for our subject file (which, of course, has been correct all the while). Fine and dandy, but the final verification is that the hash for this re-created data block matches the one stored in the .content file. Whoops!!! Hence, it squawks, and hands down a verdict of ".unrecoverable".
To get things clean again ...
BUT, before doing so, I'd like to get a logfile ("on record", so to speak) for the fix command, so please do:
and attach to next reply. (Coommand output isn't needed.)
OK, safest way to put things right:
Be sure you have a good copy of this file OFF-array. Then delete the file from the array. Do a sync command. Copy the file into the array, Do another sync command. Now do a
snapraid -p new scrub
Through this procedure, keep an eye on the output for any error reports, of course.
Going forward, be very watchful, and wary, since that glitch might NOT be a one-time fluke.
You could look into running a "memtest" type program; also a Prime95 blend test.
"Once bitten, twice shy." :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Question about your thoughts on what happened though. As I read it your saying that the hash for that block was entered into the database wrongly somehow causing the error to happen when it tried to scrub that block, however I always do a "-p new scrub" after inserting any data and also the whole array has been fully scrubbed many times since that file was inserted.
Would that be pointing more to some other issue like memory problems then?
Thanks again for your help.
however I always do a "-p new scrub" after inserting any data and also the whole array has been fully scrubbed many times since that file was inserted.
Ah-hah! Then, I must alter my "What I believe happened is ..." to:
At some point, long after that file was initially sync'ed (& -p new scrub'ed), either during the creation of the in-core database (when "Loading state from ...), or during its saving (when "Saving state to ...") a bit got flipped in the 16-byte hash for that file's 33415th data block. And, then, only when that data block was next scrub'ed (or was a "participant" in a sync) [i.e., a verification of its hash], would that original glitch have come to light, as a "Data error/hash mismatch".
Would that be pointing more to some other issue like memory problems then?
Note that in both assessment scenarios, it points to a memory problem (the crime is the same, but the crime scene is different).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
Have a problem with my array
so I ran snapraid -e fix as suggested
Any ideas why it can't fix it and how should i proceed from here?
Can I still use the scrub command to carry on checking the rest of the array?
I do have another copy of the file on an external disk, what would be the correct way to to try to replace the file assuming that the copy is still fine?
Can I just delete the file and run a sync to get the array back into a happy state?
Thanks for any help
First, to learn all we can about this error, do:
Please post the result, and contents of the logfile.
Hi UhClem
Result from cmd window below and logfile attached
Thanks.
I'm hesitant to advise a next step in light of incomplete info:
Did you, in fact, delete that file? (in the time period between your asking that question, and the time you acted upon my request)
Also,
(Prior to you starting this thread,) It appears to me that there was an ERROR reported by SnapRAID, probably in a scrub. (Do you, by chance, have any record of such error?) [It is very important to take note of any/all ERROR reports, so that you can provide them when seeking help. Also, unresolved ERRORs can metastasize (yes, like cancer) with subsequent sync (or even fix) commands.]
Hi UhClem,
No, I've not made any changes to the files in the array since the error was reported. The filename has been changed, by I assume the program itself, to have .unrecoverable on the end.
The error was reported by the last scrub that was done a couple of days ago as can be seen in the status report in the first post, no other errors were reported before this.
As it was late I tried the "snapraid -e fix" command the next day which only took a few minutes to fail with the same message as in the first post, I may also have tried it again for a 2nd time afterwards with the same result.
After coming on to the snapraid website I saw there was an update available so I downloaded that with the thought it might help, the result of that is what I posted in the first post with the main difference been it took over 2.5 hours to fail this time.
Note that I can't be certain what version I was running before I updated it, sorry, but with the difference in time it took I assume it was pre the 11.6 update, but when I downloaded 12.1 I did notice that at some point I had already downloaded 12.0 as the zip file was there, but have no idea if I actually installed it.
Something I have thought of, as the file has been renamed does the fix process attempt to create a new correct file and as such needs to have enough space to have another copy of the file, as there is not enough room to do that?
Thanks for your help, let me know if you need any more info.
Thanks for the added info. (It was also good that you reminded me about the (renamed) .unrecoverable file)
I'm still a little bit uncomfortable about proceeding without knowing the precise nature of the error that SnapRAID reported. However, we can hit "Replay" and get a second look ... if you un-rename that file (by removing the ".unrecoverable" suffix, then you can do a
and attach the LogFile and post the command result [just the output after the "Initializing ..." line is OK]
[Realize that the check command makes NO changes at all (to data or parity or the .content files)]
Hi UhClem,
Some interesting extra information from something I tried last night.
For the file in question I also had a .sha256 file created by HashCheck from when the file was synced into the array so i could double check it had copied properly compared to the original file.
So I took a copy of the file to a different disk outside the array so I could rename it back and check it against the .sha256 file without affecting anything to do with the array.
Expecting it to end with a fail was surprised to find that it ended with a match which to me means that the file hasn't changed, not sure what this means with regard to the error though.
Will try your suggestion above when I get home in about 8 hours from posting this unless you want me to try something else instead or as well as.
Thanks again.
Result below and logfile attached, note that I had to put a / in front of the Misc to get it to work.
OK, I've got a pretty good idea what happened.
Summary:
It's easily rectified (i.e., getting your array back to a clean state).
What I believe happened is that when this file was sync'd into the array, the hash for one (256KB) block (#33415 of the file) got glitched as it was being copied into the in-core database (of the .content file), and then got written into the .content files. This went unnoticed until that array_block was scrub'd; that would have produced a report of a "Data error" for this file/block#.
It appears that SnapRAID is "prejudiced" in this situation, believing the stored hash to be correct, and assuming the (file) data to be in error; so when you ran the fix command, SR used the other disks' data blocks and the parity block,for array_block 22937272, to re-create the correct data block for our subject file (which, of course, has been correct all the while). Fine and dandy, but the final verification is that the hash for this re-created data block matches the one stored in the .content file. Whoops!!! Hence, it squawks, and hands down a verdict of ".unrecoverable".
To get things clean again ...
BUT, before doing so, I'd like to get a logfile ("on record", so to speak) for the fix command, so please do:
and attach to next reply. (Coommand output isn't needed.)
OK, safest way to put things right:
Be sure you have a good copy of this file OFF-array. Then delete the file from the array. Do a sync command. Copy the file into the array, Do another sync command. Now do a
Through this procedure, keep an eye on the output for any error reports, of course.
Going forward, be very watchful, and wary, since that glitch might NOT be a one-time fluke.
You could look into running a "memtest" type program; also a Prime95 blend test.
"Once bitten, twice shy." :)
Hi UhClem,
All done and fixlog attached, no errors occurred during the process.
Question about your thoughts on what happened though. As I read it your saying that the hash for that block was entered into the database wrongly somehow causing the error to happen when it tried to scrub that block, however I always do a "-p new scrub" after inserting any data and also the whole array has been fully scrubbed many times since that file was inserted.
Would that be pointing more to some other issue like memory problems then?
Thanks again for your help.
Ah-hah! Then, I must alter my "What I believe happened is ..." to:
At some point, long after that file was initially sync'ed (& -p new scrub'ed), either during the creation of the in-core database (when "Loading state from ...), or during its saving (when "Saving state to ...") a bit got flipped in the 16-byte hash for that file's 33415th data block. And, then, only when that data block was next scrub'ed (or was a "participant" in a sync) [i.e., a verification of its hash], would that original glitch have come to light, as a "Data error/hash mismatch".
Note that in both assessment scenarios, it points to a memory problem (the crime is the same, but the crime scene is different).