First of all, thank you for writing this amazing software, very well thought out.
I'm a relatively new user and have read all the manuals, faq and browsed a lot of the forum. One thing I would like to understand is, is there a way to scrub recently added or recently fixed files?
I tried to scrub by folder (next best alternative) but the scrub command doesn't seem to work with the folder filter.
Reason why I'm doing this is to ensure that the recently added files were synced correctly instead of waiting for the 20% weekly scrub to cycle through, or to re-perform a very time consuming 100% scrub just so I can scrub the recently added files.
If I find any errors, I could easily fix the issue with available copies instead of waiting several weeks to find out.
I do use the prehash during a sync, but I don't believe SnapRaid does a scrub of the recently added files during the Sync process. If it does, then I guess it wouldn't be necessary for me to scrub the recently added files.
Let me know, perhaps my logic is flawed. Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It sounds like what you really want is not scrub -- not exactly -- although scrubbing does accomplish the same thing eventually.
But what you really want, I think, is an option for snapraid to complete the sync, and then read back the newly added files again and verify the checksums. That would basically double the sync time, but it would provide you with some peace of mind that there was no temporary silent read error.
Last edit: Jessie Taylor 2015-05-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Jessie, thank you for the concise summary, yes, that's exactly what i would want.
xad, I would suggest the first "--validate" option, my logic being there's not much reason for scrubbing recent files if there's a re-validate option available.
i'm not sure if i'm correct, but i would think that having a post sync validate option would eliminate the need for a pre-hash. also a validate would not only detect silent read errors (which pre-hash does), it would also detect silent write errors during the hash & parity generation phase (which pre-hash does not).
ps: i replied twice but somehow it never posted, apologies if i seem to repeat myself.
Last edit: alabama 2015-05-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the "scrub" code is there a specific reason to use "blockmax" instead of "count" when calculating countlimit, lastlimit, timelimit?
The selection of blocks to scrub is also duplicated (top of "state_scrub_process"). Is there special reasons for this or could be done once?
Could the selection even be done in the calling "state_scrub"?
Could the selection of block order be done in the calling "state_scrub" according to oldest first or would the performance penalty be to big?
After testing "--validate" option I found the need of performing re-validation from a previous point in time. By adding an optional argument it then both supports sync post-validation and forced re-validation (1-100: day offset; >1000000000L: unix time).
If you implement a "validate" option to scrub I would like to request a similar option to force re-validation from a specific time (using "status" it is easy to identify the right unix time, but it is even easier when adding the datetime in human readable format).
/X
PS! Ex. of added local ISO datetime to "status":
info_time:1436440356:2 (ISO:2015-06-09 13:12:36, utime:1436440356)
info_time:1436441552:4 (ISO:2015-06-09 13:32:32, utime:1436441552)
info_time:1436441676:2 (ISO:2015-06-09 13:34:36, utime:1436441676)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The double processing is done to count the blocks that are going to be processed. In this way we compute the countmax value, used to know the percentage of work done and missing.
The rest of the algorithm takes care of processing the exact percentage requested but processing all the blocks in their natural order, as doing the oldest first would imply a too big slowdown.
Most of this complexity is required to handle some corner cases, like having 1% of block at oldest time 1 at the end of the array, and 50% of blocks at time 2, and the user wanting to scrub 12%. This mean scrubbing the oldest 1%, and 11% of the other at time 2.
This explain the need of timelimit/lastlimit variables.
Anyway, I've not already decided how to implement this behavior. At now the best option seems to be to just avoid to write the time info in sync, to ensure that newly synced data is scrubbed at first, like the oldest one.
In fact, it's a bit misleading that just synced data is counted as scrubbed one time, as this is not really happening.
This would allow to have this as default behavior, without the need of any additional option.
Ciao,
Andrea
Last edit: Andrea Mazzoleni 2015-07-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First of all, thank you for writing this amazing software, very well thought out.
I'm a relatively new user and have read all the manuals, faq and browsed a lot of the forum. One thing I would like to understand is, is there a way to scrub recently added or recently fixed files?
I tried to scrub by folder (next best alternative) but the scrub command doesn't seem to work with the folder filter.
Reason why I'm doing this is to ensure that the recently added files were synced correctly instead of waiting for the 20% weekly scrub to cycle through, or to re-perform a very time consuming 100% scrub just so I can scrub the recently added files.
If I find any errors, I could easily fix the issue with available copies instead of waiting several weeks to find out.
I do use the prehash during a sync, but I don't believe SnapRaid does a scrub of the recently added files during the Sync process. If it does, then I guess it wouldn't be necessary for me to scrub the recently added files.
Let me know, perhaps my logic is flawed. Thanks!
It sounds like what you really want is not scrub -- not exactly -- although scrubbing does accomplish the same thing eventually.
But what you really want, I think, is an option for snapraid to complete the sync, and then read back the newly added files again and verify the checksums. That would basically double the sync time, but it would provide you with some peace of mind that there was no temporary silent read error.
Last edit: Jessie Taylor 2015-05-27
alabam,
Which one would you suggesting/recommending:
Jessie, thank you for the concise summary, yes, that's exactly what i would want.
xad, I would suggest the first "--validate" option, my logic being there's not much reason for scrubbing recent files if there's a re-validate option available.
i'm not sure if i'm correct, but i would think that having a post sync validate option would eliminate the need for a pre-hash. also a validate would not only detect silent read errors (which pre-hash does), it would also detect silent write errors during the hash & parity generation phase (which pre-hash does not).
ps: i replied twice but somehow it never posted, apologies if i seem to repeat myself.
Last edit: alabama 2015-05-28
Hi alabama,
It's a nice idea. I'll add it in the TODO list.
Ciao,
Andrea
thank you Andrea. I'm sure you've heard this many times, this is an excellent piece of software!
This is a draft 7.1 version for whatever it is worth, as I also wanted the function (changed to only incl. related change #002)
*** edit: example code removed, OBE ***
Last edit: xad 2015-07-10
Andrea,
In the "scrub" code is there a specific reason to use "blockmax" instead of "count" when calculating countlimit, lastlimit, timelimit?
The selection of blocks to scrub is also duplicated (top of "state_scrub_process"). Is there special reasons for this or could be done once?
Could the selection even be done in the calling "state_scrub"?
Could the selection of block order be done in the calling "state_scrub" according to oldest first or would the performance penalty be to big?
After testing "--validate" option I found the need of performing re-validation from a previous point in time. By adding an optional argument it then both supports sync post-validation and forced re-validation (1-100: day offset; >1000000000L: unix time).
If you implement a "validate" option to scrub I would like to request a similar option to force re-validation from a specific time (using "status" it is easy to identify the right unix time, but it is even easier when adding the datetime in human readable format).
/X
PS! Ex. of added local ISO datetime to "status":
info_time:1436440356:2 (ISO:2015-06-09 13:12:36, utime:1436440356)
info_time:1436441552:4 (ISO:2015-06-09 13:32:32, utime:1436441552)
info_time:1436441676:2 (ISO:2015-06-09 13:34:36, utime:1436441676)
Hi xad,
The double processing is done to count the blocks that are going to be processed. In this way we compute the countmax value, used to know the percentage of work done and missing.
The rest of the algorithm takes care of processing the exact percentage requested but processing all the blocks in their natural order, as doing the oldest first would imply a too big slowdown.
Most of this complexity is required to handle some corner cases, like having 1% of block at oldest time 1 at the end of the array, and 50% of blocks at time 2, and the user wanting to scrub 12%. This mean scrubbing the oldest 1%, and 11% of the other at time 2.
This explain the need of timelimit/lastlimit variables.
Anyway, I've not already decided how to implement this behavior. At now the best option seems to be to just avoid to write the time info in sync, to ensure that newly synced data is scrubbed at first, like the oldest one.
In fact, it's a bit misleading that just synced data is counted as scrubbed one time, as this is not really happening.
This would allow to have this as default behavior, without the need of any additional option.
Ciao,
Andrea
Last edit: Andrea Mazzoleni 2015-07-13
Hi,
Thanks for the response.
/X
Last edit: xad 2015-07-14