Can anyone explain this to me? Exact same array, linux takes 4 minutes to hash, windows takes 30 minutes.
Linux was run in a virtual machine on the same server as the windows ntfs array, which was then mounted as a SMB share in linux to perform the hash check.
root@znode1 /fpool# snapraid -c ./snapraid.conf check -a -D -U -v -f '/{B529D1FE-2533-408B-AF3A-3CC9AE8A2975}/duplicate.txt'
Self test...
Loading state from /fpool/content/C.content...
Filtering...
Using 6160 MiB of memory for the file-system.
Initializing...
Hashing...
correct {B529D1FE-2533-408B-AF3A-3CC9AE8A2975}/duplicate.txt
100% completed, 13 MB accessed in 0:04
Everything OK
[fnode1]: PS C:\array\snapraid> ./snapraid check -a -f '/{B529D1FE-2533-408B-AF3A-3CC9AE8A2975}/duplicate.txt'
Self test...
Loading state from C:/C.content...
Filtering...
Using 6160 MiB of memory for the file-system.
Initializing...
Hashing...
100% completed, 13 MB accessed in 0:29
Everything OK
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using Windows Server 2019 (1809) and SR 11.5 on both Windows and Linux.
99% of the time on Windows there was 0% disk activity on the array and 15% CPU activity in task manager.
I have no idea what SR is doing during those 25 or so minutes of zero activity.
It just sits on "Hashing" with no activity showing at all until the last couple minutes.
Initially I thought it froze up. After seeing the 15% cpu activity though I just let it go to see what happens.
This is all I see for the first 25 or so minutes.
Self test...
Loading state from C:/C.content...
Filtering...
Using 6160 MiB of memory for the file-system.
Initializing...
Hashing...
Last edit: Night Rider 2020-11-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Even though you have specified one (specific) file, the (only) mechanism for doing so is filtering (-f), which requires mucho directory scanning (at the very least). Possibly, the "disk activity" metric only covers file I/O.
Nothing jumps out to me regarding "linux via SMB" finishing faster than "native Windows" though.
Last edit: UhClem 2020-11-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Accessing the MFT, Volume Log or $Extend would still show under PerfMon.
I see no activity at all from snapraid.exe process other than chewing up CPU cycles.
I have run another test using Process Monitor and will post the results shortly.
It clearly shows snapraid.exe accessing the content files then going dead.
Then accessing the filtered file, then going dead again.
Not sure what it is doing.
Also, doing a "snapraid diff" only takes 2 minutes.
Last edit: Night Rider 2020-11-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After some testing and then some more testing on two different Windows machines...
I think there is a performance problem for fix and check related to the number of blocks in the array. Presumably only in Windows.
On my plex "server" with a tiny 15 GB array:
Checking and fixing a 1 GB file completes in a few seconds,
Absolutely no observable freeze before or after checking.
Tested first on Snapraid 11.2 and then repeated in 11.5
Changed blocksize from 256 to 2.
Repeated test in 11.5: ~1 second freeze when expecting to see 100% + Everything OK.
On my file "server":
Small array with 6 TB total data, blocksize 2048, v11.2:
2 second delay before checking begins.
5 seconds freeze at 95% then instantly 100% + Everything OK.
Also on the file "server":
200 TB array, blocksize 512
Snapraid diff: ~15-20 seconds total
Begin sync: 1.5 minutes total time including saving state and everything else.
Check a tiny file: 8 minutes total freeze.
Check all files in a folder: 8 minutes total freeze.
Fix a folder with 5 files: 8 minutes total freeze.
Everything check or fix-related: 8 minutes total freeze.
No difference between 11.4 and 11.5 (did not test 11.2)
Typically smaller delay before progress starts to count and then long freeze exactly at the moment when 100% + Everything OK is expected to show. During freeze CPU usage is more or less constant at around 33-35%
Interrestingly when I tested to fix the folder with 5 files this happened:
Checking...
Freeze for a while
3 files completely restored (~200 MB total)
Several minutes freeze.
2 remaining files restored (0 byte size)
Possibly short freeze (I did not pay attention)
Total time: 8 min.
Last edit: Leifi Plomeros 2020-11-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Freeze occurs before, between or after files are being checked/fixed. (Last example)
The frozen time is constant regardless if checking 1 or many files (all examples from 200TB array)
The frozen time is affected by number of blocks (first example: From not visible to visible when decreasing blocksize = increased number of blocks).
It basically looks like this is what happens:
Block 0´to 20493 = Nothing to do; short freeze while evaluating.
Block 20494 = Match found - Do check or fix
Block 20495 to 483983 Nothing to do; Long freeze while evaluating.
Done, Everything OK.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is the output from procmon, I've truncated some parts for sanity.
You can clearly see where it stops accessing the content file and freezes up doing who knows what.
I'm willing to entertain the fact that somehow it's my Windows installation that is the problem.
I will be booting this machine from a vanilla Windows 20H2 image and will retest.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I wouldn't see the same problem if it was limited to your installation.
I also just noticed this detail in your first post:
"Using 6160 MiB of memory for the file-system."
And it gives you 25 minutes where "nothing" happens.
For me that number is 1981 MiB and gives me 8 minutes where "nothing" happens.
The time frozen is perfectly correlated to the size of the file system, even between our most likely very different setups.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For fix & check, there is a very small amount of "housekeeping" overhead for each/every data block (only excluding those blocks outside of any -S/-B range). With "real" activity (massively) reduced by filtering, that (cumulative) overhead becomes very "visible".
[Note that during those "frozen" periods, SnapRAID is solidly CPU-bound. Your, e.g., 15% suggests you have a 6- or 8-core processor.]
If Andrea is interested, a slight re-arrangement of code reduces this overhead by 25-30%.
"A penny here, and a penny there ... pretty soon you're talking about real money."
Last edit: UhClem 2020-11-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Any theory regarding 4 minutes in Linux vs 29 minutes in Windows?
The only thing I could think of would be much larger block size in the Linux array.
Wouldn't it be possible to get rid of the overhead by "pre-filtering" in order to determine a relevant -S/-B range? Or is the overhead basically an unavoidable part of the filter process?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Exact same array, same settings, setting cannot be changed on an existing array.
Nobody has been able to explain why windows takes 8 times longer to do anything.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So after extensive testing in a controlled environment with the least amount of variables:
Windows 8
Windows 8.1
Windows 10 (1809)
Windows 10 (20H2)
Windows Server 2019
Snapraid versions 11.2 - 11.5 on each.
~20 individual tests.
I've concluded this software is useless in Windows. It takes 8 times longer to do anything in windows than it does in Linux.
Thank you all for your replies but nobody has been able to explain why this discrepancy exists.
I'm not interested in peoples "opinions", I'm more interested in factual proof backed up by actual tests and log data.
Maybe @amadvance (Andrea) can explain it. I will await his reply when he has time.
As for now, I will be transferring the array back to a Linux server. It makes absolutely no sense using Windows with this software, unless you get paid by the hour.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've concluded this software is useless in Windows. It takes 8 times longer to do anything in windows than it does in Linux.
Did you really test anything other than checking a specific file?
If you have the same problem with sync, scrub, unfiltered check/fix, then there is obviously a local problem, most likely not limited to snapraid.
As for now, I will be transferring the array back to a Linux server. It makes absolutely no sense using Windows with this software, unless you get paid by the hour.
I would absolutely agree if it was true. However I find the performance in Windows to be very good. Please review:
You can believe what you want to believe. I trust nobody, and much rather do my own testing and bench-marking. And have clearly come to the conclusion SnapRAID on windows is useless compared to Linux, for me. Believe me or not. If you are happy and can live with the degraded performance then koodos to you, I on the other hand cannot. As far as I'm concerned this mystery has been solved, and I am quite happy and capable of moving back to Linux.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can anyone explain this to me? Exact same array, linux takes 4 minutes to hash, windows takes 30 minutes.
Linux was run in a virtual machine on the same server as the windows ntfs array, which was then mounted as a SMB share in linux to perform the hash check.
root@znode1 /fpool# snapraid -c ./snapraid.conf check -a -D -U -v -f '/{B529D1FE-2533-408B-AF3A-3CC9AE8A2975}/duplicate.txt'
Self test...
Loading state from /fpool/content/C.content...
Filtering...
Using 6160 MiB of memory for the file-system.
Initializing...
Hashing...
correct {B529D1FE-2533-408B-AF3A-3CC9AE8A2975}/duplicate.txt
100% completed, 13 MB accessed in 0:04
Everything OK
[fnode1]: PS C:\array\snapraid> ./snapraid check -a -f '/{B529D1FE-2533-408B-AF3A-3CC9AE8A2975}/duplicate.txt'
Self test...
Loading state from C:/C.content...
Filtering...
Using 6160 MiB of memory for the file-system.
Initializing...
Hashing...
100% completed, 13 MB accessed in 0:29
Everything OK
Did you observe it while it was in progress on the Windows machine?
I just noticed that both fix and check takes extremely long time to finish after it is done on Windows 10 build 19041 (aka 2004) using snapraid v11.4.
A tiny 2 KB file took 8 minutes to check (showing 100% progress the entire time).
Restoring the same file using fix also took 8 minutes but I could verify that it was perfectly restored in the first minute.
I'm using Windows Server 2019 (1809) and SR 11.5 on both Windows and Linux.
99% of the time on Windows there was 0% disk activity on the array and 15% CPU activity in task manager.
I have no idea what SR is doing during those 25 or so minutes of zero activity.
It just sits on "Hashing" with no activity showing at all until the last couple minutes.
Initially I thought it froze up. After seeing the 15% cpu activity though I just let it go to see what happens.
This is all I see for the first 25 or so minutes.
Self test...
Loading state from C:/C.content...
Filtering...
Using 6160 MiB of memory for the file-system.
Initializing...
Hashing...
Last edit: Night Rider 2020-11-13
Even though you have specified one (specific) file, the (only) mechanism for doing so is filtering (-f), which requires mucho directory scanning (at the very least). Possibly, the "disk activity" metric only covers file I/O.
Nothing jumps out to me regarding "linux via SMB" finishing faster than "native Windows" though.
Last edit: UhClem 2020-11-13
Accessing the MFT, Volume Log or $Extend would still show under PerfMon.
I see no activity at all from snapraid.exe process other than chewing up CPU cycles.
I have run another test using Process Monitor and will post the results shortly.
It clearly shows snapraid.exe accessing the content files then going dead.
Then accessing the filtered file, then going dead again.
Not sure what it is doing.
Also, doing a "snapraid diff" only takes 2 minutes.
Last edit: Night Rider 2020-11-14
After some testing and then some more testing on two different Windows machines...
I think there is a performance problem for fix and check related to the number of blocks in the array. Presumably only in Windows.
On my plex "server" with a tiny 15 GB array:
Checking and fixing a 1 GB file completes in a few seconds,
Absolutely no observable freeze before or after checking.
Tested first on Snapraid 11.2 and then repeated in 11.5
Changed blocksize from 256 to 2.
Repeated test in 11.5: ~1 second freeze when expecting to see 100% + Everything OK.
On my file "server":
Small array with 6 TB total data, blocksize 2048, v11.2:
2 second delay before checking begins.
5 seconds freeze at 95% then instantly 100% + Everything OK.
Also on the file "server":
200 TB array, blocksize 512
Snapraid diff: ~15-20 seconds total
Begin sync: 1.5 minutes total time including saving state and everything else.
Check a tiny file: 8 minutes total freeze.
Check all files in a folder: 8 minutes total freeze.
Fix a folder with 5 files: 8 minutes total freeze.
Everything check or fix-related: 8 minutes total freeze.
No difference between 11.4 and 11.5 (did not test 11.2)
Typically smaller delay before progress starts to count and then long freeze exactly at the moment when 100% + Everything OK is expected to show. During freeze CPU usage is more or less constant at around 33-35%
Interrestingly when I tested to fix the folder with 5 files this happened:
Checking...
Freeze for a while
3 files completely restored (~200 MB total)
Several minutes freeze.
2 remaining files restored (0 byte size)
Possibly short freeze (I did not pay attention)
Total time: 8 min.
Last edit: Leifi Plomeros 2020-11-13
I'm sorry, you have so many changing variables I'm not sure what you are proving here.
Freeze occurs before, between or after files are being checked/fixed. (Last example)
The frozen time is constant regardless if checking 1 or many files (all examples from 200TB array)
The frozen time is affected by number of blocks (first example: From not visible to visible when decreasing blocksize = increased number of blocks).
It basically looks like this is what happens:
Block 0´to 20493 = Nothing to do; short freeze while evaluating.
Block 20494 = Match found - Do check or fix
Block 20495 to 483983 Nothing to do; Long freeze while evaluating.
Done, Everything OK.
Here is the output from procmon, I've truncated some parts for sanity.
You can clearly see where it stops accessing the content file and freezes up doing who knows what.
I'm willing to entertain the fact that somehow it's my Windows installation that is the problem.
I will be booting this machine from a vanilla Windows 20H2 image and will retest.
I wouldn't see the same problem if it was limited to your installation.
I also just noticed this detail in your first post:
"Using 6160 MiB of memory for the file-system."
And it gives you 25 minutes where "nothing" happens.
For me that number is 1981 MiB and gives me 8 minutes where "nothing" happens.
The time frozen is perfectly correlated to the size of the file system, even between our most likely very different setups.
Here's the deal ...
For fix & check, there is a very small amount of "housekeeping" overhead for each/every data block (only excluding those blocks outside of any -S/-B range). With "real" activity (massively) reduced by filtering, that (cumulative) overhead becomes very "visible".
[Note that during those "frozen" periods, SnapRAID is solidly CPU-bound. Your, e.g., 15% suggests you have a 6- or 8-core processor.]
If Andrea is interested, a slight re-arrangement of code reduces this overhead by 25-30%.
"A penny here, and a penny there ... pretty soon you're talking about real money."
Last edit: UhClem 2020-11-15
Nice.
Any theory regarding 4 minutes in Linux vs 29 minutes in Windows?
The only thing I could think of would be much larger block size in the Linux array.
Wouldn't it be possible to get rid of the overhead by "pre-filtering" in order to determine a relevant -S/-B range? Or is the overhead basically an unavoidable part of the filter process?
Exact same array, same settings, setting cannot be changed on an existing array.
Nobody has been able to explain why windows takes 8 times longer to do anything.
Then explain why this "overhead" takes 8 times longer on windows please, exact same array.
YES! and it's easy , efficient, and effective! ((1) Cheap, (2) Fast, (3) Good... Get all 3!
I hope Andrea agrees. (It's obvious what to do & where.)
Re: 4min vs 29 min ... ??? the "frozen" time thing is OS-agnostic.
Last edit: UhClem 2020-11-15
So after extensive testing in a controlled environment with the least amount of variables:
Windows 8
Windows 8.1
Windows 10 (1809)
Windows 10 (20H2)
Windows Server 2019
Snapraid versions 11.2 - 11.5 on each.
~20 individual tests.
I've concluded this software is useless in Windows. It takes 8 times longer to do anything in windows than it does in Linux.
Thank you all for your replies but nobody has been able to explain why this discrepancy exists.
I'm not interested in peoples "opinions", I'm more interested in factual proof backed up by actual tests and log data.
Maybe @amadvance (Andrea) can explain it. I will await his reply when he has time.
As for now, I will be transferring the array back to a Linux server. It makes absolutely no sense using Windows with this software, unless you get paid by the hour.
Did you really test anything other than checking a specific file?
If you have the same problem with sync, scrub, unfiltered check/fix, then there is obviously a local problem, most likely not limited to snapraid.
I would absolutely agree if it was true. However I find the performance in Windows to be very good. Please review:
Checking a single file took ~2 seconds.
Checking the entire array took 23 seconds.
I doubt it would be much faster in Linux.
You can believe what you want to believe. I trust nobody, and much rather do my own testing and bench-marking. And have clearly come to the conclusion SnapRAID on windows is useless compared to Linux, for me. Believe me or not. If you are happy and can live with the degraded performance then koodos to you, I on the other hand cannot. As far as I'm concerned this mystery has been solved, and I am quite happy and capable of moving back to Linux.
I've found my answer. It's most likely how SnapRAID was compiled.
https://www.youtube.com/watch?v=8e7IdHG5fhQ
Maybe Windows Defender (MS antivirus software) is checking snapraid
Thanks but it was being run on server 2019 without defender.