Questions:
1) What size parity drive should I get? Is 5 TB suffciant or do I need larger?
2) SD[B-D] are USB3 but the data hardly gets modified or added to, just read. Is SATA really still suggested?
3) With about 16 TB of data (zillions of files) and at near capacity, is this an issue?
4) Is there a youtube video or something that shows setup to drives with existing data? I really dont want to "pioneer" on live data and realize I misinterpreted an instruction which is now zapping terabytes of data
Thank you in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1:
With default block size of 256 kiB you typically need a parity disk with 1 GiB extra space per ~8,000 files compared to the "fullest" data disk (in worst case with all files being ultra tiny you may need 1 GiB extra space per ~4,000 files).
If you have 8,000 files on a data disk with a total file size of 5.00 TB then you need a parity disk that can hold 5.001 TB.
So in most scenarios it makes more sense to have parity disk in same size as the largest data disk and just leave some empty space on the data disks instead of adding a larger parity disk with 0.999 TB unused space.
2: No idea. I use usb disks for parity without problems.
3: Yes, if you literally have zillions of data files you need a parity disk larger than universe... Otherwise see answer 1 :-)
4: No need to worry. Snapraid only writes to data disks when you run the fix and touch commands. So, building the array is read only for the data disks.
Consider only adding a small folder as data disk before adding all data disks. That way you can experiment without building terabytes of parity.
Last edit: Leifi Plomeros 2017-01-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What Leifi wrote. Adding to that, here is my 2 cents:
1) 5TB should be enough. Use tune2fs to disable reserved blocks on the parity drive, that should give you enough extra space for snapraid overhead, asuming that you did not disable reserved blocks on your other disks. If you did, just make sure you leave a few GB on your data disks free.
2) I started with all my disks in USB3 cases. Worked fine, albeit slower than SATA. Not sure if this a probem with USB3 or just my motherboard. Problem was, i couldn't spin them down at all, so i bought a proper computer case and have them in there now. Now only the disk i read from is spinning, thus saving a bit of energy. My parity is still in an USB3 case.
3) I recommend using a disk smaller than the universe, otherwise energy costs will be really high otherwise. If your budget is unlimited, an infinite energy bill is no problem, obviously.
4) There is little chance of failure, even if something goes wrong. Worst case: You start writing parity to a data disk, thus run out of space and have to delete the parity file manually. Creating the initial parity will stress your disks a bit, so make sure you have your disks cooled at least a bit.
This is a config that could work for you (read the comments in the example config, they explain everything):
# The parity disk is mounted to /snapraidparity
parity /parity/snapraid.parity
# And here are the precious data disks mounted
data data01 /drive1
data data02 /drive2
data data03 /drive3
data data04 /drive4
# I like to keep a few copies of the content file
content /parity/snapraid.content
content /drive1/.snapraid.content
content /drive2/.snapraid.content
content /drive3/.snapraid.content
content /drive4/.snapraid.content
# Stuff to ignore, starting from the disks mountpoint
exclude *.unrecoverable
exclude /tmp/
exclude /lost+found/
exclude /backups/
Basically you build snapraid, read its documentation and put the config mentioned to /etc/snapdrive.conf. Then you run "snapraid sync".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the feedback
1) I ordered an 8tb drive as it was not much more than a 5tb - overkill wont hurt (right?)
2) Good to know USB is not a showstopper but a performance hit
3) This is a tiny server, not a data farm (lol). The electric bill for the entire house is $100-$250 depending on the season - so the power saving is not in my list of priorities although spinning down to extend life is a nice idea.
4) I have been reading and I get the idea/concept better of what is happening.
I do have another concern is that my machine may be underpowered
I am wondering with the 1st sync take weeks to generate parity? If so, I can still access the info (it wont be updated).
If I do update or add, would the next sync be quicker or will it check every file for a change again?
*Manual doesnt (clearly) state this info
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can run snapraid -T to make snapraid test how fast your CPU is for different operations
C:\Snapraid>snapraid -T
snapraid v11.0 by Andrea Mazzoleni, http://www.snapraid.it
Compiler gcc 4.9.3
CPU GenuineIntel, family 6, model 60, flags sse2 ssse3 crc32 avx2
Memory is little-endian 64-bit
Support nanosecond timestamps with futimens()
Speed test using 8 data buffers of 262144 bytes, for a total of 2048 KiB.
Memory blocks have a displacement of 1792 bytes to improve cache performance.
The reported values are the aggregate bandwidth of all data blocks in MB/s,
not counting parity blocks.
Memory write speed using the C memset() function:
memset 22817
CRC used to check the content file integrity:
table 1367
intel 10401
Hash used to check the data blocks integrity:
best murmur3 spooky2
hash spooky2 5137 15181
RAID functions used for computing the parity with 'sync':
best int8 int32 int64 sse2 sse2e ssse3 ssse3e avx2 avx2e
gen1 avx2 14568 27596 48078 56788
gen2 avx2 4089 6731 20865 23736 33385
genz avx2e 2327 3355 11194 11977 20673
gen3 avx2e 814 10086 11449 19851
gen4 avx2e 650 7633 8870 16126
gen5 avx2e 544 6505 7004 13001
gen6 avx2e 420 5074 5955 10442
RAID functions used for recovering with 'fix':
best int8 ssse3 avx2
rec1 avx2 1216 3124 2912
rec2 avx2 545 1405 1631
rec3 avx2 113 702 1017
rec4 avx2 72 459 710
rec5 avx2 52 291 524
rec6 avx2 40 224 376
If the 'best' expectations are wrong, please report it in the SnapRAID forum
In the above example it tells me that my CPU allows snapraid to sync (update parity) at 56788 MiB/s if I have single parity and at 10442 MiB/s if I have 6 parity levels. Which is more than enough for me since my disks never exceed 1500 MiB/s combined speed.
Fix (recovery) is much slower and can "only" process 2912 MiB/s for single level or parity used and at 376 MiB/s for 6 levels of parity used.
When you sync all data disks will be read in parallell with the parity writes, so the time it takes is normally limited by the slowest/largest disk, unless you have another bottleneck.
If limited only by disk speed you can expect that it will take less than a day to complete, unless you have chosen to use an SMR disk for parity in which case it could take up to 3 days.
After the first sync only affected parts of parity is updated, which means if you add or remove a single file you can expect sync to complete in seconds.
Last edit: Leifi Plomeros 2017-01-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I got my 8 tb external usb 3 drive, I partitioned it to 6 parity and 2 gig live data (which I will start using for active stuff).
It took 18.5 hours to sync (create parity) 15.8 TB of data (235 MB/s)
I deleted a file and added a few small files and did a sync
I modified a large file (150 gig file), deleted a couple of files and added some pictures
It took 2 hours to sync the next time
I am happy
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I want to start using SnapRAID and have drives of varying sizes and some at 100% full.
Questions:
1) What size parity drive should I get? Is 5 TB suffciant or do I need larger?
2) SD[B-D] are USB3 but the data hardly gets modified or added to, just read. Is SATA really still suggested?
3) With about 16 TB of data (zillions of files) and at near capacity, is this an issue?
4) Is there a youtube video or something that shows setup to drives with existing data? I really dont want to "pioneer" on live data and realize I misinterpreted an instruction which is now zapping terabytes of data
Thank you in advance
1:
With default block size of 256 kiB you typically need a parity disk with 1 GiB extra space per ~8,000 files compared to the "fullest" data disk (in worst case with all files being ultra tiny you may need 1 GiB extra space per ~4,000 files).
If you have 8,000 files on a data disk with a total file size of 5.00 TB then you need a parity disk that can hold 5.001 TB.
So in most scenarios it makes more sense to have parity disk in same size as the largest data disk and just leave some empty space on the data disks instead of adding a larger parity disk with 0.999 TB unused space.
2: No idea. I use usb disks for parity without problems.
3: Yes, if you literally have zillions of data files you need a parity disk larger than universe... Otherwise see answer 1 :-)
4: No need to worry. Snapraid only writes to data disks when you run the fix and touch commands. So, building the array is read only for the data disks.
Consider only adding a small folder as data disk before adding all data disks. That way you can experiment without building terabytes of parity.
Last edit: Leifi Plomeros 2017-01-12
What Leifi wrote. Adding to that, here is my 2 cents:
1) 5TB should be enough. Use tune2fs to disable reserved blocks on the parity drive, that should give you enough extra space for snapraid overhead, asuming that you did not disable reserved blocks on your other disks. If you did, just make sure you leave a few GB on your data disks free.
2) I started with all my disks in USB3 cases. Worked fine, albeit slower than SATA. Not sure if this a probem with USB3 or just my motherboard. Problem was, i couldn't spin them down at all, so i bought a proper computer case and have them in there now. Now only the disk i read from is spinning, thus saving a bit of energy. My parity is still in an USB3 case.
3) I recommend using a disk smaller than the universe, otherwise energy costs will be really high otherwise. If your budget is unlimited, an infinite energy bill is no problem, obviously.
4) There is little chance of failure, even if something goes wrong. Worst case: You start writing parity to a data disk, thus run out of space and have to delete the parity file manually. Creating the initial parity will stress your disks a bit, so make sure you have your disks cooled at least a bit.
This is a config that could work for you (read the comments in the example config, they explain everything):
Basically you build snapraid, read its documentation and put the config mentioned to /etc/snapdrive.conf. Then you run "snapraid sync".
Thank you for the feedback
1) I ordered an 8tb drive as it was not much more than a 5tb - overkill wont hurt (right?)
2) Good to know USB is not a showstopper but a performance hit
3) This is a tiny server, not a data farm (lol). The electric bill for the entire house is $100-$250 depending on the season - so the power saving is not in my list of priorities although spinning down to extend life is a nice idea.
4) I have been reading and I get the idea/concept better of what is happening.
I do have another concern is that my machine may be underpowered
it also runs a MySQL database
I am wondering with the 1st sync take weeks to generate parity? If so, I can still access the info (it wont be updated).
If I do update or add, would the next sync be quicker or will it check every file for a change again?
*Manual doesnt (clearly) state this info
You can run snapraid -T to make snapraid test how fast your CPU is for different operations
In the above example it tells me that my CPU allows snapraid to sync (update parity) at 56788 MiB/s if I have single parity and at 10442 MiB/s if I have 6 parity levels. Which is more than enough for me since my disks never exceed 1500 MiB/s combined speed.
Fix (recovery) is much slower and can "only" process 2912 MiB/s for single level or parity used and at 376 MiB/s for 6 levels of parity used.
When you sync all data disks will be read in parallell with the parity writes, so the time it takes is normally limited by the slowest/largest disk, unless you have another bottleneck.
If limited only by disk speed you can expect that it will take less than a day to complete, unless you have chosen to use an SMR disk for parity in which case it could take up to 3 days.
After the first sync only affected parts of parity is updated, which means if you add or remove a single file you can expect sync to complete in seconds.
Last edit: Leifi Plomeros 2017-01-14
I got my 8 tb external usb 3 drive, I partitioned it to 6 parity and 2 gig live data (which I will start using for active stuff).
It took 18.5 hours to sync (create parity) 15.8 TB of data (235 MB/s)
I deleted a file and added a few small files and did a sync
I modified a large file (150 gig file), deleted a couple of files and added some pictures
It took 2 hours to sync the next time
I am happy