My setup is 2x2TB parity and 2x2TB data disks (1.8T under df), I've got MergerFS sitting atop the two data disks and currently I have ~300G free on each of the 2 data disks, however earlier I happened to run df -h to check how my overall disk usage looked and I noticed both my parity disks sitting at 98% utilisation, cue panic stations. I recently copied ~350G onto d2 and didn't pay much attention to the parity disk usage (was using 1.5T under df before the sync, so I figured I'd have a bit of headroom for a while).
Now I'm a little worried. I had plans to add an HBA card and upgrade one of the parity disks to 4TB at some point this year and move the old parity disk to be a data disk, now I'm jittery and need some advice on the best course of action to keep everything safe.
I've a couple of questions on how to proceed:
1) 98% disk usage scares me, anything above 95% scares me, 1.7T seems big for the parity file when neither of my data disks use more than 1.5T, does this seem right/am I really <40G away from blowing the storage on the HDDs?
2) I've heard a snapraid sync -R may reduce the bloat in the parity file, but the data will be unprotected during this time, given I have 2 disk redundancy is there a safer way to rebuild the parity files without leaving data unprotected?
3) What's the best strategy for adding a 4TB drive? Originally I'd planned to buy the HBA card, then a couple of months later pick up a 4TB drive, copy the parity file over to the 4TB drive, format one of the 2TB parity disks and repurpose this as a data disk. What's my best bet here given I've got 2 disks at 98% usage and an ebay HBA card might take a while to arrive and I'm out of SATA ports (unprotected 1TB cheapo disk + OS disk using my other 2 SATA ports)?
Any help much appreciated!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Self test...
Loading state from /var/snapraid.content...
Using 893 MiB of memory for the file-system.
SnapRAID status report:
Files Fragmented Excess Wasted Used Free Use Name
Files Fragments GB GB GB
345262 5 47 72.8 1514 380 80% d1
1022322 3 4 238.6 1555 169 91% d2
1367584 8 51 311.4 3069 550 86%
22%| o
| o
| o
|o o
|o o
|o o
|o o
11%|o o
|oo o
|oo o o
|o * o
|o * o o o o o
|o * * * o o * * o
|o * * * * * * * o
0%|ooooooo_ooooooo_ooooooooo_oooo_ooo__oooooo_oooooooo_ooo____oo
63 days ago of the last scrub/sync 0
The oldest block was scrubbed 63 days ago, the median 34, the newest 0.
No sync is in progress.
The 48% of the array is not scrubbed.
You have 911964 files with zero sub-second timestamp.
Run the 'touch' command to set it to a not zero value.
No rehash is in progress or needed.
No error detected.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
as you know, SnapRAID works on blocks. Each file is divided into blocks of 256KB, which is then used to calculate parity. In case there are many small files, there may be a lot of waste. Just think of 1KB files which are expanded to one block.
In your case, about 73GB is wasted on d1 and almost 240GB wasted on d2. That is, why the parity file is much larger than the used space on each disk.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
3) What's the best strategy for adding a 4TB drive? Originally I'd planned to buy the HBA card, then a couple of months later pick up a 4TB drive, copy the parity file over to the 4TB drive, format one of the 2TB parity disks and repurpose this as a data disk. What's my best bet here given I've got 2 disks at 98% usage and an ebay HBA card might take a while to arrive and I'm out of SATA ports (unprotected 1TB cheapo disk + OS disk using my other 2 SATA ports)?
Changing the setup to 3x2 TB data + 1x2 TB parity would be the easiest solution.
Remove 2-parity from config file.
Snapraid sync (instantly finished)
Reformat 2nd parity disk.
Add 2nd parity disk as data disk.
Snapraid sync (instantly finished)
Other alternatives:
A) Add an USB3 disk (I use that for parity).
B) Replace 1TB cheapo disk with a larger disk and use a subfolder on the larger replacement disk as a data disk in snapraid (you can borrow the sata connection from one of the parity disks while transferring data from old disk to new disk).
Last edit: Leifi Plomeros 2020-07-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks folks, that's the problem then, thank you! I've a considerable amount of deduping to do on photos and perhaps it's time to rethink my photo storage in general, but assuming I reduce this drastically, is there a "safe" way to rebuild the parity and reduce the disk usage while still retaining some level of redundancy or do both parity disks need to be wiped/calculated at once?
ie. if I did some housekeeping on data disks, remove parity 2 from config, run sync -R to recompute the parity on parity 1, then add parity 2 back into the config and sync again would that work or is there an even more sensible option?
@Leifi Plomeros
Assuming I can buy myself some time/space with the plan above would my original strategy for adding the 4TB be reasonable?
Good plan on the USB3, I'm going to take a second backup of the important stuff before housekeeping anyway got a bit of tunnel vision there, 98% gave me a shock :)
Last edit: Alexander Orr 2020-07-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Make a backup of snapraid.content and snapraid.conf.
Remove 2nd parity from snapraid.conf (not the backup)
Run snapraid sync -R
If problems: Use backup of snapraid.content and snapraid.conf to recover data via 2nd parity disk.
If no problems: Reformat and add 2nd parity disk to new array with smaller parity.
However I don't see much reason to this. Snapraid will not increase the size of the parity files as long as there are reusable blocks inside the parity file (previously used to protect now removed data).
Sync speed will not be noticably improved and scrub/fix times would be up to 12% faster, but only if you are able to maintain the smaller parity size.
From my perspective that is a pretty poor tradeoff for the added risk, effort involved and two times near full disk read/writes for all disks in the array.
Edit: Could you clarify the original plan? I'm not sure that I understand what it is...
Last edit: Leifi Plomeros 2020-07-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
My setup is 2x2TB parity and 2x2TB data disks (1.8T under df), I've got MergerFS sitting atop the two data disks and currently I have ~300G free on each of the 2 data disks, however earlier I happened to run df -h to check how my overall disk usage looked and I noticed both my parity disks sitting at 98% utilisation, cue panic stations. I recently copied ~350G onto d2 and didn't pay much attention to the parity disk usage (was using 1.5T under df before the sync, so I figured I'd have a bit of headroom for a while).
Now I'm a little worried. I had plans to add an HBA card and upgrade one of the parity disks to 4TB at some point this year and move the old parity disk to be a data disk, now I'm jittery and need some advice on the best course of action to keep everything safe.
I've a couple of questions on how to proceed:
1) 98% disk usage scares me, anything above 95% scares me, 1.7T seems big for the parity file when neither of my data disks use more than 1.5T, does this seem right/am I really <40G away from blowing the storage on the HDDs?
2) I've heard a snapraid sync -R may reduce the bloat in the parity file, but the data will be unprotected during this time, given I have 2 disk redundancy is there a safer way to rebuild the parity files without leaving data unprotected?
3) What's the best strategy for adding a 4TB drive? Originally I'd planned to buy the HBA card, then a couple of months later pick up a 4TB drive, copy the parity file over to the 4TB drive, format one of the 2TB parity disks and repurpose this as a data disk. What's my best bet here given I've got 2 disks at 98% usage and an ebay HBA card might take a while to arrive and I'm out of SATA ports (unprotected 1TB cheapo disk + OS disk using my other 2 SATA ports)?
Any help much appreciated!
what does "snapraid status" say?
This is the output I get from snapraid status:
Self test...
Loading state from /var/snapraid.content...
Using 893 MiB of memory for the file-system.
SnapRAID status report:
Files Fragmented Excess Wasted Used Free Use Name
Files Fragments GB GB GB
345262 5 47 72.8 1514 380 80% d1
1022322 3 4 238.6 1555 169 91% d2
1367584 8 51 311.4 3069 550 86%
22%| o
| o
| o
|o o
|o o
|o o
|o o
11%|o o
|oo o
|oo o o
|o * o
|o * o o o o o
|o * * * o o * * o
|o * * * * * * * o
0%|ooooooo_ooooooo_ooooooooo_oooo_ooo__oooooo_oooooooo_ooo____oo
63 days ago of the last scrub/sync 0
The oldest block was scrubbed 63 days ago, the median 34, the newest 0.
No sync is in progress.
The 48% of the array is not scrubbed.
You have 911964 files with zero sub-second timestamp.
Run the 'touch' command to set it to a not zero value.
No rehash is in progress or needed.
No error detected.
as you know, SnapRAID works on blocks. Each file is divided into blocks of 256KB, which is then used to calculate parity. In case there are many small files, there may be a lot of waste. Just think of 1KB files which are expanded to one block.
In your case, about 73GB is wasted on d1 and almost 240GB wasted on d2. That is, why the parity file is much larger than the used space on each disk.
Changing the setup to 3x2 TB data + 1x2 TB parity would be the easiest solution.
Other alternatives:
A) Add an USB3 disk (I use that for parity).
B) Replace 1TB cheapo disk with a larger disk and use a subfolder on the larger replacement disk as a data disk in snapraid (you can borrow the sata connection from one of the parity disks while transferring data from old disk to new disk).
Last edit: Leifi Plomeros 2020-07-06
Thanks folks, that's the problem then, thank you! I've a considerable amount of deduping to do on photos and perhaps it's time to rethink my photo storage in general, but assuming I reduce this drastically, is there a "safe" way to rebuild the parity and reduce the disk usage while still retaining some level of redundancy or do both parity disks need to be wiped/calculated at once?
ie. if I did some housekeeping on data disks, remove parity 2 from config, run sync -R to recompute the parity on parity 1, then add parity 2 back into the config and sync again would that work or is there an even more sensible option?
@Leifi Plomeros
Assuming I can buy myself some time/space with the plan above would my original strategy for adding the 4TB be reasonable?
Good plan on the USB3, I'm going to take a second backup of the important stuff before housekeeping anyway got a bit of tunnel vision there, 98% gave me a shock :)
Last edit: Alexander Orr 2020-07-06
Yes.
If problems: Use backup of snapraid.content and snapraid.conf to recover data via 2nd parity disk.
If no problems: Reformat and add 2nd parity disk to new array with smaller parity.
However I don't see much reason to this. Snapraid will not increase the size of the parity files as long as there are reusable blocks inside the parity file (previously used to protect now removed data).
Sync speed will not be noticably improved and scrub/fix times would be up to 12% faster, but only if you are able to maintain the smaller parity size.
From my perspective that is a pretty poor tradeoff for the added risk, effort involved and two times near full disk read/writes for all disks in the array.
Edit: Could you clarify the original plan? I'm not sure that I understand what it is...
Last edit: Leifi Plomeros 2020-07-06