I’m not sure if this is even a problem, just observations and intuition based on my experience
Setup my new config with 13 drives (12x16TB, 1x18TB) and 1 parity (1x20TB). The server is running 10-core i9-10900 with 24GB of RAM. All the drives are CMR (Seagate Exos, Ironwolf Pros or WD Red Pros).
Initial sync took almost a week, running at around 300MB/s (105 stripes/s) for almost the entirety. I have about 154TB used, spread quite evenly over the disks.
My understanding is that this number should be higher, because at 300MB/s / 13 disks = 23MB/s per disk, which is abysmally slow.
The ‘wait’ times after the initial sync and subsequent syncs always show parity1 with a bucket load of stars, I’m not sure if this is indicative of a problem.
SMART reports no problems on any of the drives. They are in an enclosure connected to a backplane and an LSI card. hdparm reports read speeds of 230-300MB/s for each of the disks
Coming from UnRAID, the initial parity sync there took about 24-36 hours (which from my understanding is doing pretty much the same thing - reading from all the disks and calculating parity).
I don’t THINK this is normal (but maybe i'm wrong?)
Subsequent syncs are around the same speed, 250-300MB/s, but I believe this can vary depending on where the new data is, etc.
Any ideas? Or is this completely normal? I ask because I see some in here that have 1000MB/s and I figure I should see at least that as the drives are capable of reading at 100MB/s easily.
# SnapRAID configuration file
# Content File(s)
content /mnt/snapraid-content/disk1/snapraid.content
content /mnt/snapraid-content/disk2/snapraid.content
content /mnt/snapraid-content/disk3/snapraid.content
# Parity Disk(s)
1-parity /mnt/parity1/snapraid.parity
# Data Disk(s)
data d1 /mnt/disk1
data d2 /mnt/disk2
data d3 /mnt/disk3
data d4 /mnt/disk4
data d5 /mnt/disk5
data d6 /mnt/disk6
data d7 /mnt/disk7
data d8 /mnt/disk8
data d9 /mnt/disk9
data d10 /mnt/disk10
data d11 /mnt/disk11
data d12 /mnt/disk12
data d13 /mnt/disk13
# Excluded files and directories
exclude *.unrecoverable
exclude /tmp/
exclude /lost+found/
exclude *.!sync
Finally, latest sync report (don't have the log for the initial one):
2023-01-0403:14:39,573[INFO ]============================================================2023-01-0403:14:39,573[INFO ]Runstarted2023-01-0403:14:39,573[INFO ]============================================================2023-01-0403:14:39,574[INFO ]Runningdiff...2023-01-0403:14:43,259[OUTPUT]Loadingstatefrom/mnt/snapraid-content/disk1/snapraid.content...2023-01-0403:15:36,631[OUTPUT]Comparing...2023-01-0403:15:36,632[OUTERR]WARNING!With13disksit's recommended to use two parity levels.<SNIP new files added> 2023-01-04 03:15:49,905 [OUTERR] WARNING! UUID is unsupported for disks: 'd1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'd10', 'd11', 'd12', 'd13'. Not using inodes to detect move operations.2023-01-04 03:15:49,905 [OUTPUT] 408072 equal2023-01-04 03:15:49,906 [OUTPUT] 23 added2023-01-04 03:15:49,906 [OUTPUT] 4 removed2023-01-04 03:15:49,906 [OUTPUT] 0 updated2023-01-04 03:15:49,906 [OUTPUT] 0 moved2023-01-04 03:15:49,906 [OUTPUT] 0 copied2023-01-04 03:15:49,906 [OUTPUT] 0 restored2023-01-04 03:15:49,906 [OUTPUT] There are differences!2023-01-04 03:15:51,766 [INFO ] ************************************************************2023-01-04 03:15:51,774 [INFO ] Diff results: 23 added, 4 removed, 0 moved, 0 modified2023-01-04 03:15:51,774 [INFO ] Running sync...2023-01-04 03:15:53,673 [OUTPUT] Self test...2023-01-04 03:15:53,840 [OUTPUT] Loading state from /mnt/snapraid-content/disk1/snapraid.content...2023-01-04 03:16:47,426 [OUTPUT] Scanning...2023-01-04 03:16:47,426 [OUTERR] WARNING! With 13 disks it'srecommendedtousetwoparitylevels.2023-01-0403:16:47,428[OUTPUT]Scannedd13in0seconds2023-01-0403:16:48,921[OUTPUT]Scannedd12in1seconds2023-01-0403:16:51,181[OUTPUT]Scannedd7in3seconds2023-01-0403:16:51,757[OUTPUT]Scannedd6in4seconds2023-01-0403:16:52,019[OUTPUT]Scannedd4in4seconds2023-01-0403:16:52,464[OUTPUT]Scannedd8in5seconds2023-01-0403:16:53,298[OUTPUT]Scannedd5in5seconds2023-01-0403:16:53,751[OUTPUT]Scannedd10in6seconds2023-01-0403:16:53,845[OUTPUT]Scannedd3in6seconds2023-01-0403:16:55,946[OUTPUT]Scannedd11in8seconds2023-01-0403:16:56,704[OUTPUT]Scannedd9in9seconds2023-01-0403:16:58,756[OUTPUT]Scannedd1in11seconds2023-01-0403:16:59,182[OUTPUT]Scannedd2in11seconds2023-01-0403:16:59,619[OUTERR]WARNING!UUIDisunsupportedfordisks:'d1','d2','d3','d4','d5','d6','d7','d8','d9','d10','d11','d12','d13'.Notusinginodestodetectmoveoperations.2023-01-0403:16:59,695[OUTPUT]Using10139MiBofmemoryforthefile-system.2023-01-0403:17:01,695[OUTPUT]Initializing...2023-01-0403:17:01,695[OUTPUT]Resizing...2023-01-0403:17:02,487[OUTPUT]Savingstateto/mnt/snapraid-content/disk1/snapraid.content...2023-01-0403:17:02,487[OUTPUT]Savingstateto/mnt/snapraid-content/disk2/snapraid.content...2023-01-0403:17:02,487[OUTPUT]Savingstateto/mnt/snapraid-content/disk3/snapraid.content...2023-01-0403:18:39,457[OUTPUT]Verifying...2023-01-0403:19:25,777[OUTPUT]Verified/mnt/snapraid-content/disk3/snapraid.contentin46seconds2023-01-0403:19:33,004[OUTPUT]Verified/mnt/snapraid-content/disk2/snapraid.contentin53seconds2023-01-0403:19:34,958[OUTPUT]Verified/mnt/snapraid-content/disk1/snapraid.contentin55seconds2023-01-0403:19:35,593[OUTPUT]Using224MiBofmemoryfor64cachedblocks.2023-01-0403:19:35,599[OUTPUT]Selecting...2023-01-0403:19:47,457[OUTPUT]Syncing...2023-01-0403:55:27,952[OUTPUT]2023-01-0403:55:27,966[OUTPUT]d10%|2023-01-0403:55:27,966[OUTPUT]d20%|2023-01-0403:55:27,966[OUTPUT]d30%|2023-01-0403:55:27,966[OUTPUT]d40%|2023-01-0403:55:27,967[OUTPUT]d50%|2023-01-0403:55:27,967[OUTPUT]d60%|2023-01-0403:55:27,967[OUTPUT]d70%|2023-01-0403:55:27,967[OUTPUT]d80%|2023-01-0403:55:27,967[OUTPUT]d90%|2023-01-0403:55:27,967[OUTPUT]d100%|2023-01-0403:55:27,967[OUTPUT]d110%|2023-01-0403:55:27,967[OUTPUT]d120%|2023-01-0403:55:27,967[OUTPUT]d130%|2023-01-0403:55:27,967[OUTPUT]parity89%|******************************************************2023-01-0403:55:27,967[OUTPUT]raid2%|*2023-01-0403:55:27,968[OUTPUT]hash6%|***2023-01-0403:55:27,968[OUTPUT]sched0%|2023-01-0403:55:27,968[OUTPUT]misc0%|2023-01-0403:55:27,968[OUTPUT]|______________________________________________________________2023-01-0403:55:27,968[OUTPUT]waittime(total,lessisbetter)2023-01-0403:55:27,968[OUTPUT]2023-01-0403:55:27,968[OUTPUT]EverythingOK2023-01-0403:55:36,828[OUTPUT]Savingstateto/mnt/snapraid-content/disk1/snapraid.content...2023-01-0403:55:36,828[OUTPUT]Savingstateto/mnt/snapraid-content/disk2/snapraid.content...2023-01-0403:55:36,829[OUTPUT]Savingstateto/mnt/snapraid-content/disk3/snapraid.content...2023-01-0403:57:18,751[OUTPUT]Verifying...2023-01-0403:58:06,724[OUTPUT]Verified/mnt/snapraid-content/disk3/snapraid.contentin47seconds2023-01-0403:58:13,944[OUTPUT]Verified/mnt/snapraid-content/disk1/snapraid.contentin55seconds2023-01-0403:58:23,438[OUTPUT]Verified/mnt/snapraid-content/disk2/snapraid.contentin64seconds2023-01-0403:58:55,547[INFO ]************************************************************2023-01-0403:58:55,549[INFO ]Runningcleanup...2023-01-0403:58:56,498[INFO ]************************************************************2023-01-0403:58:56,500[INFO ]Alldone2023-01-0403:58:58,293[INFO ]Runfinishedsuccessfully
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That 300MB/s seems slow. I'm running 19 drives with 3 parity, AMD 5, 40GB RAM and I'm getting 1730MB/s speeds. Your RAM may be low so your hitting swap.
I'd really recommend running at least a second parity drive. Restoring a drive is very disk intensive due to all drives running at 100% for days and if a drive is close to dying, doing a restore can be enough to push it over the edge. Then you'll have two drives gone with only a single parity. Just a thought.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm glad I'm not going insane. Unraid did calculate parity MUCH faster (about a day, which lines up with the 1000MB/s+ that I feel I should be seeing).
And yes, I have a 2nd parity drive in the mail, being delivered next week :)
Really trying to figure out how to troubleshoot this problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm running a sync right now and snapraid is using 21GB of memory. How are the drives connected? Have you done a speedtest on each individual drive to see if there is a bad port, cable, or socket?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How are your drives connected? JBOD RAID card? Motherboard SATA? PCI SATA cards? After you receive your new parity drive, I would try this. Disconnect your original parity drive and rename the content files. Edit your config file and remark out all drives except for two or three. Add the new parity drive as a single parity and run a sync. See what the speed it. If it's still low, remark out those drives and pick out three more. If it's quicker, add another drive and do another sync. Keep doing that until you hit a bottleneck.
Last edit: David 2023-01-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Each backplane connected via single channel to the HBA (one backplane connected to one channel of the HBA, the other backplane connected to the second channel). Single channel vs dual channel per backplane will reduce by about 20-30% the total throughput, but the max throughput for this card is theoretical 4800MB/s iirc and we are no where near that.
I will definitely try your suggestion (good one!) and report back (could take a while to do these tests).
As mentioned, I moved from Unraid with the exact same config (minus the new parity drive) and parity would take under 2 days (I don't remember exactly what it was because it was a while since my last rebuild).
Appreciate the help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a pretty much identical setup, although I threw a different mb/cpu in there. Running snapraid on ubuntu 22.04. Same SAS3 backplanes. 20+2 disks.
I had upgraded from a Dell H310 (LSI 9211-8i) SAS2 HBA to the supermicro card, and saw my sync speeds go in the toilet, just like yours.
I stumbled across the disk write-cache setting, and noticed all the disks under the new SAS3 HBA had it disabled (hdparm -W).
Swapping back to the H310 card, hdparm shows all drives have write caching ENABLED.
Sure enough, force enabling write cache on my drives while on the SAS3 HBA returns my sync speeds to normal.
I'm not sure why ubuntu treats the drives on the sas3 hba differently. Maybe it thinks they are hotswappable (I suppose they actually are with this chassis) and disables write caching? I never had any previous issues with the SAS2 H310 connected to the SAS3 backplanes.
I wondered if there are some settings in the LSI bios to play with, but I am having trouble getting into it.
Last edit: Derek Ciesielski 2023-02-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
(If you're certain that your parity HDD is CMR, then) it's possible/likely that the drive's Write Cache is disabled. Use hdparm (if SATA) or sdparm (SAS) to check (and hopefully fix).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I’m not sure if this is even a problem, just observations and intuition based on my experience
Setup my new config with 13 drives (12x16TB, 1x18TB) and 1 parity (1x20TB). The server is running 10-core i9-10900 with 24GB of RAM. All the drives are CMR (Seagate Exos, Ironwolf Pros or WD Red Pros).
Initial sync took almost a week, running at around 300MB/s (105 stripes/s) for almost the entirety. I have about 154TB used, spread quite evenly over the disks.
My understanding is that this number should be higher, because at 300MB/s / 13 disks = 23MB/s per disk, which is abysmally slow.
The ‘wait’ times after the initial sync and subsequent syncs always show parity1 with a bucket load of stars, I’m not sure if this is indicative of a problem.
SMART reports no problems on any of the drives. They are in an enclosure connected to a backplane and an LSI card. hdparm reports read speeds of 230-300MB/s for each of the disks
Coming from UnRAID, the initial parity sync there took about 24-36 hours (which from my understanding is doing pretty much the same thing - reading from all the disks and calculating parity).
I don’t THINK this is normal (but maybe i'm wrong?)
Subsequent syncs are around the same speed, 250-300MB/s, but I believe this can vary depending on where the new data is, etc.
Any ideas? Or is this completely normal? I ask because I see some in here that have 1000MB/s and I figure I should see at least that as the drives are capable of reading at 100MB/s easily.
Thanks.
Here is the output of
snapraid -T
And my snapraid.conf:
Finally, latest sync report (don't have the log for the initial one):
That 300MB/s seems slow. I'm running 19 drives with 3 parity, AMD 5, 40GB RAM and I'm getting 1730MB/s speeds. Your RAM may be low so your hitting swap.
I'd really recommend running at least a second parity drive. Restoring a drive is very disk intensive due to all drives running at 100% for days and if a drive is close to dying, doing a restore can be enough to push it over the edge. Then you'll have two drives gone with only a single parity. Just a thought.
I'm glad I'm not going insane. Unraid did calculate parity MUCH faster (about a day, which lines up with the 1000MB/s+ that I feel I should be seeing).
And yes, I have a 2nd parity drive in the mail, being delivered next week :)
Really trying to figure out how to troubleshoot this problem.
Check your swap file and memory usage when you sync. I'm guessing there's the problem.
Don't think it's a swap issue, swap is turned off (first thing I did was turn it off)
Memory shows about 9.9Gib free during the sync process.
I'm running a sync right now and snapraid is using 21GB of memory. How are the drives connected? Have you done a speedtest on each individual drive to see if there is a bad port, cable, or socket?
Ran
hddparm -tT
on each of the disks and they each report 230-300MB/s reads, so doubt it's a bad port/cable/socket.My snapraid uses about 10GB of RAM with the 13 disk configuration using default hash/block size.
Ran
hddparm -tT
on each of the disks and they each report 230-300MB/s reads, so doubt it's a bad port/cable/socket.My snapraid uses about 10GB of RAM with the 13 disk configuration using default hash/block size.
How are your drives connected? JBOD RAID card? Motherboard SATA? PCI SATA cards? After you receive your new parity drive, I would try this. Disconnect your original parity drive and rename the content files. Edit your config file and remark out all drives except for two or three. Add the new parity drive as a single parity and run a sync. See what the speed it. If it's still low, remark out those drives and pick out three more. If it's quicker, add another drive and do another sync. Keep doing that until you hit a bottleneck.
Last edit: David 2023-01-04
Sorry missed that part..
Case: Supermicro SuperChassis 847BE1C4-R1K23LPB
Backplanes: BPN-SAS3-846EL1 & BPN-SAS3-826EL1
HBA: Supermicro AOC-S3008L-L8e (basicaly an LSI-9300-8i, flashed with IT firmware)
Each backplane connected via single channel to the HBA (one backplane connected to one channel of the HBA, the other backplane connected to the second channel). Single channel vs dual channel per backplane will reduce by about 20-30% the total throughput, but the max throughput for this card is theoretical 4800MB/s iirc and we are no where near that.
I will definitely try your suggestion (good one!) and report back (could take a while to do these tests).
As mentioned, I moved from Unraid with the exact same config (minus the new parity drive) and parity would take under 2 days (I don't remember exactly what it was because it was a while since my last rebuild).
Appreciate the help!
That's pretty much the same hardware I have although I am using Windows.
I have a pretty much identical setup, although I threw a different mb/cpu in there. Running snapraid on ubuntu 22.04. Same SAS3 backplanes. 20+2 disks.
I had upgraded from a Dell H310 (LSI 9211-8i) SAS2 HBA to the supermicro card, and saw my sync speeds go in the toilet, just like yours.
I stumbled across the disk write-cache setting, and noticed all the disks under the new SAS3 HBA had it disabled (hdparm -W).
Swapping back to the H310 card, hdparm shows all drives have write caching ENABLED.
Sure enough, force enabling write cache on my drives while on the SAS3 HBA returns my sync speeds to normal.
I'm not sure why ubuntu treats the drives on the sas3 hba differently. Maybe it thinks they are hotswappable (I suppose they actually are with this chassis) and disables write caching? I never had any previous issues with the SAS2 H310 connected to the SAS3 backplanes.
I wondered if there are some settings in the LSI bios to play with, but I am having trouble getting into it.
Last edit: Derek Ciesielski 2023-02-27
(If you're certain that your parity HDD is CMR, then) it's possible/likely that the drive's Write Cache is disabled. Use hdparm (if SATA) or sdparm (SAS) to check (and hopefully fix).
I think you may be onto something! Yes, write cache is disabled on all disks (including parity)! I will test and report back.
Any news @jrarseneau ?
Edit: duplicate
Last edit: JR A. 2023-01-05