Menu

Possible sync speed issue, input requested

Help
JR A.
2023-01-04
2023-03-12
  • JR A.

    JR A. - 2023-01-04

    Hello,

    I’m not sure if this is even a problem, just observations and intuition based on my experience

    Setup my new config with 13 drives (12x16TB, 1x18TB) and 1 parity (1x20TB). The server is running 10-core i9-10900 with 24GB of RAM. All the drives are CMR (Seagate Exos, Ironwolf Pros or WD Red Pros).

    Initial sync took almost a week, running at around 300MB/s (105 stripes/s) for almost the entirety. I have about 154TB used, spread quite evenly over the disks.

    My understanding is that this number should be higher, because at 300MB/s / 13 disks = 23MB/s per disk, which is abysmally slow.

    The ‘wait’ times after the initial sync and subsequent syncs always show parity1 with a bucket load of stars, I’m not sure if this is indicative of a problem.

    SMART reports no problems on any of the drives. They are in an enclosure connected to a backplane and an LSI card. hdparm reports read speeds of 230-300MB/s for each of the disks

    Coming from UnRAID, the initial parity sync there took about 24-36 hours (which from my understanding is doing pretty much the same thing - reading from all the disks and calculating parity).

    I don’t THINK this is normal (but maybe i'm wrong?)

    Subsequent syncs are around the same speed, 250-300MB/s, but I believe this can vary depending on where the new data is, etc.

    Any ideas? Or is this completely normal? I ask because I see some in here that have 1000MB/s and I figure I should see at least that as the drives are capable of reading at 100MB/s easily.

    Thanks.

    Here is the output of snapraid -T

    snapraid v12.2 by Andrea Mazzoleni, http://www.snapraid.it                                                                                           
    Compiler gcc 10.2.1 20210110                                                                                                                         
    CPU GenuineIntel, family 6, model 165, flags sse2 ssse3 crc32 avx2                                                                                   
    Memory is little-endian 64-bit                                                                                                                       
    Support nanosecond timestamps with futimens()                                                                                                        
    
    Speed test using 8 data buffers of 262144 bytes, for a total of 2048 KiB.                                                                            
    Memory blocks have a displacement of 1792 bytes to improve cache performance.                                                                        
    The reported values are the aggregate bandwidth of all data blocks in MB/s,                                                                          
    not counting parity blocks.                                                                                                                          
    
    Memory write speed using the C memset() function:                                                                                                    
      memset   57413                                                                                                                                     
    
    CRC used to check the content file integrity:                                                                                                        
       table    1718                                                                                                                                     
       intel   12359                                                                                                                                     
    
    Hash used to check the data blocks integrity:                                                                                                        
                best murmur3 spooky2   metro                                                                                                             
        hash spooky2    6165   17864   20569                                                                                                             
    
    RAID functions used for computing the parity with 'sync':                                                                                            
                best    int8   int32   int64    sse2   sse2e   ssse3  ssse3e    avx2   avx2e                                                             
        gen1    avx2           19172   33782   58862                           68078                                                                     
        gen2    avx2            5194    9616   25351   26055                   42645                                                                     
        genz   avx2e            2882    5224   14165   13237                           24592                                                             
        gen3   avx2e    1273                                   12830   13709           25955                                                             
        gen4   avx2e     948                                    9676   10758           21233                                                             
        gen5   avx2e     768                                    7852    8681           17011                                                             
        gen6   avx2e     600                                    6643    7393           14554                                                             
    
    RAID functions used for recovering with 'fix':                                                                                                       
                best    int8   ssse3    avx2                                                                                                             
        rec1    avx2    1593    3902    4265                                                                                                             
        rec2    avx2     691    1739    2218                                                                                                             
        rec3    avx2     165     911    1382                                                                                                             
        rec4    avx2     108     604     955                                                                                                             
        rec5    avx2      71     416     697                                                                                                             
        rec6    avx2      55     312     534                                                                                                             
    
    If the 'best' expectations are wrong, please report it in the SnapRAID forum
    

    And my snapraid.conf:

    # SnapRAID configuration file
    
    # Content File(s)
    content /mnt/snapraid-content/disk1/snapraid.content
    content /mnt/snapraid-content/disk2/snapraid.content
    content /mnt/snapraid-content/disk3/snapraid.content
    
    # Parity Disk(s)
    1-parity /mnt/parity1/snapraid.parity
    
    # Data Disk(s)
    data d1 /mnt/disk1
    data d2 /mnt/disk2
    data d3 /mnt/disk3
    data d4 /mnt/disk4
    data d5 /mnt/disk5
    data d6 /mnt/disk6
    data d7 /mnt/disk7
    data d8 /mnt/disk8
    data d9 /mnt/disk9
    data d10 /mnt/disk10
    data d11 /mnt/disk11
    data d12 /mnt/disk12
    data d13 /mnt/disk13
    
    # Excluded files and directories
    exclude *.unrecoverable
    exclude /tmp/
    exclude /lost+found/
    exclude *.!sync
    

    Finally, latest sync report (don't have the log for the initial one):

    2023-01-04 03:14:39,573 [INFO  ] ============================================================
    2023-01-04 03:14:39,573 [INFO  ] Run started
    2023-01-04 03:14:39,573 [INFO  ] ============================================================
    2023-01-04 03:14:39,574 [INFO  ] Running diff...
    2023-01-04 03:14:43,259 [OUTPUT] Loading state from /mnt/snapraid-content/disk1/snapraid.content...
    2023-01-04 03:15:36,631 [OUTPUT] Comparing...
    2023-01-04 03:15:36,632 [OUTERR] WARNING! With 13 disks it's recommended to use two parity levels.
    <SNIP new files added> 
    2023-01-04 03:15:49,905 [OUTERR] WARNING! UUID is unsupported for disks: 'd1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'd10', 'd11', 'd12', 'd13'. Not using inodes to detect move operations.
    2023-01-04 03:15:49,905 [OUTPUT]   408072 equal
    2023-01-04 03:15:49,906 [OUTPUT]       23 added
    2023-01-04 03:15:49,906 [OUTPUT]        4 removed
    2023-01-04 03:15:49,906 [OUTPUT]        0 updated
    2023-01-04 03:15:49,906 [OUTPUT]        0 moved
    2023-01-04 03:15:49,906 [OUTPUT]        0 copied
    2023-01-04 03:15:49,906 [OUTPUT]        0 restored
    2023-01-04 03:15:49,906 [OUTPUT] There are differences!
    2023-01-04 03:15:51,766 [INFO  ] ************************************************************
    2023-01-04 03:15:51,774 [INFO  ] Diff results: 23 added,  4 removed,  0 moved,  0 modified
    2023-01-04 03:15:51,774 [INFO  ] Running sync...
    2023-01-04 03:15:53,673 [OUTPUT] Self test...
    2023-01-04 03:15:53,840 [OUTPUT] Loading state from /mnt/snapraid-content/disk1/snapraid.content...
    2023-01-04 03:16:47,426 [OUTPUT] Scanning...
    2023-01-04 03:16:47,426 [OUTERR] WARNING! With 13 disks it's recommended to use two parity levels.
    2023-01-04 03:16:47,428 [OUTPUT] Scanned d13 in 0 seconds
    2023-01-04 03:16:48,921 [OUTPUT] Scanned d12 in 1 seconds
    2023-01-04 03:16:51,181 [OUTPUT] Scanned d7 in 3 seconds
    2023-01-04 03:16:51,757 [OUTPUT] Scanned d6 in 4 seconds
    2023-01-04 03:16:52,019 [OUTPUT] Scanned d4 in 4 seconds
    2023-01-04 03:16:52,464 [OUTPUT] Scanned d8 in 5 seconds
    2023-01-04 03:16:53,298 [OUTPUT] Scanned d5 in 5 seconds
    2023-01-04 03:16:53,751 [OUTPUT] Scanned d10 in 6 seconds
    2023-01-04 03:16:53,845 [OUTPUT] Scanned d3 in 6 seconds
    2023-01-04 03:16:55,946 [OUTPUT] Scanned d11 in 8 seconds
    2023-01-04 03:16:56,704 [OUTPUT] Scanned d9 in 9 seconds
    2023-01-04 03:16:58,756 [OUTPUT] Scanned d1 in 11 seconds
    2023-01-04 03:16:59,182 [OUTPUT] Scanned d2 in 11 seconds
    2023-01-04 03:16:59,619 [OUTERR] WARNING! UUID is unsupported for disks: 'd1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'd10', 'd11', 'd12', 'd13'. Not using inodes to detect move operations.
    2023-01-04 03:16:59,695 [OUTPUT] Using 10139 MiB of memory for the file-system.
    2023-01-04 03:17:01,695 [OUTPUT] Initializing...
    2023-01-04 03:17:01,695 [OUTPUT] Resizing...
    2023-01-04 03:17:02,487 [OUTPUT] Saving state to /mnt/snapraid-content/disk1/snapraid.content...
    2023-01-04 03:17:02,487 [OUTPUT] Saving state to /mnt/snapraid-content/disk2/snapraid.content...
    2023-01-04 03:17:02,487 [OUTPUT] Saving state to /mnt/snapraid-content/disk3/snapraid.content...
    2023-01-04 03:18:39,457 [OUTPUT] Verifying...
    2023-01-04 03:19:25,777 [OUTPUT] Verified /mnt/snapraid-content/disk3/snapraid.content in 46 seconds
    2023-01-04 03:19:33,004 [OUTPUT] Verified /mnt/snapraid-content/disk2/snapraid.content in 53 seconds
    2023-01-04 03:19:34,958 [OUTPUT] Verified /mnt/snapraid-content/disk1/snapraid.content in 55 seconds
    2023-01-04 03:19:35,593 [OUTPUT] Using 224 MiB of memory for 64 cached blocks.
    2023-01-04 03:19:35,599 [OUTPUT] Selecting...
    2023-01-04 03:19:47,457 [OUTPUT] Syncing...
    2023-01-04 03:55:27,952 [OUTPUT] 
    2023-01-04 03:55:27,966 [OUTPUT]      d1  0% |
    2023-01-04 03:55:27,966 [OUTPUT]      d2  0% |
    2023-01-04 03:55:27,966 [OUTPUT]      d3  0% |
    2023-01-04 03:55:27,966 [OUTPUT]      d4  0% |
    2023-01-04 03:55:27,967 [OUTPUT]      d5  0% |
    2023-01-04 03:55:27,967 [OUTPUT]      d6  0% |
    2023-01-04 03:55:27,967 [OUTPUT]      d7  0% |
    2023-01-04 03:55:27,967 [OUTPUT]      d8  0% |
    2023-01-04 03:55:27,967 [OUTPUT]      d9  0% |
    2023-01-04 03:55:27,967 [OUTPUT]     d10  0% |
    2023-01-04 03:55:27,967 [OUTPUT]     d11  0% |
    2023-01-04 03:55:27,967 [OUTPUT]     d12  0% |
    2023-01-04 03:55:27,967 [OUTPUT]     d13  0% |
    2023-01-04 03:55:27,967 [OUTPUT]  parity 89% | ******************************************************
    2023-01-04 03:55:27,967 [OUTPUT]    raid  2% | *
    2023-01-04 03:55:27,968 [OUTPUT]    hash  6% | ***
    2023-01-04 03:55:27,968 [OUTPUT]   sched  0% |
    2023-01-04 03:55:27,968 [OUTPUT]    misc  0% |
    2023-01-04 03:55:27,968 [OUTPUT]             |______________________________________________________________
    2023-01-04 03:55:27,968 [OUTPUT]                            wait time (total, less is better)
    2023-01-04 03:55:27,968 [OUTPUT] 
    2023-01-04 03:55:27,968 [OUTPUT] Everything OK
    2023-01-04 03:55:36,828 [OUTPUT] Saving state to /mnt/snapraid-content/disk1/snapraid.content...
    2023-01-04 03:55:36,828 [OUTPUT] Saving state to /mnt/snapraid-content/disk2/snapraid.content...
    2023-01-04 03:55:36,829 [OUTPUT] Saving state to /mnt/snapraid-content/disk3/snapraid.content...
    2023-01-04 03:57:18,751 [OUTPUT] Verifying...
    2023-01-04 03:58:06,724 [OUTPUT] Verified /mnt/snapraid-content/disk3/snapraid.content in 47 seconds
    2023-01-04 03:58:13,944 [OUTPUT] Verified /mnt/snapraid-content/disk1/snapraid.content in 55 seconds
    2023-01-04 03:58:23,438 [OUTPUT] Verified /mnt/snapraid-content/disk2/snapraid.content in 64 seconds
    2023-01-04 03:58:55,547 [INFO  ] ************************************************************
    2023-01-04 03:58:55,549 [INFO  ] Running cleanup...
    2023-01-04 03:58:56,498 [INFO  ] ************************************************************
    2023-01-04 03:58:56,500 [INFO  ] All done
    2023-01-04 03:58:58,293 [INFO  ] Run finished successfully
    
     
  • David

    David - 2023-01-04

    That 300MB/s seems slow. I'm running 19 drives with 3 parity, AMD 5, 40GB RAM and I'm getting 1730MB/s speeds. Your RAM may be low so your hitting swap.

    I'd really recommend running at least a second parity drive. Restoring a drive is very disk intensive due to all drives running at 100% for days and if a drive is close to dying, doing a restore can be enough to push it over the edge. Then you'll have two drives gone with only a single parity. Just a thought.

     
    • JR A.

      JR A. - 2023-01-04

      I'm glad I'm not going insane. Unraid did calculate parity MUCH faster (about a day, which lines up with the 1000MB/s+ that I feel I should be seeing).

      And yes, I have a 2nd parity drive in the mail, being delivered next week :)

      Really trying to figure out how to troubleshoot this problem.

       
      • David

        David - 2023-01-04

        Check your swap file and memory usage when you sync. I'm guessing there's the problem.

         
        • JR A.

          JR A. - 2023-01-04

          Don't think it's a swap issue, swap is turned off (first thing I did was turn it off)

          Memory shows about 9.9Gib free during the sync process.

           
          • David

            David - 2023-01-04

            I'm running a sync right now and snapraid is using 21GB of memory. How are the drives connected? Have you done a speedtest on each individual drive to see if there is a bad port, cable, or socket?

             
            • JR A.

              JR A. - 2023-01-04

              Ran hddparm -tT on each of the disks and they each report 230-300MB/s reads, so doubt it's a bad port/cable/socket.

              My snapraid uses about 10GB of RAM with the 13 disk configuration using default hash/block size.

               
            • JR A.

              JR A. - 2023-01-04

              Ran hddparm -tT on each of the disks and they each report 230-300MB/s reads, so doubt it's a bad port/cable/socket.

              My snapraid uses about 10GB of RAM with the 13 disk configuration using default hash/block size.

               
  • David

    David - 2023-01-04

    How are your drives connected? JBOD RAID card? Motherboard SATA? PCI SATA cards? After you receive your new parity drive, I would try this. Disconnect your original parity drive and rename the content files. Edit your config file and remark out all drives except for two or three. Add the new parity drive as a single parity and run a sync. See what the speed it. If it's still low, remark out those drives and pick out three more. If it's quicker, add another drive and do another sync. Keep doing that until you hit a bottleneck.

     

    Last edit: David 2023-01-04
    • JR A.

      JR A. - 2023-01-05

      Sorry missed that part..

      Case: Supermicro SuperChassis 847BE1C4-R1K23LPB
      Backplanes: BPN-SAS3-846EL1 & BPN-SAS3-826EL1
      HBA: Supermicro AOC-S3008L-L8e (basicaly an LSI-9300-8i, flashed with IT firmware)

      Each backplane connected via single channel to the HBA (one backplane connected to one channel of the HBA, the other backplane connected to the second channel). Single channel vs dual channel per backplane will reduce by about 20-30% the total throughput, but the max throughput for this card is theoretical 4800MB/s iirc and we are no where near that.

      I will definitely try your suggestion (good one!) and report back (could take a while to do these tests).

      As mentioned, I moved from Unraid with the exact same config (minus the new parity drive) and parity would take under 2 days (I don't remember exactly what it was because it was a while since my last rebuild).

      Appreciate the help!

       
      • David

        David - 2023-01-05

        That's pretty much the same hardware I have although I am using Windows.

         
      • Derek Ciesielski

        I have a pretty much identical setup, although I threw a different mb/cpu in there. Running snapraid on ubuntu 22.04. Same SAS3 backplanes. 20+2 disks.

        I had upgraded from a Dell H310 (LSI 9211-8i) SAS2 HBA to the supermicro card, and saw my sync speeds go in the toilet, just like yours.

        I stumbled across the disk write-cache setting, and noticed all the disks under the new SAS3 HBA had it disabled (hdparm -W).

        Swapping back to the H310 card, hdparm shows all drives have write caching ENABLED.

        Sure enough, force enabling write cache on my drives while on the SAS3 HBA returns my sync speeds to normal.

        I'm not sure why ubuntu treats the drives on the sas3 hba differently. Maybe it thinks they are hotswappable (I suppose they actually are with this chassis) and disables write caching? I never had any previous issues with the SAS2 H310 connected to the SAS3 backplanes.

        I wondered if there are some settings in the LSI bios to play with, but I am having trouble getting into it.

         

        Last edit: Derek Ciesielski 2023-02-27
  • UhClem

    UhClem - 2023-01-05

    (If you're certain that your parity HDD is CMR, then) it's possible/likely that the drive's Write Cache is disabled. Use hdparm (if SATA) or sdparm (SAS) to check (and hopefully fix).

     
    • JR A.

      JR A. - 2023-01-05

      I think you may be onto something! Yes, write cache is disabled on all disks (including parity)! I will test and report back.

       
      • kyle--

        kyle-- - 2023-03-12

        Any news @jrarseneau ?

         
    • JR A.

      JR A. - 2023-01-05

      Edit: duplicate

       

      Last edit: JR A. 2023-01-05

Log in to post a comment.