Menu

Ubuntu server crashing/rebooting during sync

Help
2016-06-26
2016-07-10
  • Brendan Wilson

    Brendan Wilson - 2016-06-26

    I'm running 14.04 LTS and snapraid v10. When I try to sync it starts up no issues and get's to about 30% before crashing and rebooting. I'm not sure how long it's been doing this because it just reboots and everything starts up again as if nothing is wrong.

    My setup is 2 parity and 4 data all 4tb drives.

    Snapraid status shows

    SnapRAID status report:
    
       Files Fragmented Excess  Wasted  Used    Free  Use Name
                Files  Fragments  GB      GB      GB
       43849     272     426    35.8    1632    2268  41% d1
      393969     266     368   175.8    3565     255  93% d2
       27970     600     930    92.4    3499     406  89% d3
        6238     348     840    91.8    3412     496  87% d4
     --------------------------------------------------------------------------
      472026    1486    2564   395.8   12109    3426  78%
    
    
     20%|                o
        |     oo         *
        |     **         *
        |     **         *
        |     **         *
        |     **         *
        |     **         *
     10%|     **         *
        |     **         *
        |o    **       o *
        |*    **       * *
        |* o  **       * *                   *                                o
        |* *  **       * *                   *                                o
        |* *  ** o     * *                   *                  *             o
      0%|*o*_o**_*oo_oo*_*___________________*__________________*_____________o
        52                    days ago of the last scrub/sync                 0
    
    The oldest block was scrubbed 52 days ago, the median 47, the newest 0.
    
    WARNING! The array is NOT fully synced.
    You have a sync in progress at 88%.
    The 9% of the array is not scrubbed.
    No file has a zero sub-second timestamp.
    No rehash is in progress or needed.
    No error detected.
    

    I've checked htop and snapraid isn't running a sync so I'm not sure what it's referring to "in progress".

    I've done a memtest and 8 passes showed no errors. I've monitored cpu and memory stats while running and nothing out of the ordinary appears there. syslog doesn't flag any errors. I'm using zackreed.me script in crontab every morning, I stopped getting emails but just assumed the email's were broken again rather than a full reboot situation.

    Is there anything else I can look for? I just turned on the autosave feature at 250GB to see if I can actually get a full sync done across reboots but this is obviously not ideal.

    Thanks in advance.

     
  • Steve Miles

    Steve Miles - 2016-06-27

    Hi

    'You have a sync in progress at 88%' - means a sync was started but not completed not that one is running at that time. Confusing yes got me at first.

    Run a snapraid sync with the following

    sudo snapraid -v -l /root/snaplog.log sync

    If it reboots open the log file and post here the point the reboot happens. This may give Andrea Mazzoleni the information he needs to understand.

    Take it all your disks are healthy. Never faced this issue once and also use ubuntu.

     
  • Andrea Mazzoleni

    Hi Bredan,

    When the system reboots or crashes, it's clearly a hardware issue. SnapRAID never reboot the system.

    Try checking your syslog if there is some hint of the possible issue.
    Check also the HD cabling and ensure that your power supply can sustatin all the HD spinning.

    Ciao,
    Andrea

     
  • Brendan Wilson

    Brendan Wilson - 2016-07-07

    Thanks guys. I am currently investigating a Hardware MCE error I found in syslog and believe it to be a failing HDD.

     
    • rubylaser

      rubylaser - 2016-07-07

      I would definitely check the RAM and stress test the rest of the system (CPU temps, PSU is working, clean/test all system fans, etc.). It's unlikely a failing disk would cause a reboot and then come back up clean after a restart.

       
      • Brendan Wilson

        Brendan Wilson - 2016-07-10

        I've since run a memtest in both slots and found no errors. I ran cpuburn and saw no increase in temperatures and experienced no instabilities. The attached error message is the reason I started to suspect a HDD issue. When I looked into that one I found this which was actually solved by changing PSU so I checked the BIOS to see if my 12V rail was showing low voltages but it all appeared normal. (I don't have a spare PSU so I'm trying out other possible issues before commiting to this fix).

        At this point I wanted to check all components for physical issues so I pulled everything out and reassembled it all. There was nothing out of order that I could see.

        That forum also mentioned a bad cable or port so I replaced the cable and swapped the ports around but that error above kept following the same drive so I ruled that out. I've just dumped all the files from that drive to a spare and am running a sync and so far so good. It used to always crash at the same percentage and with the spare drive it's gotten much further. I'll update here once I know anything more.

        UPDATE: Replacing the problematic HDD appears to have resolved my issue. I just performed a full sync with no crashes. Thanks for taking time to help me fix this everyone.

         

        Last edit: Brendan Wilson 2016-07-10

Log in to post a comment.

MongoDB Logo MongoDB