I'm running 14.04 LTS and snapraid v10. When I try to sync it starts up no issues and get's to about 30% before crashing and rebooting. I'm not sure how long it's been doing this because it just reboots and everything starts up again as if nothing is wrong.
My setup is 2 parity and 4 data all 4tb drives.
Snapraid status shows
SnapRAID status report:
Files Fragmented Excess Wasted Used Free Use Name
Files Fragments GB GB GB
43849 272 426 35.8 1632 2268 41% d1
393969 266 368 175.8 3565 255 93% d2
27970 600 930 92.4 3499 406 89% d3
6238 348 840 91.8 3412 496 87% d4
--------------------------------------------------------------------------
472026 1486 2564 395.8 12109 3426 78%
20%| o
| oo * | ** * | ** * | ** * | ** * | ** * 10%| ** * | ** * |o ** o * |* ** * *
|* o ** * ** o |* * ** * ** o |* * ** o * ** * o
0%|*o*_o**_*oo_oo*_*___________________*__________________*_____________o
52 days ago of the last scrub/sync 0
The oldest block was scrubbed 52 days ago, the median 47, the newest 0.
WARNING! The array is NOT fully synced.
You have a sync in progress at 88%.
The 9% of the array is not scrubbed.
No file has a zero sub-second timestamp.
No rehash is in progress or needed.
No error detected.
I've checked htop and snapraid isn't running a sync so I'm not sure what it's referring to "in progress".
I've done a memtest and 8 passes showed no errors. I've monitored cpu and memory stats while running and nothing out of the ordinary appears there. syslog doesn't flag any errors. I'm using zackreed.me script in crontab every morning, I stopped getting emails but just assumed the email's were broken again rather than a full reboot situation.
Is there anything else I can look for? I just turned on the autosave feature at 250GB to see if I can actually get a full sync done across reboots but this is obviously not ideal.
Thanks in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When the system reboots or crashes, it's clearly a hardware issue. SnapRAID never reboot the system.
Try checking your syslog if there is some hint of the possible issue.
Check also the HD cabling and ensure that your power supply can sustatin all the HD spinning.
Ciao,
Andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I would definitely check the RAM and stress test the rest of the system (CPU temps, PSU is working, clean/test all system fans, etc.). It's unlikely a failing disk would cause a reboot and then come back up clean after a restart.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've since run a memtest in both slots and found no errors. I ran cpuburn and saw no increase in temperatures and experienced no instabilities. The attached error message is the reason I started to suspect a HDD issue. When I looked into that one I found this which was actually solved by changing PSU so I checked the BIOS to see if my 12V rail was showing low voltages but it all appeared normal. (I don't have a spare PSU so I'm trying out other possible issues before commiting to this fix).
At this point I wanted to check all components for physical issues so I pulled everything out and reassembled it all. There was nothing out of order that I could see.
That forum also mentioned a bad cable or port so I replaced the cable and swapped the ports around but that error above kept following the same drive so I ruled that out. I've just dumped all the files from that drive to a spare and am running a sync and so far so good. It used to always crash at the same percentage and with the spare drive it's gotten much further. I'll update here once I know anything more.
UPDATE: Replacing the problematic HDD appears to have resolved my issue. I just performed a full sync with no crashes. Thanks for taking time to help me fix this everyone.
I'm running 14.04 LTS and snapraid v10. When I try to sync it starts up no issues and get's to about 30% before crashing and rebooting. I'm not sure how long it's been doing this because it just reboots and everything starts up again as if nothing is wrong.
My setup is 2 parity and 4 data all 4tb drives.
Snapraid status shows
I've checked htop and snapraid isn't running a sync so I'm not sure what it's referring to "in progress".
I've done a memtest and 8 passes showed no errors. I've monitored cpu and memory stats while running and nothing out of the ordinary appears there. syslog doesn't flag any errors. I'm using zackreed.me script in crontab every morning, I stopped getting emails but just assumed the email's were broken again rather than a full reboot situation.
Is there anything else I can look for? I just turned on the autosave feature at 250GB to see if I can actually get a full sync done across reboots but this is obviously not ideal.
Thanks in advance.
Hi
'You have a sync in progress at 88%' - means a sync was started but not completed not that one is running at that time. Confusing yes got me at first.
Run a snapraid sync with the following
sudo snapraid -v -l /root/snaplog.log sync
If it reboots open the log file and post here the point the reboot happens. This may give Andrea Mazzoleni the information he needs to understand.
Take it all your disks are healthy. Never faced this issue once and also use ubuntu.
Hi Bredan,
When the system reboots or crashes, it's clearly a hardware issue. SnapRAID never reboot the system.
Try checking your syslog if there is some hint of the possible issue.
Check also the HD cabling and ensure that your power supply can sustatin all the HD spinning.
Ciao,
Andrea
Thanks guys. I am currently investigating a Hardware MCE error I found in syslog and believe it to be a failing HDD.
I would definitely check the RAM and stress test the rest of the system (CPU temps, PSU is working, clean/test all system fans, etc.). It's unlikely a failing disk would cause a reboot and then come back up clean after a restart.
I've since run a memtest in both slots and found no errors. I ran cpuburn and saw no increase in temperatures and experienced no instabilities. The attached error message is the reason I started to suspect a HDD issue. When I looked into that one I found this which was actually solved by changing PSU so I checked the BIOS to see if my 12V rail was showing low voltages but it all appeared normal. (I don't have a spare PSU so I'm trying out other possible issues before commiting to this fix).
At this point I wanted to check all components for physical issues so I pulled everything out and reassembled it all. There was nothing out of order that I could see.
That forum also mentioned a bad cable or port so I replaced the cable and swapped the ports around but that error above kept following the same drive so I ruled that out. I've just dumped all the files from that drive to a spare and am running a sync and so far so good. It used to always crash at the same percentage and with the spare drive it's gotten much further. I'll update here once I know anything more.
UPDATE: Replacing the problematic HDD appears to have resolved my issue. I just performed a full sync with no crashes. Thanks for taking time to help me fix this everyone.
Last edit: Brendan Wilson 2016-07-10