From: Jeremy J. <jb...@fo...> - 2006-04-29 17:40:51
|
Johan Braeken wrote: > In the output below you can clearly see these tests did fail and there are > several errors. > My question is why the "overall-health self-assessment test result" is OK when > there seem to be problems with the disk? The usual response to this is that Smartctl is not responsible for interpreting from the data from your disk. It is simply giving you the data that your disk reports, which in this case claims that the self-assessment test passed. > === START OF INFORMATION SECTION === > Device Model: Maxtor 6Y080L0 From experience (approx 10 failed disks out of a batch of 30), I wouldn't use Maxtor disks for important situations. If you have to get them - get the Heavy Duty models, and make sure they've got a 5 year warranty on them. I tend to rely on Seagates these days, but too early to know how the failure rate compares. If you want to make use of smartmontools, you need to intepret the data for your own use. The following should be flagged up by any script running on the output. > Self-test execution status: ( 118) The previous self-test completed > having > the read element of the test failed. Read failures during self-tests aren't good. > 5 Reallocated_Sector_Ct 0x0033 253 241 063 Pre-fail Always > - 4 This could indicate that there may be problems lurking. Maxtor have replaced disks for me in the past with any reallocated sectors. > 196 Reallocated_Event_Count 0x0008 135 135 000 Old_age Offline > - 118 Ditto - this is quite a high number of reallocations that have been done. Best if it's still at zero... > 198 Offline_Uncorrectable 0x0008 252 126 000 Old_age Offline > - 1 Bad - you've probably lost data here. > ATA Error Count: 141 (device log contains only the most recent five errors) Events are generally bad news, but check in case they are benign ones (CRC fails caused by dodgy cabling etc). > Num Test_Description Status Remaining LifeTime(hours) > LBA_of_first_error > # 1 Short offline Completed: read failure 60% 20861 > 34209730 Read failures not good. Be weary of any disks that have one of these. So, ultimately you would be better off getting Nagios to check the output of a script that pretty much fails if the return code from smartctl is non-zero (which it will be if there are any errors in the event or self-test log). I use something like [1] that mails me from a cronjob, but something similar could be done to interface with Nagios or any other checking software. Best wishes, Jeremy [1] http://jeremy.publication.org.uk/checkdisks (python script, quite old, but should be obvious what's going on) |