Re: [smartmontools-support]HD failure not detected.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Johan Braeken wrote:
> In the output below you can clearly see these tests did fail and there are 
> several errors.
> My question is why the "overall-health self-assessment test result" is OK when 
> there seem to be problems with the disk?

The usual response to this is that Smartctl is not responsible for 
interpreting from the data from your disk. It is simply giving you the 
data that your disk reports, which in this case claims that the 
self-assessment test passed.

> === START OF INFORMATION SECTION ===
> Device Model:     Maxtor 6Y080L0

 From experience (approx 10 failed disks out of a batch of 30), I 
wouldn't use Maxtor disks for important situations. If you have to get 
them - get the Heavy Duty models, and make sure they've got a 5 year 
warranty on them. I tend to rely on Seagates these days, but too early 
to know how the failure rate compares.

If you want to make use of smartmontools, you need to intepret the data 
for your own use. The following should be flagged up by any script 
running on the output.

> Self-test execution status:      ( 118) The previous self-test completed 
> having
>                                         the read element of the test failed.

Read failures during self-tests aren't good.

>   5 Reallocated_Sector_Ct   0x0033   253   241   063    Pre-fail  Always       
> -       4

This could indicate that there may be problems lurking. Maxtor have 
replaced disks for me in the past with any reallocated sectors.

> 196 Reallocated_Event_Count 0x0008   135   135   000    Old_age   Offline      
> -       118

Ditto - this is quite a high number of reallocations that have been 
done. Best if it's still at zero...

> 198 Offline_Uncorrectable   0x0008   252   126   000    Old_age   Offline      
> -       1

Bad - you've probably lost data here.

> ATA Error Count: 141 (device log contains only the most recent five errors)

Events are generally bad news, but check in case they are benign ones 
(CRC fails caused by dodgy cabling etc).

> Num  Test_Description    Status                  Remaining  LifeTime(hours)  
> LBA_of_first_error
> # 1  Short offline       Completed: read failure       60%     20861         
> 34209730

Read failures not good. Be weary of any disks that have one of these.

So, ultimately you would be better off getting Nagios to check the 
output of a script that pretty much fails if the return code from 
smartctl is non-zero (which it will be if there are any errors in the 
event or self-test log). I use something like [1] that mails me from a 
cronjob, but something similar could be done to interface with Nagios or 
any other checking software.

Best wishes,
Jeremy

[1] http://jeremy.publication.org.uk/checkdisks (python script, quite 
old, but should be obvious what's going on)

Re: [smartmontools-support]HD failure not detected.

Disk Inspection and Monitoring

Re: [smartmontools-support]HD failure not detected.