Re: [Smartmontools-support]smartmontools failure reporting problem

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Volker,

> Smartmontools has some problems reporting the failed status of this
> disk.

Please remember that Smartmontools is only *reporting* what the disk has
decided.  It's not making these judgements itself.

> The disk isn't making it easy by the looks of it, but
> smartmontools needs to IMHO pick up on failed selftests.
> The disk is throwing read-errors left right and center, filesystem is
> corrupted, and some data is lost while other data is still retrievable,
> and that s**t-disk is still reporting
>   SMART overall-health self-assessment test result: PASSED
> Yeaaaa... right. I don't think so.

In fact this is "correct" in the following sense:  The firmware is
reporting that there is nothing intrinsically wrong with the disk.  Eg the
servo system is not failing, the motor drive is not failing, etc.

What is going wrong is that your disk has a set of bad sectors:

>   5 Reallocated_Sector_Ct   0x0033   212   212   063    Pre-fail     -       105

which is very common on a good normal disk.  However there is (either 1 or
16) sectors:

> 197 Current_Pending_Sector  0x0008   237   237   000    Old_age      -       16
> 198 Offline_Uncorrectable   0x0008   252   252   000    Old_age      -       1

which can not be read.  In other words, the disk has lost the data on
these 1 or 16 sectors.  However there is nothing "wrong" with the disk --
or at least the firmware thinks not.

> SMART Self-test log, version number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Short off-line      Completed: read failure       60%      2104         0x00007d94
> # 2  Extended off-line   Completed: read failure       40%      2104         0x00007d94
> # 3  Extended off-line   Completed: read failure       40%      2103         0x00007d94
> # 4  Short off-line      Completed: read failure       60%      2103         0x00007d94
> # 5  Short off-line      Completed: read failure       60%      2103         0x00007d94

This LBA 0x00007d94 is where the data has been lost.

> Ok, let's see whether smartd would actually ring the alarm bells:

It will, if you run a self-test while smartd is running in the background.  
When the self-test finds the error, you'll get a report.

> # running smartctl -t short here, which fails after 60% with read error
> #
> Sep 19 20:21:37 Rescue smartd[746]: Signal USR1 - checking devices now rather than in 929 seconds. 
> Sep 19 20:21:37 Rescue smartd[746]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 251 to 250 
> Sep 19 20:21:37 Rescue smartd[746]: Device: /dev/hda, SMART Usage Attribute: 209 Unknown_Attribute changed from 193 to 192 
> Sep 19 20:21:37 Rescue smartd[746]: Device: /dev/hda, Self-Test Log error count increased from 5 to 6 
> 
> Not really. smartd doesn't tell me that this disk is essentially already
> dead. Especially, it should pick up on

In fact the disk is not "dead", in the sense that if it can be told to
reallocate the bad sectors, it should work OK again.

Please remember that this is not *my* choice of logic -- it's only what
the disk firmware has decided to do.

You should try running the Maxtor MaxSafe utility -- it may be able to
repair the disk.  You should also be able to use some file system recover
tools to determine what file(s) live at the LBA above.

>   Self-test execution status:    ( 118)	The previous self-test completed having
> 					the read element of the test failed.
> 
> This leaves me with a question: smartd doesn't run any self-tests. Am I
> supposed to set up cron jobs for that? It would be more sensible for
> smartd to take care of it.

Correct -- smartd does NOT run self-tests.  I will probably add an option
to it to run short/long self-tests at regular intervals.

Cheers,
	Bruce

Re: [Smartmontools-support]smartmontools failure reporting problem

Disk Inspection and Monitoring

Re: [Smartmontools-support]smartmontools failure reporting problem