Thread: [smartmontools-support] Help Understanding smartctl Output

Disk Inspection and Monitoring

Brought to you by: ballen4705, chrfranke, dipohl

smartmontools-support

[smartmontools-support] Help Understanding smartctl Output

From: Tim G. <tj...@so...> - 2012-08-31 20:51:29

Attachments: smartctl.txt

Hi,

I got an e-mail from smartd telling me that one of my drive has some
issues.  I logged into the server and ran:

smartctl -a /dev/ada30

and got a ton of output, including 5 "error" sections.  But, I can't
tell how severe these errors are.  I'm including the output below.
Can someone please help me interpret this output?


-- 

Tim Gustafson
tj...@so...
831-459-5354
Baskin Engineering, Room 313A

Re: [smartmontools-support] Help Understanding smartctl Output

From: Gabriele P. <gp...@di...> - 2012-08-31 22:46:37

Tim,

On 08/31/2012 10:43 PM, Tim Gustafson wrote:
>
> smartctl -a /dev/ada30
> 
> and got a ton of output, including 5 "error" sections.  But, I can't
> tell how severe these errors are.  I'm including the output below.
> Can someone please help me interpret this output?

health check result is "passed" and
short test  completed without error,
so don't worry ~

Some more hints..

http://sourceforge.net/apps/trac/smartmontools/wiki/Howto_ReadSmartctlReports_ATA_new

HTH
Gabriele

Re: [smartmontools-support] Help Understanding smartctl Output

From: Tim G. <tj...@so...> - 2012-08-31 22:52:42

> health check result is "passed" and
> short test  completed without error,
> so don't worry ~
>
> Some more hints..
>
> http://sourceforge.net/apps/trac/smartmontools/wiki/Howto_ReadSmartctlReports_ATA_new

Thanks!  I appreciate your taking the time, and I'll read up on that document.

-- 

Tim Gustafson
tj...@so...
831-459-5354
Baskin Engineering, Room 313A

Re: [smartmontools-support] Help Understanding smartctl Output

From: Dan L. <da...@ob...> - 2012-09-01 07:20:20

On 09/01/12 00:52, Tim Gustafson:
>> health check result is "passed" and
>> short test  completed without error,
>> so don't worry ~

Well, I will offer somewhat interpretation.

Attribute [5] show that there has been three unreadable sectors in the 
past (already solved by relocation). And they are not "manufacturing 
errors" - latest UNC sector has been found at lifetime 5353h

I can compare it with my own FreeBSD with Hitachi (althougth different 
model), which has no relocated sector nor recorded errors at lifetime 13672

Tim's disc encountered several problems in the past, as recorded in 
error log ant attributes. I'm not speaking about "time to panic" in ant 
way, but unless they has been caused by known event (fall, brownout or 
so) he should take a dim view of it's disc.

Of course, interpretation of output depends not only on values shown, 
but also on overall life experience and "paranoia level" of administrator.

In short - facts are same, but I'm not as credulous as Gabrielle during 
it's interpretation. You need to make your own decision, Tim ...


Dan

Re: [smartmontools-support] Help Understanding smartctl Output

From: Gabriele P. <gp...@di...> - 2012-09-01 12:06:34

On 09/01/2012 08:43 AM, Dan Lukes wrote:
> On 09/01/12 00:52, Tim Gustafson:
>>> health check result is "passed" and
>>> short test  completed without error,
>>> so don't worry ~

> Well, I will offer somewhat interpretation.

thanks!

> Attribute [5] show that there has been three unreadable sectors in the
> past (already solved by relocation). And they are not "manufacturing
> errors" - latest UNC sector has been found at lifetime 5353h
> 
> unless they has been caused by known event (fall, brownout or
> so) he should take a dim view of it's disc.

About getting a deeper view of the disks condition:

As you have only run a short test I recommend
to start a long test to check the whole disk.

"Auto Offline Data Collection is Enabled" but
"Offline data collection activity was suspended
by an interrupting command from host"

It announces a very long "Total time to complete
Offline data collection" of 37566 seconds ~ 10.5 hours,
which can also be last longer if the disk is in heavy use.

Is it possible to umount the disk to check in captive mode?

> In short - facts are same, but I'm not as credulous as Gabrielle during
> it's interpretation.

It was intended as first entry of a discussion.
Thanks for picking it up! :-)

> You need to make your own decision, Tim ...

I would like to see some show cases about exploring
the disks condition in the wiki. That would be for sure
helpful for many other smartmontools users.

Tell here about your next steps and results if you like
https://sourceforge.net/apps/trac/smartmontools/wiki/Howto_ReadSmartctlReports_ATA_542.1

You need to be logged in at sourceforge to edit the page.

All the best and cheers!

Gabriele

Re: [smartmontools-support] Help Understanding smartctl Output

From: Dan L. <da...@ob...> - 2012-09-01 15:40:05

On 09/01/12 13:54, Gabriele Pohl:
> to start a long test to check the whole disk.

> It announces a very long "Total time to complete
> Offline data collection" of 37566 seconds ~ 10.5 hours,
> which can also be last longer if the disk is in heavy use.
>
> Is it possible to umount the disk to check in captive mode?

But not on FreeBSD. It have no concept of "unlimited time to wait for 
result". A device driver require the device will respond to command in 
time. The timeout is not configurable from application level.

Test in captive mode results in timeout and detach of device from 
system. Attempt to reattach may trigger reset of device, so test will be 
interrupted.

As there is no big difference in duration of test in captive mode and 
test in standard mode (on idle disk) the "timeout problem" is not big 
issue - just don't use captive mode on FreeBSD.

> I would like to see some show cases about exploring
> the disks condition in the wiki. That would be for sure
> helpful for many other smartmontools users.

I'm in doubt somewhat. Analysis of data from one disk model has very 
limited applicability for other disk model. Even worse when from 
different vendors.

In advance, one-shot data have very limited usability at all. It's 
progress during longer time is substantial for

> Tell here about your next steps and results if you like
> https://sourceforge.net/apps/trac/smartmontools/wiki/Howto_ReadSmartctlReports_ATA_542.1
>
> You need to be logged in at sourceforge to edit the page.

Unfortunately, my English is not good for public page.

> SMART overall-health state
> ..missing an explanation..

It is disk's own interpretation of it's health state.

The most common algorithm seen is - if no cooked value of "pre-failure 
warning" type attribute is bellow threshold, then overall state is "PASS".

But true algorithm is vendor specific. It's just vendor's opinion about 
overall health state of disk.

Dan

Re: [smartmontools-support] Help Understanding smartctl Output

From: Tim G. <tj...@so...> - 2012-09-04 16:06:55

> As there is no big difference in duration of test in captive mode and
> test in standard mode (on idle disk) the "timeout problem" is not big
> issue - just don't use captive mode on FreeBSD.

The disk is not idle, and bringing it out of service is a bit
problematic because it's a member drive of a 45-disc Zpool, so I'd
have to bring the whole Zpool off-line to test it in any sort of
offline or idle state.  It sounds like you're saying that I can run
the test while the drive is "hot", but the test might take longer.
I'm OK with that.  But, I'm not clear on what test you want me to run.
 Would it be possible for you to send me the command line you'd like
to see the output of?

For what its worth, this Zpool ran a "zpool scrub" this past weekend
without any errors, so I'm fairly confident that this may have been a
one-time thing.  I don't know when the failure started being reported
because I didn't configure the box originally, and it wasn't until the
other day that I reconfigured it to e-mail me notifications.  Before
that, it was only reporting bits to /var/log/message and I only have a
week or so worth of those lying around at any given time.  The box has
been on-line for about a year, but we're just getting around to
storing data on it now, so it sat mostly idle for 9 months or so.

-- 

Tim Gustafson
tj...@so...
831-459-5354
Baskin Engineering, Room 313A

Re: [smartmontools-support] Help Understanding smartctl Output

From: Dan L. <da...@ob...> - 2012-09-05 15:51:52

Tim Gustafson wrote:
> It sounds like you're saying that I can run
> the test while the drive is "hot", but the test might take longer.

Exactly. Also, running test can affect running system speed slightly.

> But, I'm not clear on what test you want me to run.

-t long

Dan