Re: [smartmontools-support] raw values help required

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The POH attribute value has increased by 146 
hours for the Hitachi and Samsung drives, and by 
145 hours for the WD. This suggests that the WD 
is keeping time correctly. So I have no 
explanation for the discrepancy in the lifetime totals. :-(

You say that "I don't understand the POH counter 
and why you calculated it to be around
4½ years, whereas it should be only < 6 months". In fact, you misunderstand me.

The POH counter is an age related attribute. Its 
normalised value begins at 100 and is decremented 
as the drive grows older. When the value hits the 
threshold, then the drive is considered have 
reached the end of its rated life. In my 
calculations I determined, based on current 
trends, how many hours would be required for the 
POH attribute to reach the threshold. In other 
words, the results of my calculations represent 
the rated lifetime of the drives, not their 
current age. I merely presented these 
calculations to see whether the SMART data, as 
interpreted by us, produced sensible lifetime predictions, which they do.

A drive will unload its heads onto a loading ramp 
after a certain period of idle time. This could 
be as a result of an ATA command from the OS 
(after a power management timeout), or a command 
from a bridge chip in an external enclosure, or 
it could be due to the drive's own internal APM 
setting. The following document describes the technology.

Ramp Load/Unload Technology in Hard Disk Drives:

http://www.hitachigst.com/tech/techlib.nsf/techdocs/9076679E3EE4003E86256FAB005825FB/$file/LoadUnload_white_paper_FINAL.pdf

The rated load/unload cycle counts for the 
Hitachi and WD drives appear to be 1 million and 600K, respectively.

According to Samsung's datasheet, its drive is 
also rated for 600K "controlled ramp load/unload cycles":

http://www.samsung.com/global/business/hdd/pr/brochures/downloads/spinpoint_m7.pdf

I'm not [yet] a Linux user, but Tim Small has 
identified the above problem, and pointed you to a solution:

https://ata.wiki.kernel.org/index.php/Known_issues#Drives_which_perform_frequent_head_unloads_under_Linux

As for your question re attribute names, not all 
HDD manufacturers use the same attribute number 
for the same attribute type. See the following article for an explanation:

http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

For example, see attributes 09, 200, and 240.

The meanings of each attribute, and the ways in 
which the raw and normalised values are 
calculated, vary between manufacturers, and often 
between models from the same manufacturer. That 
is, there is no real standard for SMART reporting.

I'm not certain, but your Seagate Momentus 5400.6 
ST9500325AS drive appears to have "failed" 
because of a single Reported_Uncorrectable error. 
If this sector is consistently bad, then it should eventually be reallocated.

Once again the Load_Cycle_Count (706769) has 
exceeded the rated value (600K), but this is not the reason for the failure.

Both the Raw_Read_Error_Rate and Seek_Error_Rate 
attributes are good. However, they appear counterintuitive.

For example, your seek error rate is actually 0 
errors in 0x015cc6dc seeks, ie a perfect score.

The normalised value is calculated as follows:

normalised SER = -10 log(total lifetime errors / total lifetime seeks)

If errors = 0, then let errors = 1.

So, using Google's calculator ...

http://www.google.com/search?q=-10+x+log%281+%2F+0x015cc6dc%29

... we have ...

SER = -10 x log(1 / 0x015cc6dc) = 73.6

IIRC, a new drive begins life with an SER value 
of 253. At this time the SER data are 
statistically insignificant. For example, what 
can you say about a drive that fails its first 
seek, or, for that matter, is successful on its 
first seek? It is only when the drive has 
recorded 1 million seeks that the SER becomes 
meaningful. A normalised value of 60 (the worst 
case) corresponds to 0 errors in 1 million seeks.

As for the RRER, I have been informed by a data 
recovery specialist that, at least for some 
Seagate drives, the relationship is as follows:

Raw Error Rate = 10 * 
log10(NumberOfSectorsTransferredToOrFromHost * 
512 * 8 / (Number of sectors requiring retries))

where the factor of 512*8 is used to convert from 
sectors to bits. The attribute value is only 
computed when the number of bits in the 
"transferred bits" count is in the range 1010 to 1012.

I have attempted to test the above relationship 
with the threshold values from several Seagate 
models, but the numbers don't make sense.

Another reason for the "failure" of your Seagate 
drive may be that the NAS timed out while waiting 
for the drive to recover from a read error. RAID 
controllers will drop a drive from the array for 
similar reasons. Many RAIDs allow a maximum of 7 
seconds for a drive to complete a read/write 
request. However, some drives can take up to 2 
minutes to retry a bad sector. To prevent a drive 
from being dropped from the array, some models 
incorporate ERC (Error Recovery Control, an ATA 
standard, used by Seagate), or TLER (Time Limited 
Error Recovery, Western Digital), or CCTL 
(Command Completion Time Limit, Samsung). These 
commands define the time limits for the drive to 
complete a read or write command.

-Franc

Re: [smartmontools-support] raw values  help required

Disk Inspection and Monitoring