|
From: Franc Z. <fz...@in...> - 2010-04-21 22:58:50
|
The POH attribute value has increased by 146 hours for the Hitachi and Samsung drives, and by 145 hours for the WD. This suggests that the WD is keeping time correctly. So I have no explanation for the discrepancy in the lifetime totals. :-( You say that "I don't understand the POH counter and why you calculated it to be around 4½ years, whereas it should be only < 6 months". In fact, you misunderstand me. The POH counter is an age related attribute. Its normalised value begins at 100 and is decremented as the drive grows older. When the value hits the threshold, then the drive is considered have reached the end of its rated life. In my calculations I determined, based on current trends, how many hours would be required for the POH attribute to reach the threshold. In other words, the results of my calculations represent the rated lifetime of the drives, not their current age. I merely presented these calculations to see whether the SMART data, as interpreted by us, produced sensible lifetime predictions, which they do. A drive will unload its heads onto a loading ramp after a certain period of idle time. This could be as a result of an ATA command from the OS (after a power management timeout), or a command from a bridge chip in an external enclosure, or it could be due to the drive's own internal APM setting. The following document describes the technology. Ramp Load/Unload Technology in Hard Disk Drives: http://www.hitachigst.com/tech/techlib.nsf/techdocs/9076679E3EE4003E86256FAB005825FB/$file/LoadUnload_white_paper_FINAL.pdf The rated load/unload cycle counts for the Hitachi and WD drives appear to be 1 million and 600K, respectively. According to Samsung's datasheet, its drive is also rated for 600K "controlled ramp load/unload cycles": http://www.samsung.com/global/business/hdd/pr/brochures/downloads/spinpoint_m7.pdf I'm not [yet] a Linux user, but Tim Small has identified the above problem, and pointed you to a solution: https://ata.wiki.kernel.org/index.php/Known_issues#Drives_which_perform_frequent_head_unloads_under_Linux As for your question re attribute names, not all HDD manufacturers use the same attribute number for the same attribute type. See the following article for an explanation: http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes For example, see attributes 09, 200, and 240. The meanings of each attribute, and the ways in which the raw and normalised values are calculated, vary between manufacturers, and often between models from the same manufacturer. That is, there is no real standard for SMART reporting. I'm not certain, but your Seagate Momentus 5400.6 ST9500325AS drive appears to have "failed" because of a single Reported_Uncorrectable error. If this sector is consistently bad, then it should eventually be reallocated. Once again the Load_Cycle_Count (706769) has exceeded the rated value (600K), but this is not the reason for the failure. Both the Raw_Read_Error_Rate and Seek_Error_Rate attributes are good. However, they appear counterintuitive. For example, your seek error rate is actually 0 errors in 0x015cc6dc seeks, ie a perfect score. The normalised value is calculated as follows: normalised SER = -10 log(total lifetime errors / total lifetime seeks) If errors = 0, then let errors = 1. So, using Google's calculator ... http://www.google.com/search?q=-10+x+log%281+%2F+0x015cc6dc%29 ... we have ... SER = -10 x log(1 / 0x015cc6dc) = 73.6 IIRC, a new drive begins life with an SER value of 253. At this time the SER data are statistically insignificant. For example, what can you say about a drive that fails its first seek, or, for that matter, is successful on its first seek? It is only when the drive has recorded 1 million seeks that the SER becomes meaningful. A normalised value of 60 (the worst case) corresponds to 0 errors in 1 million seeks. As for the RRER, I have been informed by a data recovery specialist that, at least for some Seagate drives, the relationship is as follows: Raw Error Rate = 10 * log10(NumberOfSectorsTransferredToOrFromHost * 512 * 8 / (Number of sectors requiring retries)) where the factor of 512*8 is used to convert from sectors to bits. The attribute value is only computed when the number of bits in the "transferred bits" count is in the range 1010 to 1012. I have attempted to test the above relationship with the threshold values from several Seagate models, but the numbers don't make sense. Another reason for the "failure" of your Seagate drive may be that the NAS timed out while waiting for the drive to recover from a read error. RAID controllers will drop a drive from the array for similar reasons. Many RAIDs allow a maximum of 7 seconds for a drive to complete a read/write request. However, some drives can take up to 2 minutes to retry a bad sector. To prevent a drive from being dropped from the array, some models incorporate ERC (Error Recovery Control, an ATA standard, used by Seagate), or TLER (Time Limited Error Recovery, Western Digital), or CCTL (Command Completion Time Limit, Samsung). These commands define the time limits for the drive to complete a read or write command. -Franc |