From: Christopher W. <sma...@th...> - 2003-02-22 07:50:21
|
c21hcnRjdGwgdmVyc2lvbiA1LjEtNCBDb3B5cmlnaHQgKEMpIDIwMDIgQnJ1Y2UgQWxsZW4KSG9t ZSBwYWdlIGlzIGh0dHA6Ly9zbWFydG1vbnRvb2xzLnNvdXJjZWZvcmdlLm5ldC8KCj09PSBTVEFS VCBPRiBJTkZPUk1BVElPTiBTRUNUSU9OID09PQpEZXZpY2UgTW9kZWw6ICAgICBTVDM0MDAxNkEg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgClNlcmlhbCBOdW1iZXI6ICAgIDNIUzA0MDhM ICAgICAgICAgICAgCkZpcm13YXJlIFZlcnNpb246IDMuMDUgICAgCkFUQSBWZXJzaW9uIGlzOiAg IDUKQVRBIFN0YW5kYXJkIGlzOiAgVW5yZWNvZ25pemVkLiBNaW5vciByZXZpc2lvbiBjb2RlOiAw eDAwCkxvY2FsIFRpbWUgaXM6ICAgIFNhdCBGZWIgMjIgMDE6Mjk6MjAgMjAwMyBDU1QKU01BUlQg c3VwcG9ydCBpczogQXZhaWxhYmxlIC0gZGV2aWNlIGhhcyBTTUFSVCBjYXBhYmlsaXR5LgpTTUFS VCBzdXBwb3J0IGlzOiBFbmFibGVkCgo9PT0gU1RBUlQgT0YgUkVBRCBTTUFSVCBEQVRBIFNFQ1RJ T04gPT09ClNNQVJUIG92ZXJhbGwtaGVhbHRoIHNlbGYtYXNzZXNzbWVudCB0ZXN0IHJlc3VsdDog UEFTU0VECgpHZW5lcmFsIFNNQVJUIFZhbHVlczoKT2ZmLWxpbmUgZGF0YSBjb2xsZWN0aW9uIHN0 YXR1czogKDB4MDIpCU9mZmxpbmUgZGF0YSBjb2xsZWN0aW9uIGFjdGl2aXR5IAoJCQkJCWNvbXBs ZXRlZCB3aXRob3V0IGVycm9yLgpTZWxmLXRlc3QgZXhlY3V0aW9uIHN0YXR1czogICAgICAoICA0 OCkJQSBmYXRhbCBlcnJvciBvciB1bmtub3duIHRlc3QgZXJyb3IKCQkJCQlvY2N1cnJlZCB3aGls ZSB0aGUgZGV2aWNlIHdhcyBleGVjdXRpbmcKCQkJCQlpdHMgc2VsZi10ZXN0IHJvdXRpbmUgYW5k IHRoZSBkZXZpY2UgCgkJCQkJd2FzIHVuYWJsZSB0byBjb21wbGV0ZSB0aGUgc2VsZi10ZXN0IAoJ CQkJCXJvdXRpbmUuClRvdGFsIHRpbWUgdG8gY29tcGxldGUgb2ZmLWxpbmUgCmRhdGEgY29sbGVj dGlvbjogCQkgKCA0MjIpIHNlY29uZHMuCk9mZmxpbmUgZGF0YSBjb2xsZWN0aW9uCmNhcGFiaWxp dGllczogCQkJICgweDFiKSBTTUFSVCBleGVjdXRlIE9mZmxpbmUgaW1tZWRpYXRlLgoJCQkJCUF1 dG9tYXRpYyB0aW1lciBPTi9PRkYgc3VwcG9ydC4KCQkJCQlTdXNwZW5kIE9mZmxpbmUgY29sbGVj dGlvbiB1cG9uIG5ldwoJCQkJCWNvbW1hbmQuCgkJCQkJT2ZmbGluZSBzdXJmYWNlIHNjYW4gc3Vw cG9ydGVkLgoJCQkJCVNlbGYtdGVzdCBzdXBwb3J0ZWQuClNNQVJUIGNhcGFiaWxpdGllczogICAg ICAgICAgICAoMHgwMDAzKQlTYXZlcyBTTUFSVCBkYXRhIGJlZm9yZSBlbnRlcmluZwoJCQkJCXBv d2VyLXNhdmluZyBtb2RlLgoJCQkJCVN1cHBvcnRzIFNNQVJUIGF1dG8gc2F2ZSB0aW1lci4KRXJy b3IgbG9nZ2luZyBjYXBhYmlsaXR5OiAgICAgICAgKDB4MDEpCUVycm9yIGxvZ2dpbmcgc3VwcG9y dGVkLgpTaG9ydCBzZWxmLXRlc3Qgcm91dGluZSAKcmVjb21tZW5kZWQgcG9sbGluZyB0aW1lOiAJ ICggICAxKSBtaW51dGVzLgpFeHRlbmRlZCBzZWxmLXRlc3Qgcm91dGluZSAKcmVjb21tZW5kZWQg cG9sbGluZyB0aW1lOiAJICggIDMxKSBtaW51dGVzLgoKU01BUlQgQXR0cmlidXRlcyBEYXRhIFN0 cnVjdHVyZSByZXZpc2lvbiBudW1iZXI6IDEwClZlbmRvciBTcGVjaWZpYyBTTUFSVCBBdHRyaWJ1 dGVzIHdpdGggVGhyZXNob2xkczoKSUQjIEFUVFJJQlVURV9OQU1FICAgICAgICAgIEZMQUcgICAg IFZBTFVFIFdPUlNUIFRIUkVTSCBUWVBFICAgICBXSEVOX0ZBSUxFRCBSQVdfVkFMVUUKICAxIFJh d19SZWFkX0Vycm9yX1JhdGUgICAgIDB4MDAwZiAgIDA2OCAgIDA1NiAgIDAzNCAgICBQcmUtZmFp bCAgICAgLSAgICAgICAxMzk0Njk3NzkKICAzIFNwaW5fVXBfVGltZSAgICAgICAgICAgIDB4MDAw MyAgIDA3MiAgIDA3MCAgIDAwMCAgICBQcmUtZmFpbCAgICAgLSAgICAgICAwCiAgNCBTdGFydF9T dG9wX0NvdW50ICAgICAgICAweDAwMzIgICAwOTIgICAwOTIgICAwMjAgICAgT2xkX2FnZSAgICAg IC0gICAgICAgODMzMwogIDUgUmVhbGxvY2F0ZWRfU2VjdG9yX0N0ICAgMHgwMDMzICAgMTAwICAg MTAwICAgMDM2ICAgIFByZS1mYWlsICAgICAtICAgICAgIDAKICA3IFNlZWtfRXJyb3JfUmF0ZSAg ICAgICAgIDB4MDAwZiAgIDA4MSAgIDA2MCAgIDAzMCAgICBQcmUtZmFpbCAgICAgLSAgICAgICAx NDY2MzA3NzYKICA5IFBvd2VyX09uX0hvdXJzICAgICAgICAgIDB4MDAzMiAgIDA5MyAgIDA5MyAg IDAwMCAgICBPbGRfYWdlICAgICAgLSAgICAgICA2NTQ0CiAxMCBTcGluX1JldHJ5X0NvdW50ICAg ICAgICAweDAwMTMgICAxMDAgICAxMDAgICAwOTcgICAgUHJlLWZhaWwgICAgIC0gICAgICAgMAog MTIgUG93ZXJfQ3ljbGVfQ291bnQgICAgICAgMHgwMDMyICAgMTAwICAgMTAwICAgMDIwICAgIE9s ZF9hZ2UgICAgICAtICAgICAgIDIyCjE5NCBUZW1wZXJhdHVyZV9DZWxzaXVzICAgICAweDAwMjIg ICAwNDUgICAwNTYgICAwMDAgICAgT2xkX2FnZSAgICAgIC0gICAgICAgNDUKMTk1IEhhcmR3YXJl X0VDQ19SZWNvdmVyZWQgIDB4MDAxYSAgIDA2OCAgIDA1NiAgIDAwMCAgICBPbGRfYWdlICAgICAg LSAgICAgICAxMzk0Njk3NzkKMTk3IEN1cnJlbnRfUGVuZGluZ19TZWN0b3IgIDB4MDAxMiAgIDEw MCAgIDEwMCAgIDAwMCAgICBPbGRfYWdlICAgICAgLSAgICAgICAwCjE5OCBPZmZsaW5lX1VuY29y cmVjdGFibGUgICAweDAwMTAgICAxMDAgICAxMDAgICAwMDAgICAgT2xkX2FnZSAgICAgIC0gICAg ICAgMAoxOTkgVURNQV9DUkNfRXJyb3JfQ291bnQgICAgMHgwMDNlICAgMjAwICAgMjAwICAgMDAw ICAgIE9sZF9hZ2UgICAgICAtICAgICAgIDAKMjAwIE11bHRpX1pvbmVfRXJyb3JfUmF0ZSAgIDB4 MDAwMCAgIDEwMCAgIDI1MyAgIDAwMCAgICBPbGRfYWdlICAgICAgLSAgICAgICAwCjIwMiBVbmtu b3duX0F0dHJpYnV0ZSAgICAgICAweDAwMzIgICAxMDAgICAyNTMgICAwMDAgICAgT2xkX2FnZSAg ICAgIC0gICAgICAgMAoKRXJyb3IgU01BUlQgRXJyb3IgTG9nIFJlYWQgZmFpbGVkOiBJbnB1dC9v dXRwdXQgZXJyb3IKU21hcnRjdGw6IFNNQVJUIEVycm9ybG9nIFJlYWQgRmFpbGVkCkVycm9yIFNN QVJUIEVycm9yIFNlbGYtVGVzdCBMb2cgUmVhZCBmYWlsZWQ6IElucHV0L291dHB1dCBlcnJvcgpT bWFydGN0bDogU01BUlQgU2VsZiBUZXN0IExvZyBSZWFkIEZhaWxlZAo= |
From: Bruce A. <ba...@gr...> - 2003-02-22 12:49:52
|
Hi Christopher, > I just started using the Linux smartmon tools on several of my systems, and > I have a few questions. > > First, when I run smartctl -a on a Seagate ST340016A drive, it dumps the > attributes, e.g. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH > TYPE WHEN_FAILED RAW_VALUE > 1 > Raw_Read_Error_Rate 0x000f 068 056 034 Pre-fail - > 138071364 > 3 Spin_Up_Time 0x0003 072 070 000 Pre-fail - 0 > 4 > Start_Stop_Count 0x0032 092 092 020 Old_age - 8333 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail - 0 > ... > > Can someone explain what several columns are? What are the flags and what > do they mean? The flags are "proprietary" or "vendor-specific", meaning that they don't have a fixed meaning given by the ATA/SMART specs. However from historical useage, the least significant bit of the flag indicates Pre-fail (versus Usage) attributes, as indicated in the TYPE column. IBM does document a couple of the other bits, but as far as I know they are the only manufacturer to do so. Please read the smartctl man page for an explanation of the difference between Pre-fail and Usage attributes. > What is the difference between VALUE and RAW_VALUE? Again, please read the smartctl man page. > What does it mean when the threshold is zero? This means that the attribute can never fail, since the attribute fails if its value is less than or equal to the threshold, and the minimum attribute value is 1. > Second, expanding on the above question, on one of my drives, a MAXTOR > 6L040J2, I see: > > SMART Attributes Data Structure revision number: 11 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH > TYPE WHEN_FAILED RAW_VALUE > 194 Temperature_Celsius 0x0022 083 079 042 Old_age - 44 > > > Is this telling me the temperature is 83 degrees Celsius? I doubt that > very much considering it's only 33 degrees right next to that hard > drive. And what does it mean that the raw value is 44 and the flags are > 22. I'd have to guess maybe that this was not actually the temperature > attribute?!?! It probably IS the temperature attribute. Again, as explained in the smartctl man page, 44 Celsius is the drive temperature (internally, as reported by the drive.) The normalised value of this quantity is 83. This value has (at some point in the past) had value 79, which means that the drive has in the past been hotter than it is now. In order to "fail" the usage threshold the normalized value has to drop to 42 or lower. > Next, on several of the drives, I get this in the headers: > > Seagate ST340016A: ATA Standard is: Unrecognized. Minor revision code: 0x00 > WDC WD1000BB-75CHE0: ATA Standard is: Unrecognized. Minor revision code: 0x00 > > What does this mean? It means that Seagate/WDC is not completely obeying the ATA spec. The ATA spec for ATA-5 (which is what the Seagate drive is) actually consists of a dozen different revision levels. The manufacturer can put a non-zero number in the minor revision code, which indicates (indirectly) which of these different revisions is the one that the drive obeys. But Seagate hasn't bothered to do this. > Next, on the Seagate ST340016A drive, I've run the DOS Seagate SMART tools > and run the long and short tests, but when I try to use smartmon to do so, > it says the test is being run, then when I later query it, it says > > Self-test execution status: ( 48) A fatal error or unknown test error > occurred while the device was > executing > its self-test routine and the device > was unable to complete the self-test > routine. > > and at the end of the report, it says > > Error SMART Error Log Read failed: Input/output error > Smartctl: SMART Errorlog Read Failed > Error SMART Error Self-Test Log Read failed: Input/output error > Smartctl: SMART Self Test Log Read Failed > > Since the Seagate tools can run and log the tests, I'm wondering why I > can't from Linux. That's interesting. Could you please post the complete output of the self-test log as reported by the Seagate DOS tool, and tell us about your kernel version and build? Also, are there any IO error messages in /var/log/messages when the error messages above from Smartctl are printed? If so, please post those as well. This is quite strange, and at first glance I'd conclude that something is either wrong with the disk or with the rest of the IO chain (cabling, motherboard IO controller). > Next, if I enable the automatic testing and tracking features, do I have to > do so every time I reboot? In the man page it mentions -s on may not be > remembered. What it says is: "In principle the S.M.A.R.T. feature settings are preserved over power-cycling, but it doesn't hurt to be sure." This is a true statement! You con't need -s on, but it also does no harm. > I notice that one can add -S on to the /etc/smartd file, but that > feature is supposed to survive power cycles. There is no way to > specify -s on in the /etc/smartd file, although that feature may not > survive power cycles. Or am I missing something? First, I assume you mean /etc/smartd.conf. Second, smartd automatically enables SMART (equivalent of -s on) on the device, "just in case" SMART has been disabled. > Will adding -S on to smartd interfere with the other options there? No, adding -S to smartd.conf won't interfere with the other options. > I've attached the output of 'smartctl -a /dev/hd' for each of the drive, if > it matters. Thanks, this was useful for me. I would be concerned about smartmontools' inability to print the Seagate self-test log. I haven't see this before or had it reported. Unless there is something strange about your kernel, it's not a good sign... Cheers, Bruce |
From: Christopher W. <sma...@th...> - 2003-02-22 17:38:41
|
At 06:49 AM 2/22/2003, Bruce Allen wrote [edited response]: > > Can someone explain what several columns are? What are the flags and what > > do they mean? > >The flags are "proprietary" or "vendor-specific", meaning that they don't >have a fixed meaning given by the ATA/SMART specs. However from >historical useage, the least significant bit of the flag indicates >Pre-fail (versus Usage) attributes, as indicated in the TYPE column. IBM >does document a couple of the other bits, but as far as I know they are >the only manufacturer to do so. > > > What is the difference between VALUE and RAW_VALUE? > >Again, please read the smartctl man page. I had looked for this information in the man page before and not found it, so I re-read the whole thing in detail and finally found a brief mention under the discussion of the -A option, where I had not expected to find it. Might I suggest breaking this out into it's own section? What I did find says: Each Attribute has a 'Raw' value, printed under the heading 'RAW_VALUE', and a 'Normalized' value printed under the heading 'VALUE'. ... Each vendor uses their own magic to convert this Raw value to a Normalized value in the range from 1 to 254. Note that the conversion from 'Raw' value to a quantity with physical units is not specified by the S.M.A.R.T. standard. Expounding on this, the raw value is the value actually reported by the disk, which smartmon then interprets via some unspecified logic to come up with a "Normalized" (how?) result that it displays as the actual VALUE, and it has to guess at this because each manufacturer uses a different method to do the conversion since its not part of the SMART spec? Or does the manufacturer do the conversion and smartmon just reports it by reading it from the drive? What about the THRESH and WORST values? Does smartmon produce them or read them from the disk? > > What does it mean when the threshold is zero? > >This means that the attribute can never fail, since the attribute fails if >its value is less than or equal to the threshold, and the minimum >attribute value is 1. So those attributes are only for "tracking purposes" or informational purposes? What about something like 3 Spin_Up_Time 0x0003 072 070 000 Pre-fail - 0 I assume that VALUE/WORST are the only attribute values that have a minimum of 1, since the THRESH is obviously zero here, as is the raw value. So how do I interpret this pre-fail attribute with a 0 threshold? > > 194 > Temperature_Celsius 0x0022 083 079 042 Old_age - 44 > > > > > > Is this telling me the temperature is 83 degrees Celsius? I doubt that ... > >It probably IS the temperature attribute. Again, as explained in the >smartctl man page, 44 Celsius is the drive temperature (internally, as >reported by the drive.) The normalised value of this quantity is 83. >This value has (at some point in the past) had value 79, which means that >the drive has in the past been hotter than it is now. In order to "fail" >the usage threshold the normalized value has to drop to 42 or lower. OK, thank you for the explanation. As I understand it then, the normalized values are then merely an algorithmic representation of the raw data, and the raw data may itself may have several meanings depending on manufacturer. These algorithmic values have no real world representation (like the raw value might, but doesn't always), and are merely an abstract measure of how close that tracked drive attribute is to an "incorrectly operating" failure level. The value/worst/thresh can only be compared to each other, and only for a given attribute, and the values are derived through a "magical manufacturer conversion" of the raw value. This, then, answers my question above in that smartmon does not calculate these values, but merely reads them from the drive. I don't really see that explanation in the man page, and it could probably use additional discussions on that topic to clarify. > > Next, on several of the drives, I get this in the headers: > > > > Seagate ST340016A: ATA Standard is: Unrecognized. Minor revision code: > 0x00 > > WDC WD1000BB-75CHE0: ATA Standard is: Unrecognized. Minor revision > code: 0x00 > > > > What does this mean? > >It means that Seagate/WDC is not completely obeying the ATA spec. The ATA >spec for ATA-5 (which is what the Seagate drive is) actually consists of a >dozen different revision levels. The manufacturer can put a non-zero >number in the minor revision code, which indicates (indirectly) which of >these different revisions is the one that the drive obeys. But Seagate >hasn't bothered to do this. So this isn't an error/problem in the drive, but merely an unsupported feature. Maybe there's a better way to display that so it gets across that this is just not being used by the drive, as compared to being a possible error? > > Since the Seagate tools can run and log the tests, I'm wondering why I > > can't from Linux. > >That's interesting. Could you please post the complete output of the >self-test log as reported by the Seagate DOS tool, and tell us about your >kernel version and build? I'll do so in a separate message, as there's a lot of data there. >Second, smartd automatically enables SMART (equivalent of -s on) on the >device, "just in case" SMART has been disabled. Perfect! Thanks. -W |
From: Bruce A. <ba...@gr...> - 2003-02-23 02:59:25
|
Hi Chris, On Sat, 22 Feb 2003, Christopher Wolf wrote: > At 06:49 AM 2/22/2003, Bruce Allen wrote [edited response]: > > > Can someone explain what several columns are? What are the flags and what > > > do they mean? > > > >The flags are "proprietary" or "vendor-specific", meaning that they don't > >have a fixed meaning given by the ATA/SMART specs. However from > >historical useage, the least significant bit of the flag indicates > >Pre-fail (versus Usage) attributes, as indicated in the TYPE column. IBM > >does document a couple of the other bits, but as far as I know they are > >the only manufacturer to do so. > > > > > What is the difference between VALUE and RAW_VALUE? > > > >Again, please read the smartctl man page. > > > I had looked for this information in the man page before and not found > it, so I re-read the whole thing in detail and finally found a brief > mention under the discussion of the -A option, where I had not > expected to find it. Might I suggest breaking this out into it's own > section? I really don't see another place that's appropriate for it. But thanks for the suggestion. > What I did find says: > > Each Attribute has a 'Raw' value, printed under the > heading 'RAW_VALUE', and a 'Normalized' value > printed under the heading 'VALUE'. ... > Each vendor uses their own magic to convert this > Raw value to a Normalized value in the range from 1 > to 254. > > Note that the conversion from 'Raw' value to a > quantity with physical units is not specified by > the S.M.A.R.T. standard. > > Expounding on this, the raw value is the value actually reported by > the disk, which smartmon then interprets via some unspecified logic to > come up with a "Normalized" (how?) result that it displays as the > actual VALUE, and it has to guess at this because each manufacturer > uses a different method to do the conversion since its not part of the > SMART spec? No, smartctl does not do the conversion. It's done by the disk's firmware. > Or does the manufacturer do the conversion and smartmon just reports > it by reading it from the drive? Correct! > What about the THRESH and WORST values? Does smartmon produce them or > read them from the disk? Smartctl does not produce these values. It simply reads them from the disk. I'll add a sentence to the man page stating this explicitly. > > > What does it mean when the threshold is zero? > > > >This means that the attribute can never fail, since the attribute fails if > >its value is less than or equal to the threshold, and the minimum > >attribute value is 1. > > > So those attributes are only for "tracking purposes" or informational > purposes? Correct. > What about something like > > 3 Spin_Up_Time 0x0003 072 070 000 Pre-fail - 0 This looks odd. Spin_Up_Time is one of the best monitors of a disk's health. Failing bearings or a failing motor can cause the spin-up time to increase dramatically. So the reported threshold of "0" looks very odd. As is the value of "0". The only time that I have ever seen such strange values is for disks that have just had SMART enabled for the first time, and which have not yet been power-cycled enough to gather sufficient statistical information to report a spin-up time. But if this disk has been power-cycled even a handful of times, it should report both a spin-up time and a non-zero threshold. > I assume that VALUE/WORST are the only attribute values that have a > minimum of 1, since the THRESH is obviously zero here, as is the raw > value. So how do I interpret this pre-fail attribute with a 0 > threshold? It either means that something is pretty messed up with the disk, or alternatively that SMART has been enabled too recently to gather spin-up time information. > > > 194 > > Temperature_Celsius 0x0022 083 079 042 Old_age - 44 > > > > > > > > > Is this telling me the temperature is 83 degrees Celsius? I doubt that ... > > > >It probably IS the temperature attribute. Again, as explained in the > >smartctl man page, 44 Celsius is the drive temperature (internally, as > >reported by the drive.) The normalised value of this quantity is 83. > >This value has (at some point in the past) had value 79, which means that > >the drive has in the past been hotter than it is now. In order to "fail" > >the usage threshold the normalized value has to drop to 42 or lower. > > > OK, thank you for the explanation. You're welcome! > As I understand it then, the normalized values are then merely an > algorithmic representation of the raw data, and the raw data may itself may > have several meanings depending on manufacturer. These algorithmic values > have no real world representation (like the raw value might, but doesn't > always), and are merely an abstract measure of how close that tracked drive > attribute is to an "incorrectly operating" failure level. The > value/worst/thresh can only be compared to each other, and only for a given > attribute, and the values are derived through a "magical manufacturer > conversion" of the raw value. This, then, answers my question above in > that smartmon does not calculate these values, but merely reads them from > the drive. > > I don't really see that explanation in the man page, and it could probably > use additional discussions on that topic to clarify. OK, I'll add a few sentences of additional explanation. > > > Next, on several of the drives, I get this in the headers: > > > > > > Seagate ST340016A: ATA Standard is: Unrecognized. Minor revision code: > > 0x00 > > > WDC WD1000BB-75CHE0: ATA Standard is: Unrecognized. Minor revision > > code: 0x00 > > > > > > What does this mean? > > > >It means that Seagate/WDC is not completely obeying the ATA spec. The ATA > >spec for ATA-5 (which is what the Seagate drive is) actually consists of a > >dozen different revision levels. The manufacturer can put a non-zero > >number in the minor revision code, which indicates (indirectly) which of > >these different revisions is the one that the drive obeys. But Seagate > >hasn't bothered to do this. > > > So this isn't an error/problem in the drive, but merely an unsupported > feature. Maybe there's a better way to display that so it gets across that > this is just not being used by the drive, as compared to being a possible > error? I'll have to take another look at the ATA-5 spec. I think in fact that it is an "error" in the sense that the manufacturer has not implemented the spec correctly, by not specifying the ATA-5 revision level. > > > Since the Seagate tools can run and log the tests, I'm wondering why I > > > can't from Linux. > > > >That's interesting. Could you please post the complete output of the > >self-test log as reported by the Seagate DOS tool, and tell us about your > >kernel version and build? > > I'll do so in a separate message, as there's a lot of data there. OK, thank you. I am looking forward to comparing it. Bruce |
From: Christopher W. <sma...@th...> - 2003-02-23 06:07:47
|
At 08:59 PM 2/22/2003, Bruce Allen wrote: > > I had looked for this information in the man page before and not found > > it, so I re-read the whole thing in detail and finally found a brief > > mention under the discussion of the -A option, where I had not > > expected to find it. Might I suggest breaking this out into it's own > > section? > >I really don't see another place that's appropriate for it. But thanks >for the suggestion. As important as attributes are, they could certainly be discussed in their own top level title/section of the smartctrl manual page! > > 3 > Spin_Up_Time 0x0003 072 070 000 Pre-fail - 0 > >This looks odd. Spin_Up_Time is one of the best monitors of a disk's >health. Failing bearings or a failing motor can cause the spin-up time to >increase dramatically. So the reported threshold of "0" looks very odd. >As is the value of "0". The only time that I have ever seen such strange >values is for disks that have just had SMART enabled for the first time, >and which have not yet been power-cycled enough to gather sufficient >statistical information to report a spin-up time. But if this disk has >been power-cycled even a handful of times, it should report both a spin-up >time and a non-zero threshold. Well, this is the system that had the funny output that I'll be detailing in another message, so maybe it's related to that. I'll wait until later to discuss this again. > So this isn't an error/problem in the drive, but merely an unsupported > > feature. Maybe there's a better way to display that so it gets across > that > > this is just not being used by the drive, as compared to being a possible > > error? > >I'll have to take another look at the ATA-5 spec. I think in fact that it >is an "error" in the sense that the manufacturer has not implemented the >spec correctly, by not specifying the ATA-5 revision level. As the purpose of running smartmon tools is to detect drive failures that could result in data loss, and since this is really a non-fatal failure that existed from the moment of manufacture, I suggest it be labeled in such as way as to clearly note that this is not a sign of drive failure *or corruption*, even if it is a violation of the spec. -W |