You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(50) |
Nov
(161) |
Dec
(84) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(84) |
Feb
(103) |
Mar
(54) |
Apr
(63) |
May
(44) |
Jun
(45) |
Jul
(44) |
Aug
(55) |
Sep
(15) |
Oct
(99) |
Nov
(101) |
Dec
(104) |
2004 |
Jan
(76) |
Feb
(98) |
Mar
(99) |
Apr
(130) |
May
(107) |
Jun
(79) |
Jul
(94) |
Aug
(164) |
Sep
(115) |
Oct
(125) |
Nov
(160) |
Dec
(84) |
2005 |
Jan
(72) |
Feb
(85) |
Mar
(55) |
Apr
(109) |
May
(64) |
Jun
(33) |
Jul
(71) |
Aug
(77) |
Sep
(84) |
Oct
(102) |
Nov
(106) |
Dec
(51) |
2006 |
Jan
(47) |
Feb
(58) |
Mar
(60) |
Apr
(106) |
May
(73) |
Jun
(65) |
Jul
(109) |
Aug
(103) |
Sep
(73) |
Oct
(57) |
Nov
(94) |
Dec
(62) |
2007 |
Jan
(61) |
Feb
(67) |
Mar
(90) |
Apr
(90) |
May
(77) |
Jun
(82) |
Jul
(75) |
Aug
(74) |
Sep
(63) |
Oct
(70) |
Nov
(60) |
Dec
(59) |
2008 |
Jan
(68) |
Feb
(113) |
Mar
(128) |
Apr
(89) |
May
(57) |
Jun
(88) |
Jul
(74) |
Aug
(43) |
Sep
(77) |
Oct
(106) |
Nov
(99) |
Dec
(82) |
2009 |
Jan
(126) |
Feb
(49) |
Mar
(47) |
Apr
(26) |
May
(38) |
Jun
(75) |
Jul
(61) |
Aug
(45) |
Sep
(105) |
Oct
(77) |
Nov
(46) |
Dec
(47) |
2010 |
Jan
(58) |
Feb
(88) |
Mar
(54) |
Apr
(78) |
May
(30) |
Jun
(40) |
Jul
(46) |
Aug
(36) |
Sep
(30) |
Oct
(29) |
Nov
(80) |
Dec
(52) |
2011 |
Jan
(30) |
Feb
(27) |
Mar
(25) |
Apr
(77) |
May
(24) |
Jun
(45) |
Jul
(34) |
Aug
(24) |
Sep
(65) |
Oct
(55) |
Nov
(72) |
Dec
(19) |
2012 |
Jan
(58) |
Feb
(44) |
Mar
(90) |
Apr
(11) |
May
(27) |
Jun
(32) |
Jul
(61) |
Aug
(32) |
Sep
(39) |
Oct
(45) |
Nov
(50) |
Dec
(21) |
2013 |
Jan
(44) |
Feb
(26) |
Mar
(37) |
Apr
(46) |
May
(24) |
Jun
(44) |
Jul
(15) |
Aug
(16) |
Sep
(20) |
Oct
(36) |
Nov
(36) |
Dec
(41) |
2014 |
Jan
(21) |
Feb
(9) |
Mar
(14) |
Apr
(16) |
May
(32) |
Jun
(50) |
Jul
(71) |
Aug
(47) |
Sep
(17) |
Oct
(9) |
Nov
(40) |
Dec
(42) |
2015 |
Jan
(11) |
Feb
(25) |
Mar
(22) |
Apr
(21) |
May
(6) |
Jun
(3) |
Jul
(7) |
Aug
(42) |
Sep
(28) |
Oct
(33) |
Nov
(5) |
Dec
(7) |
2016 |
Jan
(12) |
Feb
(18) |
Mar
(19) |
Apr
(31) |
May
(27) |
Jun
(23) |
Jul
(12) |
Aug
(33) |
Sep
(5) |
Oct
(28) |
Nov
(19) |
Dec
(8) |
2017 |
Jan
(52) |
Feb
(36) |
Mar
(12) |
Apr
(17) |
May
(8) |
Jun
(12) |
Jul
(3) |
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Bruce A. <ba...@gr...> - 2003-04-02 18:18:06
|
In addition to reading the smartmontools manual, you can download the ATA specification from the web (a link is given on the smartmonools web page) and read (or skim) the section of SMART. Being the specification itself, this is in fact the "last word" on the subject. Cheers, Bruce On Wed, 2 Apr 2003, Fabrizio Di Meo wrote: > > Thanks Bruce...I read and printed the manual page but I've to re-read > them. (double and double check is a must to me) Maybe I've to "study" > the SMART technology to know the meaning of some terms...just like the > one I quoted previuosly (-S autosave...I don't really understand what > means "Autosaving" and what saves in this case :o) ). Well however I > wish to thank you for your patience and your excellent work. Fabrizio > Bruce Allen <ba...@gr...> wrote:> thank you for having replied in a such short time. Well I thought > > smartd run some tests infact I coulnd't explain the existence of > > smartctl except than reading attributes. > > It's also for running self-tests, and examining the self-test logs and ATA > error logs. > > Although your main interest is in smartd, you should carefully read the > smartctl manual page as well. > > > As you told I've to run > > "smartctl -t long /dev/hda" by crontab...but what should be the most > > right scheduling...I mean 4 hours or more? > > I'd suggest about once per week. > > > Doesn't this generate some > > problems (conflicts) with the self test which is made by the hard > > disk itself? (or the hard disk doesn't perform any kind of self > > test?). > > Please read the smartctl manual page completely. It has a description of > the different types of testing. > > > and smartctl updates the > > attributes > > smartctl does not update the attributes. As it says in the manual page: > smartctl does not calculate any of these values, it merely reports them > from the S.M.A.R.T. data on the disk. > > > 2. smartd will read the updated attributes and will notify > > what changed is. Another question....(sorry :o) ) I can't understand > > when I should use the directive -S: > > > > > > -S VALUE > > > > Enables or disables Attribute Autosave when smartd > > > > starts up and has no further effect. The valid > > > > arguments to this Directive are on and off. > > > > > > > > This is also a command line parameter we can find on using smartctl so > > this seems I can use smartd instead of smartctl, but we have told that > > these programs make different things...so I feel a little confused on > > finding duplicated command line parameter/directive on both smartctl > > and smartd.... :o( when using the one and the other? > > In fact you can use either smartctl or smartd to turn on the Attribute > auto-save. It's there as a convenience in smartd. > > Cheers, > Bruce > > > > --------------------------------- > Yahoo! Cellulari: loghi, suonerie, picture message per il tuo telefonino |
From: Bruce A. <ba...@gr...> - 2003-04-02 18:15:17
|
Hi Francis, I'm clueless about what is needed to fix this -- but I'll try to find out if it's possible. Cheers, Bruce On Wed, 2 Apr 2003, Francis Reader wrote: > We have tried it on both a FREECOM USB2 drive and a drive based upon the > Cypress CY4611 chipset. > Both report that the drives are not SMART capable. > > To quote from the design notes: > "If the device is an IDE device, the ATAPI > commands received over USB will be translated into > IDE task file commands." > However the usb->ide bridge claims to be ATAPI5/6 compliant. > > If I force -d scsi then it reports "Device does not support S.M.A.R.T." > and I force -d ata then it reports "Error ATA GET HD Identity Failed: > Invalid argument" > > Any ideas where is the problem is? The drive is being reported as a "WDC > WD20 OEB-OOCPFO" > > Fran > > -----Original Message----- > From: Bruce Allen [mailto:ba...@gr...] > Sent: 02 April 2003 16:22 > To: Francis Reader > Cc: sma...@li... > Subject: Re: [Smartmontools-support]USB support? > > > Hi Francis, > > > Has anyone tried any USB -> IDE devices as of yet? > > Any success? > > I haven't tried it. Perhaps you could try and tell us if it works? > > > What is the difference in the SMART requests needed for a SCSI drive, > > compared to those for an IDE drive? > > Well, the requests are described in two different documents, written by > two different technical committees. There is not much coordination > between them. In fact in the SCSI world, these commands are not even > called "SMART". > > You can find both documents under the list of REFERENCES on the > smartmontools home page. > > Cheers, > Bruce > > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Francis R. <FR...@am...> - 2003-04-02 16:38:41
|
We have tried it on both a FREECOM USB2 drive and a drive based upon the Cypress CY4611 chipset. Both report that the drives are not SMART capable. To quote from the design notes: "If the device is an IDE device, the ATAPI commands received over USB will be translated into IDE task file commands." However the usb->ide bridge claims to be ATAPI5/6 compliant. If I force -d scsi then it reports "Device does not support S.M.A.R.T." and I force -d ata then it reports "Error ATA GET HD Identity Failed: Invalid argument" Any ideas where is the problem is? The drive is being reported as a "WDC WD20 OEB-OOCPFO" Fran -----Original Message----- From: Bruce Allen [mailto:ba...@gr...] Sent: 02 April 2003 16:22 To: Francis Reader Cc: sma...@li... Subject: Re: [Smartmontools-support]USB support? Hi Francis, > Has anyone tried any USB -> IDE devices as of yet? > Any success? I haven't tried it. Perhaps you could try and tell us if it works? > What is the difference in the SMART requests needed for a SCSI drive, > compared to those for an IDE drive? Well, the requests are described in two different documents, written by two different technical committees. There is not much coordination between them. In fact in the SCSI world, these commands are not even called "SMART". You can find both documents under the list of REFERENCES on the smartmontools home page. =20 Cheers, Bruce |
From: <fab...@ya...> - 2003-04-02 16:15:08
|
Thanks Bruce...I read and printed the manual page but I've to re-read them. (double and double check is a must to me) Maybe I've to "study" the SMART technology to know the meaning of some terms...just like the one I quoted previuosly (-S autosave...I don't really understand what means "Autosaving" and what saves in this case :o) ). Well however I wish to thank you for your patience and your excellent work. Fabrizio Bruce Allen <ba...@gr...> wrote:> thank you for having replied in a such short time. Well I thought > smartd run some tests infact I coulnd't explain the existence of > smartctl except than reading attributes. It's also for running self-tests, and examining the self-test logs and ATA error logs. Although your main interest is in smartd, you should carefully read the smartctl manual page as well. > As you told I've to run > "smartctl -t long /dev/hda" by crontab...but what should be the most > right scheduling...I mean 4 hours or more? I'd suggest about once per week. > Doesn't this generate some > problems (conflicts) with the self test which is made by the hard > disk itself? (or the hard disk doesn't perform any kind of self > test?). Please read the smartctl manual page completely. It has a description of the different types of testing. > and smartctl updates the > attributes smartctl does not update the attributes. As it says in the manual page: smartctl does not calculate any of these values, it merely reports them from the S.M.A.R.T. data on the disk. > 2. smartd will read the updated attributes and will notify > what changed is. Another question....(sorry :o) ) I can't understand > when I should use the directive -S: > > > -S VALUE > > Enables or disables Attribute Autosave when smartd > > starts up and has no further effect. The valid > > arguments to this Directive are on and off. > > > > This is also a command line parameter we can find on using smartctl so > this seems I can use smartd instead of smartctl, but we have told that > these programs make different things...so I feel a little confused on > finding duplicated command line parameter/directive on both smartctl > and smartd.... :o( when using the one and the other? In fact you can use either smartctl or smartd to turn on the Attribute auto-save. It's there as a convenience in smartd. Cheers, Bruce --------------------------------- Yahoo! Cellulari: loghi, suonerie, picture message per il tuo telefonino |
From: Bruce A. <ba...@gr...> - 2003-04-02 15:22:02
|
Hi Francis, > Has anyone tried any USB -> IDE devices as of yet? > Any success? I haven't tried it. Perhaps you could try and tell us if it works? > What is the difference in the SMART requests needed for a SCSI drive, > compared to those for an IDE drive? Well, the requests are described in two different documents, written by two different technical committees. There is not much coordination between them. In fact in the SCSI world, these commands are not even called "SMART". You can find both documents under the list of REFERENCES on the smartmontools home page. Cheers, Bruce |
From: Bruce A. <ba...@gr...> - 2003-04-02 15:17:35
|
> thank you for having replied in a such short time. Well I thought > smartd run some tests infact I coulnd't explain the existence of > smartctl except than reading attributes. It's also for running self-tests, and examining the self-test logs and ATA error logs. Although your main interest is in smartd, you should carefully read the smartctl manual page as well. > As you told I've to run > "smartctl -t long /dev/hda" by crontab...but what should be the most > right scheduling...I mean 4 hours or more? I'd suggest about once per week. > Doesn't this generate some > problems (conflicts) with the self test which is made by the hard > disk itself? (or the hard disk doesn't perform any kind of self > test?). Please read the smartctl manual page completely. It has a description of the different types of testing. > and smartctl updates the > attributes smartctl does not update the attributes. As it says in the manual page: smartctl does not calculate any of these values, it merely reports them from the S.M.A.R.T. data on the disk. > 2. smartd will read the updated attributes and will notify > what changed is. Another question....(sorry :o) ) I can't understand > when I should use the directive -S: > > > -S VALUE > > Enables or disables Attribute Autosave when smartd > > starts up and has no further effect. The valid > > arguments to this Directive are on and off. > > > > This is also a command line parameter we can find on using smartctl so > this seems I can use smartd instead of smartctl, but we have told that > these programs make different things...so I feel a little confused on > finding duplicated command line parameter/directive on both smartctl > and smartd.... :o( when using the one and the other? In fact you can use either smartctl or smartd to turn on the Attribute auto-save. It's there as a convenience in smartd. Cheers, Bruce |
From: <fab...@ya...> - 2003-04-02 14:11:56
|
Hi Bruce, thank you for having replied in a such short time. Well I thought smartd run some tests infact I coulnd't explain the existence of smartctl except than reading attributes. As you told I've to run "smartctl -t long /dev/hda" by crontab...but what should be the most right scheduling...I mean 4 hours or more? Doesn't this generate some problems (conflicts) with the self test which is made by the hard disk itself? (or the hard disk doesn't perform any kind of self test?). Let me see if I understood the flow of operation: 1. I've to run smartctl in order to query the hard disk and smartctl updates the attributes 2. smartd will read the updated attributes and will notify what changed is. Another question....(sorry :o) ) I can't understand when I should use the directive -S: -S VALUE Enables or disables Attribute Autosave when smartd starts up and has no further effect. The valid arguments to this Directive are on and off. This is also a command line parameter we can find on using smartctl so this seems I can use smartd instead of smartctl, but we have told that these programs make different things...so I feel a little confused on finding duplicated command line parameter/directive on both smartctl and smartd.... :o( when using the one and the other? Thank you, Fabrizio Bruce Allen <ba...@gr...> wrote: Hi Fabrizio, > I've just installed smartd and configured smartd.conf as reported below: > > /dev/hda -H -o on -f -l error -l selftest -m fab...@ya... -t \ > -R 1 -R 3 -R 5 -R 7 -R 11 -R 13 -M daily This looks OK. > executing the command smartctl -c /dev/hda I get this message: You should probably use smartctl -a /dev/hda to get a complete picture. > I run the self test manually and next I abort it, but seems no more > tests have been executed by smartd.....because the message reported > just a few rows above...is this right? smartd does not run self-tests. It only monitors the self-test log for signs of self-tests that showed errors. If you want to run self-tests on a regular basis I suggest that you schedule a cron job to do this, using smartctl -t long /dev/hda for example. > My questions are: > > 1. how can I check out what smartd is doing (because no message is > written in /var/log messages every 30 minutes - this is the default > run-time right?) smartd only writes messages into /var/log/messages when a disk Attribute has changed. Until that happens, there are no messages. > 2. is the configuration directives right as I wrote it in smartd.conf? It looks fine. You might want to add -M test to get a test mail on startup to be sure that is working OK. Cheers, Bruce --------------------------------- Yahoo! Cellulari: loghi, suonerie, picture message per il tuo telefonino |
From: Bruce A. <ba...@gr...> - 2003-04-02 13:12:27
|
Hi Fabrizio, > I've just installed smartd and configured smartd.conf as reported below: > > /dev/hda -H -o on -f -l error -l selftest -m fab...@ya... -t \ > -R 1 -R 3 -R 5 -R 7 -R 11 -R 13 -M daily This looks OK. > executing the command smartctl -c /dev/hda I get this message: You should probably use smartctl -a /dev/hda to get a complete picture. > I run the self test manually and next I abort it, but seems no more > tests have been executed by smartd.....because the message reported > just a few rows above...is this right? smartd does not run self-tests. It only monitors the self-test log for signs of self-tests that showed errors. If you want to run self-tests on a regular basis I suggest that you schedule a cron job to do this, using smartctl -t long /dev/hda for example. > My questions are: > > 1. how can I check out what smartd is doing (because no message is > written in /var/log messages every 30 minutes - this is the default > run-time right?) smartd only writes messages into /var/log/messages when a disk Attribute has changed. Until that happens, there are no messages. > 2. is the configuration directives right as I wrote it in smartd.conf? It looks fine. You might want to add -M test to get a test mail on startup to be sure that is working OK. Cheers, Bruce |
From: Francis R. <FR...@am...> - 2003-04-02 11:02:04
|
Has anyone tried any USB -> IDE devices as of yet? Any success? What is the difference in the SMART requests needed for a SCSI drive, compared to those for an IDE drive? Francis Reader Software Manager Amino Communications Ltd Tel +44 (0)1954 784500=20 Fax +44 (0)1954 784501 The information in this e-mail and any attached files is likely to be confidential, may be legally privileged, and is intended solely for the addressee. Any copying, dissemination, distribution or use of this message or the information included is strictly prohibited unless authorised by us. Any views or opinions presented are solely those of the author and do not necessarily represent those of Amino Communications Limited or any of its affiliates. If you have received this message in error please notify us immediately and delete this message. Although we have taken steps to ensure that this e-mail and attachments are free from any virus, we would advise you to check that they are indeed virus free. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection (and/or external compromise of security and/or breach of confidentiality) in relation to transmissions sent by e-mail. =20 Amino Communications Limited Registered No: 3490180, England" |
From: <fab...@ya...> - 2003-04-02 07:22:31
|
Hi, I've just installed smartd and configured smartd.conf as reported below: /dev/hda -H -o on -f -l error -l selftest -m fab...@ya... -t \ -R 1 -R 3 -R 5 -R 7 -R 11 -R 13 -M daily executing the command smartctl -c /dev/hda I get this message: ---------------- Off-line data collection status: (0x02) Offline data collection activity completed without error. Self-test execution status: ( 16) The self-test routine was aborted by the host. Total time to complete off-line data collection: ( 51) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Automatic timer ON/OFF support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 35) minutes. ------------- I run the self test manually and next I abort it, but seems no more tests have been executed by smartd.....because the message reported just a few rows above...is this right? My questions are: 1. how can I check out what smartd is doing (because no message is written in /var/log messages every 30 minutes - this is the default run-time right?) 2. is the configuration directives right as I wrote it in smartd.conf? Thank you :o) Fabrizio P.s. this is just the 1st time I run smartmontools.... --------------------------------- Yahoo! Cellulari: loghi, suonerie, picture message per il tuo telefonino |
From: Steve W. <sw...@ar...> - 2003-04-01 21:48:25
|
> Hmmm. This might be one of those drives that only starts recording > power > on hours after SMART is enabled. Is it possible that SMART was only > enabled 248 hours ago? I would say the possibility is highly unlikely. When I initially started porting smartmontools over, SMART was already enabled on this drive. Prior to this, I can't think of any incident where I would've enabled it myself, so I would have to assume it came this way from the "factory". The only possibility I can think of where I would've enabled SMART (provided it came disabled from the factory) would have been running manufacturer diagnostic software on it, but I'm very certain that I have not done that to this drive as it's never given me any reason to. > Anyway, please see if you can figure out if this attribute raw value is > counting time in hours, or something else... Will do. Cheers, Steve |
From: Steve W. <sw...@ar...> - 2003-04-01 21:41:19
|
Hi Bruce, > OK, thanks for trying. It doesn't look like your code is doing > anything > "wrong". heh. So rarely that I hear that, it's quite refreshing. :) > I don't understand what Attribute 9 is for this disk. It's made by > Quantum, right? Does the raw value of Attribute 9 change with time? > [You > can monitor it with the -R 9 Directive in /etc/smartd.conf.] Yes it is made by Quantum. It is Quantum Part Number QML20000LC-A. Also lists 'LC22AT LC22A3M1 REV01-A A010F' on the drive label. The value is changing over time- When I first sent the message it was at 210, and presently it is at 248. I don't think that value is reflected in hours, as I don't think I've had the drive powered up for 38 hours between them. I will try out smartd after futzing with 2.1.24 on x86 today for the Error&Self Test log purposes (that is my goal for the afternoon). Cheers, Steve |
From: Bruce A. <ba...@gr...> - 2003-04-01 21:34:51
|
Hi Steve, > > I don't understand what Attribute 9 is for this disk. It's made by > > Quantum, right? Does the raw value of Attribute 9 change with time? > > [You > > can monitor it with the -R 9 Directive in /etc/smartd.conf.] > > Yes it is made by Quantum. It is Quantum Part Number QML20000LC-A. > Also lists 'LC22AT LC22A3M1 REV01-A A010F' on the drive label. The > value is changing over time- When I first sent the message it was at > 210, and presently it is at 248. I don't think that value is reflected > in hours, as I don't think I've had the drive powered up for 38 hours > between them. I will try out smartd after futzing with 2.1.24 on x86 > today for the Error&Self Test log purposes (that is my goal for the > afternoon). Hmmm. This might be one of those drives that only starts recording power on hours after SMART is enabled. Is it possible that SMART was only enabled 248 hours ago? Anyway, please see if you can figure out if this attribute raw value is counting time in hours, or something else... Cheers, Bruce |
From: Bruce A. <ba...@gr...> - 2003-04-01 21:23:42
|
> >>> [Is the large number of power cycles in 210 hours right??] > >> This drive was in use 24/7 for 2 years, if that helps shed some light > >> on the numbers. > > It does. It looks as if something may be wrong with the raw value of > > Attribute 9. This may be due to the endian change. For fun, try using > > the > > -v N,raw8 > > option to smartctl, to see the individual byte. > > -v N,raw16 > > may also be useful. > > Using smartctl on PowerPC, the following values are: > Normal output: > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 248 > With raw16: > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 0 0 248 > With raw8: > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 0 0 0 0 0 248 > > Using smartctl on x86, the following values are: > Normal output: > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 248 > With raw16: > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 0 0 248 > With raw8: > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 0 0 0 0 0 248 > > Everything looks identical there. OK, thanks for trying. It doesn't look like your code is doing anything "wrong". I don't understand what Attribute 9 is for this disk. It's made by Quantum, right? Does the raw value of Attribute 9 change with time? [You can monitor it with the -R 9 Directive in /etc/smartd.conf.] Cheers, Bruce |
From: Steve W. <sw...@ar...> - 2003-04-01 21:17:46
|
Hi Bruce, >>> [Is the large number of power cycles in 210 hours right??] >> This drive was in use 24/7 for 2 years, if that helps shed some light >> on the numbers. > It does. It looks as if something may be wrong with the raw value of > Attribute 9. This may be due to the endian change. For fun, try using > the > -v N,raw8 > option to smartctl, to see the individual byte. > -v N,raw16 > may also be useful. Using smartctl on PowerPC, the following values are: Normal output: 9 Power_On_Hours 0x0012 100 100 001 Old_age - 248 With raw16: 9 Power_On_Hours 0x0012 100 100 001 Old_age - 0 0 248 With raw8: 9 Power_On_Hours 0x0012 100 100 001 Old_age - 0 0 0 0 0 248 Using smartctl on x86, the following values are: Normal output: 9 Power_On_Hours 0x0012 100 100 001 Old_age - 248 With raw16: 9 Power_On_Hours 0x0012 100 100 001 Old_age - 0 0 248 With raw8: 9 Power_On_Hours 0x0012 100 100 001 Old_age - 0 0 0 0 0 248 Everything looks identical there. Cheers, Steve |
From: Bruce A. <ba...@gr...> - 2003-04-01 04:34:40
|
Hi Ralf, > i got a firmware directly from IBM. But the IBM Firmware Installer > told me that no update is needed. OK. Thanks for trying. > So the problem is still the same and another strange value with the > same disk is printed out. The temperature is raising from 39=B0C up to > 49 =B0C by heavy activity. > The max. temperature of the other older IBM hard disk's i have, are > only 37 ^C. What is wrong ? I'm really not sure. Could you please save the output of smartctl -a -v N,raw16 /dev/hd* to a file and send it to the mailing list as an attachment, please? That might help me to understand better what's going on. [Note: you'll need version 5.1-9 or later of the package to support this -v option.] I'm sorry you are having so much trouble. Cheers, =09Bruce >=20 > Ralf >=20 >=20 > Am Freitag, 21. Februar 2003 19:41 schrieb Bruce Allen: > > Hi Ralf, > > > > OK, I think I see what's happening. I think that you may have an IBM di= sk > > that has defective SMART firmware. > > > > Go to this page: > > > > http://www.geocities.com/dtla_update/ > > > > and follow one of the "Related links" for the 60GXP disk. This points = to > > revised IBM firmware that should fix the problem. > > > > [You will find a link to this page on the smartmontools web page under = the > > FAQs]. > > > > Please let me know if this helps! > > > > Cheers, > > =09Bruce > > > > On Fri, 21 Feb 2003, Ralf Panse wrote: > > > Hi Bruce > > > > > > Here the output from my strange disk with the low Power_On value. It'= s > > > not a laptop disk and the pc is turned off every evening. > > > > > > > > > smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen > > > Home page is http://smartmontools.sourceforge.net/ > > > > > > =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D > > > Device Model: IC35L040AVER07-0 > > > Serial Number: SXPTX393636 > > > Firmware Version: ER4OA46A > > > ATA Version is: 5 > > > ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > > > Local Time is: Fri Feb 21 17:01:37 2003 CET > > > SMART support is: Available - device has SMART capability. > > > SMART support is: Enabled > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > SMART overall-health self-assessment test result: PASSED > > > > > > General SMART Values: > > > Off-line data collection status: (0x00) Offline data collection activ= ity > > > was never started. > > > Self-test execution status: ( 0) The previous self-test routin= e > > > completed > > > without error or no self-test= has > > > ever been run. > > > Total time to complete off-line > > > data collection: (1383) seconds. > > > Offline data collection > > > capabilities: (0x1b) SMART execute Offline immedia= te. > > > Automatic timer ON/OFF suppor= t. > > > Suspend Offline collection up= on > > > new command. > > > Offline surface scan supporte= d. > > > Self-test supported. > > > SMART capabilities: (0x0003) Saves SMART data before enter= ing > > > power-saving mode. > > > Supports SMART auto save time= r. > > > Error logging capability: (0x01) Error logging supported. > > > Short self-test routine > > > recommended polling time: ( 1) minutes. > > > Extended self-test routine > > > recommended polling time: ( 23) minutes. > > > > > > SMART Attributes Data Structure revision number: 16 > > > Vendor Specific SMART Attributes with Thresholds: > > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE =20 > > > WHEN_FAILED RAW_VALUE > > > 1 Raw_Read_Error_Rate 0x000b 095 095 060 Pre-fail = - > > > 131084 > > > 2 Throughput_Performance 0x0005 100 100 050 Pre-fail = - =20 > > > 0 3 Spin_Up_Time 0x0007 104 104 024 Pre-fail = - > > > 17193697490 > > > 4 Start_Stop_Count 0x0012 100 100 000 Old_age = - > > > 197 > > > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail = - =20 > > > 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail = - > > > 0 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fai= l =20 > > > - 0 9 Power_On_Hours 0x0012 100 100 000 Old_= age=20 > > > - 117 > > > 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail = - =20 > > > 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age = =20 > > > - 197 > > > 192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age = - > > > 197 > > > 193 Load_Cycle_Count 0x0012 100 100 050 Old_age = - > > > 197 > > > 194 Temperature_Celsius 0x0002 141 141 000 Old_age = - > > > 39 (Lifetime Min/Max 19/47) > > > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age = - =20 > > > 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age = =20 > > > - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old= _age > > > - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 = =20 > > > Old_age - 0 > > > > > > SMART Error Log Version: 1 > > > No Errors Logged > > > > > > SMART Self-test log, version number 1 > > > Num Test_Description Status Remaining=20 > > > LifeTime(hours) LBA_of_first_error > > > # 1 Short off-line Completed 00% 115 = =20 > > > - # 2 Short off-line Completed 00% = 104 > > > - # 3 Short off-line Completed 00% = =20 > > > 104 - # 4 Short off-line Completed = 00% > > > 104 - # 5 Short off-line Completed = =20 > > > 00% 103 - # 6 Short off-line Completed = =20 > > > 00% 103 - # 7 Short off-line Completed = =20 > > > 00% 103 - # 8 Short off-line Completed= =20 > > > 00% 102 - # 9 Extended off-line =20 > > > Completed 00% 101 - #10 Short off-= line > > > Completed 00% 99 - #11 Short > > > off-line Completed 00% 99 - #= 12=20 > > > Short off-line Completed 00% 99 = - > > > #13 Short off-line Completed 00% 99 = =20 > > > - #14 Short off-line Completed 00% = 99 > > > - #15 Short off-line Completed 00% = =20 > > > 99 - #16 Short off-line Completed = 00% > > > 99 - #17 Short off-line Completed = =20 > > > 00% 97 - #18 Short off-line Completed = =20 > > > 00% 97 - #19 Short off-line Completed = =20 > > > 00% 97 - #20 Short off-line Completed= =20 > > > 00% 97 - #21 Short off-line =20 > > > Completed 00% 97 - > > > > > > > > > > > > Thanks > > > Ralf > > > > > > Am Freitag, 21. Februar 2003 16:58 schrieben Sie: > > > > Hi Ralf, > > > > > > > > (My comments are inserted below) > > > > > > > > On Fri, 21 Feb 2003, Ralf Panse wrote: > > > > > Hi Bruce ! > > > > > > > > > > Here thr output from smartctl -a /dev/hdd: > > > > > > > > > > smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen > > > > > Home page is http://smartmontools.sourceforge.net/ > > > > > > > > > > =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D > > > > > Device Model: IBM-DTLA-307045 > > > > > Serial Number: YMDYMHA0675 > > > > > Firmware Version: TX6OA50C > > > > > ATA Version is: 5 > > > > > ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > > > > > Local Time is: Fri Feb 21 16:09:14 2003 CET > > > > > SMART support is: Available - device has SMART capability. > > > > > SMART support is: Enabled > > > > > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > > > SMART overall-health self-assessment test result: PASSED > > > > > > > > > > General SMART Values: > > > > > Off-line data collection status: (0x02) Offline data collection > > > > > activity completed without error. Self-test execution status: = (=20 > > > > > 0) The previous self-test routine completed > > > > > without error or no self-= test > > > > > has ever been run. > > > > > Total time to complete off-line > > > > > data collection: (2294) seconds. > > > > > Offline data collection > > > > > capabilities: (0x1b) SMART execute Offline > > > > > immediate. Automatic timer ON/OFF support. Suspend Offline collec= tion > > > > > upon new command. > > > > > Offline surface scan > > > > > supported. Self-test supported. SMART capabilities: =20 > > > > > (0x0003) Saves SMART data before entering power-saving mode. > > > > > Supports SMART auto save > > > > > timer. Error logging capability: (0x01) Error logging > > > > > supported. Short self-test routine > > > > > recommended polling time: ( 2) minutes. > > > > > Extended self-test routine > > > > > recommended polling time: ( 28) minutes. > > > > > > > > > > SMART Attributes Data Structure revision number: 16 > > > > > Vendor Specific SMART Attributes with Thresholds: > > > > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > > > > > WHEN_FAILED RAW_VALUE > > > > > 1 Raw_Read_Error_Rate 0x000b 100 100 060 Pre-fail = =20 > > > > > - 1 2 Throughput_Performance 0x0005 132 132 050 Pre-fai= l =20 > > > > > - 340 > > > > > 3 Spin_Up_Time 0x0007 094 094 024 Pre-fail = =20 > > > > > - 25789530422 > > > > > 4 Start_Stop_Count 0x0012 100 100 000 Old_age = =20 > > > > > - 52 > > > > > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail = =20 > > > > > - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fai= l =20 > > > > > - 0 8 Seek_Time_Performance 0x0005 130 130 020 Pre-fa= il - > > > > > 34 > > > > > 9 Power_On_Hours 0x0012 100 100 000 Old_age = =20 > > > > > - 119 > > > > > 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail = =20 > > > > > - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_ag= e - > > > > > 52 > > > > > 192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age = =20 > > > > > - 52 > > > > > 193 Load_Cycle_Count 0x0012 100 100 050 Old_age = =20 > > > > > - 52 > > > > > 194 Temperature_Celsius 0x0002 171 171 000 Old_age = =20 > > > > > - 32 > > > > > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age = =20 > > > > > - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_a= ge - > > > > > 0 198 Offline_Uncorrectable 0x0008 100 100 000 =20 > > > > > Old_age - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 = =20 > > > > > 000 Old_age - 0 > > > > > > > > > > SMART Error Log Version: 1 > > > > > ATA Error Count: 2 > > > > > DCR =3D Device Control Register > > > > > FR =3D Features Register > > > > > SC =3D Sector Count Register > > > > > SN =3D Sector Number Register > > > > > CL =3D Cylinder Low Register > > > > > CH =3D Cylinder High Register > > > > > D/H =3D Device/Head Register > > > > > CR =3D Content written to Command Register > > > > > ER =3D Error register > > > > > STA =3D Status register > > > > > Timestamp is seconds since the previous disk power-on. > > > > > Note: timestamp "wraps" after 2^32 msec =3D 49.710 days. > > > > > > > > > > Error 2 occurred at disk power-on lifetime: 2 hours > > > > > When the command that caused the error occurred, the device was > > > > > active or idle. > > > > > After command completion occurred, registers were: > > > > > ER:10 SC:00 SN:4c CL:18 CH:f2 D/H:b2 ST:51 > > > > > Sequence of commands leading to the command that caused the error > > > > > were: DCR FR SC SN CL CH D/H CR Timestamp > > > > > 02 40 00 00 00 f2 f2 82 7302.400 > > > > > 02 40 00 00 c0 f1 f2 82 7302.000 > > > > > 02 40 00 00 80 f1 f2 82 7301.700 > > > > > 02 40 00 00 40 f1 f2 82 7301.300 > > > > > 02 40 00 00 00 f1 f2 82 7300.900 > > > > > > > > > > Error 1 occurred at disk power-on lifetime: 1 hours > > > > > When the command that caused the error occurred, the device was > > > > > active or idle. > > > > > After command completion occurred, registers were: > > > > > ER:10 SC:00 SN:00 CL:00 CH:00 D/H:b0 ST:51 > > > > > Sequence of commands leading to the command that caused the error > > > > > were: DCR FR SC SN CL CH D/H CR Timestamp > > > > > 02 11 00 00 00 00 b0 f7 5132.900 > > > > > 02 00 00 00 00 00 b0 f3 5132.900 > > > > > 02 00 00 00 00 00 b0 ec 5132.900 > > > > > 02 00 40 00 c3 00 f0 40 5132.800 > > > > > 02 00 40 c0 c2 00 f0 40 5132.800 > > > > > > > > > > SMART Self-test log, version number 1 > > > > > Num Test_Description Status Remaining > > > > > LifeTime(hours) LBA_of_first_error > > > > > # 1 Short off-line Completed 00% = 117 > > > > > - # 2 Short off-line Completed 00% = =20 > > > > > 92 - # 3 Short off-line Completed 00% = 89 =20 > > > > > - # 4 Extended off-line Completed 00= % 69 > > > > > - # 5 Short off-line Completed > > > > > 00% 2 - # 6 Short off-line Completed > > > > > 00% 0 - > > > > > > > > > > > > > > > So, may hard disk must be ok !?! > > > > > > > > The disk looks fine. The two entries in the ATA error log were at = 1 > > > > hours 25 minutes after the disk was first powered up, and 2 hours a= nd 2 > > > > minutes after the disk was first powered up. [Perhaps you were pla= ying > > > > the hdparm??]. The disk is now 119 hours old and hasn't shown any > > > > further errors. The time to worry is if the ATA error log starts > > > > showing hundreds or thousands of errors in the very recent past. > > > > > > > > And the self-tests all completed OK. > > > > > > > > > There is another strange value in the SMART Attribute. With anoth= er > > > > > disk (IBM) smartctl return a smaller value for Start_Stop_Count = than > > > > > Power_On_Hours. > > > > > > > > In itself, that's OK. Start_Stop_Count is the number of times that= the > > > > disk has spun up. For a machine that is turned on and off once per > > > > month, and has been running for a year, this would be 12. But > > > > Power_On_Hours would be one year: 8760. > > > > > > > > Note that Start_Stop_Count can also change when the disk sleeps. > > > > > > > > Now these results: > > > > > 4 Start_Stop_Count 0x0012 100 100 000 Old_age = - > > > > > 197 9 Power_On_Hours 0x0012 100 100 000 Old_age= - > > > > > 116 12 Power_Cycle_Count 0x0032 100 100 000 Old_= age=20 > > > > > - 197 192 Power-Off_Retract_Count 0x0032 100 100 050 Old= _age > > > > > - 197 193 Load_Cycle_Count 0x0012 100 100 050 =20 > > > > > Old_age - 197 > > > > > > > > look very strange. Is this a laptop disk? Please post the output = of > > > > smartctl -a for this disk. > > > > > > > > > My disk runs more than one hour per power_on. Can you explain me = this > > > > > value ? > > > > > > > > Is the disk sleeping (either a laptop disk or desktop machine that > > > > suspends a lot)? > > > > > > > > > Thanks a lot! > > > > > > > > You're welcome! > > > > > > > > Bruce > > > > > > > > > Ralf > > > > > > > > > > Am Freitag, 21. Februar 2003 05:39 schrieb Bruce Allen: > > > > > > Hi Ralf, > > > > > > > > > > > > On Thu, 20 Feb 2003, Ralf Panse wrote: > > > > > > > Hi all! > > > > > > > what does this output mean (output from ./smartctl -c /dev/hd= d > > > > > > > ) -> > > > > > > > > > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > > > > > SMART Error Log Version: 1 > > > > > > > ATA Error Count: 2 > > > > > > > ... > > > > > > > Error 2 occurred at disk power-on lifetime: 2 hours > > > > > > > When the command that caused the error occurred, the device w= as > > > > > > > active or idle. > > > > > > > After command completion occurred, registers were: > > > > > > > ER:10 SC:00 SN:4c CL:18 CH:f2 D/H:b2 ST:51 > > > > > > > Sequence of commands leading to the command that caused the e= rror > > > > > > > were: DCR FR SC SN CL CH D/H CR Timestamp > > > > > > > 02 40 00 00 00 f2 f2 82 7302.400 > > > > > > > 02 40 00 00 c0 f1 f2 82 7302.000 > > > > > > > 02 40 00 00 80 f1 f2 82 7301.700 > > > > > > > 02 40 00 00 40 f1 f2 82 7301.300 > > > > > > > 02 40 00 00 00 f1 f2 82 7300.900 > > > > > > > > > > > > > > Error 1 occurred at disk power-on lifetime: 1 hours > > > > > > > ... > > > > > > > > > > > > This is the ATA error log. The types of errors that it indicat= es > > > > > > are described in this document: > > > > > > http://www.t13.org/project/d1321r1c.pdf please see section > > > > > > 8.41.6.8.2.4 (Device Error Count) which starts on page 204. > > > > > > > > > > > > The output listed are the five commands leading up to the comma= nd > > > > > > that caused the error. The different columns refer to differen= t > > > > > > ATA registers. > > > > > > > > > > > > > What is wrong with my hard disk? > > > > > > > > > > > > Probably nothing. The last of there errors occured when the di= sk > > > > > > was just a few hours old (7300 seconds after it was first turne= d > > > > > > on). This may have been due to strange or incorrect hdparm (DM= A > > > > > > mode, etc) settings, a loose cable, or something else wrong wit= h > > > > > > the disk. > > > > > > > > > > > > Assuming that the disk is more than a few hours old, it's not b= een > > > > > > exhibiting the errors recently. If you want some more reassura= nce, > > > > > > post the output of smartctl -a, please. > > > > > > > > > > > > You might also want to run some extended self-tests and examine= the > > > > > > self-test log. You can do both of these things with smartctl. > > > > > > > > > > > > > The healtstatus-command of smartctl (./smartctl -H /dev/hdd) = says > > > > > > > ... > > > > > > > > > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > > > > > SMART overall-health self-assessment test result: PASSED > > > > > > > > > > > > > > And all S.M.A.R.T Attributes are ok. > > > > > > > > > > > > > > > > > > > > > Thanks ! > > > > > > > > > > > > You're welcome! > > > > > > > > > > > > Bruce > > > > > > > > > > ------------------------------------------------------- > > > > > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge= =2E > > > > > The most comprehensive and flexible code editor you can use. > > > > > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day > > > > > Trial. www.slickedit.com/sourceforge > > > > > _______________________________________________ > > > > > Smartmontools-support mailing list > > > > > Sma...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/smartmontools-suppor= t > > > > > > -- > > > Ralf Panse > > > Kirchhoff-Institut f=FCr Physik > > > > > > Tel: 06221 54 9811 > > > > > > > > > ------------------------------------------------------- > > > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. > > > The most comprehensive and flexible code editor you can use. > > > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial= =2E > > > www.slickedit.com/sourceforge > > > _______________________________________________ > > > Smartmontools-support mailing list > > > Sma...@li... > > > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. > > The most comprehensive and flexible code editor you can use. > > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. > > www.slickedit.com/sourceforge > > _______________________________________________ > > Smartmontools-support mailing list > > Sma...@li... > > https://lists.sourceforge.net/lists/listinfo/smartmontools-support >=20 > --=20 > Ralf Panse > Kirchhoff-Institut f=FCr Physik > Technische Informatik >=20 > Tel: +49 6221 54 9811 Im Neuenheimer Feld 227 > D-69120 Heidelberg, Germany > e-mail: pa...@ki... >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb:=20 > Dedicated Hosting for just $79/mo with 500 GB of bandwidth!=20 > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support >=20 |
From: Bruce A. <ba...@gr...> - 2003-04-01 04:23:42
|
Hi Steve, > >> Total time to complete off-line > >> data collection: ( 2) seconds. > > This looks too short (but might be right, I suppose, if you printed > > this > > output just as some data collection was finishing??) > > I've ran smartctl (what seems like) a hundred times on this drive, and > it's always that value. Also when I ran it on x86, it was the same > value as well. OK. I'm not sure, but I have a memory that on some disks that I've seen (IBM) this time is the time to check a single cylinder. Anyway if it's the same on x86 it's probably right. > >> 4 Start_Stop_Count 0x0032 100 100 008 Old_age > >> - > >> 444 > >> 9 Power_On_Hours 0x0012 100 100 001 Old_age > >> - > >> 210 > >> 12 Power_Cycle_Count 0x0032 100 100 008 Old_age > >> - > >> 417 > > Do these numbers look reasonable? You should be able to use hdparm -y > > to > > spin down and spin up the disk while the system is running and see the > > start stop count increment while the power cycle count stays fixed. > > [Is the large number of power cycles in 210 hours right??] > > This drive was in use 24/7 for 2 years, if that helps shed some light > on the numbers. It does. It looks as if something may be wrong with the raw value of Attribute 9. This may be due to the endian change. For fun, try using the -v N,raw8 option to smartctl, to see the individual byte. -v N,raw16 may also be useful. > to say. Using 'hdparm -y', the Start_Stop_Count value increments while Good! That sounds right. > > At this point, my only good answer is "kernel developers mailing list". > > The point being that obviously the other SMART calls that return 512 > > byte > > structures succeeded. The fact that this one failed (and especially if > > the UDMA error count is correlated) might be a sign of a kernel driver > > bug. I wouldn't be surprised since I don't think that there is any > > other > > standard linux code that uses this ioctl(). > > That's what I feared (kernel issue). As a sanity test, I'll run > smartmontools on x86 using the same kernel (2.1.24), and if time > permits will also test on a PowerMac running PPC/Linux with 2.1.24 > (this will require me finding a spare drive to install on, but I need > to do some benchmarking with PPC/Linux for another app so it's not a > huge inconvenience). I'll report the status back here, and if the > error occurs across all three, I'll take this issue up on a kernel dev > list. This sounds like a good plan. > This will hopefully isolate any errors specific to the hardware > I'm currently using. If it does end up being something specific to the > hardware, I will try to take it up with the manufacturer (who most > likely will not be responsive to this issue). I doubt it's specific to the hardware - Quantum has been pretty involved in SMART for quite some time, and I suspect their implementation in firmware is pretty solid. > > [PS: I remember the first time I saw a lisa. I was a postdoc, > > visiting a > > friend in Austin TX around 1984-5. He very proudly showed me his lisa, > > one > > of the first ones out, on which he had blown at least 5 grand.] > > That's very neat. Not to drift terribly off topic, but do you mind if > I ask what he used it for (provided you still remember)? Code development. He was a graduate student working on gravitational physics. > It's so very rare to encounter somebody that either owned one or had a > friend that owned one, I'm always interested in hearing it's original > intended use. And FWIW, I was probably equally as proud as your friend > when I finally got a Lisa in 2000. But that's just due to the > Apple-geek blood that pumps through my veins. One of my colleauges gets his jollies by running dusty versions of BSD on his PDP-11. No kidding. Bruce |
From: Bruce A. <ba...@gr...> - 2003-04-01 04:03:32
|
Hi Christiaan, I'm sorry to hear about these problems. Doug Gilbert is the smartmontools developer who knows the most about the SCSI code. He has been doing some development and testing with the 2.5 kernel series, and should be able to help you. So I'll let him respond. Cheers, Bruce On Tue, 1 Apr 2003, Christiaan Willemsen wrote: > Hi, > > I just looked the smartmontools. I used it to read some SMART values of my new SCSI disks (Maxtor Athlas 10k4) on an Adapted AIC-7902. It seemed to work very well, I could read data about the drive temperature and some other data. But the problem was the the program seems to generate errors on the scsi chanels, so that the transfer rate of the disks goed down. > > normaly this is the picture: > > cat /proc/scsi/aic79xx/0 > Adaptec AIC79xx driver version: 1.3.0 > aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs > > Serial EEPROM: > 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 > 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 > 0x09f4 0x0146 0x2807 0x0010 0xffff 0xffff 0xffff 0xffff > 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x0400 0xb3c7 > > Channel A Target 0 Negotiation Settings > User: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > ... > Channel A Target 5 Negotiation Settings > User: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > Goal: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > Curr: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > Transmission Errors 0 > Channel A Target 5 Lun 0 Settings > Commands Queued 5405 > Commands Active 0 > Command Openings 32 > Max Tagged Openings 32 > Device Queue Frozen Count 0 > Channel A Target 6 Negotiation Settings > User: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > Goal: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > Curr: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) > Transmission Errors 0 > Channel A Target 6 Lun 0 Settings > Commands Queued 4991 > Commands Active 0 > Command Openings 32 > Max Tagged Openings 32 > Device Queue Frozen Count 0 > ... > > But when I run smartctrl ones, The transmission error count goes up and the transfer rates go down even down to 9 MB/s! Also the system stalls and appears to have damaged files. After a reboot, these files seem to be oke again... > > I'm running Gentoo linux, kernel 2.5.66-SMP (might have something to do with that), aic-79xx version 1.3.0 (standard kernel version). > > Hope this will help you determine the problem, > > Greetz, > > Christiaan > > > ------------------------------------------------------- > This SF.net email is sponsored by: ValueWeb: > Dedicated Hosting for just $79/mo with 500 GB of bandwidth! > No other company gives more support or power for your dedicated server > http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Christiaan W. <chr...@fw...> - 2003-03-31 22:39:07
|
Hi, I just looked the smartmontools. I used it to read some SMART values of my new SCSI disks (Maxtor Athlas 10k4) on an Adapted AIC-7902. It seemed to work very well, I could read data about the drive temperature and some other data. But the problem was the the program seems to generate errors on the scsi chanels, so that the transfer rate of the disks goed down. normaly this is the picture: cat /proc/scsi/aic79xx/0 Adaptec AIC79xx driver version: 1.3.0 aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs Serial EEPROM: 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x09f4 0x0146 0x2807 0x0010 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x0400 0xb3c7 Channel A Target 0 Negotiation Settings User: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) ... Channel A Target 5 Negotiation Settings User: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Goal: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Curr: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Transmission Errors 0 Channel A Target 5 Lun 0 Settings Commands Queued 5405 Commands Active 0 Command Openings 32 Max Tagged Openings 32 Device Queue Frozen Count 0 Channel A Target 6 Negotiation Settings User: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Goal: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Curr: 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) Transmission Errors 0 Channel A Target 6 Lun 0 Settings Commands Queued 4991 Commands Active 0 Command Openings 32 Max Tagged Openings 32 Device Queue Frozen Count 0 ... But when I run smartctrl ones, The transmission error count goes up and the transfer rates go down even down to 9 MB/s! Also the system stalls and appears to have damaged files. After a reboot, these files seem to be oke again... I'm running Gentoo linux, kernel 2.5.66-SMP (might have something to do with that), aic-79xx version 1.3.0 (standard kernel version). Hope this will help you determine the problem, Greetz, Christiaan |
From: Steve W. <sw...@ar...> - 2003-03-31 21:40:39
|
Hi Bruce, > I'm VERY excited to hear this. I have been wondering if someone would > try > to get smartmontools working on big-endian (I only have x86 little > endian > boxes). Glad to hear this work will (hopefully) be beneficial. > Is the kernel version number you gave above accurate (2.1.24?). Are > you > using similar/identical kernel versions in your comparison testing? Yes, it is 2.1.24. On the x86 side, I've used a very similar kernel (from the 2.2 series--don't recall the exact version as I have a few 2.2 kernels unpacked/built at the moment). The HDIO_DRIVE_CMD ioctl is identical between the 2.2.x and 2.1.24 kernels I've used, and the ide_do_drive_cmd() function looks similar--but I haven't done an in-depth comparison to note of any specific differences. I am working on building a 2.1.24 kernel on x86 right now to test with, and will report back once that is complete. >> Local Time is: Sat Mar 29 21:14:45 2003 localtime > Interesting timezone. Is this right? See utility.c for the relevant > bits > of code referencing tzname[]. Also man tzset. On this particular system, it is correct. Presently I'm running this on a very scaled down Linux system (entire filesystem is only a few megs). I would assume since the system lacks zoneinfo, libc is defaulting to the string 'localtime' for the timezone (as evident by GNU C Library v1.96, file time/tzfile.h). If I set the TZ environment variable, it does properly affect the output: $ TZ="MST" /mnt/smartctl -a /dev/hdb ..snip.. Local Time is: Mon Mar 31 20:05:38 2003 MST ..snip.. So it's certainly no error of smartmontools. Just another wonderful artifact of this slightly demented box. >> Total time to complete off-line >> data collection: ( 2) seconds. > This looks too short (but might be right, I suppose, if you printed > this > output just as some data collection was finishing??) I've ran smartctl (what seems like) a hundred times on this drive, and it's always that value. Also when I ran it on x86, it was the same value as well. >> 4 Start_Stop_Count 0x0032 100 100 008 Old_age >> - >> 444 >> 9 Power_On_Hours 0x0012 100 100 001 Old_age >> - >> 210 >> 12 Power_Cycle_Count 0x0032 100 100 008 Old_age >> - >> 417 > Do these numbers look reasonable? You should be able to use hdparm -y > to > spin down and spin up the disk while the system is running and see the > start stop count increment while the power cycle count stays fixed. > [Is the large number of power cycles in 210 hours right??] This drive was in use 24/7 for 2 years, if that helps shed some light on the numbers. The Power_Cycle_Count seems high to me, but it's hard to say. Using 'hdparm -y', the Start_Stop_Count value increments while the other two values stay static, as you had indicated. >> 199 UDMA_CRC_Error_Count 0x001a 196 196 000 Old_age - >> 10 > Does this count increment each time you get an ide error like the one > above? No, it does not increment. > At this point, my only good answer is "kernel developers mailing list". > The point being that obviously the other SMART calls that return 512 > byte > structures succeeded. The fact that this one failed (and especially if > the UDMA error count is correlated) might be a sign of a kernel driver > bug. I wouldn't be surprised since I don't think that there is any > other > standard linux code that uses this ioctl(). That's what I feared (kernel issue). As a sanity test, I'll run smartmontools on x86 using the same kernel (2.1.24), and if time permits will also test on a PowerMac running PPC/Linux with 2.1.24 (this will require me finding a spare drive to install on, but I need to do some benchmarking with PPC/Linux for another app so it's not a huge inconvenience). I'll report the status back here, and if the error occurs across all three, I'll take this issue up on a kernel dev list. This will hopefully isolate any errors specific to the hardware I'm currently using. If it does end up being something specific to the hardware, I will try to take it up with the manufacturer (who most likely will not be responsive to this issue). > Steve, would you like to join the group of smartmontools developers so > that you can integrate your changes into the body of the code? If so, > let > me know if you have a sourceforge user name, and if you are familar > with > CVS. If not, I'll help you get started. I'll need to create an account. I'll do so here shortly, and send you the details in a separate message. I am familiar with CVS, and while not my favorite, I should be able to handle it. I do appreciate the offer for help, and of course may have to take you up on it. > [PS: I remember the first time I saw a lisa. I was a postdoc, > visiting a > friend in Austin TX around 1984-5. He very proudly showed me his lisa, > one > of the first ones out, on which he had blown at least 5 grand.] That's very neat. Not to drift terribly off topic, but do you mind if I ask what he used it for (provided you still remember)? It's so very rare to encounter somebody that either owned one or had a friend that owned one, I'm always interested in hearing it's original intended use. And FWIW, I was probably equally as proud as your friend when I finally got a Lisa in 2000. But that's just due to the Apple-geek blood that pumps through my veins. Cheers, Steve |
From: Bruce A. <ba...@gr...> - 2003-03-31 14:48:16
|
Ralf, I have a request. > > But at some IBM hard disk the temperatur never change his value > > although they are the same model ( Device Model: IBM-DTLA-307045 > > Firmware Version: TX6OA50C ). I send you a picture which shows the > > temperature against time. > > A number of these model IBM disks had defective SMART firmware. See > http://smartmontools.sourceforge.net/#FAQ Could you please check if the > disks have different firmware (smartctl -i will tell you). If so, IBM > provides a utility that you can use to upgrade the firmware to the latest > version. The latest release (5.1-9) of smartmontools should warn you about the possible need to upgrade the disk firmware on these IBM disks. Could you please confirm if this works for you -- and perhaps send a copy of the warning message that gets output? I wrote this code but haven't been able to test it. Cheers, Bruce |
From: Bruce A. <ba...@gr...> - 2003-03-31 14:15:18
|
Hi Ralf, > I send you the picture, because i think it does not belong to the > smarttools mailinglist. Thank you. In fact I'd be happy to see this posted on the mailing list -- if you could resend it, it might be helpful to others. > I have test smarttools on serveral hard disks. I have logged the > temperature every 5 min. First i measured the temperature when the > hard disk was idle and than on heavy hard disk activity. This cause a > heating and the temperature raiesd. Alles klar! > But at some IBM hard disk the temperatur never change his value > although they are the same model ( Device Model: IBM-DTLA-307045 > Firmware Version: TX6OA50C ). I send you a picture which shows the > temperature against time. A number of these model IBM disks had defective SMART firmware. See http://smartmontools.sourceforge.net/#FAQ Could you please check if the disks have different firmware (smartctl -i will tell you). If so, IBM provides a utility that you can use to upgrade the firmware to the latest version. If not, please write back to the mailing list and I'll try and think of something else! Cheers, Bruce |
From: Ralf P. <pa...@ki...> - 2003-03-31 11:09:48
|
Hi Bruce, i got a firmware directly from IBM. But the IBM Firmware Installer told m= e=20 that no update is needed.=20 So the problem is still the same and another strange value with the same = disk=20 is printed out. The temperature is raising from 39=B0C up to 49 =B0C by h= eavy=20 activity. The max. temperature of the other older IBM hard disk's i have= ,=20 are only 37 ^C.=20 What is wrong ?=20 Ralf Am Freitag, 21. Februar 2003 19:41 schrieb Bruce Allen: > Hi Ralf, > > OK, I think I see what's happening. I think that you may have an IBM di= sk > that has defective SMART firmware. > > Go to this page: > > http://www.geocities.com/dtla_update/ > > and follow one of the "Related links" for the 60GXP disk. This points = to > revised IBM firmware that should fix the problem. > > [You will find a link to this page on the smartmontools web page under = the > FAQs]. > > Please let me know if this helps! > > Cheers, > =09Bruce > > On Fri, 21 Feb 2003, Ralf Panse wrote: > > Hi Bruce > > > > Here the output from my strange disk with the low Power_On value. It'= s > > not a laptop disk and the pc is turned off every evening. > > > > > > smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen > > Home page is http://smartmontools.sourceforge.net/ > > > > =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D > > Device Model: IC35L040AVER07-0 > > Serial Number: SXPTX393636 > > Firmware Version: ER4OA46A > > ATA Version is: 5 > > ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > > Local Time is: Fri Feb 21 17:01:37 2003 CET > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > SMART overall-health self-assessment test result: PASSED > > > > General SMART Values: > > Off-line data collection status: (0x00) Offline data collection activ= ity > > was never started. > > Self-test execution status: ( 0) The previous self-test routin= e > > completed > > without error or no self-test= has > > ever been run. > > Total time to complete off-line > > data collection: (1383) seconds. > > Offline data collection > > capabilities: (0x1b) SMART execute Offline immedia= te. > > Automatic timer ON/OFF suppor= t. > > Suspend Offline collection up= on > > new command. > > Offline surface scan supporte= d. > > Self-test supported. > > SMART capabilities: (0x0003) Saves SMART data before enter= ing > > power-saving mode. > > Supports SMART auto save time= r. > > Error logging capability: (0x01) Error logging supported. > > Short self-test routine > > recommended polling time: ( 1) minutes. > > Extended self-test routine > > recommended polling time: ( 23) minutes. > > > > SMART Attributes Data Structure revision number: 16 > > Vendor Specific SMART Attributes with Thresholds: > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE =20 > > WHEN_FAILED RAW_VALUE > > 1 Raw_Read_Error_Rate 0x000b 095 095 060 Pre-fail = - > > 131084 > > 2 Throughput_Performance 0x0005 100 100 050 Pre-fail = - =20 > > 0 3 Spin_Up_Time 0x0007 104 104 024 Pre-fail = - > > 17193697490 > > 4 Start_Stop_Count 0x0012 100 100 000 Old_age = - > > 197 > > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail = - =20 > > 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail = - > > 0 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fai= l =20 > > - 0 9 Power_On_Hours 0x0012 100 100 000 Old_= age=20 > > - 117 > > 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail = - =20 > > 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age = =20 > > - 197 > > 192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age = - > > 197 > > 193 Load_Cycle_Count 0x0012 100 100 050 Old_age = - > > 197 > > 194 Temperature_Celsius 0x0002 141 141 000 Old_age = - > > 39 (Lifetime Min/Max 19/47) > > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age = - =20 > > 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age = =20 > > - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old= _age > > - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 = =20 > > Old_age - 0 > > > > SMART Error Log Version: 1 > > No Errors Logged > > > > SMART Self-test log, version number 1 > > Num Test_Description Status Remaining=20 > > LifeTime(hours) LBA_of_first_error > > # 1 Short off-line Completed 00% 115 = =20 > > - # 2 Short off-line Completed 00% = 104 > > - # 3 Short off-line Completed 00% = =20 > > 104 - # 4 Short off-line Completed = 00% > > 104 - # 5 Short off-line Completed = =20 > > 00% 103 - # 6 Short off-line Completed = =20 > > 00% 103 - # 7 Short off-line Completed = =20 > > 00% 103 - # 8 Short off-line Completed= =20 > > 00% 102 - # 9 Extended off-line =20 > > Completed 00% 101 - #10 Short off-= line > > Completed 00% 99 - #11 Short > > off-line Completed 00% 99 - #= 12=20 > > Short off-line Completed 00% 99 = - > > #13 Short off-line Completed 00% 99 = =20 > > - #14 Short off-line Completed 00% = 99 > > - #15 Short off-line Completed 00% = =20 > > 99 - #16 Short off-line Completed = 00% > > 99 - #17 Short off-line Completed = =20 > > 00% 97 - #18 Short off-line Completed = =20 > > 00% 97 - #19 Short off-line Completed = =20 > > 00% 97 - #20 Short off-line Completed= =20 > > 00% 97 - #21 Short off-line =20 > > Completed 00% 97 - > > > > > > > > Thanks > > Ralf > > > > Am Freitag, 21. Februar 2003 16:58 schrieben Sie: > > > Hi Ralf, > > > > > > (My comments are inserted below) > > > > > > On Fri, 21 Feb 2003, Ralf Panse wrote: > > > > Hi Bruce ! > > > > > > > > Here thr output from smartctl -a /dev/hdd: > > > > > > > > smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen > > > > Home page is http://smartmontools.sourceforge.net/ > > > > > > > > =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D > > > > Device Model: IBM-DTLA-307045 > > > > Serial Number: YMDYMHA0675 > > > > Firmware Version: TX6OA50C > > > > ATA Version is: 5 > > > > ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > > > > Local Time is: Fri Feb 21 16:09:14 2003 CET > > > > SMART support is: Available - device has SMART capability. > > > > SMART support is: Enabled > > > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > > SMART overall-health self-assessment test result: PASSED > > > > > > > > General SMART Values: > > > > Off-line data collection status: (0x02) Offline data collection > > > > activity completed without error. Self-test execution status: = (=20 > > > > 0) The previous self-test routine completed > > > > without error or no self-= test > > > > has ever been run. > > > > Total time to complete off-line > > > > data collection: (2294) seconds. > > > > Offline data collection > > > > capabilities: (0x1b) SMART execute Offline > > > > immediate. Automatic timer ON/OFF support. Suspend Offline collec= tion > > > > upon new command. > > > > Offline surface scan > > > > supported. Self-test supported. SMART capabilities: =20 > > > > (0x0003) Saves SMART data before entering power-saving mode. > > > > Supports SMART auto save > > > > timer. Error logging capability: (0x01) Error logging > > > > supported. Short self-test routine > > > > recommended polling time: ( 2) minutes. > > > > Extended self-test routine > > > > recommended polling time: ( 28) minutes. > > > > > > > > SMART Attributes Data Structure revision number: 16 > > > > Vendor Specific SMART Attributes with Thresholds: > > > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > > > > WHEN_FAILED RAW_VALUE > > > > 1 Raw_Read_Error_Rate 0x000b 100 100 060 Pre-fail = =20 > > > > - 1 2 Throughput_Performance 0x0005 132 132 050 Pre-fai= l =20 > > > > - 340 > > > > 3 Spin_Up_Time 0x0007 094 094 024 Pre-fail = =20 > > > > - 25789530422 > > > > 4 Start_Stop_Count 0x0012 100 100 000 Old_age = =20 > > > > - 52 > > > > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail = =20 > > > > - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fai= l =20 > > > > - 0 8 Seek_Time_Performance 0x0005 130 130 020 Pre-fa= il - > > > > 34 > > > > 9 Power_On_Hours 0x0012 100 100 000 Old_age = =20 > > > > - 119 > > > > 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail = =20 > > > > - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_ag= e - > > > > 52 > > > > 192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age = =20 > > > > - 52 > > > > 193 Load_Cycle_Count 0x0012 100 100 050 Old_age = =20 > > > > - 52 > > > > 194 Temperature_Celsius 0x0002 171 171 000 Old_age = =20 > > > > - 32 > > > > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age = =20 > > > > - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_a= ge - > > > > 0 198 Offline_Uncorrectable 0x0008 100 100 000 =20 > > > > Old_age - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 = =20 > > > > 000 Old_age - 0 > > > > > > > > SMART Error Log Version: 1 > > > > ATA Error Count: 2 > > > > DCR =3D Device Control Register > > > > FR =3D Features Register > > > > SC =3D Sector Count Register > > > > SN =3D Sector Number Register > > > > CL =3D Cylinder Low Register > > > > CH =3D Cylinder High Register > > > > D/H =3D Device/Head Register > > > > CR =3D Content written to Command Register > > > > ER =3D Error register > > > > STA =3D Status register > > > > Timestamp is seconds since the previous disk power-on. > > > > Note: timestamp "wraps" after 2^32 msec =3D 49.710 days. > > > > > > > > Error 2 occurred at disk power-on lifetime: 2 hours > > > > When the command that caused the error occurred, the device was > > > > active or idle. > > > > After command completion occurred, registers were: > > > > ER:10 SC:00 SN:4c CL:18 CH:f2 D/H:b2 ST:51 > > > > Sequence of commands leading to the command that caused the error > > > > were: DCR FR SC SN CL CH D/H CR Timestamp > > > > 02 40 00 00 00 f2 f2 82 7302.400 > > > > 02 40 00 00 c0 f1 f2 82 7302.000 > > > > 02 40 00 00 80 f1 f2 82 7301.700 > > > > 02 40 00 00 40 f1 f2 82 7301.300 > > > > 02 40 00 00 00 f1 f2 82 7300.900 > > > > > > > > Error 1 occurred at disk power-on lifetime: 1 hours > > > > When the command that caused the error occurred, the device was > > > > active or idle. > > > > After command completion occurred, registers were: > > > > ER:10 SC:00 SN:00 CL:00 CH:00 D/H:b0 ST:51 > > > > Sequence of commands leading to the command that caused the error > > > > were: DCR FR SC SN CL CH D/H CR Timestamp > > > > 02 11 00 00 00 00 b0 f7 5132.900 > > > > 02 00 00 00 00 00 b0 f3 5132.900 > > > > 02 00 00 00 00 00 b0 ec 5132.900 > > > > 02 00 40 00 c3 00 f0 40 5132.800 > > > > 02 00 40 c0 c2 00 f0 40 5132.800 > > > > > > > > SMART Self-test log, version number 1 > > > > Num Test_Description Status Remaining > > > > LifeTime(hours) LBA_of_first_error > > > > # 1 Short off-line Completed 00% = 117 > > > > - # 2 Short off-line Completed 00% = =20 > > > > 92 - # 3 Short off-line Completed 00% = 89 =20 > > > > - # 4 Extended off-line Completed 00= % 69 > > > > - # 5 Short off-line Completed > > > > 00% 2 - # 6 Short off-line Completed > > > > 00% 0 - > > > > > > > > > > > > So, may hard disk must be ok !?! > > > > > > The disk looks fine. The two entries in the ATA error log were at = 1 > > > hours 25 minutes after the disk was first powered up, and 2 hours a= nd 2 > > > minutes after the disk was first powered up. [Perhaps you were pla= ying > > > the hdparm??]. The disk is now 119 hours old and hasn't shown any > > > further errors. The time to worry is if the ATA error log starts > > > showing hundreds or thousands of errors in the very recent past. > > > > > > And the self-tests all completed OK. > > > > > > > There is another strange value in the SMART Attribute. With anoth= er > > > > disk (IBM) smartctl return a smaller value for Start_Stop_Count = than > > > > Power_On_Hours. > > > > > > In itself, that's OK. Start_Stop_Count is the number of times that= the > > > disk has spun up. For a machine that is turned on and off once per > > > month, and has been running for a year, this would be 12. But > > > Power_On_Hours would be one year: 8760. > > > > > > Note that Start_Stop_Count can also change when the disk sleeps. > > > > > > Now these results: > > > > 4 Start_Stop_Count 0x0012 100 100 000 Old_age = - > > > > 197 9 Power_On_Hours 0x0012 100 100 000 Old_age= - > > > > 116 12 Power_Cycle_Count 0x0032 100 100 000 Old_= age=20 > > > > - 197 192 Power-Off_Retract_Count 0x0032 100 100 050 Old= _age > > > > - 197 193 Load_Cycle_Count 0x0012 100 100 050 =20 > > > > Old_age - 197 > > > > > > look very strange. Is this a laptop disk? Please post the output = of > > > smartctl -a for this disk. > > > > > > > My disk runs more than one hour per power_on. Can you explain me = this > > > > value ? > > > > > > Is the disk sleeping (either a laptop disk or desktop machine that > > > suspends a lot)? > > > > > > > Thanks a lot! > > > > > > You're welcome! > > > > > > Bruce > > > > > > > Ralf > > > > > > > > Am Freitag, 21. Februar 2003 05:39 schrieb Bruce Allen: > > > > > Hi Ralf, > > > > > > > > > > On Thu, 20 Feb 2003, Ralf Panse wrote: > > > > > > Hi all! > > > > > > what does this output mean (output from ./smartctl -c /dev/hd= d > > > > > > ) -> > > > > > > > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > > > > SMART Error Log Version: 1 > > > > > > ATA Error Count: 2 > > > > > > ... > > > > > > Error 2 occurred at disk power-on lifetime: 2 hours > > > > > > When the command that caused the error occurred, the device w= as > > > > > > active or idle. > > > > > > After command completion occurred, registers were: > > > > > > ER:10 SC:00 SN:4c CL:18 CH:f2 D/H:b2 ST:51 > > > > > > Sequence of commands leading to the command that caused the e= rror > > > > > > were: DCR FR SC SN CL CH D/H CR Timestamp > > > > > > 02 40 00 00 00 f2 f2 82 7302.400 > > > > > > 02 40 00 00 c0 f1 f2 82 7302.000 > > > > > > 02 40 00 00 80 f1 f2 82 7301.700 > > > > > > 02 40 00 00 40 f1 f2 82 7301.300 > > > > > > 02 40 00 00 00 f1 f2 82 7300.900 > > > > > > > > > > > > Error 1 occurred at disk power-on lifetime: 1 hours > > > > > > ... > > > > > > > > > > This is the ATA error log. The types of errors that it indicat= es > > > > > are described in this document: > > > > > http://www.t13.org/project/d1321r1c.pdf please see section > > > > > 8.41.6.8.2.4 (Device Error Count) which starts on page 204. > > > > > > > > > > The output listed are the five commands leading up to the comma= nd > > > > > that caused the error. The different columns refer to differen= t > > > > > ATA registers. > > > > > > > > > > > What is wrong with my hard disk? > > > > > > > > > > Probably nothing. The last of there errors occured when the di= sk > > > > > was just a few hours old (7300 seconds after it was first turne= d > > > > > on). This may have been due to strange or incorrect hdparm (DM= A > > > > > mode, etc) settings, a loose cable, or something else wrong wit= h > > > > > the disk. > > > > > > > > > > Assuming that the disk is more than a few hours old, it's not b= een > > > > > exhibiting the errors recently. If you want some more reassura= nce, > > > > > post the output of smartctl -a, please. > > > > > > > > > > You might also want to run some extended self-tests and examine= the > > > > > self-test log. You can do both of these things with smartctl. > > > > > > > > > > > The healtstatus-command of smartctl (./smartctl -H /dev/hdd) = says > > > > > > ... > > > > > > > > > > > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > > > > > > SMART overall-health self-assessment test result: PASSED > > > > > > > > > > > > And all S.M.A.R.T Attributes are ok. > > > > > > > > > > > > > > > > > > Thanks ! > > > > > > > > > > You're welcome! > > > > > > > > > > Bruce > > > > > > > > ------------------------------------------------------- > > > > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge= =2E > > > > The most comprehensive and flexible code editor you can use. > > > > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day > > > > Trial. www.slickedit.com/sourceforge > > > > _______________________________________________ > > > > Smartmontools-support mailing list > > > > Sma...@li... > > > > https://lists.sourceforge.net/lists/listinfo/smartmontools-suppor= t > > > > -- > > Ralf Panse > > Kirchhoff-Institut f=FCr Physik > > > > Tel: 06221 54 9811 > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. > > The most comprehensive and flexible code editor you can use. > > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial= =2E > > www.slickedit.com/sourceforge > > _______________________________________________ > > Smartmontools-support mailing list > > Sma...@li... > > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > ------------------------------------------------------- > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. > The most comprehensive and flexible code editor you can use. > Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. > www.slickedit.com/sourceforge > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support --=20 Ralf Panse Kirchhoff-Institut f=FCr Physik Technische Informatik Tel: +49 6221 54 9811 Im Neuenheimer Feld 227 D-69120 Heidelberg, Germany e-mail: pa...@ki... |
From: Bruce A. <ba...@gr...> - 2003-03-30 03:38:29
|
Hi Steve, > I'm using smartmontools-5.1-9 on PowerPC running Linux 2.1.24 > (smartmontools required a few changes to deal with endianness issues, > which I'll gladly post the patches after getting this issue fixed). I'm VERY excited to hear this. I have been wondering if someone would try to get smartmontools working on big-endian (I only have x86 little endian boxes). It would be wonderful to get the code to work correctly on both, and also eventually to make sure it's 64-bit clean. It sounds as if you are already halfway there. > Everything is working fine up until it tries to read the > Error&Self-Test logs, which causes an I/O error. In /var/log/kernel, > these two lines appear: > > Mar 29 21:14:46 (none) kernel: hdb: drive_cmd: status=0x51 { DriveReady > SeekComplete Error } > Mar 29 21:14:46 (none) kernel: hdb: drive_cmd: error=0x10 { > SectorIdNotFound }, secCnt=6, LBAsect=12734249 > > On Linux/x86 this works without issue on this exact drive, so I know > the drive is not the culprit. Is the kernel version number you gave above accurate (2.1.24?). Are you using similar/identical kernel versions in your comparison testing? > I don't see how the changes I made to accommodate big endian would be > the cause, as they just byte-swap the data before sending it back, and > this error is happening at the ioctl in ataReadErrorLog() and > ataReadSelfTestLog() prior to where I would byte-swap the data. i.e. > in atacmds.c, my modifications focus mainly on these lines: > > memcpy(data,buf+HDIO_DRIVE_CMD_HDR_SIZE,ATA_SMART_SEC_SIZE); Let's see... I think what you are doing makes sense. After all before doing the ioctl you have for example: unsigned char buf[HDIO_DRIVE_CMD_HDR_SIZE+ATA_SMART_SEC_SIZE] = {WIN_SMART, 0x06, SMART_READ_LOG_SECTOR, 1,}; and since these first four quantities are ALL single bytes, no byte-swapping is needed. Then, as you say, just byte-swap the return structure. So it sounds to me like you are doing "the right thing". > I've checked the homepage and scanned through the support archives, and > have not seen this same issue. Admittedly I didn't read every message > in the archive, so I certainly apologize if I overlooked where this had > been covered before. No apology necessary. It's not been covered before. The only person I know who has worked on a big-endian architecture is Peter Cassidy who is doing a Darwin port. But he's using the Darwin native SMART commands, not a straight linux kernel. So I think the byte swapping is done for him already. But let's see if Peter has anything to add. I am copying the developers list (see mail header) so he should get this too. > > And for what it's worth, here is the output from smartctl: And for what it's worth, here are my comments (:-;) > ATA Version is: 5 > ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > Local Time is: Sat Mar 29 21:14:45 2003 localtime Interesting timezone. Is this right? See utility.c for the relevant bits of code referencing tzname[]. Also man tzset. > Total time to complete off-line > data collection: ( 2) seconds. This looks too short (but might be right, I suppose, if you printed this output just as some data collection was finishing??) > 4 Start_Stop_Count 0x0032 100 100 008 Old_age - > 444 > 9 Power_On_Hours 0x0012 100 100 001 Old_age - > 210 > 12 Power_Cycle_Count 0x0032 100 100 008 Old_age - > 417 Do these numbers look reasonable? You should be able to use hdparm -y to spin down and spin up the disk while the system is running and see the start stop count increment while the power cycle count stays fixed. [Is the large number of power cycles in 210 hours right??] > 199 UDMA_CRC_Error_Count 0x001a 196 196 000 Old_age - > 10 Does this count increment each time you get an ide error like the one above? > Error SMART Error Log Read failed: Input/output error > Smartctl: SMART Errorlog Read Failed > Error SMART Error Self-Test Log Read failed: Input/output error > Smartctl: SMART Self Test Log Read Failed OK -- just as you described. At this point, my only good answer is "kernel developers mailing list". The point being that obviously the other SMART calls that return 512 byte structures succeeded. The fact that this one failed (and especially if the UDMA error count is correlated) might be a sign of a kernel driver bug. I wouldn't be surprised since I don't think that there is any other standard linux code that uses this ioctl(). Steve, would you like to join the group of smartmontools developers so that you can integrate your changes into the body of the code? If so, let me know if you have a sourceforge user name, and if you are familar with CVS. If not, I'll help you get started. Cheers, Bruce [PS: I remember the first time I saw a lisa. I was a postdoc, visiting a friend in Austin TX around 1984-5. He very proudly showed me his lisa, one of the first ones out, on which he had blown at least 5 grand.] |
From: Steve W. <sw...@ar...> - 2003-03-29 21:32:10
|
I'm using smartmontools-5.1-9 on PowerPC running Linux 2.1.24 (smartmontools required a few changes to deal with endianness issues, which I'll gladly post the patches after getting this issue fixed). Everything is working fine up until it tries to read the Error&Self-Test logs, which causes an I/O error. In /var/log/kernel, these two lines appear: Mar 29 21:14:46 (none) kernel: hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error } Mar 29 21:14:46 (none) kernel: hdb: drive_cmd: error=0x10 { SectorIdNotFound }, secCnt=6, LBAsect=12734249 On Linux/x86 this works without issue on this exact drive, so I know the drive is not the culprit. I don't see how the changes I made to accommodate big endian would be the cause, as they just byte-swap the data before sending it back, and this error is happening at the ioctl in ataReadErrorLog() and ataReadSelfTestLog() prior to where I would byte-swap the data. i.e. in atacmds.c, my modifications focus mainly on these lines: memcpy(data,buf+HDIO_DRIVE_CMD_HDR_SIZE,ATA_SMART_SEC_SIZE); I've checked the homepage and scanned through the support archives, and have not seen this same issue. Admittedly I didn't read every message in the archive, so I certainly apologize if I overlooked where this had been covered before. And for what it's worth, here is the output from smartctl: $ /mnt/smartctl -a /dev/hdb smartctl version 5.1-9 Copyright (C) 2002-3 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: QUANTUM FIREBALLlct15 22 Serial Number: 313019116552 Firmware Version: A01.0F00 ATA Version is: 5 ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 Local Time is: Sat Mar 29 21:14:45 2003 localtime SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Off-line data collection status: (0x00) Offline data collection activity was never started. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete off-line data collection: ( 2) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Automatic timer ON/OFF support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 23) minutes. SMART Attributes Data Structure revision number: 11 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0029 100 253 020 Old_age - 0 3 Spin_Up_Time 0x0027 073 070 020 Old_age - 3476 4 Start_Stop_Count 0x0032 100 100 008 Old_age - 444 5 Reallocated_Sector_Ct 0x0033 100 100 020 Old_age - 0 7 Seek_Error_Rate 0x000b 100 100 023 Old_age - 0 9 Power_On_Hours 0x0012 100 100 001 Old_age - 210 10 Spin_Retry_Count 0x0026 100 100 000 Old_age - 0 11 Calibration_Retry_Count 0x0013 100 100 020 Old_age - 0 12 Power_Cycle_Count 0x0032 100 100 008 Old_age - 417 13 Read_Soft_Error_Rate 0x000b 100 100 023 Old_age - 0 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age - 2061 196 Reallocated_Event_Count 0x0010 100 253 020 Old_age - 0 197 Current_Pending_Sector 0x0032 100 100 020 Old_age - 0 198 Offline_Uncorrectable 0x0010 100 253 000 Old_age - 0 199 UDMA_CRC_Error_Count 0x001a 196 196 000 Old_age - 10 Error SMART Error Log Read failed: Input/output error Smartctl: SMART Errorlog Read Failed Error SMART Error Self-Test Log Read failed: Input/output error Smartctl: SMART Self Test Log Read Failed $ Cheers, Steve |