From: <0...@pe...> - 2002-10-29 01:20:20
|
Bruce Allen wrote: >> Hi. With smartd from smartsuite 2.1 I got the following each >> 30 minutes: >> >> Device: /dev/hda, S.M.A.R.T. Attribute: 1 Changed -66 >> >> With smartd from smartmontools 5.0-10 I get: >> >> Device: /dev/hda, S.M.A.R.T. Attribute: 255 Changed from 100 to 166 > I've changed the message so instead of showing the amount of > the change, it shows initial and final values. BTW, with 5.0-16 I always get: Device: /dev/hda, SMART Attribute: 255 Unknown_Attribute changed from 100 to 165 With 5.0-11 I always got: Device: /dev/hda, SMART Attribute: 255 Unknown_Attribute changed from 100 to 11 >> Device Model: MAXTOR 6L060J3 >> Serial Number: 663200252994 >> Firmware Version: A93.0500 >> ATA Version is: 5 >> ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 >> SMART support is: Enabled >> >> What I find strange is that I don't see Attribute 1 or 255 >> with smartctl -a /dev/hda. > There is probably a bug in smartd, where it incorrectly > identifies these as atributes. I'll have a look. I thought Attribute 1 was "Raw_Read_Error_Rate". http://marc.theaimsgroup.com/?l=linux-kernel&m=102971113124014&w=2 The person appears to have the same model, but with a different size: 6L080J4 = 80Gb, while mine is 6L060J3 = 60Gb. And he has: ( 1)Raw Read Error Rate... which I never had with smartsuite or smartmontools. > If you look at the ATA/ATAPI-5 T13 1321D revision 1 spec (see > link in REFERENCS on smartmontools web page) you'll see that > in there it a field in the self-test log for revision number > which is supposed to be "1". On your drive, it's not. If > you can take the time to look through the later ATA specs > (same link above) and find the ATA version where this > self-test log revision number is defined, I'll have a look > and see if any changes need to happen in the code. OK. I'll install Acrobat or something and go through. > have you tried doing some self tests with -S or -X? I ran smartcl -S /dev/hda (before I always ran with -O, -c, -a). smartctl -L /dev/hda reports: === START OF READ SMART DATA SECTION === SMART Self-test log, version number 3 Warning - structure revision number does not match spec! Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Off-line Completed 00% 31520 0x327e0008 BTW, I suggested smartmontools to Patrick J. Volkerding and it's now part of Slackware -current, which should become 9.0. Mon Oct 21 17:51:04 PDT 2002 a/smartmontools-5.0_10-i386-1.tgz: Replaces smartsuite package. -- 0@pervalidus.{net, {dyndns.}org} |
From: Bruce A. <ba...@gr...> - 2002-10-29 09:56:10
|
> BTW, with 5.0-16 I always get: > > Device: /dev/hda, SMART Attribute: 255 Unknown_Attribute changed from 100 to 165 > With 5.0-11 I always got: > > Device: /dev/hda, SMART Attribute: 255 Unknown_Attribute changed from 100 to 11 This is a bit odd; I don't understand it yet. Please send me the output of smartd -VXB for the version 11 and version 16 code that you are using (depending whether you built it using CVS or from a release, it may contain slightly different module versions). This way I can compare the exact routines that are producing this. > >> Device Model: MAXTOR 6L060J3 > >> Serial Number: 663200252994 > >> Firmware Version: A93.0500 > >> ATA Version is: 5 > >> ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > >> SMART support is: Enabled > >> > >> What I find strange is that I don't see Attribute 1 or 255 > >> with smartctl -a /dev/hda. > > > There is probably a bug in smartd, where it incorrectly > > identifies these as atributes. I'll have a look. > > I thought Attribute 1 was "Raw_Read_Error_Rate". > ~ > http://marc.theaimsgroup.com/?l=linux-kernel&m=102971113124014&w=2 > > The person appears to have the same model, but with a different > size: 6L080J4 = 80Gb, while mine is 6L060J3 = 60Gb. > > And he has: > > ( 1)Raw Read Error Rate... > > which I never had with smartsuite or smartmontools. Interesting. This attribute ought to exist in your drive. Could you post the complete output of smartctl -a please? I am not so concerned that attribute 1 doesn't show up. I'm more concerned that smartctl and smartd disagree about the possible existence of attribute 255. Let's try and sort that out first. [By the way, do you know how to use a debugger, either ddd or gdb? This would make it very easy to track down what's going wrong -- that is to say why smartd and smartctl do not agree about the existence of attribute 255.] > > If you look at the ATA/ATAPI-5 T13 1321D revision 1 spec (see > > link in REFERENCS on smartmontools web page) you'll see that > > in there it a field in the self-test log for revision number > > which is supposed to be "1". On your drive, it's not. If > > you can take the time to look through the later ATA specs > > (same link above) and find the ATA version where this > > self-test log revision number is defined, I'll have a look > > and see if any changes need to happen in the code. > > OK. I'll install Acrobat or something and go through. I just had a quick look at the specs. They say "self-test log revision number = 1" right through to the most recent ATA/ATAPI-7 specification! The fact that you are finding version #3 is strange. But please have a look at the specs. If you can find me a definition of self test log rev #3 I'd be happy to put it in the code! > > have you tried doing some self tests with -S or -X? > > I ran smartcl -S /dev/hda (before I always ran with -O, -c, > -a). > > smartctl -L /dev/hda reports: > > === START OF READ SMART DATA SECTION === > SMART Self-test log, version number 3 > Warning - structure revision number does not match spec! > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Off-line Completed 00% 31520 0x327e0008 I've never seen this before -- a self test that completed (no reported error) but has the address of the first error not equal to 0 or =0xffffffff. I just looked in the specs and they say " // T13/1321D revision 1c: (Data structure Rev #1) //The failing LBA shall be the LBA of the uncorrectable sector //that caused the test to fail. If the device encountered more //than one uncorrectable sector during the test, this field //shall indicate the LBA of the first uncorrectable sector //encountered. If the test passed or the test failed for some //reason other than an uncorrectable sector, the value of this //field is undefined. So I think that this is a bug in smartctl. I want to check a couple of other refs, but I think that since the test completed without reporting errors, that the LBA is undefined and shouldn't be printed. > BTW, I suggested smartmontools to Patrick J. Volkerding and > it's now part of Slackware -current, which should become 9.0. > > Mon Oct 21 17:51:04 PDT 2002 > a/smartmontools-5.0_10-i386-1.tgz: Replaces smartsuite package. Thanks! Hopefully they will keep up as we find and fix the things that are broken. > > -- > 0@pervalidus.{net, {dyndns.}org} > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Bruce A. <ba...@gr...> - 2002-10-29 10:03:15
|
> Please send me the output of smartd -VXB for the version 11 and version 16 I meant smartd -VX Bruce |
From: <0...@pe...> - 2002-10-29 16:39:08
|
On Tue, 29 Oct 2002, Bruce Allen wrote: > > Please send me the output of smartd -VXB for the version 11 and version 16 > > I meant smartd -VX 5.0-16: smartd version 5.0-16 - S.M.A.R.T. Daemon. Home page is http://smartmontools.sourceforge.net/ smartd comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under the terms of the GNU General Public License Version 2. See http://www.gnu.org for further details. CVS version IDs of files used to build this code are: Module: smartd.c revision: 1.31 date: 2002/10/25 uses: atacmds.h revision: 1.18 date: 2002/10/24 uses: scsicmds.h revision: 1.7 date: 2002/10/22 uses: smartd.h revision: 1.8 date: 2002/10/25 Module: atacmds.c revision: 1.23 date: 2002/10/24 uses: atacmds.h revision: 1.18 date: 2002/10/24 Module: scsicmds.c revision: 1.11 date: 2002/10/23 uses: scsicmds.h revision: 1.7 date: 2002/10/22 5.0-11: smartd version 5.0-11 - S.M.A.R.T. Daemon Home page is http://smartmontools.sourceforge.net/ smartd comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under the terms of the GNU General Public License Version 2. See http://www.gnu.org for further details. CVS version IDs of files used to build this code are: Module: smartd.c revision: 1.24 date: 2002/10/24 uses: atacmds.h revision: 1.18 date: 2002/10/24 uses: scsicmds.h revision: 1.7 date: 2002/10/22 uses: smartd.h revision: 1.7 date: 2002/10/24 Module: atacmds.c revision: 1.22 date: 2002/10/24 uses: atacmds.h revision: 1.18 date: 2002/10/24 Module: scsicmds.c revision: 1.11 date: 2002/10/23 uses: scsicmds.h revision: 1.7 date: 2002/10/22 BTW, I'm using CVS, but with tagged releases: RELEASE_5_0_11 and RELEASE_5_0_16 -- 0@pervalidus.{net, {dyndns.}org} |
From: Bruce A. <ba...@gr...> - 2002-10-29 19:35:05
|
Hi Fr=E9d=E9ric, I went through the code pretty closely, comparing the file differences between smartd 5.0-11 and 5.0-16. I can't find any cause for the difference in the reporting on your attribute #255. The only files that differ are: > Module: smartd.c revision: 1.31 date: 2002/10/25 > uses: smartd.h revision: 1.8 date: 2002/10/25 > Module: atacmds.c revision: 1.23 date: 2002/10/24 >=20 > Module: smartd.c revision: 1.24 date: 2002/10/24 > uses: smartd.h revision: 1.7 date: 2002/10/24 > Module: atacmds.c revision: 1.22 date: 2002/10/24 In comparing these I did find one bug (now fixed). I'd left a "break;" out of a switch statement so that Attribute 231 would be mistakenly identified as "unrecognized" rather than as temperature. But it didn't have any other effect. It's not responsible for what you are seeing. > BTW, I'm using CVS, but with tagged releases: >=20 > RELEASE_5_0_11 and RELEASE_5_0_16 Do you have an older 2.2 kernel? Strange compiler? Anything non-standard? You just did "make"? I'd be interested to know if the problems persist if you use one of the binary distribution formats, eg the .rpm Cheers, =09Bruce |
From: <0...@pe...> - 2002-10-29 20:58:19
|
On Tue, 29 Oct 2002, Bruce Allen wrote: > > BTW, I'm using CVS, but with tagged releases: > > > > RELEASE_5_0_11 and RELEASE_5_0_16 > > Do you have an older 2.2 kernel? Strange compiler? Anything > non-standard? You just did "make"? 2.4.19, GCC 3.2, glibc 2.2.5. Nothing non-standard. The only change was the addition of -s to the Makefile. I think I found the problem. Apparently my GCC 3.2 or something miscompiles it. I recompiled 5.0-16 with it but removed -s: Oct 29 17:14:07 pervalidus smartd: smartd version 5.0-16 - S.M.A.R.T. Daemon. Oct 29 17:14:07 pervalidus smartd: Home page is http://smartmontools.sourceforge.net/ Oct 29 17:14:07 pervalidus smartd: Using configuration file /etc/smartd.conf Oct 29 17:14:07 pervalidus smartd: Opening device /dev/hda Oct 29 17:14:07 pervalidus smartd: /dev/hda Found and is SMART capable. Adding to "monitor" list. Oct 29 17:14:07 pervalidus smartd: Started monitoring 1 ATA and 0 SCSI devices Oct 29 17:14:07 pervalidus smartd: Device: /dev/hda, SMART Attribute: 255 Unknown_Attribute changed from 100 to 165 Then I recompiled with GCC 2.95.4 CVS: Oct 29 17:16:06 pervalidus smartd: smartd version 5.0-16 - S.M.A.R.T. Daemon. Oct 29 17:16:06 pervalidus smartd: Home page is http://smartmontools.sourceforge.net/ Oct 29 17:16:06 pervalidus smartd: Using configuration file /etc/smartd.conf Oct 29 17:16:06 pervalidus smartd: Opening device /dev/hda Oct 29 17:16:06 pervalidus smartd: /dev/hda Found and is SMART capable. Adding to "monitor" list. Oct 29 17:16:06 pervalidus smartd: Started monitoring 1 ATA and 0 SCSI devices And smartctl -a /dev/hda now reports (it's a diff against my old run with the same version compiled with GCC 3.2): SMART overall-health self-assessment test result: PASSED -See vendor-specific Attribute list for marginal Attributes. -SMART Attributes Data Structure revision number: 22816 +SMART Attributes Data Structure revision number: 11 - 5 Reallocated_Sector_Ct 0x2008 117 005 020 Old_age In_the_past 847118427 + 1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail - 0 -SMART Error Log Version: 3 +SMART Error Log Version: 1 -SMART Self-test log, version number 3 +SMART Self-test log, version number 1 -Warning - structure revision number does not match spec! Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error -# 1 Off-line Completed 00% 31520 0x327e0008 +# 1 Short off-line Completed 00% 4924 > I'd be interested to know if the problems persist if you use > one of the binary distribution formats, eg the .rpm I tried your 5.0-16 rpm before recompiling them (yes, I renamed smartd): Oct 29 17:11:03 pervalidus smartd-16.bin: smartd version 5.0-16 - S.M.A.R.T. Daemon. Oct 29 17:11:03 pervalidus smartd-16.bin: Home page is http://smartmontools.sourceforge.net/ Oct 29 17:11:03 pervalidus smartd-16.bin: Using configuration file /etc/smartd.conf Oct 29 17:11:03 pervalidus smartd-16.bin: Opening device /dev/hda Oct 29 17:11:03 pervalidus smartd-16.bin: /dev/hda Found and is SMART capable. Adding to "monitor" list. Oct 29 17:11:03 pervalidus smartd-16.bin: Started monitoring 1 ATA and 0 SCSI devices Your binaries: GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.3 2.96-112) I also tried the Slackware 5.0-10 smartd, which was compiled with GCC 3.2, and it runs fine. -- 0@pervalidus.{net, {dyndns.}org} |
From: Bruce A. <ba...@gr...> - 2002-10-30 00:33:02
|
Hi Fr=E9d=E9ric, I'm glad you found the problem with your build! =20 On Tue, 29 Oct 2002, [ISO-8859-1] Fr=E9d=E9ric L. W. Meunier wrote: > On Tue, 29 Oct 2002, Bruce Allen wrote: >=20 > > > BTW, I'm using CVS, but with tagged releases: > > > > > > RELEASE_5_0_11 and RELEASE_5_0_16 > > > > Do you have an older 2.2 kernel? Strange compiler? Anything > > non-standard? You just did "make"? >=20 > 2.4.19, GCC 3.2, glibc 2.2.5. Nothing non-standard. The only > change was the addition of -s to the Makefile. Which just strips the symbol table, right? Though since I don't have -g, there may not have been a symbol table to start with. > I think I found the problem. Apparently my GCC 3.2 or something > miscompiles it. I recompiled 5.0-16 with it but removed -s: > And smartctl -a /dev/hda now reports (it's a diff against my > old run with the same version compiled with GCC 3.2): >=20 > SMART overall-health self-assessment test result: PASSED > -See vendor-specific Attribute list for marginal Attributes. >=20 > -SMART Attributes Data Structure revision number: 22816 > +SMART Attributes Data Structure revision number: 11 >=20 > - 5 Reallocated_Sector_Ct 0x2008 117 005 020 Old_age In_the_= past 847118427 >=20 > + 1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail - = 0 >=20 > -SMART Error Log Version: 3 > +SMART Error Log Version: 1 >=20 > -SMART Self-test log, version number 3 > +SMART Self-test log, version number 1 > -Warning - structure revision number does not match spec! >=20 > Num Test_Description Status Remaining LifeTime(hour= s) LBA_of_first_error > -# 1 Off-line Completed 00% 31520 = 0x327e0008 > +# 1 Short off-line Completed 00% 4924 Very nice! I wonder if this is my fault -- I am thinking of one thing in particular; a place in the code where byte alignment might be causing problems.... hmm. =20 First, could you try fixing the following include file that is perhaps still broken on your machine (the patch sas sent to the kernel tree on Oct 10th. Then recompile yet again? --- linux/include/linux/hdreg.h.orig Thu Oct 10 08:40:22 2002 +++ linux/include/linux/hdreg.h Mon Oct 21 17:40:47 2002 @@ -626,9 +626,9 @@ * 12 * 11:0 */ - unsigned short words161_175[14];/* Reserved for CFA */ - unsigned short words176_205[31];/* Current Media Serial Number */ - unsigned short words206_254[48];/* reserved words 206-254 */ + unsigned short words161_175[15];/* Reserved for CFA */ + unsigned short words176_205[30];/* Current Media Serial Number */ + unsigned short words206_254[49];/* reserved words 206-254 */ unsigned short integrity_word; /* (word 255) * 15:8 Checksum * 7:0 Signature > I tried your 5.0-16 rpm before recompiling them (yes, I > renamed smartd): >=20 > Oct 29 17:11:03 pervalidus smartd-16.bin: smartd version 5.0-16 - S.M.A.R= =2ET. Daemon. > Oct 29 17:11:03 pervalidus smartd-16.bin: Home page is http://smartmontoo= ls.sourceforge.net/ > Oct 29 17:11:03 pervalidus smartd-16.bin: Using configuration file /etc/s= martd.conf > Oct 29 17:11:03 pervalidus smartd-16.bin: Opening device /dev/hda > Oct 29 17:11:03 pervalidus smartd-16.bin: /dev/hda Found and is SMART cap= able. Adding to "monitor" list. > Oct 29 17:11:03 pervalidus smartd-16.bin: Started monitoring 1 ATA and 0 = SCSI devices >=20 > Your binaries: GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.3 2.96-112) >=20 > I also tried the Slackware 5.0-10 smartd, which was compiled > with GCC 3.2, and it runs fine. OK, I will sleep a bit better tonight... Bruce |
From: <0...@pe...> - 2002-10-30 01:14:33
|
On Tue, 29 Oct 2002, Bruce Allen wrote: > > 2.4.19, GCC 3.2, glibc 2.2.5. Nothing non-standard. The only > > change was the addition of -s to the Makefile. > > Which just strips the symbol table, right? Though since I > don't have -g, there may not have been a symbol table to > start with. I think -s does the same as strip without arguments. It strips everything. It isn't causing the miscompilations. > I wonder if this is my fault -- I am thinking of one thing in > particular; a place in the code where byte alignment might be > causing problems.... hmm. And I wonder if this is my GCC 3.2 fault since the one from Slackware didn't miscompile their 5.0-10. > First, could you try fixing the following include file that > is perhaps still broken on your machine (the patch sas sent > to the kernel tree on Oct 10th. Then recompile yet again? If you meant recompiling smartmontools (and not the kernel) after patching, it didn't change anything in the smartctl -a output for the GCC 3.2 binaries with default options. -- 0@pervalidus.{net, {dyndns.}org} |
From: Bruce A. <ba...@gr...> - 2002-10-30 02:37:50
|
> And I wonder if this is my GCC 3.2 fault since the one from > Slackware didn't miscompile their 5.0-10. > > > First, could you try fixing the following include file that > > is perhaps still broken on your machine (the patch sas sent > > to the kernel tree on Oct 10th. Then recompile yet again? > > If you meant recompiling smartmontools (and not the kernel) > after patching, it didn't change anything in the smartctl -a > output for the GCC 3.2 binaries with default options. OK, I've got another idea closer to home. Let me look a bit closer at the print formatting... Bruce |
From: <0...@pe...> - 2002-10-29 16:54:56
|
On Tue, 29 Oct 2002, Bruce Allen wrote: > > ( 1)Raw Read Error Rate... > > > > which I never had with smartsuite or smartmontools. > > Interesting. This attribute ought to exist in your drive. > Could you post the complete output of smartctl -a please? =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D Device Model: MAXTOR 6L060J3 Serial Number: 663200252994 Firmware Version: A93.0500 ATA Version is: 5 ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 SMART support is: Available - device has SMART capability. SMART support is: Enabled =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Off-line data collection status: (0x02) Offline data collection activity completed without error. Self-test execution status: ( 0) The previous self-test routine comp= leted =09=09=09=09=09without error or no self-test has ever =09=09=09=09=09been run. Total time to complete off-line data collection: ( 35) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Automatic timer ON/OFF support. Suspend Offline collection upon new command. =09=09=09=09=09Offline surface scan supported. Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. SMART Attributes Data Structure revision number: 22816 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE WHEN_FAILE= D RAW_VALUE 5 Reallocated_Sector_Ct 0x2008 117 005 020 Old_age In_the_pas= t 847118427 3 Spin_Up_Time 0x0027 066 066 020 Pre-fail - = 4287 4 Start_Stop_Count 0x0032 100 100 008 Old_age - = 24 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail - = 0 7 Seek_Error_Rate 0x000b 100 100 023 Pre-fail - = 0 9 Power_On_Hours 0x0012 093 093 001 Old_age - = 4925 10 Spin_Retry_Count 0x0026 100 100 000 Old_age - = 0 11 Calibration_Retry_Count 0x0013 100 100 020 Pre-fail - = 0 12 Power_Cycle_Count 0x0032 100 100 008 Old_age - = 22 13 Read_Soft_Error_Rate 0x000b 100 100 023 Pre-fail - = 0 194 Temperature_Centigrade 0x0022 079 074 042 Old_age - = 56 195 Hardware_ECC_Recovered 0x001a 100 020 000 Old_age - = 499492 196 Reallocated_Event_Count 0x0010 100 100 020 Old_age - = 0 197 Current_Pending_Sector 0x0032 100 100 020 Old_age - = 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age - = 0 199 UDMA_CRC_Error_Count 0x001a 200 200 000 Old_age - = 0 SMART Error Log Version: 3 No Errors Logged SMART Self-test log, version number 3 Warning - structure revision number does not match spec! Num Test_Description Status Remaining LifeTime(hours)= LBA_of_first_error # 1 Off-line Completed 00% 31520 = 0x327e0008 I'm a bit worried about Hardware_ECC_Recovered. It increases ~100.000 each time I run smartctl -O, -c, -a. I don't know it the cable, IDE controller, driver, or temperature (the drive is running at ~55=B0C) could cause it. > I am not so concerned that attribute 1 doesn't show up. I'm > more concerned that smartctl and smartd disagree about the > possible existence of attribute 255. Let's try and sort that > out first. [By the way, do you know how to use a debugger, > either ddd or gdb? I used gdb a few times, mainly with core files (where...). --=20 0@pervalidus.{net, {dyndns.}org} |
From: Bruce A. <ba...@gr...> - 2002-10-29 23:48:22
|
Hi Fr=E9d=E9ric, Thanks for this note. I think if you look closely at your output, you'll see the "missing" attribute #1: Let me intersperse some comments first... > SMART Attributes Data Structure revision number: 22816 This is VERY odd. Especially since for a lot of these maxtor disks the rev number is around 10 or so... this huge value makes me suspicious that something's been corrupted at the start of the Device Attributes Data Structure. It's odd... > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE WHEN_FAI= LED RAW_VALUE > 5 Reallocated_Sector_Ct 0x2008 117 005 020 Old_age In_the_p= ast 847118427 > 3 Spin_Up_Time 0x0027 066 066 020 Pre-fail - = 4287 > 4 Start_Stop_Count 0x0032 100 100 008 Old_age - = 24 > 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail - = 0 AHA! The attributes are stored in a structure that can contain up to thirty of them. Normally, the manufactuers store them in order of increasing attribute ID. But if you look above you'll see the "missing" attribute 1 (Raw Read Error Rate). It's incorrectly labeled "5". I don't know yet if this is because the data on your disk's smart data sector is corrupted, or if there's something wrong with smartmontools. What does smartsuite report for the vendor attribute structure? Can you compare them for me? The attribute numbers are stored in two different places. I have now added code to the latest smartmontools release (5,0-22) to print a warning message if they don't agree. Could you please try that out and again send the output from smartctl -a? > 195 Hardware_ECC_Recovered 0x001a 100 020 000 Old_age - = 499492 > 196 Reallocated_Event_Count 0x0010 100 100 020 Old_age - = 0 > 197 Current_Pending_Sector 0x0032 100 100 020 Old_age - = 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age - = 0 > 199 UDMA_CRC_Error_Count 0x001a 200 200 000 Old_age - = 0 >=20 > I'm a bit worried about Hardware_ECC_Recovered. It increases > ~100.000 each time I run smartctl -O, -c, -a. I don't know it > the cable, IDE controller, driver, or temperature (the drive is > running at ~55=B0C) could cause it. Something to keep in mind is that the quantity reported in the final column is my (our!) attempt to interpret a 6-byte "vendor specific" field. = =20 I have a lot of maxtor disks and have seen much larger numbers in this field. It's probably two or three small numbers which record different things, all stuck together in the six bytes. > > I am not so concerned that attribute 1 doesn't show up. I'm > > more concerned that smartctl and smartd disagree about the > > possible existence of attribute 255. Let's try and sort that > > out first. [By the way, do you know how to use a debugger, > > either ddd or gdb? >=20 > I used gdb a few times, mainly with core files (where...). How about first trying the latest release? I added some code especially for you, to check for consistent attribute numbering in both smartctl and smartd. So please try both I also found and fixed a programming blunder on my part in smartd (a varargs function with an extra argument that should not have been there). This might have led to my over-writing something on the stack. It's worth a shot. If this doesn't find & fix the problem, I'll send you some instructions on how to run the code under a debugger. Cheers, =09Bruce >=20 > --=20 > 0@pervalidus.{net, {dyndns.}org} >=20 >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support >=20 |
From: <0...@pe...> - 2002-10-30 00:34:14
|
On Tue, 29 Oct 2002, Bruce Allen wrote: > How about first trying the latest release? I added some code > especially for you, to check for consistent attribute > numbering in both smartctl and smartd. So please try both I > also found and fixed a programming blunder on my part in > smartd (a varargs function with an extra argument that should > not have been there). This might have led to my over-writing > something on the stack. It's worth a shot. If this doesn't > find & fix the problem, I'll send you some instructions on > how to run the code under a debugger. I tried, but as you can see I reported the miscompilations, which don't occur with -Os. I don't know what's causing them, but think it'd have run into more troubles if the compiler was that bad, as I compiled XFree86 and Mozilla with -O3 -march=athlon -mcpu=athlon, and everything else uses -O2. With 5.0-22 (mis)compiled with -O2, smartd prints: Oct 29 21:10:05 pervalidus smartd: smartd version 5.0-22 - S.M.A.R.T. Daemon. Oct 29 21:10:05 pervalidus smartd: Home page is http://smartmontools.sourceforge.net/ Oct 29 21:10:05 pervalidus smartd: Using configuration file /etc/smartd.conf Oct 29 21:10:05 pervalidus smartd: Device: /dev/hda, opened Oct 29 21:10:05 pervalidus smartd: Device: /dev/hda, is SMART capable. Adding to "monitor" list. Oct 29 21:10:05 pervalidus smartd: Started monitoring 1 ATA and 0 SCSI devices I have /dev/hda -A -a in /etc/smartd.conf. So, apparently 255 is gone. Or does -a hides it ? smartctl still shows the miscompilations: +See vendor-specific Attribute list for marginal Attributes. SMART Attributes Data Structure revision number: 24416 Yes, this number changes when miscompiled. 5 Reallocated_Sector_Ct 0x0008 133 005 020 Old_age In_the_past 847118427 5 Reallocated_Sector_Ct <== Data Page | WARNING: PREVIOUS ATTRIBUTE HAS TWO 1 Raw_Read_Error_Rate <== Threshold Page | INCONSISTENT IDENTITIES IN THE DATA SMART Error Log Version: 3 No Errors Logged SMART Self-test log, version number 3 Warning - structure revision number does not match spec! Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Off-line Completed 00% 35584 Compiled with -Os: -See vendor-specific Attribute list for marginal Attributes. SMART Attributes Data Structure revision number: 11 1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail - 0 3 Spin_Up_Time 0x0027 066 066 020 Pre-fail - 4287 4 Start_Stop_Count 0x0032 100 100 008 Old_age - 24 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail - 0 ... SMART Error Log Version: 1 No Errors Logged SMART Self-test log, version number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short off-line Completed 00% 4924 Definitely a miscompilation. -- 0@pervalidus.{net, {dyndns.}org} |
From: Bruce A. <ba...@gr...> - 2002-10-30 02:26:19
|
> I tried, but as you can see I reported the miscompilations, > which don't occur with -Os. I don't know what's causing them, > but think it'd have run into more troubles if the compiler was > that bad, as I compiled XFree86 and Mozilla with -O3 > -march=athlon -mcpu=athlon, and everything else uses -O2. > > With 5.0-22 (mis)compiled with -O2, smartd prints: > > Oct 29 21:10:05 pervalidus smartd: smartd version 5.0-22 - S.M.A.R.T. Daemon. > Oct 29 21:10:05 pervalidus smartd: Home page is http://smartmontools.sourceforge.net/ > Oct 29 21:10:05 pervalidus smartd: Using configuration file /etc/smartd.conf > Oct 29 21:10:05 pervalidus smartd: Device: /dev/hda, opened > Oct 29 21:10:05 pervalidus smartd: Device: /dev/hda, is SMART capable. Adding to "monitor" list. > Oct 29 21:10:05 pervalidus smartd: Started monitoring 1 ATA and 0 SCSI devices > > I have /dev/hda -A -a in /etc/smartd.conf. > > So, apparently 255 is gone. Or does -a hides it ? It's not "-a" hiding it. The "-a" directive to smartd shows all. The "255" was probably due either to stack misaligment -- look at the difference between smartd.c v1.44 and 1.45, around line 549 of v1.45 But I'm not sure. I suggest that for the moment we regard smartd as fixed or non-problematic, and concentrate on tracking down the problem in smartctl. > smartctl still shows the miscompilations: > > +See vendor-specific Attribute list for marginal Attributes. > > SMART Attributes Data Structure revision number: 24416 > > Yes, this number changes when miscompiled. > > 5 Reallocated_Sector_Ct 0x0008 133 005 020 Old_age In_the_past 847118427 > 5 Reallocated_Sector_Ct <== Data Page | WARNING: PREVIOUS ATTRIBUTE HAS TWO > 1 Raw_Read_Error_Rate <== Threshold Page | INCONSISTENT IDENTITIES IN THE DATA > > SMART Error Log Version: 3 > No Errors Logged > > SMART Self-test log, version number 3 > Warning - structure revision number does not match spec! > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Off-line Completed 00% 35584 > > Compiled with -Os: > > -See vendor-specific Attribute list for marginal Attributes. > > SMART Attributes Data Structure revision number: 11 > > 1 Raw_Read_Error_Rate 0x0029 100 253 020 Pre-fail - 0 > 3 Spin_Up_Time 0x0027 066 066 020 Pre-fail - 4287 > 4 Start_Stop_Count 0x0032 100 100 008 Old_age - 24 > 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail - 0 > ... > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log, version number 1 > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Short off-line Completed 00% 4924 > > Definitely a miscompilation. I agree. Could you email me gcc 3.2 precompiled code with -Os and -O2 (ie one working, one broken binary) for smartctl? I'd like to try them on a P3 and P4 box and on another athlon box. Also, does smartctl work correctly when compiled with -g? Cheers, Bruce |