From: Frederik H. <fh...@pa...> - 2004-08-04 14:00:31
|
Hi, Today I noticed that each half hour, I have these warnings Aug 4 15:09:49 localhost smartd[4806]: Device: /dev/hda, 1 Offline uncorrectable sectors in my /var/log/messages each 30 minutes, this for already a few weeks. I manually did a short and a long self-test, and these are the results: # smartctl -l selftest /dev/hda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 4881 - # 2 Short offline Completed without error 00% 4880 - # 3 Short offline Interrupted (host reset) 90% 4879 - smartctl -Hc and smartctl -A neither show any failures. With smartctl -a I see these errors: Error 128 occurred at disk power-on lifetime: 4830 hours (201 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 1f 5a 5e e2 Error: ICRC, ABRT at LBA = 0x025e5a1f = 39737887 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 28 f8 59 5e e2 00 00:00:35.157 READ DMA Error 127 occurred at disk power-on lifetime: 4351 hours (181 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 9f 50 61 e2 Error: ICRC, ABRT at LBA = 0x0261509f = 39932063 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 80 20 50 61 e2 00 00:01:58.484 READ DMA Error 126 occurred at disk power-on lifetime: 2960 hours (123 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 01 15 9f 38 e4 Error: UNC 1 sectors at LBA = 0x04389f15 = 70819605 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 01 15 9f 38 e4 00 02:40:23.479 READ DMA Error 125 occurred at disk power-on lifetime: 2960 hours (123 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 59 01 15 9f 38 e4 Error: UNC 1 sectors at LBA = 0x04389f15 = 70819605 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 02 14 9f 38 e4 00 02:40:18.350 READ DMA Error 124 occurred at disk power-on lifetime: 2960 hours (123 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 01 15 9f 38 e4 Error: UNC 1 sectors at LBA = 0x04389f15 = 70819605 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 01 15 9f 38 e4 00 02:40:13.379 READ DMA Should I be worried about all this? Frederik |
From: Bruce A. <ba...@gr...> - 2004-08-04 16:02:25
|
Hi Frederik, > Today I noticed that each half hour, I have these warnings > Aug 4 15:09:49 localhost smartd[4806]: Device: /dev/hda, 1 Offline > uncorrectable sectors > > in my /var/log/messages each 30 minutes, this for already a few weeks. > > I manually did a short and a long self-test, and these are the results: > > # smartctl -l selftest /dev/hda > smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF READ SMART DATA SECTION === > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed without error 00% 4881 - > # 2 Short offline Completed without error 00% 4880 - > # 3 Short offline Interrupted (host reset) 90% 4879 - The fact that the extended offline test completed without errors is a good sign. > smartctl -Hc and smartctl -A neither show any failures. > > With smartctl -a I see these errors: > > Error 128 occurred at disk power-on lifetime: 4830 hours (201 days + 6 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 84 51 00 1f 5a 5e e2 Error: ICRC, ABRT at LBA = 0x025e5a1f = 39737887 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 28 f8 59 5e e2 00 00:00:35.157 READ DMA > > Error 127 occurred at disk power-on lifetime: 4351 hours (181 days + 7 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 84 51 00 9f 50 61 e2 Error: ICRC, ABRT at LBA = 0x0261509f = 39932063 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 80 20 50 61 e2 00 00:01:58.484 READ DMA > > Error 126 occurred at disk power-on lifetime: 2960 hours (123 days + 8 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 01 15 9f 38 e4 Error: UNC 1 sectors at LBA = 0x04389f15 = 70819605 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 01 15 9f 38 e4 00 02:40:23.479 READ DMA > > Error 125 occurred at disk power-on lifetime: 2960 hours (123 days + 8 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 59 01 15 9f 38 e4 Error: UNC 1 sectors at LBA = 0x04389f15 = 70819605 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 02 14 9f 38 e4 00 02:40:18.350 READ DMA > > Error 124 occurred at disk power-on lifetime: 2960 hours (123 days + 8 hours) > When the command that caused the error occurred, the device was active or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 01 15 9f 38 e4 Error: UNC 1 sectors at LBA = 0x04389f15 = 70819605 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 01 15 9f 38 e4 00 02:40:13.379 READ DMA > > Should I be worried about all this? I don't think that you need to worry. But I don't understand why the offline uncorrectable sector count is still 1 on your disk, since the extended self-test passes. Perhaps another subscriber to this list can offer some insight. What's the brand/model/firmware of the disk? Here's one random thought: try seeing if smartctl -t offline /dev/hd? updates the SMART Attribute values and clears the offline uncorrectable count to zero. Cheers, Bruce |
From: Frederik H. <fh...@pa...> - 2004-08-04 20:08:46
|
On Wednesday 04 August 2004 18:02, Bruce Allen wrote: > But I don't understand why the > offline uncorrectable sector count is still 1 on your disk, since the > extended self-test passes. Perhaps another subscriber to this list can > offer some insight. What's the brand/model/firmware of the disk? Device Model: FUJITSU MHS2040AT D Firmware Version: 3003 It's a laptop hard drive, in a Compaq EVO N1020v. Some 6 months ago I had some problem on the disk with one or a few unreadable sectors (there were the classic driveready seekcomplete errors in kernel logs). I solved it then by making a copy of the data, reformatting the partion, and restoring the data. Since then I never had any problems. > Here's one random thought: try seeing if > smartctl -t offline /dev/hd? > updates the SMART Attribute values and clears the offline uncorrectable > count to zero. That does not seem to help. It's strange, because lately I'm having sometimes trouble that my machine hangs when setting the hdparm paramters at boot time. And at this moment, I see that my HD led stays on all the time, although it is not reading. I was using a kernel 2.6.7 with lots of patches from mm-series, I have switched to a more standard 2.6.8rc now, maybe this will solve these problems. Or maybe it could be a problem with the IDE cable or something like that? That would be rather annoying, because it's a laptop, and I don't know how I would have to solve that. Anyway, a back-up of my vital data never hurts, it seems this is the right moment to think about that :-) Thank you for the help, and for this great utility. I did not really know much about S.M.A.R.T. and smartmontools, but today I have read a bit of documentation about it, and I'm really impressed. Thanks! Frederik |
From: Bruce A. <ba...@gr...> - 2004-08-05 11:57:26
|
Frederik, I suggest that you try running the Fujitsu 'disk repair' utility off of a DOS boot floppy, and see if this results in the offline uncorrectable sector counts being cleared to zero. It may be that even though these sectors are readable, they need to be re-written in order to force them off the list. The disk repair utility will probably do this. [Do a back-up first to keep my conscience clear, please.] Cheers, Bruce On Wed, 4 Aug 2004, Frederik Himpe wrote: > On Wednesday 04 August 2004 18:02, Bruce Allen wrote: > > But I don't understand why the > > offline uncorrectable sector count is still 1 on your disk, since the > > extended self-test passes. Perhaps another subscriber to this list can > > offer some insight. What's the brand/model/firmware of the disk? > > Device Model: FUJITSU MHS2040AT D > Firmware Version: 3003 > > It's a laptop hard drive, in a Compaq EVO N1020v. > > Some 6 months ago I had some problem on the disk with one or a few unreadable > sectors (there were the classic driveready seekcomplete errors in kernel > logs). I solved it then by making a copy of the data, reformatting the > partion, and restoring the data. Since then I never had any problems. > > > Here's one random thought: try seeing if > > smartctl -t offline /dev/hd? > > updates the SMART Attribute values and clears the offline uncorrectable > > count to zero. > > That does not seem to help. > > It's strange, because lately I'm having sometimes trouble that my machine > hangs when setting the hdparm paramters at boot time. And at this moment, I > see that my HD led stays on all the time, although it is not reading. I was > using a kernel 2.6.7 with lots of patches from mm-series, I have switched to > a more standard 2.6.8rc now, maybe this will solve these problems. > > Or maybe it could be a problem with the IDE cable or something like that? That > would be rather annoying, because it's a laptop, and I don't know how I would > have to solve that. > > Anyway, a back-up of my vital data never hurts, it seems this is the right > moment to think about that :-) > > Thank you for the help, and for this great utility. I did not really know much > about S.M.A.R.T. and smartmontools, but today I have read a bit of > documentation about it, and I'm really impressed. Thanks! > > Frederik > > > ------------------------------------------------------- > This SF.Net email is sponsored by OSTG. Have you noticed the changes on > Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, > one more big change to announce. We are now OSTG- Open Source Technology > Group. Come see the changes on the new OSTG site. www.ostg.com > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > |