From: <pho...@ti...> - 2002-11-10 08:47:25
Attachments:
Quantum.txt
|
Hello! I'm new to this mailing list, but I hope you could help me to solve this problem. It's now 5 or 6 times that my system freezes with the HD led on: I can't do anything which involves disk access and there are no messages from the kernel. I have a redirection on tty12, so since switching console is not a disk based activity I can read the last messages. I have a 30GB Quantum Fireball AS and everytime there is a freeze I see that "UDMA CRC Error Count" "RAW_VALUE" increases by 1 and "VALUE" decreases by 1 (I can see this on the next reboot). After the last freeze I got: 199 UDMA_CRC_Error_Count 0x001a 191 191 000 Old_age - 9 Before it was: 199 UDMA_CRC_Error_Count 0x001a 192 192 000 Old_age - 8 And so on... I changed the cable and tried a with new motherboard: I got the same result, so this seems to be a hard disk related problem. Attached to this email is the SMART status of the drive. Now, two questions: 1) What does "UDMA_CRC_Error_Count" mean? 2) Is my loved HD dying? :'( Thanks for your help! -- --Tony |
From: Bruce A. <ba...@gr...> - 2002-11-10 18:56:31
|
Hi Tony, > It's now 5 or 6 times that my system freezes with the HD led on: I > can't do anything which involves disk access and there are no messages > from the kernel. I have a redirection on tty12, so since switching > console is not a disk based activity I can read the last messages. > > I have a 30GB Quantum Fireball AS and everytime there is a freeze I see that > "UDMA CRC Error Count" "RAW_VALUE" increases by 1 and "VALUE" decreases by 1 > (I can see this on the next reboot). > > After the last freeze I got: > > 199 UDMA_CRC_Error_Count 0x001a 191 191 000 Old_age - 9 > > Before it was: > > 199 UDMA_CRC_Error_Count 0x001a 192 192 000 Old_age - 8 > > And so on... > > I changed the cable and tried a with new motherboard: I got the same result, > so this seems to be a hard disk related problem. > > Attached to this email is the SMART status of the drive. > > Now, two questions: > > 1) What does "UDMA_CRC_Error_Count" mean? I am far from an expert on this subject. My understanding is that the data transfer to/from the disk, using UDMA mode, is accompanied by a CRC (Cyclic Redundancy Code) which is a sort of checksum for detecting errors in the data transmittion. There can be at least three causes -- a bad io chipset, bad cable, or bad disk. Since you're replaced the MB, I'd say the next thing to try is the IDE/ATA cable. Make sure you use a 80-wire not 40-wire cable. > 2) Is my loved HD dying? :'( It might be. Another thing to try is get a recent release of the hdparm utility and try changing the I/O mode settings of the dis. Perhaps there is an incompatibility between the settings of your system and the capabilities of the disk. By the way, are the values reported by the disk for power cycles and lifetime accurate? 1000 hours is only 6 weeks, so it appears to be a pretty young disk, on the other hand it seems to have been power cycled about once per hour. Do you use your machine for an hour at a time, then power it down? If so, keep in mind that power cycling a hard disk causes a lot of "wear and tear". Good luck! Bruce Allen |
From: <pho...@ti...> - 2002-11-11 09:11:30
Attachments:
via
hdparm.txt
|
On Sunday 10 November 2002 11:43, Bruce Allen wrote: > > Now, two questions: > > > > 1) What does "UDMA_CRC_Error_Count" mean? > > I am far from an expert on this subject. My understanding is that the > data transfer to/from the disk, using UDMA mode, is accompanied by a CRC > (Cyclic Redundancy Code) which is a sort of checksum for detecting errors > in the data transmittion. There can be at least three causes -- a bad io > chipset, bad cable, or bad disk. Since you're replaced the MB, I'd say So there seems to be no way to know the precise meaning of "UDMA_CRC_Error_Count"... ;/ > the next thing to try is the IDE/ATA cable. Make sure you use a 80-wire > not 40-wire cable. I changed both motherboard and IDE cable (yes, it has 80-wires). Do you feel to exclude any software related cause? Maybe some strange bug in the IDE driver (maybe exploited by some exotic condition... ;) ) > > 2) Is my loved HD dying? :'( > > It might be. Another thing to try is get a recent release of the hdparm > utility and try changing the I/O mode settings of the dis. Perhaps there > is an incompatibility between the settings of your system and the > capabilities of the disk. I never changed my hardware/software conf, at least in the last 3 months. The HD started failing without apparent cause. However the south bridge of my MoBo is a VIA 686B. I attached the /proc/ide/via file and the hdparm output to this email in the hope you can tell me more... > By the way, are the values reported by the disk for power cycles and > lifetime accurate? 1000 hours is only 6 weeks, so it appears to be a This is strange. I didn't notice this before. The HD is about 1.5 years old and I use my workstation for about 10 hours a day (power cycling it twice a day, or something more). > pretty young disk, on the other hand it seems to have been power cycled > about once per hour. Do you use your machine for an hour at a time, then This is definitely not true... Maybe the electronic board on the drive is faulty? However I think to have read somewhere that the "Power_On_Hours" counter is not that accurate on some drives... Maybe *so* inaccurate?!?! =;/ Thanks a lot for your help. --Tony |
From: Bruce A. <ba...@gr...> - 2002-11-11 09:33:26
|
Hi Tony, > > > 1) What does "UDMA_CRC_Error_Count" mean? > > > > I am far from an expert on this subject. My understanding is that the > > data transfer to/from the disk, using UDMA mode, is accompanied by a CRC > > (Cyclic Redundancy Code) which is a sort of checksum for detecting errors > > in the data transmittion. There can be at least three causes -- a bad io > > chipset, bad cable, or bad disk. Since you're replaced the MB, I'd say > > So there seems to be no way to know the precise meaning of > "UDMA_CRC_Error_Count"... ;/ There is. You have to read the ATA/ATAPI specs (see REFERENCES section of the smartmontools home page). Use the "search" feature of acroread to look for "CRC". You'll find a section called Ultra DMA CRC rules. But I'm not sure it'll help you figure out what is going on. > > the next thing to try is the IDE/ATA cable. Make sure you use a 80-wire > > not 40-wire cable. > > I changed both motherboard and IDE cable (yes, it has 80-wires). OK, that's too bad. By the way, is there another device on the same IDE cable? That might also be responsible. > Do you feel to exclude any software related cause? Maybe some strange bug in > the IDE driver (maybe exploited by some exotic condition... ;) ) Sadly, I am simply not expert enough to answer this. I'm sorry -- I just don't know. > > > 2) Is my loved HD dying? :'( > > > > It might be. Another thing to try is get a recent release of the hdparm > > utility and try changing the I/O mode settings of the dis. Perhaps there > > is an incompatibility between the settings of your system and the > > capabilities of the disk. > > I never changed my hardware/software conf, at least in the last 3 months. The > HD started failing without apparent cause. It still might be worth trying to use hdparm to change some of the parameters -- but perhaps not. You are the best judge of this. > However the south bridge of my MoBo is a VIA 686B. I attached the > /proc/ide/via file and the hdparm output to this email in the hope you can > tell me more... I'm afraid you've reached the limits of my knowledge/ignorance. > > By the way, are the values reported by the disk for power cycles and > > lifetime accurate? 1000 hours is only 6 weeks, so it appears to be a > > This is strange. I didn't notice this before. The HD is about 1.5 years old > and I use my workstation for about 10 hours a day (power cycling it twice a > day, or something more). So this is odd -- though many manufacturers do not use the Attributes to store "the normal thing". The Attribute values are all device specific, according to the spec. Try keeping an eye on the attribute and seeing how it changes from one day to the next. By the way, power cycling your machine twice a day is probably not a good idea. It causes a lot of wear and tear on the power supply and disk. My advice is to leave it switched on (but turn off the monitor). The cost of electricity is less than the cost of fixing the system. > > pretty young disk, on the other hand it seems to have been power cycled > > about once per hour. Do you use your machine for an hour at a time, then > > This is definitely not true... Maybe the electronic board on the drive is > faulty? > > However I think to have read somewhere that the "Power_On_Hours" counter is > not that accurate on some drives... Maybe *so* inaccurate?!?! =;/ Indeed, the attribute is Vendor/Device specific -- so only Quantum really knows what it stores. *Most* vendors seem to use it for power on hours -- but Hitachi for example uses it for power on minutes. Personally, if I were you, I'd buy another drive and transfer your data to it sometime soon. Cheers, Bruce |
From: <pho...@ti...> - 2002-11-11 18:36:38
|
On Monday 11 November 2002 10:33, Bruce Allen wrote: > > > the next thing to try is the IDE/ATA cable. Make sure you use a > > > 80-wire not 40-wire cable. > > > > I changed both motherboard and IDE cable (yes, it has 80-wires). > > OK, that's too bad. By the way, is there another device on the same IDE > cable? That might also be responsible. No, It is the only one. There was a slave disk, but removing it didn't influence the first HD... ;( > > > > 2) Is my loved HD dying? :'( > > > > > > It might be. Another thing to try is get a recent release of the > > > hdparm utility and try changing the I/O mode settings of the dis. > > > Perhaps there is an incompatibility between the settings of your system > > > and the capabilities of the disk. > > > > I never changed my hardware/software conf, at least in the last 3 months. > > The HD started failing without apparent cause. > > It still might be worth trying to use hdparm to change some of the > parameters -- but perhaps not. You are the best judge of this. O.K., I'll try something... > Personally, if I were you, I'd buy another drive and transfer your data to > it sometime soon. O.K., I think I'll follow your advice... Thanks a lot for your help! :) -- --Tony |