 [smartmontools-support]194 Temperature_Celsius FAILING_NOW From: KELEMEN Peter - 2006-01-13 14:14:37 ```Hi folks, After a massive power cut, we observe several tens of disks having their 194 Temperature_Celsius in FAILING_NOW status (where RAW value is 28, pretty normal). Do I have to interpret these as the disk is incapable of measuring it's own temperature anymore? Peter -- .+'''+. .+'''+. .+'''+. .+'''+. .+'' Kelemen Péter / \ / \ Peter.Kelemen@... .+' `+...+' `+...+' `+...+' `+...+' ```

 Re: [smartmontools-support]194 Temperature_Celsius FAILING_NOW From: Bruce Allen - 2006-01-13 14:58:59 ```Hey Peter, It was nice meeting you this past summer. By the way I've been testing a= =20 bunch of Areca/3ware cards during the past months. Sometime we should=20 talk about it. I have some bad news for you. The literal interpretation is as follows.=20 Temperature_Celsius is what is called a 'usage' or 'old age' Attribute.=20 When failing (which means that the normalized value is <=3D the threshold= =20 value) this means that the disk has 'exceeded its design lifetime or=20 design parameters' and can no longer be expected to function normally. In plainer language: the disks got hot enough that (the engineers who=20 designed them think that) they should now go into the trash can. If these disk sare behind RAID controllers, I personally wouldn't worry=20 much. If the disks are used for non-redundant storage, I'd either replace= =20 the drives or at the very least consult the disk manufacturer to ask their= =20 opinion. Cheers, =09Bruce On Fri, 13 Jan 2006, KELEMEN Peter wrote: > Hi folks, > > After a massive power cut, we observe several tens of disks having > their 194 Temperature_Celsius in FAILING_NOW status (where RAW > value is 28, pretty normal). Do I have to interpret these as the > disk is incapable of measuring it's own temperature anymore? > > Peter > > --=20 > .+'''+. .+'''+. .+'''+. .+'''+. .+'' > Kelemen P=E9ter / \ / \ Peter.Kelemen@... > .+' `+...+' `+...+' `+...+' `+...+' > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log fi= les > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Smartmontools-support mailing list > Smartmontools-support@... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support >```
 Re: [smartmontools-support]194 Temperature_Celsius FAILING_NOW From: KELEMEN Peter - 2006-01-13 16:53:27 ```* Bruce Allen (ballen@...) [20060113 08:58]: Hi Bruce, > It was nice meeting you this past summer. By the way I've been > testing a bunch of Areca/3ware cards during the past months. > Sometime we should talk about it. Same here. We're just getting Areca-arrays rolling, I should have some more exposure to them in the following months. > I have some bad news for you. The literal interpretation is > as follows. Temperature_Celsius is what is called a 'usage' > or 'old age' Attribute. When failing (which means that the > normalized value is <= the threshold value) this means that the > disk has 'exceeded its design lifetime or design parameters' and > can no longer be expected to function normally. Thanks for the explanation. We'll contact the vendor; however, it was very useful to have this confirmed because we have other signs (PCI parity errors, NMIs) that point the investigation in the direction of overheating. Additionally, the average age of those disks is 22260 hours (== 2.53 years). Peter -- .+'''+. .+'''+. .+'''+. .+'''+. .+'' Kelemen Péter / \ / \ Peter.Kelemen@... .+' `+...+' `+...+' `+...+' `+...+' ```