From: Bruce A. <ba...@gr...> - 2006-09-15 18:11:11
|
Tony, thanks. I would appreciate it if you can copy the mailing list -- this may be of interest to other users also. Bruce On Fri, 15 Sep 2006, Tony Ladd wrote: > Bruce > > I have made some progress-thanks to 3ware and to you; but no thanks to WDC. > I asked for a description of the vendor-specific attributes and they > refused. Nice. But I think I understand a little better, nevertheless. It > seems that the spin up time is encoded via 3 16 bit integers (base10). In > that format the lowest (I think) bits encode the spin up time, the next 16 > have something which smartmontools reports as Average but which may not be. > In most cases its zero but for some drives it is 7 and occasionally 8. It > would be useful to know what this means. The upper 16 bits always are zero > (it seems). Also the spin up time does not seem to be active for the first > few spin ups. I have 5 start ups on one drive and its still not reporting a > spin up time. > > I tried using a separate 300W PS to power a single drive with a long spin up > time. It made no difference to the SMART log so in the end I don't think > power is the issue. If I move the drive to a different box the "Average" > integer gets reset to zero and I get a normal spin up. I put one such drive > back on the RAID array with the normal PS. It now seems happy, reporting a > normal spinup time with the 2nd integer still on 0. It reports a failed spin > up in the past but is OK (<4 s) now. I am going to try the same process with > the other 2 "slow" drives. > > My best guess is that we had a couple a bad power connectors which caused > some failed or near failed spin ups. For some reason the SMART log is not > forgetting about this-I suspect this is signified by the second integer. > When you change hardware it does reset the spinup time so then it seems OK. > > Not sure if this makes any sense-but I have learnt something about > smartmontools. > > Tony > > -----Original Message----- > From: Bruce Allen [mailto:ba...@gr...] > Sent: Friday, September 15, 2006 1:09 PM > To: Tony Ladd > Cc: Smartmontools Mailing List > Subject: RE: [smartmontools-support] Reading the logs > > > 3ware has reasonably good technical support. I suggest that you contact > them about this. Ask if your drive model is on their 'certified' list for > the controller, and if not, ask if they could get a disk and run it > through their certification problem. Tell them the details of what you > are observing. > > Cheers, > Bruce > > > On Wed, 13 Sep 2006, Tony Ladd wrote: > >> Bruce >> >> I have the controller configured as you suggested-staggered spin up >> and forced reallocation when verify finds an error. I tried the failed >> drive on a plain SATA controller and it came up in about 4s which >> seems normal-as opposed to 54 secs on the RAID array. It also passed >> diagnostics. I then tried 4 drives on the RAID array to reduce the >> power load-but it did not help. So I will try the oscilloscope next. >> >> Tony >> >> -----Original Message----- >> From: Bruce Allen [mailto:ba...@gr...] >> Sent: Sunday, September 10, 2006 2:47 AM >> To: Tony Ladd >> Subject: RE: [smartmontools-support] Reading the logs >> >> >> One more note -- be sure to set up the 3ware controller so that it >> staggers the drive spin ups. This will reduce the power supply >> loading. >> >> On Sat, 9 Sep 2006, Tony Ladd wrote: >> >>> Bruce >>> >>> Thanks for the quick and detailed reply. Its very helpful. My first >>> thing will be to spin the disks up on a desktop box and see if the >>> spin up times change. If so it may be power, as you suggest-then I >>> have to debug that. I did measure the voltage across a spare power >>> connector during spin up. It dropped from about 12.15 to 11.95 V (or >>> something close to that). That did not seem too bad to me. I can also >>> pull the power from 1/2 the drives and see what that does. We don't >>> have any data on these drives right now so that makes us more >>> flexible. >>> >>> Tony >>> >>> -----Original Message----- >>> From: Bruce Allen [mailto:ba...@gr...] >>> Sent: Saturday, September 09, 2006 6:07 PM >>> To: Tony Ladd >>> Cc: sma...@li... >>> Subject: Re: [smartmontools-support] Reading the logs >>> >>> >>> On Sat, 9 Sep 2006, Tony Ladd wrote: >>> >>>> Sorry to ask what are probably elementary questions. I have a RAID >>>> array with a 3ware 9550-SX controller and 8 brand new WD2500YS >>>> drives. I have since replaced the drive marked FAILING_NOW but I am >>>> concerned about the long spin up time on several of the other >>>> drives. >>> >>> As long as the normalized value of the attribute is greater than the >>> threshold, you should not be (too) concerned. Of course if the >>> normalized value is decreasing with time and approaching the >>> threshold, you should be sure to have a spare drive on hand! >>> >>>> 1) How do the RAW values convert to normalized values. I realize it >>>> counts backwards but the scale makes no sense to me. For example, >>>> why does 34091 convert to 131 and 47798 convert to 153. Also the >>>> scale seems very nonlinear. And I don't believe the drive was taking >>>> 54 secs to spin up. >>> >>> The only people who know this are the ones at Western Digital. You >>> can try asking them. Try to get a 'Field Application Engineer' to >>> talk with you. >>> >>>> 2) What does (Average 7) etc. signify? >>> >>> I think you can probably ignore this. Some IBM/Hitachi disks were >>> using the high-order bits of the raw value to store the average >>> spin-up time. I wasn't aware of other vendors using the high bits. >>> Apparently now WD is using them, but obviously not to store the >>> average spin-up time. >>> >>>> 3) Are the spinup times recorded every time the system reboots? >>> >>> I expect they are recorded every time that the disk spins up. I am >>> not sure if a warm reboot will power cycle the disk or cause it to >>> spin up. Looking at the other SMART attribute values will probably >>> show that there is a counter you can use to track the number of >>> spin-ups. >>> >>>> If I got a faster spin up on another box would it overwrite the >>>> SMART log value. >>> >>> Yes >>> >>>> 4) Can all these drives be crap? (OK OK I know what the answer is). >>>> But I wondered if the 3ware card could have something to do with >>>> this. It is supposed to use a staggered spin up-can that affect the >>>> timings in smartmontools? >>> >>> PROVIDED that the drives are getting enough raw power, this can't be >>> the fault of the controller. >>> >>>> 5) I also wonder about the power supply. Its an old S'micro chassis >>>> with redundant 2 X 300W power supplies. I have 10 drives (+CD & >>>> floppy) but a dual P3 m'board. I have measured the power consumption >>>> at the outlet during start up and maximal running (cpus and disks >>>> all >>>> working) and its about 260W with 1 power supply (a bit marginal I >>>> know) but about 185W in redundant mode. Its not a cheap upgrade to >>>> 400W redundant power so I would like to resolve this rather than just >>>> guess at the problem. >>> >>> Hmmm, I think that this could be responsible. Normally disks draw >>> about twice their nominal operating current while spinning up. You >>> could measure the 5v and 12v current waveforms while the drives spin >>> up (use a chart recorder or storage oscilliscope or A/D converter + >>> labview) and see if the drive is getting as much current as the specs >>> call for. >>> >>> Good luck! >>> >>> Cheers, >>> Bruce >>> >>> >> > > |