Ticket #82 (closed enhancement: wontfix)
danger of high Load_Cycle_Count and WD 'Intelli-park' self-destruction "feature"
|Reported by:||virtuousfox||Owned by:||somebody|
recently i was quite unpleasantly introduced to the issue of drive self-destruction via infinite and damaging tries to "save some energy" best described in those links:
long story... longer
i'm a "proud" owner of 5 WD drives, 3 of which had this "feature" enabled from factory and those happened to be most recently acquired ones:
1) WDC WD15EARS-00Z5B1 [80.00A80] acquired approximately in
winter 2010 with currently
and zero bad sectors
2) WDC WD10EADS-00M2B0 [01.00A01] acquired
~autumn 2009 with now
and 1 uncorrectable, 1 pending sectors
3) WDC WD10EADS-65L5B1 [01.01A01] acquired
~spring 2009 with now
and 1 uncorrectable, 1 pending sectors
4) WDC WD3000JS-00PDB0 [21.00M21] acquired
sometime 2006-2007 with now
688 Power_Cycle_Count (no LCC counter)
and zero bad sectors but 20 reallocations
5) WDC WD2500AAJS-00VTA0 [01.01B01] acquired
sometime ~2008 with now
350 Load_Cycle_Count (same as Power_Cycle_Count)
and 6 uncorrectable, 6 pending sectors.
as you may see - newest drives have ridiculous amount of LCCs but a wasn't paying any attention to them, until about 1-2 mounts ago drives 2 and 3 (same model) began to stop answering to kernel and it started resetting them very often (always at times of their low but non-zero r/w activity like using torrent, watching low-bitrate videos or answering hddtemp/smartd queries) [ http://pastebin.ca/1873324 ].
it aggravated in spoiled sectors for both of them yesterday and i started digging and dug those links on top of the ticket.
program from link №4-2 (version 1.00) showed that "Intelli-Park feature" was:
enabled on drives 1,2,3 and set to default of 8 seconds
but disabled at 4
and didn't exist on drive 5.
instead of letting me disable it, utility of 1.00 version has set minimum of 6 seconds 'idle timer' for all 4 (no way to select one drive at a time), so i had to set all four for maximum of 25.5 seconds on second try.
then i used version 1.05 from link №5 and it said that drives 1,2,3 are "newest drives" and their timer can be set from 30 seconds to 300 or properly disabled but gave 'busy' errors on drives 4,5.
so, i issued 'disable' command and it reported that 'idle timer' for drives 1,2,3 was disabled but stuck completely on drive 4 and i had to hard reset DOS along with it.
before i tried any of programs i looked for LCC via smartctl for a while and it was growing approximately 1 time per 10-60 seconds which was not good at all.
after manipulation with programs it's increasing only 1 time per complete shutdown/startup (as Power_Cycle_Count).
i hadn't any reseting issues from that time also (but it was just yesterday so we'll see later).
strange thing: drive 2 and 3 failed identically and most of times they was reseted by kernel simultaneously but drive 3 has newer firmware and normal number of LCCs.
my thought that WD did same thing with it as some people think they did earlier with EACS drive series for which they have acknowledged the issue at first and allegedly "fixed" with never revisions/firmware.
after all - we know how they "fixed" the issue of unaligned partitions (and you already know what "nice" idea caused that) on EARS series 4K-block drives.
all in all
i write all that here because that issue cannot be ignored by people whose drives are not damaged too much yet - they must know the issue and prevent it.
after reading links at a top i do not think that WD going to notify anyone:
on link №2 they blame "Linux":
"Some utilities, operating systems, and applications, such as some implementations of Linux, for example, are not optimized for low power storage devices and can cause our drives to wake up at a higher rate than normal."
but not only Windows(tm) users suffer from it too (and their kernel is not capable of resetting the drive and not die) but 2 of 3 WD's suggestions to fix it are useless (even without logging there's no way system can stay without any r/w activity for more than a minute and most of those drives are not capable of APM to begin with).
the only effective way of preventing damage i see is to alert user at once about high LCC increase per some short interval of time via SMART monitoring software such as smartmontools/smartd and hope that he would be able to get ahold of 'wdidle3' program or at least to tune kernel (dirty_writeback_centisecs, dirty_expire_centisecs,etc.) so it will give away writes at least once per 7 seconds or something.
be able to tune settings of a timer via something else than obscure and glitchy DOS program would be nice too but WD help on that is not expectable and reverse-engineering is unlikely.