On Tue, 17 Jun 2003, Christopher Wolf wrote:
> Thank for responding. I still don't understand the overall
> "picture". I've embedded some questions/comments below.
> At 05:18 AM 6/17/2003 -0500, Bruce Allen wrote:
> >Hi Chris,
> > > Can anyone explain these results?
> >Short version: your drive is going bad. Back up your data and get the
> >drive replaced.
> I'm confused as to why don't the internal diagnostics don't think so.
The internal diagnostics DO think so. However the drive does not yet have
failing SMART status, which indicates a predicted lifetime < 24 hours.
> > > Off-line data collection status: (0x02) Offline data collection activity
> > > completed without error.
The key word here is 'offline'. The drive's offline tests are still
running without error. But the selftests are failing. Please read
smartctl man page for an explanation of the difference.
> >use -o on to enable automatic offline data collection (be sure to use a
> >recent version, there was a bug in the code for older versions).
> is the 5.1-4 version I'm using OK?
Note -- use 5.1-11 or better yet 5.1-14.
> > > 5
> > Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail - 2
> >Note the two reallocated sectors. Typically these maxtor drives can
> >reallocate up to a few hundred sectors. But at least two sectors on the
> >disk are bad and were eliminated from the "usable sectors" list.
> Why does it keep giving a read error on sector 0x0007d6e6? Why doesn't it
> relocate it if it has free sectors to do so?
I don't know -- I can only conjecture that the read error may not be due
to a bad sector.
> > > 194
> > Temperature_Celsius 0x0022 082 079 042 Old_age - 47
> >Wow -- this is VERY high. I suggest you try and get some fan to blow air
> >on this disk. It's awfully hot...
> It's "running idle" temp is around 42 degress C, so this is a minor
> increase while it's doing something. Even with the case off and a fan
> blowing directly on it, 42 is about as low as I can go. I have a physical
> temp monitor that verifies this measurement as real.
Hmm, what's the ambient room temperature? I have a bunch (well,
hundreds) of maxtor drives in a 21 C room and they all run at 23-27 C.
> But the drive can see this from it's sensor and I'd assume if it was a
> problem it'd be reflected in the Temp value/worse/threshold series, no?
Sure -- the drive is not failing. Just remember that if it does get down
to threshold value or below, that means predicted failure in < 24 hours.
Speaking from extensive experience and industry studies, each 5 C temp
increase doubles the failure rate.
> > # 1 Extended off-line Completed: read
> failure 90% 3068 0x0007d6e6
> > # 2 Short off-line Completed: read
> failure 50% 2901 0x0007d6e6
> > # 3 Short off-line Completed: read
> failure 50% 2735 0x0007d6e6
> > # 4 Extended off-line Completed: read
> failure 90% 2568 0x0007d6e6
> >The disk is having real problems. If it were mine, I would replace it
> >without much delay. The fact that it is failing the self-tests in the
> >same place, and has been failing them for a couple of months, should not
> >give you a false sense of reassurance.
> So why isn't the drive reallocating these bad sectors? Why is it always at
> the same address? I assume it's not reallocating them because there are 13
> failures but the reallocation attribute only shows 2 reallocations.
I don't know. Apparently the read problem can not be solved by
> Why do all the read attributes show raw values of zero even though the
> tests keep failing on read errors?
I don't know. It may be that the read Attributes show error RATES rather
than total error numbers. So unless you are trying to read data from the
bad LBA, the error rate remains zero.
> >Bottom line: you need a new disk. It looks like it's less than a year old
> >(much less than 8000 hours usage) so Maxtor's warranty should cover it.
> I'll probably need to provide them some sort of proof, no? Guess I'll need
> to find and try their diagnostics and see what those say.
If you just tell them that the drive is failing it's SMART short &
extended self-test at the same address each time, that should be enough
for them to replace it. That, after all, is what the self-tests are for.