Hi Dewi,
> I've looked into this some more, with many thanks to Bruce Allen for
> helping me with the samsung error. (yeah, I know, I should have read
> the man pages, it says it very clearly in there!)
Glad you noticed!
> Seems the 'smartcheck' email I was getting came from a program called
> "cpanel" installed on my shiny new server (available from
> http://cpanel.net and widely used on many servers). It is using the
> 5-year-old 2.1 smartsuite package.
>
> I'm suggesting to them that they upgrade their software to your latest
> version, so you don't get calls like this again :)
That would be nice!
> Whether they do or not, hopefully now you'll at least know what you're
> dealing with if you get any future calls about "smartcheck".
Yup -- it's smartsuite.
> The suggestion/solution that I have given my sysadmins to resolve my
> problem is:
>
> 1) touch the file /var/cpanel/disablesmartcheck - this will
> permanently disable "smartcheck" (deleting the script will merely stop
> it until the next upgrade of the software is installed).
>
> 2) ensure that an at-least-vaguely up to date version of smartmontools
> is installed.
>
> 3) modify /etc/smartd.conf so that all -m arguments have valid
> addresses.
>
> 4) add "-F samsung2" to the lines for any drives that need it.
>
> On a semi-unrelated note, could smartmontools be made to autodetect
> bad-firmware drives by default?
It already does. However you were running an out-of-date version of
smartmontools (5.21, as I recall, probably from Fedora?) and so it didn't
have your drive in the database. More recent versions do this.
> Otherwise any script that calls smartctl needs to check the type
> itself, and will need to be modified every time a new bad-firmware
> drive comes out. This isn't terribly friendly to scripters.
Thats why we have a database. Unfortunately, when the new drive comes
out, we need to update it. And users need to update THEIR versions as
well.
> Also, if it autodetects the type, it could also notice when endian changes
> would prevent warnings, and report them as a possible firmware problem.
> Rather than the somewhat esoteric "Warning: ATA error count 256 inconsistent
> with error log pointer 5", you might instead have:
> "Warning: ATA error count '256' appears to be a byte-swapped '1' - this
> drive type is not known to have this firmware bug. Use '-F samsung2' to work
> around it, and email smtools@... with the output of 'smartctl -blah'
> so that it can be autodetected in the next version."
Sadly, this doesn't work. The error count inconsistency with the error
log pointer occurs for reasons other than endian swaps. It can also occur
when a vendor doesn't obey the ATA specs for circular buffer offsets. So
what you propose would occur for drives where the problem can't and
shouldn't be corrected with -F samsung or -F samsung2.
Cheers,
Bruce
|