Thank you your answers! I'm new on this area and have no experience. However
I tested the config file options and command line arguments of smartd and
smartctl before I intsalled them to these computers. In the last two weeks I
didn't find any negative or unhoped-for behavior. But I appreciate any
experienced user's advice. So:
If I understand you correctly: Do you agree with me in that the SMART
logging and the opertion of smartd doesn't cause any harm to the disks,
Could you clarify why is a very bad idea to run self tests periodically? It
seems to me a good method to discover errors, and its use is included in the
example config files also.
I use own scipt with the -M directive. It sends me an e-mail, and sends a
passive check result to Nagios! Does this solution have any disatvantage
against your syslog-based method?
Thanks in advance!
From: Eric Praetzel [mailto:praetzel@...]
Sent: Thursday, March 31, 2005 3:36 PM
To: Horvath Tamas
Subject: Re: [smartmontools-support]Question about log wrapping
Yes the logs wrap - it's built into the harddrive.
0) Using the self test feature (short or long) is a very very bad way to
test for drive failure. I use the option to report all changes. If I
see any error or "CRIT" message then the drive is 95% on it's way to
failure - but likely days or weeks from data corruption. Typically the
drives are still under warranty and we can replace the drive or rebuild
the RAID array well before there is any sort of failure. Given that it
takes upto 2 days to rebuild our RAID array - knowing about the failures
in advance of data corruption is important. Ie you can setup "hot
on the RAID array - and just set it as a replacement for the failing
so that the rebuilding happens in the background with all drives working.
1) You should syslog all servers to a central machine. Then I use simple
scripts to rip thru the logs and search for particular messages (errors,
signs of attacks, fan and cpu temperature monitoring etc etc etc. That
script sends out error messages if thresholds are reached...
I maintain about 15 servers and 250 PCs - the only way to manage things
is to syslog everthing to a central server for easy data collection and
analysis. This is trivial on Linux, painful on Solaris and fairly easy
From: Sebastian Vuorinen <sebastian.vuorinen@he...> - 2005-04-01 10:48:19
On Thu, 2005-03-31 at 18:40 +0200, Horvath Tamas wrote:
> If I understand you correctly: Do you agree with me in that the SMART
> logging and the opertion of smartd doesn't cause any harm to the disks,
> don't you?
It should not, unless there is a bug in the drive firmware.
Smartmontools just calls features of the drive firmware and assumes them
to be safe. In the end it is the drive manufacturer's responsibility.
Personally I haven't lost a drive in a way that I would attribute to
SMART. The drives that I did loose were already showing signs of trouble
> Could you clarify why is a very bad idea to run self tests periodically? It
> seems to me a good method to discover errors, and its use is included in the
> example config files also.
I think you misunderstood. It's a good idea to run selftests, but it's
also a good idea to track changes in the SMART-attributes. They can
sometimes tell you a drive is having trouble way before it would show up
in a selftest.
A drive that starts to accumulate bad sectors at a rate of 4 a day is
probably going to fail. However a drive usually has around 200 spare
This problem will only show up in the selftest after the drive runs out
of spare sectors. In this case selftest would register a problem in 50
days or so assuming the drive didn't have any bad sectors before.
Wouldn't you want to use those 50 days of advance warning ?