From: Kaden N. <ka...@em...> - 2004-04-14 23:21:33
|
Hi all, Which test(s) would you recommend to run on servers (Win32 and Linux) and how often. This is what a I figure Daily smartctl -t short /dev/hdX then smartctl -l selftest /dev/hdX and email me if the out put of the command is anything other than "Completed without error" smartctl -a /dev/hdX and email me the difference between yesterdays command and today's Monthly smartctl -a /dev/hdX email me the whole output of the command Do I only need to run long selftests when I suspect are drive is dying or should I run them daily, weekly or monthly? Does anyone already have a plan for this that they would like to share? Any help would be awesome. Thanks Kaden |
From: Bruce A. <ba...@gr...> - 2004-04-15 01:05:44
|
Hi Kaden, On "my" machines I do a short self-test daily and a long self-test weekly. Please note that instead of writing a script to run smartctl as you are planning, you can run a daemon (smartd) that does these things for you. A good reason to run long self-tests is to search for unreadable (uncorrectable = UNC) sectors. On important systems it might make sense to do such searches more often than weekly, though I also suspect that the stress of doing a full read scan (long self test) of the disk TOO often (ie, daily) might have a negative impact on the disks' ultimate lifetime and reliability. As you can see, this is a compromise between choices that are hard to quantify. Other people may arrive at very different answers. Cheers, Bruce On Thu, 15 Apr 2004, Kaden Napper wrote: > Hi all, > Which test(s) would you recommend to run on servers (Win32 and Linux) and > how often. > > This is what a I figure > > Daily > smartctl -t short /dev/hdX > then > smartctl -l selftest /dev/hdX > and email me if the out put of the command is anything other than "Completed > without error" > > smartctl -a /dev/hdX > and email me the difference between yesterdays command and today's > > > Monthly > smartctl -a /dev/hdX > email me the whole output of the command > > Do I only need to run long selftests when I suspect are drive is dying or > should I run them daily, weekly or monthly? > > Does anyone already have a plan for this that they would like to share? > > Any help would be awesome. > > > Thanks > Kaden > > > |
From: Robert P. <b0...@gm...> - 2004-04-16 16:22:07
|
I'm also doing a short selftest daily and a long weekly on my "home machines". What I'm missing a bit: I'm running smartd on my laptop (which is not 24/7 up), too. How could I tell it easily "if no short-selftest was done today, do it now; if no long was done in the last 7 weeks, do that now"? |
From: Bruce A. <ba...@gr...> - 2004-04-16 20:06:55
|
Hi Robert, > I'm also doing a short selftest daily and a long weekly on my "home > machines". OK. > What I'm missing a bit: I'm running smartd on my laptop (which is not > 24/7 up), too. How could I tell it easily "if no short-selftest was > done today, do it now; if no long was done in the last 7 weeks, do > that now"? I actually have the same problem with my laptop. There's no way for smartd to do this. I thought about allowing such flexibility when I added the scheduled self-test features, but I eventually abandonded the idea. Part of the reason is that a large number of disks from several vendors have faulty firmware and so they don't keep accurate timestamps in their self-test logs, or don't have accurate timestamps in Attribute 9. This makes it hard to see when the last self-test was It might be possible to modify smartd so that it checks between polling intervals to see if it 'missed some' polling because a laptop was asleep and then to check if any self-tests had been missed. But at the moment I don't have time to modify the code to do this. If anyone wants to volunteer I'd be happy to suggest a way to do this. And it might not be 'the right thing' because for laptops that sleep a lot (like mine) it would imply frequent self-tests. The bottom line is that you should set up the smartd self-test schedule on your laptop in a way that will ensure that (one the average) you get a long self-test about once a week. But if you want to be certain that these are not missed, you need to use smartctl -t to run them by hand. Cheers, Bruce |
From: Robert P. <b0...@gm...> - 2004-04-17 10:14:44
|
Bruce Allen wrote: > I actually have the same problem with my laptop. There's no way for > smartd to do this. Hi! I looked around a little and found out that my distribution (gentoo-linux, but it seems they copied it from SuSE) uses a cron-based script for that. "/usr/sbin/run-crons" is called once in an hour and looks for scripts in /etc/cron.[hourly|daily|weekly|monthly] to call (it only executes them if they weren't in the last 24hours and so on - realized with some "lastrun" file). So I wrote two simple shell-scripts to call long and short selftests using smartctl in these directories. The run-crons script is here: http://gentoo.kems.net/gentoo-x86-portage/sys-apps/cronbase/files/run-crons If that scheduled selftest with smartctl finds some error now, will smartd email me? Thanks Robert |
From: Bruce A. <ba...@gr...> - 2004-04-18 06:25:57
|
Hi Robert, > > I actually have the same problem with my laptop. There's no way for > > smartd to do this. > > I looked around a little and found out that my distribution (gentoo-linux, > but it seems they copied it from SuSE) uses a cron-based script for that. > "/usr/sbin/run-crons" is called once in an hour and looks for scripts > in /etc/cron.[hourly|daily|weekly|monthly] to call (it only executes them > if they weren't in the last 24hours and so on - realized with some > "lastrun" file). So I wrote two simple shell-scripts to call long and short > selftests using smartctl in these directories. > The run-crons script is here: > http://gentoo.kems.net/gentoo-x86-portage/sys-apps/cronbase/files/run-crons > If that scheduled selftest with smartctl finds some error now, will smartd > email me? Yes, smartd will email you anytime the count of failed self-tests goes up, or the timestamp of the most recent failed self-test increases. So you can have self-tests run/driven by an external cron script as you are doing. Cheers, Bruce |
From: Andreas P. <b0...@gm...> - 2004-04-22 11:55:49
|
I configured smartd on my server that it runs a short selftest every day at 1am. When I look at the selftest-log and substract the lifetime when it was executed, I see that it was done all 22h of disc lifetime (and because it runs 24/7 it should be "real" time). It's a samsung sv1604n drive, is this a bug in the firmware again? |
From: Bruce A. <ba...@gr...> - 2004-04-23 03:11:45
|
> I configured smartd on my server that it runs a short selftest every day at > 1am. When I look at the selftest-log and substract the lifetime when it was > executed, I see that it was done all 22h of disc lifetime (and because it > runs 24/7 it should be "real" time). It's a samsung sv1604n drive, is this > a bug in the firmware again? Sort of. The quantity that I call 'half-minutes' is actually in units of 32 seconds, not thirty seconds. This is probably because the firmware finds it easier to divide by 32 than by 30. So when an entire day (24 hours) has gone by, the counter has incremented 86400/32=2700 times. Smartmontools interprets this the same way that the device error and self-test log do, as 2700x30 (not 32) seconds. This is exactly 22.50 hours, as you are seeing. I could fix smartmontools to interpret this correctly as units of 32 seconds, but then Attribute 9 would not agree with the self-test log. Conclusion: firmware bug. Sigh. Bruce |