From: Bruce A. <ba...@gr...> - 2003-10-04 12:39:29
|
Hi Jeffrey, > I've been looking at using smartmontools on a few clusters > especially since I read Bruce's comments that he uses it in his > clusters. I've been trying smartmontools on a few different hard > drives (some old, some new) to get a feel for what is being reported. > Now, I'm ready to start diving into parsing smartmontools output for > useful information to watch for drive problems and drive failures. What we do on our cluster is: (1) Run smartd, with this config file: # First and second ATA/IDE hard disk. Monitor all attributes /dev/hda -S on -o on -a -I 194 -m me...@my...dress /dev/hdc -S on -o on -a -I 194 -m me...@my...dress (2) Run self-tests once per week from a cron script: #! /bin/bash # Once per week, run extended self-tests on the disks see man smartctl # for further details of how this works. if [ -e /proc/ide/hda ] ; then /usr/sbin/smartctl -t long /dev/hda > /dev/null 2> /dev/null && \ /usr/bin/logger -t rundiskselftests "Starting long self-test on /dev/hda" || \ /usr/bin/logger -t rundiskselftests "FAILED starting long self-test on /dev/hda" fi if [ -e /proc/ide/hdc ] ; then /usr/sbin/smartctl -t long /dev/hdc > /dev/null 2> /dev/null && \ /usr/bin/logger -t rundiskselftests "Starting long self-test on /dev/hdc" || \ /usr/bin/logger -t rundiskselftests "FAILED starting long self-test on /dev/hdc" fi (3) We have a separate script running that periodically uses smartctl -A and grep/awk to monitor the temperature of the disk, and sends an email if they are too high. > However, let me ask, does anyone have any scripts for parsing the > output of smartmontools for a cluster setting? I'd suggest using smartd for this. On another cluster that I've heard about, they run smartd, and then run a SHORT self-test about once every hour or so. Cheers, Bruce |