From: Bruce A. <ba...@gr...> - 2007-08-24 11:49:38
|
Hi Michael, This sounds to me like it might be a bug in the smartmontools CCISS Linux interface or in smartd. To help me track this down, could you see if by using smartctl (rather than smartd) you can run selftests on the other disks? If you can, then the bug is probably in smartd. If you can't then the bug is probably in the smartmontools CCISS Linux interface code. Cheers, Bruce On Wed, 22 Aug 2007, Michael Mansour wrote: > Hi, > > I'm using Scientific Linux 5.0 which comes pre-packaged with: > > # smartctl -V > smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > smartctl comes with ABSOLUTELY NO WARRANTY. This > is free software, and you are welcome to redistribute it > under the terms of the GNU General Public License Version 2. > See http://www.gnu.org for further details. > > CVS version IDs of files used to build this code are: > Module: atacmdnames.c revision: 1.13 date: 2006/04/12 > uses: atacmdnames.h revision: 1.5 date: 2006/04/12 > Module: atacmds.c revision: 1.168 date: 2006/04/12 > uses: atacmds.h revision: 1.81 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: extern.h revision: 1.41 date: 2006/04/12 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: ataprint.c revision: 1.164 date: 2006/04/12 > uses: atacmdnames.h revision: 1.5 date: 2006/04/12 > uses: atacmds.h revision: 1.81 date: 2006/04/12 > uses: ataprint.h revision: 1.28 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: extern.h revision: 1.41 date: 2006/04/12 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: knowndrives.h revision: 1.16 date: 2006/04/05 > uses: smartctl.h revision: 1.23 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: knowndrives.c revision: 1.139 date: 2006/04/05 > uses: atacmds.h revision: 1.81 date: 2006/04/12 > uses: ataprint.h revision: 1.28 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: extern.h revision: 1.41 date: 2006/04/12 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: knowndrives.h revision: 1.16 date: 2006/04/05 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: os_linux.c revision: 1.82 date: 2006/04/12 > uses: atacmds.h revision: 1.81 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: os_linux.h revision: 1.24 date: 2006/04/12 > uses: scsicmds.h revision: 1.57 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: scsicmds.c revision: 1.85 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: extern.h revision: 1.41 date: 2006/04/12 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: scsicmds.h revision: 1.57 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: scsiprint.c revision: 1.107 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: extern.h revision: 1.41 date: 2006/04/12 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: scsicmds.h revision: 1.57 date: 2006/04/12 > uses: scsiprint.h revision: 1.20 date: 2006/04/12 > uses: smartctl.h revision: 1.23 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: smartctl.c revision: 1.143 date: 2006/04/12 > uses: atacmds.h revision: 1.81 date: 2006/04/12 > uses: ataprint.h revision: 1.28 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: extern.h revision: 1.41 date: 2006/04/12 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: knowndrives.h revision: 1.16 date: 2006/04/05 > uses: scsicmds.h revision: 1.57 date: 2006/04/12 > uses: scsiprint.h revision: 1.20 date: 2006/04/12 > uses: smartctl.h revision: 1.23 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > Module: utility.c revision: 1.61 date: 2006/04/12 > uses: configure.in revision: 1.113 date: 2005/11/27 > uses: int64.h revision: 1.13 date: 2006/04/12 > uses: utility.h revision: 1.43 date: 2006/04/12 > > smartmontools release 5.36 dated 2006/04/12 at 17:39:01 UTC > smartmontools build host: i686-redhat-linux-gnu > smartmontools build configured: 2007/03/27 08:24:02 UTC > smartctl compile dated Mar 27 2007 at 04:24:14 > smartmontools configure arguments: '--build=i686-redhat-linux-gnu' > '--host=i686-redhat-linux-gnu' '--target=i386-redhat-linux-gnu' > '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' > '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' > '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' > '--localstatedir=/var' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' > '--infodir=/usr/share/info' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 > -mtune=generic -fasynchronous-unwind-tables' > 'build_alias=i686-redhat-linux-gnu' 'host_alias=i686-redhat-linux-gnu' > 'target_alias=i386-redhat-linux-gnu' > > I have a HP Proliant DL380 G3 server with six 146Gb U320 drives. > > All drives are detected correctly when using smartctl. > > I use the following in smartd.conf to schedule the short and long tests: > > /dev/cciss/c0d0 -d cciss,0 -a -o on -S on -s > (S/../../(1|2|3|4|5|6)/05|L/../../7/05) -m my@emailaddress > /dev/cciss/c0d0 -d cciss,1 -a -o on -S on -s > (S/../../(1|2|3|4|5|6)/06|L/../../7/06) -m my@emailaddress > /dev/cciss/c0d0 -d cciss,2 -a -o on -S on -s > (S/../../(1|2|3|4|5|6)/07|L/../../7/07) -m my@emailaddress > /dev/cciss/c0d0 -d cciss,3 -a -o on -S on -s > (S/../../(1|2|3|4|5|6)/08|L/../../7/08) -m my@emailaddress > /dev/cciss/c0d0 -d cciss,4 -a -o on -S on -s > (S/../../(1|2|3|4|5|6)/09|L/../../7/09) -m my@emailaddress > /dev/cciss/c0d0 -d cciss,5 -a -o on -S on -s > (S/../../(1|2|3|4|5|6)/10|L/../../7/10) -m my@emailaddress > > What happens is that when the short tests are kicked off, only disk 5 gets the > test request, none of the other actually get tested ie: > > disk 0 tests get done on disk 5 > disk 1 tests get done on disk 5 > disk 2 tests get done on disk 5 > disk 3 tests get done on disk 5 > ... all scheduled disk tests get done on disk 5 > > I can manually run: > > smartctl -d cciss,0 -t short /dev/cciss/c0d0 > > and disk 0 which get the short test, it's only when smartd tries to kick off > the test automatically that disk 5 gets done. > > My messages log file shows: > > Aug 19 08:09:10 server smartd[32017]: Device: /dev/cciss/c0d0 [cciss_disk_03], > starting scheduled Long Self-Test. > Aug 19 09:09:10 server smartd[32017]: Device: /dev/cciss/c0d0 [cciss_disk_04], > skip since Self-Test already in progress. > Aug 19 10:09:10 server smartd[32017]: Device: /dev/cciss/c0d0 [cciss_disk_05], > skip since Self-Test already in progress. > > and looking at the selftest logs on each disk, disk 5 shows all the tests done > for all the other disks. > > I use this same smartd.conf scheduled config on other HP Proliant servers > without issues, yet this is the only server of this type I have which gives > this problem. > > Any ideas what the problem here could be? > > Thanks. > > Michael. > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |