From: Jon H. <jd_...@ya...> - 2007-11-02 08:56:17
|
I have discovered what i think is an issue with the delay smartd allows before running a scheduled check on a SPUN DOWN SATA DRIVE. I get a message telling me that the drive isn't capable in the log barely 5 secs after it has tried to do the test - not enough time for the drive to spin up! I have confirmed this by tricking smartd and making sure the drive is up. Works fine. Also I had previously seen problems with the 30min checks where by my log is filled with 'ata soft reset errors' when it tries to 'check' a spun down disk. Stopping it checking if the drive is sleeping 'fixed' this. Any comments? I have posted on Gentoo about this also... http://forums.gentoo.org/viewtopic-p-4437926.html?sid=e8a8fdb5f0d9c8c847407559e9716713 ----------------------- N: Jon Hardcastle E: Jon@eHardcastle.com 'The writing is on the wall...' ----------------------- |
From: Bruce A. <ba...@gr...> - 2007-11-02 10:32:45
|
Jon: thanks for the report. Tejun: what are your thoughts about this? Should be fix be at the kernel level or at the application level? Did you recent libata changes already address this? Cheers, Bruce On Fri, 2 Nov 2007, Jon Hardcastle wrote: > I have discovered what i think is an issue with the > delay smartd allows before running a scheduled check > on a SPUN DOWN SATA DRIVE. > > I get a message telling me that the drive isn't > capable in the log barely 5 secs after it has tried to > do the test - not enough time for the drive to spin > up! I have confirmed this by tricking smartd and > making sure the drive is up. Works fine. > > Also I had previously seen problems with the 30min > checks where by my log is filled with 'ata soft reset > errors' when it tries to 'check' a spun down disk. > Stopping it checking if the drive is sleeping 'fixed' > this. > > Any comments? > > I have posted on Gentoo about this also... > > > http://forums.gentoo.org/viewtopic-p-4437926.html?sid=e8a8fdb5f0d9c8c847407559e9716713 > > ----------------------- > N: Jon Hardcastle > E: Jon@eHardcastle.com > 'The writing is on the wall...' > ----------------------- > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Jon H. <jd_...@ya...> - 2007-11-02 10:44:15
|
Hey, no probs. I was worried smartmon might have been left to drift and wasn't maintained. Would've been a tragedy for such a fantastic suit! Also note I found another bug thing... I mentioned on my post on Gentoo. My IDE drives I can see the prgress of a test i kick off.. I cant with SATA ones.. but i think it updates the list on completion. today at 3pm my drives are scheduled to check again. No tricks this time.. I have just not configured them to spin down. I have a feeling it will work. Cheers. --- Bruce Allen <ba...@gr...> wrote: > Jon: thanks for the report. > > Tejun: what are your thoughts about this? Should be > fix be at the kernel > level or at the application level? Did you recent > libata changes already > address this? > > Cheers, > Bruce > > On Fri, 2 Nov 2007, Jon Hardcastle wrote: > > > I have discovered what i think is an issue with > the > > delay smartd allows before running a scheduled > check > > on a SPUN DOWN SATA DRIVE. > > > > I get a message telling me that the drive isn't > > capable in the log barely 5 secs after it has > tried to > > do the test - not enough time for the drive to > spin > > up! I have confirmed this by tricking smartd and > > making sure the drive is up. Works fine. > > > > Also I had previously seen problems with the 30min > > checks where by my log is filled with 'ata soft > reset > > errors' when it tries to 'check' a spun down disk. > > Stopping it checking if the drive is sleeping > 'fixed' > > this. > > > > Any comments? > > > > I have posted on Gentoo about this also... > > > > > > > http://forums.gentoo.org/viewtopic-p-4437926.html?sid=e8a8fdb5f0d9c8c847407559e9716713 > > > > ----------------------- > > N: Jon Hardcastle > > E: Jon@eHardcastle.com > > 'The writing is on the wall...' > > ----------------------- > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? > Stop. > > Now Search log events and configuration files > using AJAX and a browser. > > Download your FREE copy of Splunk now >> > http://get.splunk.com/ > > _______________________________________________ > > Smartmontools-support mailing list > > Sma...@li... > > > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > > ----------------------- N: Jon Hardcastle E: Jon@eHardcastle.com 'The writing is on the wall...' ----------------------- |
From: Tejun H. <ht...@gm...> - 2007-11-02 10:55:04
|
Bruce Allen wrote: >> I have discovered what i think is an issue with the >> delay smartd allows before running a scheduled check >> on a SPUN DOWN SATA DRIVE. >> >> I get a message telling me that the drive isn't >> capable in the log barely 5 secs after it has tried to >> do the test - not enough time for the drive to spin >> up! I have confirmed this by tricking smartd and >> making sure the drive is up. Works fine. >> >> Also I had previously seen problems with the 30min >> checks where by my log is filled with 'ata soft reset >> errors' when it tries to 'check' a spun down disk. >> Stopping it checking if the drive is sleeping 'fixed' >> this. >> >> Any comments? >> >> I have posted on Gentoo about this also... >> >> http://forums.gentoo.org/viewtopic-p-4437926.html?sid=e8a8fdb5f0d9c8c847407559e9716713 > > Tejun: what are your thoughts about this? Should be fix be at the > kernel level or at the application level? Did you recent libata changes > already address this? I can't tell from the report itself. What exactly is the problem? It isn't clear why smartd thinks the drive can't do smart. I don't think spinning down has anything to do with it. If a command issued to a drive requires spin up, the drive spins up and completes the command. If timeout is set too short, timeout triggers, the error is logged and smartd is notified as such. I don't think this was the case tho. I think the first thing to do is to find out what actually happened. Thanks. -- tejun |
From: Jon H. <jd_...@ya...> - 2007-11-02 11:44:17
|
I am at work atm so I can't provide precise .conf setting and log errors BUt the jist of it is as follows. I have my 5 drives (2 IDE 3 SATA) configured to run a short test tues/weds/thurs/sat/sun and a long test fri/mon at 3pm when it tries to do this the 3 sata drives fail to do so and I am notified as such via email. Please note at this stage the 3 sata are raided and used for data the 2 ide are also raided but are root. Come 3pm the data drives will have almost certainly spun down. Looking at the logs there is seconds(as in 2~5) literally between when the smartd kicks in on the sata and when it decides the drive isn't capable. I can tell you now it takes at least 10~15 seconds for the drive to spin up.. probably longer. I also know that when i run the test manually using smartctl it seems to work, and that when the scheduled test is run when the drives HAVENT spun down it also works. I get the SATA soft reset error I have seen on these forums and on the internet and it seems to be caused by the spinning up of the drives and it not responding quick enough (or something).. I will know for sure at 3pm BST (currently 11:40) as the server is configured to run the tests as per usual with me doing nothing to 'engineer' it. All I have done is stopped hdparm from setting a sleep time. I'm sorry for being a shade vague but I am new to this :) --- Tejun Heo <ht...@gm...> wrote: > Bruce Allen wrote: > >> I have discovered what i think is an issue with > the > >> delay smartd allows before running a scheduled > check > >> on a SPUN DOWN SATA DRIVE. > >> > >> I get a message telling me that the drive isn't > >> capable in the log barely 5 secs after it has > tried to > >> do the test - not enough time for the drive to > spin > >> up! I have confirmed this by tricking smartd and > >> making sure the drive is up. Works fine. > >> > >> Also I had previously seen problems with the > 30min > >> checks where by my log is filled with 'ata soft > reset > >> errors' when it tries to 'check' a spun down > disk. > >> Stopping it checking if the drive is sleeping > 'fixed' > >> this. > >> > >> Any comments? > >> > >> I have posted on Gentoo about this also... > >> > >> > http://forums.gentoo.org/viewtopic-p-4437926.html?sid=e8a8fdb5f0d9c8c847407559e9716713 > > > > Tejun: what are your thoughts about this? Should > be fix be at the > > kernel level or at the application level? Did you > recent libata changes > > already address this? > > I can't tell from the report itself. What exactly > is the problem? It > isn't clear why smartd thinks the drive can't do > smart. I don't think > spinning down has anything to do with it. If a > command issued to a > drive requires spin up, the drive spins up and > completes the command. > If timeout is set too short, timeout triggers, the > error is logged and > smartd is notified as such. I don't think this was > the case tho. > > I think the first thing to do is to find out what > actually happened. > > Thanks. > > -- > tejun > ----------------------- N: Jon Hardcastle E: Jon@eHardcastle.com 'The writing is on the wall...' ----------------------- |
From: Tejun H. <ht...@gm...> - 2007-11-02 12:29:34
|
Hello, Jon. Please don't top-post. Jon Hardcastle wrote: > Looking at the logs there is seconds(as in 2~5) > literally between when the smartd kicks in on the sata > and when it decides the drive isn't capable. I can > tell you now it takes at least 10~15 seconds for the > drive to spin up.. probably longer. I also know that > when i run the test manually using smartctl it seems > to work, and that when the scheduled test is run when > the drives HAVENT spun down it also works. Okay, please do the following. 1. Post /var/log/boot.msg or dmesg result right after boot. 2. Post the result of "lspci -nnv" 3. Post the results of "hdparm -I /dev/sdX" where sdX is one of th problematic drives. 4. Run "hdparm -y /dev/sdX" to spin it down then issue SMART short test manually. If it fails the same way, post what smartd/smartctl says and the result of dmesg after the failure. Thanks. -- tejun |
From: Tejun H. <ht...@gm...> - 2007-11-02 12:29:45
|
Hello, Jon. Please don't top-post. Jon Hardcastle wrote: > Looking at the logs there is seconds(as in 2~5) > literally between when the smartd kicks in on the sata > and when it decides the drive isn't capable. I can > tell you now it takes at least 10~15 seconds for the > drive to spin up.. probably longer. I also know that > when i run the test manually using smartctl it seems > to work, and that when the scheduled test is run when > the drives HAVENT spun down it also works. Okay, please do the following. 1. Post /var/log/boot.msg or dmesg result right after boot. 2. Post the result of "lspci -nnv" 3. Post the results of "hdparm -I /dev/sdX" where sdX is one of th problematic drives. 4. Run "hdparm -y /dev/sdX" to spin it down then issue SMART short test manually. If it fails the same way, post what smartd/smartctl says and the result of dmesg after the failure. Thanks. -- tejun |