From: Bruce A. <ba...@gr...> - 2007-07-09 00:33:16
|
Mark, David, Doug, Tejin, Alan, Jeff, LKML, I'm afraid that there may be some problem with SMART + libata in the 2.6.22 kernel. An hour ago I discovered that I missed a month of correspondence (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark and others copied to me -- it was automatically shoved into one of my mailboxes by my mail client. Sorry about that. So I am trying to catch up to see if there is some real problem or not. Here is a typical bug report that worries me: http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 Here is another similar report: http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713 And another report: http://www.mail-archive.com/deb...@li.../msg358354.html >From some of the earlier threads that I missed (below) I have the impression that the problem may be a very simple one, namely that starting with 2.6.22 one needs to run a command to enable SMART when a box is first booted -- the kernel no longer does this as part of the init/setup of the disks. But that is NOT consistent with the first two reports above, which show 'SMART ENABLED'. Here are some of the earlier threads that I completely missed: http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html http://www.mail-archive.com/lin...@vg.../msg164863.html Before I go off half-cocked, could anyone shed some light on this? Is there a real problem here or just something dumb? Cheers, Bruce |
From: Bruce A. <ba...@gr...> - 2007-07-09 01:14:23
|
Here is another similar report: http://article.gmane.org/gmane.linux.utilities.smartmontools/4704/match=diamondmax Again, this indicates that SMART is enabled. But it's not clear what the kernel version here is. The report indicates that the problem started with an FC7 kernel upgrade Bruce On Sun, 8 Jul 2007, Bruce Allen wrote: > Mark, David, Doug, Tejin, Alan, Jeff, LKML, > > I'm afraid that there may be some problem with SMART + libata in the 2.6.22 > kernel. An hour ago I discovered that I missed a month of correspondence > (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark > and others copied to me -- it was automatically shoved into one of my > mailboxes by my mail client. Sorry about that. So I am trying to catch up > to see if there is some real problem or not. > > Here is a typical bug report that worries me: > http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 > > Here is another similar report: > http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713 > > And another report: > http://www.mail-archive.com/deb...@li.../msg358354.html > > From some of the earlier threads that I missed (below) I have the impression > that the problem may be a very simple one, namely that starting with 2.6.22 > one needs to run a command to enable SMART when a box is first booted -- the > kernel no longer does this as part of the init/setup of the disks. But that > is NOT consistent with the first two reports above, which show 'SMART > ENABLED'. > > Here are some of the earlier threads that I completely missed: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html > http://www.mail-archive.com/lin...@vg.../msg164863.html > > Before I go off half-cocked, could anyone shed some light on this? Is there > a real problem here or just something dumb? > > Cheers, > Bruce > |
From: Jeff G. <je...@ga...> - 2007-07-09 02:10:02
|
On the base point, libata has never enabled SMART on its own. That's always up to the BIOS, etc. It's possible that the recent addition of ACPI support will cause disks to be in different modes than previously expected. ACPI supplies ATA taskfiles to be pushed to the disk, and who knows what's in there... Jeff |
From: David G. <da...@dg...> - 2007-07-09 11:55:14
|
Hi Bruce >> From some of the earlier threads that I missed (below) I have the > impression that the problem may be a very simple one, namely that > starting with 2.6.22 one needs to run a command to enable SMART when a > box is first booted -- the kernel no longer does this as part of the > init/setup of the disks. But that is NOT consistent with the first two > reports above, which show 'SMART ENABLED'. > > Here are some of the earlier threads that I completely missed: > > http://www.mail-archive.com/lin...@vg.../msg164863.html This is mine and although it's a 'real' problem, it is something that's easy to hack around by having the suspend script turn on smart after it is resumed. (Of course I can't use resume until a skge wol bug is fixed so I won't see/test this unless asked too.) The smart init scripts run '-s on' when the system boots anyway for my system - this problem only occurs for me during suspend/resume. Maybe smartd should detect that as Alan says. Please let me know if there's anything else you need. David |
From: Bruce A. <ba...@gr...> - 2007-07-09 17:35:43
|
On Sun, 8 Jul 2007, Jeff Garzik wrote: Jeff, thanks for the quick feedback. > On the base point, libata has never enabled SMART on its own. That's > always up to the BIOS, etc. OK, clear. > It's possible that the recent addition of ACPI support will cause disks > to be in different modes than previously expected. ACPI supplies ATA > taskfiles to be pushed to the disk, and who knows what's in there... Is there a simple way I can have affected users test this? Is there a kernel boot flag or sysctl setting or something else they can use to disable the ACPI stuff so see if the problem then goes away? Cheers, Bruce |
From: Jeff G. <je...@ga...> - 2007-07-09 17:53:18
|
Bruce Allen wrote: > On Sun, 8 Jul 2007, Jeff Garzik wrote: > > Jeff, thanks for the quick feedback. > >> On the base point, libata has never enabled SMART on its own. That's >> always up to the BIOS, etc. > > OK, clear. > >> It's possible that the recent addition of ACPI support will cause >> disks to be in different modes than previously expected. ACPI >> supplies ATA taskfiles to be pushed to the disk, and who knows what's >> in there... > > Is there a simple way I can have affected users test this? Is there a > kernel boot flag or sysctl setting or something else they can use to > disable the ACPI stuff so see if the problem then goes away? The 'noacpi' module option. Jeff |
From: Bruce A. <ba...@gr...> - 2007-07-09 18:02:54
|
Hi Jeff, >>> It's possible that the recent addition of ACPI support will cause disks to >>> be in different modes than previously expected. ACPI supplies ATA >>> taskfiles to be pushed to the disk, and who knows what's in there... >> >> Is there a simple way I can have affected users test this? Is there a >> kernel boot flag or sysctl setting or something else they can use to >> disable the ACPI stuff so see if the problem then goes away? > > The 'noacpi' module option. OK, thanks. Klaus, Jan: could you please see if your problem with 2.6.22 goes away with noacpi passed as a flag to libata? Jeff: I will add the noacpi test suggestion into the Debian bug report here http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=428975 to try to ensure that Klaus sees it. Cheers, Bruce |
From: Bruce A. <ba...@gr...> - 2007-07-09 17:56:22
|
Hi David, >> http://www.mail-archive.com/lin...@vg.../msg164863.html > This is mine and although it's a 'real' problem, it is something that's easy > to hack around by having the suspend script turn on smart after it is > resumed. (Of course I can't use resume until a skge wol bug is fixed so I > won't see/test this unless asked too.) > > The smart init scripts run '-s on' when the system boots anyway for my system > - this problem only occurs for me during suspend/resume. Maybe smartd should > detect that as Alan says. OK, that should be easy to do. So let's forget about the 'SMART disabled' issue. This is easy to fix in multiple ways and is not a LKML issue. David: can you reproduce the more serious problem http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 reported by Jan Dvorak? Jeff: this is the problem that really has me concerned. Jan: what happens if you replace '-d ata' with '-d sat'? This option should be available in the 5.37 release of smartmontools that you are using unless the Suse package maintainer is playing games with the version numbers. Unfortunately I don't think this will fix the problem, as the bug report by Klaus Fuerstberger http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=428975 is using '-d sat'. Jeff: the fact that both links given above are reporting the same bug in two different settings, and the fact that the bug goes away when reverting 2.6.22 to 2.6.21 still has me concerned. Cheers, Bruce |
From: David G. <da...@dg...> - 2007-07-09 18:00:55
|
Bruce Allen wrote: > Hi David, > >>> http://www.mail-archive.com/lin...@vg.../msg164863.html > >> This is mine and although it's a 'real' problem, it is something >> that's easy to hack around by having the suspend script turn on smart >> after it is resumed. (Of course I can't use resume until a skge wol >> bug is fixed so I won't see/test this unless asked too.) >> >> The smart init scripts run '-s on' when the system boots anyway for my >> system - this problem only occurs for me during suspend/resume. Maybe >> smartd should detect that as Alan says. > > OK, that should be easy to do. So let's forget about the 'SMART > disabled' issue. This is easy to fix in multiple ways and is not a LKML > issue. Sure. > David: can you reproduce the more serious problem > http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 > reported by Jan Dvorak? Sorry, I haven't seen that problem. David |
From: Jeff G. <je...@ga...> - 2007-07-09 18:43:23
|
Bruce Allen wrote: > http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 > reported by Jan Dvorak? Relevant lspci and dmesg output would be useful... that gives enhanced error diagnostics. Jeff |
From: Adam S. <sma...@ad...> - 2007-07-09 23:36:12
|
On Sun, Jul 08, 2007 at 08:14:10PM -0500, Bruce Allen wrote: > On Sun, 8 Jul 2007, Bruce Allen wrote: > > I'm afraid that there may be some problem with SMART + libata in the 2.6.22 > > kernel. An hour ago I discovered that I missed a month of correspondence > > (some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark > > and others copied to me -- it was automatically shoved into one of my > > mailboxes by my mail client. Sorry about that. So I am trying to catch up > > to see if there is some real problem or not. > > > > Here is a typical bug report that worries me: > > http://article.gmane.org/gmane.linux.utilities.smartmontools/4712 > > > > Here is another similar report: > > http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713 > > > > And another report: > > http://www.mail-archive.com/deb...@li.../msg358354.html > > > > From some of the earlier threads that I missed (below) I have the impression > > that the problem may be a very simple one, namely that starting with 2.6.22 > > one needs to run a command to enable SMART when a box is first booted -- the > > kernel no longer does this as part of the init/setup of the disks. But that > > is NOT consistent with the first two reports above, which show 'SMART > > ENABLED'. [snipped] > Here is another similar report: > > http://article.gmane.org/gmane.linux.utilities.smartmontools/4704/match=diamondmax > > Again, this indicates that SMART is enabled. But it's not clear what the > kernel version here is. The report indicates that the problem started > with an FC7 kernel upgrade That was me, and the kernel in question is 2.6.21-1.3194.fc7. I tried Jeff's noacpi suggestion, and here is the outcome. I am sure it comes as no surprise that his patch to support the boot-time parameter libata.noacpi is not included in this kernel: Kernel command line: ro root=/dev/vg0/fc-root rhgb selinux=0 nodmraid libata.noacpi=1 Unknown boot option `libata.noacpi=1': ignoring However, the module option is there: # modinfo libata filename: /lib/modules/2.6.21-1.3194.fc7/kernel/drivers/ata/libata.ko version: 2.20 license: GPL description: Library module for ATA devices author: Jeff Garzik srcversion: 44DAFFD701701A15EB2D574 depends: scsi_mod vermagic: 2.6.21-1.3194.fc7 SMP mod_unload 686 4KSTACKS parm: atapi_enabled:Enable discovery of ATAPI devices (0=off, 1=on) (int) parm: atapi_dmadir:Enable ATAPI DMADIR bridge support (0=off, 1=on) (int) parm: fua:FUA support (0=off, 1=on) (int) parm: ignore_hpa:Ignore HPA (0=keep BIOS setting 1=ignore it) (int) parm: ata_probe_timeout:Set ATA probing timeout (seconds) (int) parm: noacpi:Disables the use of ACPI in suspend/resume when set (int) And when used via: # cat /etc/modprobe.d/libata options libata noacpi=1 I still see the same problem: smartctl version 5.37 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Maxtor DiamondMax 10 family (ATA/133 and SATA/150) Device Model: Maxtor 6L250S0 Serial Number: L50A1B8H Firmware Version: BANC1G10 User Capacity: 251,000,193,024 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Mon Jul 9 23:39:25 2007 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled Error SMART Status command failed Please get assistance from http://smartmontools.sourceforge.net/ Register values returned from SMART Status command are: CMD=0x50 FR =0x00 NS =0x00 SC =0x00 CL =0xc2 CH =0x00 SEL=0x00 A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Regarding Kai's very recent analysis elsewhere in this thread: > sata_nc has been changed between 2.6.21 and 2.6.22-rc1 and this > particular smartctl problem may or may not be specific to CK804. I should note that this particular machine is indeed using that chipset: 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) HTH, Adam |
From: Mark L. <lk...@rt...> - 2007-07-10 18:23:04
|
Adam Spiers wrote: > On Sun, Jul 08, 2007 at 08:14:10PM -0500, Bruce Allen wrote: >> On Sun, 8 Jul 2007, Bruce Allen wrote: >>.. >> http://article.gmane.org/gmane.linux.utilities.smartmontools/4704/match=diamondmax >> >> Again, this indicates that SMART is enabled. But it's not clear what the >> kernel version here is. The report indicates that the problem started >> with an FC7 kernel upgrade So it's a bug in the sata_nv.c port driver. In particular, I see this in the bug report: >Jun 30 10:23:42 atlantic kernel: ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1501000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 That looks a bit strange, because the driver goes to some effort to prevent these kind of commands from ever being issued "in ADMA mode", precisely because there's no way to do a tf_read in that mode. Mmm.. buggy somewhere in there. |