From: Hans-Peter J. <hp...@ur...> - 2013-03-19 17:59:02
|
Dear Stanislav, dear smartmontools developers, I've upgraded one of my servers to smartmontools 6.1 today, that happens to drive a Areca SAS controller, specifically: <5>[ 7.206669] Areca RAID Controller6: F/W V1.46 2009-01-06 & Model ARC-1680 <6>[ 7.270516] scsi6 : Areca SAS Host Adapter RAID Controller( RAID6 capable) <6>[ 7.270516] Driver Version 1.20.00.15 2010/08/05 Thing is, the system freezes immediately after running (updating) smartd. rpm stopped at 89%, and the system was *locked*. While I can switch consoles, even Sys-Req S and U doesn't finish anymore, while B worked. :-( Reverting to version 6.0 resolved it for me. I'm using the simplest of all cases of /etc/smartd.conf, since that worked reasonably well: DEVICESCAN -d removable -m root While userspace is rather old (openSUSE 11.1/i586), it's controlled by a reasonably current kernel: <5>[ 0.000000] Linux version 3.8.3-3-pae (geeko@buildhost) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Fri Mar 15 08:16:33 UTC 2013 (1ca6928) All packages are built on OBS, smartmontools here: https://build.opensuse.org/package/show?package=smartmontools&project=home%3Afrispete%3Atools I would seriously consider reverting the 6.0 -> 6.1 transition in the openSUSE repos, until this issue is fixed. Let me know, what additional information I can provide in order to resolve the issue. Thanks, Pete |
From: Alex S. <ml...@os...> - 2013-03-20 10:39:05
|
On 03/19/2013 06:51 PM, Hans-Peter Jansen wrote: > Dear Stanislav, dear smartmontools developers, > > I've upgraded one of my servers to smartmontools 6.1 today, that happens to > drive a Areca SAS controller, specifically: > > <5>[ 7.206669] Areca RAID Controller6: F/W V1.46 2009-01-06 & Model ARC-1680 > <6>[ 7.270516] scsi6 : Areca SAS Host Adapter RAID Controller( RAID6 capable) > <6>[ 7.270516] Driver Version 1.20.00.15 2010/08/05 > > Thing is, the system freezes immediately after running (updating) smartd. rpm > stopped at 89%, and the system was *locked*. While I can switch consoles, even > Sys-Req S and U doesn't finish anymore, while B worked. :-( > Sounds scary. Please do some tests: 1) Test on smartd from fresh svn to make sure that local patches are not affecting the result. 2) Test if smartctl --scan and --scan-open commands works. 3) try to run it with smartd -d -r ioctl and provide resulted output. Also i would recommend to make sure that RAID firmware is up to date, because this could be caused by wrong firmware. |
From: Stanislav B. <sb...@su...> - 2013-03-20 15:05:00
|
Alex Samorukov wrote: > On 03/19/2013 06:51 PM, Hans-Peter Jansen wrote: > > Dear Stanislav, dear smartmontools developers, > > > > I've upgraded one of my servers to smartmontools 6.1 today, that happens to > > drive a Areca SAS controller, specifically: > > > > <5>[ 7.206669] Areca RAID Controller6: F/W V1.46 2009-01-06 & Model ARC-1680 > > <6>[ 7.270516] scsi6 : Areca SAS Host Adapter RAID Controller( RAID6 capable) > > <6>[ 7.270516] Driver Version 1.20.00.15 2010/08/05 > > > > Thing is, the system freezes immediately after running (updating) smartd. rpm > > stopped at 89%, and the system was *locked*. While I can switch consoles, even > > Sys-Req S and U doesn't finish anymore, while B worked. :-( > > > Sounds scary. Please do some tests: > > 1) Test on smartd from fresh svn to make sure that local patches are not > affecting the result. > 2) Test if smartctl --scan and --scan-open commands works. > 3) try to run it with smartd -d -r ioctl and provide resulted output. > > Also i would recommend to make sure that RAID firmware is up to date, > because this could be caused by wrong firmware. Well, thinking about it, smartmontools itself should never crash the system completely. Only kernel bug can. Or firmware bug not handled in the kernel. S.M.A.R.T. over Areca is new in 6.1. Version 6.0 did not contain it, so the crash could not happen. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sb...@su... Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 028 951 Czech Republic http://www.suse.cz/ |
From: Hans-Peter J. <hp...@ur...> - 2013-03-20 16:00:20
|
Dear Alex, On Mittwoch, 20. März 2013 11:38:49 Alex Samorukov wrote: > On 03/19/2013 06:51 PM, Hans-Peter Jansen wrote: > > Dear Stanislav, dear smartmontools developers, > > > > I've upgraded one of my servers to smartmontools 6.1 today, that happens > > to > > drive a Areca SAS controller, specifically: > > > > <5>[ 7.206669] Areca RAID Controller6: F/W V1.46 2009-01-06 & Model > > ARC-1680 <6>[ 7.270516] scsi6 : Areca SAS Host Adapter RAID > > Controller( RAID6 capable) <6>[ 7.270516] Driver Version 1.20.00.15 > > 2010/08/05 > > > > Thing is, the system freezes immediately after running (updating) smartd. > > rpm stopped at 89%, and the system was *locked*. While I can switch > > consoles, even Sys-Req S and U doesn't finish anymore, while B worked. > > :-( > > Sounds scary. Please do some tests: > > 1) Test on smartd from fresh svn to make sure that local patches are not > affecting the result. > 2) Test if smartctl --scan and --scan-open commands works. > 3) try to run it with smartd -d -r ioctl and provide resulted output. Will do that soon (hopefully I can do these tests in single user mode). > Also i would recommend to make sure that RAID firmware is up to date, > because this could be caused by wrong firmware. Honestly, I don't buy this argument. This behavior points to a serious *kernel* problem in this area, since a system app, even running as root, should never ever be able to firmly lock the system, no matter what firmware version it is running. But such issues must be analysed and fixed, of course.. This mail was intended to both: warn users about it, and harvest some hints on how to preceed. Thanks for sharing the tests, Pete |
From: Alex S. <ml...@os...> - 2013-03-20 16:22:40
|
On 03/20/2013 04:59 PM, Hans-Peter Jansen wrote: > Honestly, I don't buy this argument. This behavior points to a serious > *kernel* problem in this area, since a system app, even running as root, > should never ever be able to firmly lock the system, no matter what firmware > version it is running. But such issues must be analysed and fixed, of course.. > > This mail was intended to both: warn users about it, and harvest some hints on > how to preceed. > > Thanks for sharing the tests, It is not the case. With root privileges you can lock/crash the system with thousands different methods. And unfortunately buggy firmware may cause such issues, e.g. if after sending some legacy command device will stop responding there is not too much you can do on software level. In smartctl some workarounds for such buggy devices are already present, but to understand what's going on we should gather more information. In 6.1 some areca related code was changed, but as far as i could see you are not using -d areca in your configuration, so this seems to be very different issue. BTW you may want to run this tests from bootable USB/CD to avoid mounting of the file systems and to minimize risk of the FS damage. |
From: Christian F. <Chr...@t-...> - 2013-03-21 22:20:21
|
Alex Samorukov wrote: > > BTW you may want to run this tests from bootable USB/CD to avoid > mounting of the file systems and to minimize risk of the FS damage. > BTW the current 20130321 snapshot of ALT Linux Rescue CD already contains smartmontools 6.1 and may be useful for testing. http://nightly.altlinux.org/sisyphus/flavours/rescue/ Thanks, Christian |
From: Alex S. <ml...@os...> - 2013-03-20 16:25:45
|
On 03/20/2013 04:04 PM, Stanislav Brabec wrote: > S.M.A.R.T. over Areca is new in 6.1. Version 6.0 did not contain it, so > the crash could not happen. SMART over areca is not new in 6.1 (but extended). I don`t see any references that -d areca was in use, so most likely it is unrelated. Also SCSI support was extended, so in theory it is possible that some SCSI commands to areca exported device causing this. Argument "-r ioctl" should help to find the problem. |
From: Christian F. <Chr...@t-...> - 2013-03-20 19:03:56
|
Alex Samorukov wrote: > On 03/20/2013 04:04 PM, Stanislav Brabec wrote: > >> S.M.A.R.T. over Areca is new in 6.1. Version 6.0 did not contain it, so >> the crash could not happen. > SMART over areca is not new in 6.1 (but extended). I don`t see any > references that -d areca was in use, so most likely it is unrelated. > Also SCSI support was extended, so in theory it is possible that some > SCSI commands to areca exported device causing this. Argument "-r ioctl" > should help to find the problem. > According to the original post the problem occurs with DEVICESCAN in smartd.conf. On Linux, there is no '-d areca,' auto-detection and DEVICESCAN does not scan /dev/sg* which is required for Areca access. The Areca related code is likely not involved in this problem unless '-d areca' is used in smartd.conf. If the problem is actually Areca related, it might happen when smartd scans the device node (/dev/sdX or /dev/disk/disk*) of a *logical* drive emulated by the controller. Some smartd 6.1 enhancements that might trigger driver bugs or similar: - smartd now reads SCSI VPD page 0x80 (Unit Serial Number) if reported as supported in VPD page 0x00, see SCSIDeviceScan(). - If -A (--attributelog) option is specified, smartd now reads various SCSI error counter log pages, see SCSICheckDevice(). - On Linux, DEVICESCAN now scans for drives behind MegaRAID controllers, see linux_smart_interface::get_dev_megasas(). Thanks, Christian |