From: Tomáš S. <tsm...@re...> - 2008-04-07 07:34:26
|
Hello, I have a bugreport that I don't know what to do with: the user has Samsung HD161HJ disks connected to the Promise PDC40718 (SATA 300 TX4, rev 02) controller and smartd seems to lock up the machine completely. I know there are some problems with Promise controllers reported. Is this one of the known issues? Or is this a problem of the disks? (I have neither of those to test anything myself.) The bugreport: https://bugzilla.redhat.com/show_bug.cgi?id=436314 Thanks in advance for any response. -- Tomáš Smetana Base OS Software Engineer, Red Hat RH IRC: #brno #devel #base-os; Freenode IRC: #fedora-devel |
From: Bruce A. <ba...@gr...> - 2008-04-07 08:24:49
|
Hi Tomá, I think we need to get someone who works on the Promise controller/driver to help with this. But I have no idea who to approach. Any ideas? Tejun, any suggestions about this? I don't even know if the controller works through libata or has its own interface to the kernel. Cheers, Bruce On Mon, 7 Apr 2008, Tomá Smetana wrote: > Hello, > I have a bugreport that I don't know what to do with: the user has Samsung > HD161HJ disks connected to the Promise PDC40718 (SATA 300 TX4, rev 02) > controller and smartd seems to lock up the machine completely. > > I know there are some problems with Promise controllers reported. Is this one > of the known issues? Or is this a problem of the disks? (I have neither of > those to test anything myself.) > > The bugreport: > https://bugzilla.redhat.com/show_bug.cgi?id=436314 > > Thanks in advance for any response. > > |
From: Tejun H. <ht...@gm...> - 2008-04-07 11:39:27
|
Bruce Allen wrote: > Hi Tomá, > > I think we need to get someone who works on the Promise > controller/driver to help with this. But I have no idea who to > approach. Any ideas? > > Tejun, any suggestions about this? I don't even know if the controller > works through libata or has its own interface to the kernel. Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sda, starting scheduled Offline Immediate Test. Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sdb, starting scheduled Offline Immediate Test. Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sdc, starting scheduled Offline Immediate Test. smartd is starting offline immediate test while the disk is still mounted. So the disk is basically offline and libata reacts that way. Am I missing something? -- tejun |
From: Tomáš S. <tsm...@re...> - 2008-04-07 12:28:00
|
On Mon, 07 Apr 2008 20:39:17 +0900 Tejun Heo <ht...@gm...> wrote: > Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sda, starting > scheduled Offline Immediate Test. > Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sdb, starting > scheduled Offline Immediate Test. > Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sdc, starting > scheduled Offline Immediate Test. > > smartd is starting offline immediate test while the disk is still > mounted. So the disk is basically offline and libata reacts that way. > Am I missing something? Thank you. This makes sense. And I have actually seen those lines but I thought the offline test shouldn't have the effect on the controller (for the manual page says "This command can be given during normal system operation." which I apparently misunderstood). -- Tomáš Smetana Base OS Software Engineer, Red Hat RH IRC: #brno #devel #base-os; Freenode IRC: #fedora-devel |
From: Bruce A. <ba...@gr...> - 2008-04-07 13:39:06
|
Hi Tejun, >> I think we need to get someone who works on the Promise controller/driver >> to help with this. But I have no idea who to approach. Any ideas? >> >> Tejun, any suggestions about this? I don't even know if the controller >> works through libata or has its own interface to the kernel. > > Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sda, starting > scheduled Offline Immediate Test. > Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sdb, starting > scheduled Offline Immediate Test. > Mar 25 18:11:24 bofferding-pcfe smartd[9845]: Device: /dev/sdc, starting > scheduled Offline Immediate Test. > > smartd is starting offline immediate test while the disk is still > mounted. So the disk is basically offline and libata reacts that way. Am > I missing something? This is not what offline means: I think you are confusing 'offline' with 'captive'. In the ATA specs, there are three types of tests listed: - online - offline - captive >From ATA 5 specs: 6.14.2 On-line data collection Collection of SMART data in an on-line mode shall have no impact on device performance. The SMART data that is collected or the methods by which data is collected in this mode may be different than those in the off-line data collection mode for any particular device and may vary from one device to another. 6.14.3 Off-line data collection The device shall use off-line mode for data collection and self-test routines that have an impact on performance if the device is required to respond to commands from the host while performing that data collection. This impact on performance may vary from device to device. The data that is collected or the methods by which the data is collected in this mode may be different than those in the on-line data collection mode for any particular device and may vary from one device to another. Here is from 8.41.4.8.1 section C: If the device is in the process of performing the subcommand routine and is interrupted by any new command from the host except a SLEEP, SMART DISABLE OPERATIONS, SMART EXECUTE OFF- LINE IMMEDIATE, or STANDBY IMMEDIATE command, the device shall suspend or abort the subcommand routine and service the host within two seconds after receipt of the new command. After servicing the interrupting command from the host the device may immediately re-initiate or resume the subcommand routine without any additional commands from the host (see 8.41.5.8.4). The only mode that should block the device is Captive. Here is what the specs say: 8.41.4.8.2 Captive mode When executing a self-test in captive mode, the device sets BSY to one and executes the self-test routine after receipt of the command. At the end of the routine the device places the results of this routine in the Self- test execution status byte and executes command completion. If an error occurs while a device is performing the routine the device may discontinue its testing, place the results of this routine in the Self-test execution status byte, and complete the command. I can try and dig the corresponding info out of the SATA specs, but thought that this would be enough to clarify it. Cheers, Bruce |
From: Tejun H. <ht...@gm...> - 2008-04-07 14:10:24
|
Hello, Bruce. Bruce Allen wrote: > This is not what offline means: I think you are confusing 'offline' with > 'captive'. In the ATA specs, there are three types of tests listed: > > - online > - offline > - captive Ah.. right. I got confused. > From ATA 5 specs: > > 6.14.2 > On-line data collection > Collection of SMART data in an on-line mode shall have no impact on > device performance. The SMART data that is collected or the methods by > which data is collected in this mode may be different than those in the > off-line data collection mode for any particular device and may vary > from one device to another. > > 6.14.3 > Off-line data collection > The device shall use off-line mode for data collection and self-test > routines that have an impact on performance if the device is required to > respond to commands from the host while performing that data collection. > This impact on performance may vary from device to device. The data > that is collected or the methods by which the data is collected in this > mode may be different than those in the on-line data collection mode for > any particular device and may vary from one device to another. > > Here is from 8.41.4.8.1 section C: > If the device is in the process of performing the subcommand routine and > is interrupted by any new command from the host except a SLEEP, SMART > DISABLE OPERATIONS, SMART EXECUTE OFF- LINE IMMEDIATE, or STANDBY > IMMEDIATE command, the device shall suspend or abort the subcommand > routine and service the host within two seconds after receipt of the new > command. After servicing the interrupting command from the host the > device may immediately re-initiate or resume the subcommand routine > without any additional commands from the host (see 8.41.5.8.4). > > > The only mode that should block the device is Captive. Here is what the > specs say: > > 8.41.4.8.2 > Captive mode > When executing a self-test in captive mode, the device sets BSY to one > and executes the self-test routine after receipt of the command. At the > end of the routine the device places the results of this routine in the > Self- test execution status byte and executes command completion. If an > error occurs while a device is performing the routine the device may > discontinue its testing, place the results of this routine in the > Self-test execution status byte, and complete the command. > > I can try and dig the corresponding info out of the SATA specs, but > thought that this would be enough to clarify it. I think the above should be enough. This doesn't really look like it has anything to do with the host controller. It looks like the drive just doesn't respond to anything once it begins offline testing. Tomáš, is it possible for you to hook up the drive to a different controller and see whether you can reproduce the problem there? Thanks. -- tejun |
From: Tomáš S. <tsm...@re...> - 2008-04-07 15:43:30
|
Dne Mon, 07 Apr 2008 22:51:14 +0900 Tejun Heo <ht...@gm...> napsal(a): > I think the above should be enough. This doesn't really look like it > has anything to do with the host controller. It looks like the drive > just doesn't respond to anything once it begins offline testing. Tomáš, > is it possible for you to hook up the drive to a different controller > and see whether you can reproduce the problem there? Unfortunately no. I brought the problem here because I don't have the particular hardware to run the tests myself. Based on your previous mail I suggested the reporter to turn off the automatic offline testing which may show something. I would wait for the result and then possibly ask him to do some more experiments. Thanks and regards. -- Tomáš Smetana Base OS Software Engineer, Red Hat RH IRC: #brno #devel #base-os; Freenode IRC: #fedora-devel |