From: Mark N. <ma...@ti...> - 2004-04-16 17:17:12
|
On 15 Apr 2004 at 10:35, David K. wrote: > Have others had issues similar to this when the disk subsystem is > under stress?? I would really hate to disable smart while running > the disk tests of ctcs. Any thoughts?? Yes, I get this error every once in a while on a RedHat 7.3 system, when the drive is being backed up to tape. But not every time. And usually only once or twice during an hour long backup. It was discussed (without resolution)on this list in a thread starting on March 9, with the title "timeout waiting for DMA". -- Mark W. Nienberg, SE Tipping Mar + associates 1906 Shattuck Ave, Berkeley, CA 94704 visit our website at http://www.tippingmar.com |
From: David K. <dab...@ex...> - 2004-04-21 17:15:24
|
--- On Thu 04/15, Bruce Allen < ba...@gr... > wrote: >From: Bruce Allen [mailto: ba...@gr...] >To: dab...@ex... > Cc: B.Z...@el..., >sma...@li... >Date: Thu, 15 Apr 2004 09:52:40 -0500 (CDT) >Subject: Re: [smartmontools-support]dma_timer_expiry while >running self tests >I think that this *may* be a kernel driver bug. Are you using a >stock RH >kernel (currently 2.4.20-30.9) or something else? I am running the stock RH kernel 2.4.20.30.9bigmem. >Cheers, > Bruce >PS: I assume that your self-tests are NOT being done in >'captive' mode. >If they are, that can busy out the drive and will probably cause >an IDE >bus reset like you are seeing. I have seen this if I use -s option to schedule them in smartd.conf or if I call the tests from a script like this: /usr/sbin/smartctl -t short $drive At this point, I'm just disabling smart self tests while ctcs is running. It's not optimal, but it avoids having to explain spurious errors to manufacturing technicians :^) If there is something I can do to help get this really fixed, please let me know. Thank you. David _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web! |
From: Bruce A. <ba...@gr...> - 2004-04-21 18:44:46
|
Hi David, > >Subject: Re: [smartmontools-support]dma_timer_expiry while >running self tests > > >I think that this *may* be a kernel driver bug. Are you using a >stock RH > >kernel (currently 2.4.20-30.9) or something else? > I am running the stock RH kernel 2.4.20.30.9bigmem. > >PS: I assume that your self-tests are NOT being done in >'captive' mode. > >If they are, that can busy out the drive and will probably cause >an IDE > >bus reset like you are seeing. > I have seen this if I use -s option to schedule them in smartd.conf or > if I call the tests from a script like this: > /usr/sbin/smartctl -t short $drive > > At this point, I'm just disabling smart self tests while ctcs is > running. It's not optimal, but it avoids having to explain spurious > errors to manufacturing technicians :^) > > If there is something I can do to help get this really fixed, please > let me know. At the moment, my best guess is that this is a kernel bug that shows up when the system is under heavy load. So strictly speaking I don't think it's related to SMART -- I think perhaps similar time-outs would be observed if the disk were just under heavy load. Anyway, I'll let you know if I learn more. Bruce |
From: Bruce A. <ba...@gr...> - 2004-04-16 19:53:56
|
> On 15 Apr 2004 at 10:35, David K. wrote: > > Have others had issues similar to this when the disk subsystem is > > under stress?? I would really hate to disable smart while running > > the disk tests of ctcs. Any thoughts?? > > Yes, I get this error every once in a while on a RedHat 7.3 system, when the drive is > being backed up to tape. But not every time. And usually only once or twice during > an hour long backup. It was discussed (without resolution)on this list in a thread > starting on March 9, with the title "timeout waiting for DMA". Mark, thanks for the reminder -- I forgot about that discussion. When you see the errors, does it appear in any way correlated with smartmontools? For example is smartd accessing the drive? (Note: if smartd is running with default command-line options, it will check exactly on 30-min intervals. So you can take the timestamp of the 'nearest' syslog entry from smartd and determine whether it was doing a check at the time when you saw the timeout error. Cheers, Bruce |
From: Mark N. <ma...@ti...> - 2004-04-16 21:04:06
|
Here is a log segment that I think disproves my notion that smartmontools was somehow causing this problem. The 30 minute increments don't seem to occur at the same time as the DMA errors. Apr 7 16:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 2 Throughput_Performance changed from 100 to 146 Apr 7 16:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 8 Seek_Time_Performance changed from 142 to 100 Apr 7 16:40:34 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 8 Seek_Time_Performance changed from 100 to 142 Apr 7 22:21:50 gingham kernel: hda: dma_timer_expiry: dma status == 0x20 Apr 7 22:21:50 gingham kernel: hda: timeout waiting for DMA Apr 7 22:21:50 gingham kernel: hda: timeout waiting for DMA Apr 7 22:21:50 gingham kernel: hda: (__ide_dma_test_irq) called while not waiting Apr 7 22:21:50 gingham kernel: hda: status timeout: status=0xd0 { Busy } Apr 7 22:21:50 gingham kernel: Apr 7 22:21:50 gingham kernel: hda: drive not ready for command Apr 7 22:21:50 gingham kernel: ide0: reset: success Apr 7 22:24:33 gingham kernel: hda: dma_timer_expiry: dma status == 0x21 Apr 7 22:24:33 gingham kernel: hda: error waiting for DMA Apr 7 22:24:33 gingham kernel: hda: dma timeout retry: status=0xd0 { Busy } Apr 7 22:24:33 gingham kernel: Apr 7 22:24:33 gingham kernel: hda: DMA disabled Apr 7 22:24:33 gingham kernel: ide0: reset: success Apr 7 22:40:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 1 Raw_Read_Error_Rate changed from 100 to 99 Apr 7 23:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 1 Raw_Read_Error_Rate changed from 99 to 98 Apr 8 00:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 1 Raw_Read_Error_Rate changed from 98 to 99 Apr 8 01:40:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut e: 1 Raw_Read_Error_Rate changed from 99 to 100 Apr 8 02:10:34 gingham smartd[790]: Device: /dev/hda, starting scheduled Short Self-Test. On 16 Apr 2004 at 14:53, Bruce Allen wrote: > When you see the errors, does it appear in any way correlated with > smartmontools? For example is smartd accessing the drive? (Note: if > smartd is running with default command-line options, it will check > exactly on 30-min intervals. So you can take the timestamp of the > 'nearest' syslog entry from smartd and determine whether it was doing > a check at the time when you saw the timeout error. > > Cheers, > Bruce -- Mark W. Nienberg, SE Tipping Mar + associates 1906 Shattuck Ave, Berkeley, CA 94704 visit our website at http://www.tippingmar.com |
From: Bruce A. <ba...@gr...> - 2004-04-16 21:51:13
|
Hi Mark, Thanks. I agree that it appears at first glance as if smartd/smartmontools is NOT the culprit here. Cheers, Bruce On Fri, 16 Apr 2004, Mark Nienberg wrote: > Here is a log segment that I think disproves my notion that smartmontools was > somehow causing this problem. The 30 minute increments don't seem to occur at > the same time as the DMA errors. > > Apr 7 16:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 2 Throughput_Performance changed from 100 to 146 > Apr 7 16:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 8 Seek_Time_Performance changed from 142 to 100 > Apr 7 16:40:34 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 8 Seek_Time_Performance changed from 100 to 142 > > Apr 7 22:21:50 gingham kernel: hda: dma_timer_expiry: dma status == 0x20 > Apr 7 22:21:50 gingham kernel: hda: timeout waiting for DMA > Apr 7 22:21:50 gingham kernel: hda: timeout waiting for DMA > Apr 7 22:21:50 gingham kernel: hda: (__ide_dma_test_irq) called while not waiting > Apr 7 22:21:50 gingham kernel: hda: status timeout: status=0xd0 { Busy } > Apr 7 22:21:50 gingham kernel: > Apr 7 22:21:50 gingham kernel: hda: drive not ready for command > Apr 7 22:21:50 gingham kernel: ide0: reset: success > > Apr 7 22:24:33 gingham kernel: hda: dma_timer_expiry: dma status == 0x21 > Apr 7 22:24:33 gingham kernel: hda: error waiting for DMA > Apr 7 22:24:33 gingham kernel: hda: dma timeout retry: status=0xd0 { Busy } > Apr 7 22:24:33 gingham kernel: > Apr 7 22:24:33 gingham kernel: hda: DMA disabled > Apr 7 22:24:33 gingham kernel: ide0: reset: success > > Apr 7 22:40:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 1 Raw_Read_Error_Rate changed from 100 to 99 > Apr 7 23:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 1 Raw_Read_Error_Rate changed from 99 to 98 > Apr 8 00:10:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 1 Raw_Read_Error_Rate changed from 98 to 99 > Apr 8 01:40:33 gingham smartd[790]: Device: /dev/hda, SMART Prefailure Attribut > e: 1 Raw_Read_Error_Rate changed from 99 to 100 > Apr 8 02:10:34 gingham smartd[790]: Device: /dev/hda, starting scheduled Short > Self-Test. > > On 16 Apr 2004 at 14:53, Bruce Allen wrote: > > When you see the errors, does it appear in any way correlated with > > smartmontools? For example is smartd accessing the drive? (Note: if > > smartd is running with default command-line options, it will check > > exactly on 30-min intervals. So you can take the timestamp of the > > 'nearest' syslog entry from smartd and determine whether it was doing > > a check at the time when you saw the timeout error. > > > > Cheers, > > Bruce > > > -- > Mark W. Nienberg, SE > Tipping Mar + associates > 1906 Shattuck Ave, Berkeley, CA 94704 > visit our website at http://www.tippingmar.com > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > > |