From: Linda W. <lk...@tl...> - 2008-11-13 21:22:15
|
FYI -- ever since I switched to using SATA, I've not had a stable kernel. Sys uptime went from near infinite (striking planned take downs), to less than a week consistently. I'd been using the Promise 300 TX4 with 1-2 Seagate drives. (PDC40718, rev 02). Finally an explicit problem regarding that controller under Linux, with it timing out a drive returning from suspend during 'SMART' operations, got a suggestions from the community (Tnx, Tejun Heo) to try a _cheaper_ but better featured Silicon Image controller (SiI 3124 Sata). Not only did it NOT have the SMART problem (that would hang the drive or machine), but my random hangs seem to have gone away. My main server has been up nearly 21 days now on 2.6.27-3 SMP (vanilla-i386). I'd had problems with the ranging in kernels going back to 2.6.24 or so when I had first tried adding SATA to the system. So Tnx again to Tejun -- and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't stable for production use -- and has a repeatable problem of timing out some drives before it can spin-up from standby (just the drive -- not the computer). The error logically removes the drive from the system until the next boot (unplugging, and replugging in the SATA cable on the drive would hang the machine within 5 seconds of replugging in the cable). Not an instant, hang as might indicated a HW upset plugging in cable, but a couple second delay after plugin -- before keyboard would lock up -- pointing toward the software trying to re-add+initialize the drive. Needless to say, I'm only using the Sil controller now, and things are stable. |
From: Mikael P. <mi...@it...> - 2008-11-16 11:08:22
|
Tejun Heo writes: > > and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't > > stable for production use -- and has a repeatable problem of timing out > > some drives before it can spin-up from standby (just the drive -- not the > > computer). The error logically removes the drive from the system until > > the next boot (unplugging, and replugging in the SATA cable on the drive > > would hang the machine within 5 seconds of replugging in the cable). Not > > an instant, hang as might indicated a HW upset plugging in cable, but a > > couple second delay after plugin -- before keyboard would lock up -- > > pointing toward the software trying to re-add+initialize the drive. > > Some promise controllers seem to suffer transmission problems when > combined with certain drives, which often show up as timeouts. The > hardreset of sata_promise wasn't as robust as it should have been and > in some cases it wasn't able to recover a link after error condition > causing the system to lose drive after such events. The hardreset > problem was fixed recently by Mikael Pettersson. Can you please try > 2.6.28-rc5 and see whether sata_promise still loses drives after > failures? > > Mikael, I think the hardreset fix is worthy including into -stable. > It should be safe for -stable too, right? The hardreset fix was included in 2.6.27.5. I wanted it in 2.6.26-stable too, but that branch seems to have been closed now. |
From: Brad C. <br...@wa...> - 2008-11-16 18:08:40
|
Mikael Pettersson wrote: > Tejun Heo writes: > > > and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't > > > stable for production use -- and has a repeatable problem of timing out > > > some drives before it can spin-up from standby (just the drive -- not the > > > computer). The error logically removes the drive from the system until > > > the next boot (unplugging, and replugging in the SATA cable on the drive > > > would hang the machine within 5 seconds of replugging in the cable). Not > > > an instant, hang as might indicated a HW upset plugging in cable, but a > > > couple second delay after plugin -- before keyboard would lock up -- > > > pointing toward the software trying to re-add+initialize the drive. > > > > Some promise controllers seem to suffer transmission problems when > > combined with certain drives, which often show up as timeouts. The > > hardreset of sata_promise wasn't as robust as it should have been and > > in some cases it wasn't able to recover a link after error condition > > causing the system to lose drive after such events. The hardreset > > problem was fixed recently by Mikael Pettersson. Can you please try > > 2.6.28-rc5 and see whether sata_promise still loses drives after > > failures? > > > > Mikael, I think the hardreset fix is worthy including into -stable. > > It should be safe for -stable too, right? > > The hardreset fix was included in 2.6.27.5. I wanted it in 2.6.26-stable > too, but that branch seems to have been closed now. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > Is that likely to do anything for the old SATA150-TX4 ? I have 2 of them in a machine and I've been dropping drives under write load recently but it was a 2.6.27.4 kernel. Reboot required to pick up the drives again (unless the kernel panics and it reboots itself - which it's been doing also). Brad -- Dolphins are so intelligent that within a few weeks they can train Americans to stand at the edge of the pool and throw them fish. |