Redeeman wrote:
On Wed, 2008-12-10 at 19:11 -0500, Bill Davidsen wrote:
  
Justin Piszcz wrote:
    
Point of thread: Two problems, mentioned in detail below, NCQ in Linux 
when used in a RAID configuration and two, something with how Linux 
interacts with the drives causes lots of problems as when I run the WD 
tools on the disks, they do not show any errors.

If anyone has/would like me to run any debugging/patches/etc on this 
system feel free to suggest/send me things to try out.  After I put 
the VR's in a test system, I left NCQ enabled and I made a 10 disk 
raid5 to see how fast I could get it to fail, I ran bonnie++ shown 
below as a disk benchmark/stress test:

For the next test I will repeat this one but with NCQ disabled, having 
NCQ enabled makes it fail very easily.  Then I want to re-run the test 
with RAID6.

bonnie++ -d /r1/test -s 1000G -m p63 -n 16:100000:16:64

$ df -h
/dev/md3              2.5T  5.5M  2.5T   1% /r1

And the results?  Two disk "failures" according to md/Linux within a 
few hours as shown below:

Note, the NCQ-related errors are what I talk about all of the time, if 
you use
NCQ and Linux in a RAID environment with WD drives, well-- good luck.

Two-disks failed out of the RAID5 and I currentlty cannot even 'see' 
one of the drives with smartctl, will reboot the host and check sde 
again.

After a reboot, it comes up and has no errors, really makes one wonder 
where/what the bugs is/are, there are two I can see:
1. NCQ issue on at least WD drives in Linux in SW md/RAID
2. Velociraptor/other disks reporting all kinds of sector errors etc, 
but when you use the WD 11.x disk tools program and run all of their 
tests it says the disks have no problems whatsoever!  The smart 
statistics do confirm this.  Currently, TLER is on for all disks, for 
the duration of these tests.
      
Just a few comments on this, I have several RAID arrays built on Seagate 
using NCQ, and yet to have a problem. I have NCQ on with my WD drives, 
non-RAID, and haven't had an issue with them either. The WDs run a lot 
cooler than the SG, but they are probably getting less use, as well. If 
the WD are still on sale after the holiday I may grab a few more and run 
RAID, by then I will have some small sense of trusting them.
    
Velociraptors, or which WD?
  

Calls itself "WDC WD10EACS-00D" in /sys if that helps. I could dig out the packing slip if it matters. Runs nicely so far, and if the SMART temperature probe is correct, very cool:
/dev/sda: ST3750640AS: 43 C
/dev/sdb: WDC WD10EACS-00D6B1: 31 C
/dev/sdc: ST3750640AS: 44 C
/dev/sdd: ST3750640AS: 46 C
I don't totally trust the temps, there is a LOT of 18C air going into that box, because it has a lot coming out the back and side.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark