From: David M. <ma...@ca...> - 2006-07-31 17:20:24
|
We have 4 Maxtor ATLAS10K5_147SCA Version JNZH disks on a Sun Ultra320 dual channel scsi controller. All of them log tons of read: (Errors Corrected by ECC, fast) events. The current range on these disks is 118K to 1.3M such events. Lifetime is around 5700 hours. I've never seen any type of disk do this, and in fact none of the other disks on this system (Fujitsu FC disks) or on any of my other systems list anything but zero in this column. If it was just one disk I'd think it was failing, but since all 4 of them do it I'm wondering if this isn't just how these disks work normally. One of these disks did recently have two consecutive bad blocks appear, but that may have been due to the OS not parking the heads properly on a full power shutdown, since these appeared immediately after a power off for service. Anyway, except for that one disks two bad blocks, and these corrected read events, there have been no other errors. They pass the short and long offline tests. Should I worry about these many, many corrected read events? After my signature is a smartctl -a log for one of the disks that didn't have any bad blocks. Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech # smartctl -a /dev/rdsk/c4t1d0s0 smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: MAXTOR ATLAS10K5_147SCA Version: JNZH Serial number: D40R9MYK Device type: disk Transport protocol: Parallel SCSI (SPI-4) Local Time is: Mon Jul 31 10:17:07 2006 PDT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 30 C Manufactured in week 05 of year Current start stop count: 1074003968 times Recommended maximum start stop count: 1124401151 times Elements in grown defect list: 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 118729 0 0 0 0 2366.645 0 write: 0 0 0 0 0 521.228 0 Non-medium error count: 48 Last n error events log page SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 5858 - [- - -] # 2 Background long Completed - 5836 - [- - -] # 3 Background long Completed - 5811 - [- - -] # 4 Background short Completed - 5809 - [- - -] # 5 Background long Completed - 308 - [- - -] # 6 Background short Completed - 308 - [- - -] Long (extended) Self Test duration: 2880 seconds [48.0 minutes] |
From: Bruce A. <ba...@gr...> - 2006-07-31 17:27:29
|
Hi David, (I recognize you from the BEOWULF mailing list -- welcome!) I don't know if this is cause for worry. I suspect not, but perhaps Doug knows better.... Cheers, Bruce On Mon, 31 Jul 2006, David Mathog wrote: > We have 4 Maxtor ATLAS10K5_147SCA Version JNZH disks on a Sun > Ultra320 dual channel scsi controller. All of them log tons > of read: (Errors Corrected by ECC, fast) events. The current > range on these disks is 118K to 1.3M such events. Lifetime > is around 5700 hours. I've never seen any type of disk do this, and > in fact none of the other disks on this system (Fujitsu FC disks) > or on any of my other systems list anything but zero in this column. > If it was just one disk I'd think it was failing, but since all 4 of > them do it I'm wondering if this isn't just how these disks work > normally. > > One of these disks did recently have two consecutive bad blocks > appear, but that may have been due to the OS not parking the heads > properly on a full power shutdown, since these appeared > immediately after a power off for service. Anyway, except for > that one disks two bad blocks, and these corrected read events, there > have been no other errors. They pass the short and long offline > tests. > > Should I worry about these many, many corrected read events? > > After my signature is a smartctl -a log for one of the disks that > didn't have any bad blocks. > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > # smartctl -a /dev/rdsk/c4t1d0s0 > smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce > Allen > Home page is http://smartmontools.sourceforge.net/ > > Device: MAXTOR ATLAS10K5_147SCA Version: JNZH > Serial number: D40R9MYK > Device type: disk > Transport protocol: Parallel SCSI (SPI-4) > Local Time is: Mon Jul 31 10:17:07 2006 PDT > Device supports SMART and is Enabled > Temperature Warning Enabled > SMART Health Status: OK > > Current Drive Temperature: 30 C > Manufactured in week 05 of year > Current start stop count: 1074003968 times > Recommended maximum start stop count: 1124401151 times > Elements in grown defect list: 0 > > Error counter log: > Errors Corrected by Total Correction > Gigabytes Total > ECC rereads/ errors algorithm > processed uncorrected > fast | delayed rewrites corrected invocations [10^9 > bytes] errors > read: 118729 0 0 0 0 2366.645 > 0 > write: 0 0 0 0 0 521.228 > 0 > > Non-medium error count: 48 > > Last n error events log page > > SMART Self-test log > Num Test Status segment LifeTime > LBA_first_err [SK ASC ASQ] > Description number (hours) > # 1 Background short Completed - 5858 > - [- - -] > # 2 Background long Completed - 5836 > - [- - -] > # 3 Background long Completed - 5811 > - [- - -] > # 4 Background short Completed - 5809 > - [- - -] > # 5 Background long Completed - 308 > - [- - -] > # 6 Background short Completed - 308 > - [- - -] > > Long (extended) Self Test duration: 2880 seconds [48.0 minutes] > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Douglas G. <do...@to...> - 2006-08-02 21:27:58
|
Bruce Allen wrote: > Hi David, > > (I recognize you from the BEOWULF mailing list -- welcome!) > > I don't know if this is cause for worry. I suspect not, but perhaps > Doug knows better.... > > Cheers, > Bruce > > On Mon, 31 Jul 2006, David Mathog wrote: > >> We have 4 Maxtor ATLAS10K5_147SCA Version JNZH disks on a Sun >> Ultra320 dual channel scsi controller. All of them log tons >> of read: (Errors Corrected by ECC, fast) events. The current >> range on these disks is 118K to 1.3M such events. Lifetime >> is around 5700 hours. I've never seen any type of disk do this, and >> in fact none of the other disks on this system (Fujitsu FC disks) >> or on any of my other systems list anything but zero in this column. >> If it was just one disk I'd think it was failing, but since all 4 of >> them do it I'm wondering if this isn't just how these disks work >> normally. >> >> One of these disks did recently have two consecutive bad blocks >> appear, but that may have been due to the OS not parking the heads >> properly on a full power shutdown, since these appeared >> immediately after a power off for service. Anyway, except for >> that one disks two bad blocks, and these corrected read events, there >> have been no other errors. They pass the short and long offline >> tests. >> >> Should I worry about these many, many corrected read events? >> >> After my signature is a smartctl -a log for one of the disks that >> didn't have any bad blocks. >> >> Thanks, >> >> David Mathog >> ma...@ca... >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> >> # smartctl -a /dev/rdsk/c4t1d0s0 >> smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce >> Allen >> Home page is http://smartmontools.sourceforge.net/ >> >> Device: MAXTOR ATLAS10K5_147SCA Version: JNZH >> Serial number: D40R9MYK >> Device type: disk >> Transport protocol: Parallel SCSI (SPI-4) >> Local Time is: Mon Jul 31 10:17:07 2006 PDT >> Device supports SMART and is Enabled >> Temperature Warning Enabled >> SMART Health Status: OK >> >> Current Drive Temperature: 30 C >> Manufactured in week 05 of year >> Current start stop count: 1074003968 times >> Recommended maximum start stop count: 1124401151 times Dave, Maxtor screwed up their implementation of the start stop counter log page. The version of smartmontools checked into CVS has a fix (actually more like an accommodation) so the "Current start stop count:" and the "Recommended maximum start stop count:" come out as sensible numbers. >> Elements in grown defect list: 0 >> >> Error counter log: >> Errors Corrected by Total Correction >> Gigabytes Total >> ECC rereads/ errors algorithm >> processed uncorrected >> fast | delayed rewrites corrected invocations [10^9 >> bytes] errors >> read: 118729 0 0 0 0 2366.645 In my experience both Seagate and Maxtor SCSI disks increment the "ECC fast errors" count at an alarming rate. I have brand new Seagate SAS disks that bumped that count from day one. >> 0 >> write: 0 0 0 0 0 521.228 >> 0 >> >> Non-medium error count: 48 >> >> Last n error events log page >> >> SMART Self-test log >> Num Test Status segment LifeTime >> LBA_first_err [SK ASC ASQ] >> Description number (hours) >> # 1 Background short Completed - 5858 >> - [- - -] >> # 2 Background long Completed - 5836 >> - [- - -] >> # 3 Background long Completed - 5811 >> - [- - -] >> # 4 Background short Completed - 5809 >> - [- - -] >> # 5 Background long Completed - 308 >> - [- - -] >> # 6 Background short Completed - 308 >> - [- - -] >> >> Long (extended) Self Test duration: 2880 seconds [48.0 minutes] This disk looks ok. Doug Gilbert |
From: Volker K. <lis...@pa...> - 2006-08-02 21:42:56
|
> In my experience both Seagate and Maxtor SCSI disks > increment the "ECC fast errors" count at an alarming > rate. Is "ECC fast errors" specific to SCSI disks? Because 3 error values for at least a ST3120026A (Seagate IDE) are in the millions too, and it seems they're meant to be. Volker -- Volker Kuhlmann is list0570 with the domain in header http://volker.dnsalias.net/ Please do not CC list postings to me. |