Thread: [smartmontools-support]Fujitsu MHS2060AT in a Thinkpad a21p

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

[per the support page, I'm not subscribed to this email list. 
 Please CC me on your replies! Thx]

This is not a new story.  I've found other instances of my issue...

I have Fedora Core-4 on a laptop.  It has two hard drives, one of
which is a 60-GB Fujitsu.

After getting smartd running, it started reporting 4 Offline
uncorrectable sectors.  It reported this every half hour, when it woke
up.  Every day.  All day.

I ran, on numerous occasions, offline, short, and long tests, and all
passed.  They always pass.

See smartctl -a /dev/hdc at the end of this message.  You'll see
errors from a *long* time ago - and IIRC it was the drive connection
that was loose (it's been a while... :-)

I finally used the HOWTO at sourceforge. The file system on hdc6,
which contained the 'bad' sector(s), was reiserfs.  Just to go along
with the HOWTO, I laid down an ext3 file system in its place (I am
lucky in this instance, just old data on this partition, and a copy at
that).

While creating the ext3 file system, I used -cc to run badblocks &
automatically update the bad block list.

The test wrote & then read the various patterns (10101010, 01010101,
11111111...) and reported no errors to stdout/stderr (or in syslog).

Because of this, I didn't bother to play with tune2fs or the latest dd
(with oflag=direct).

Instead, I ran a 'long' test again.  No errors.

So, I killed smartd (the daemon) and started it from the cmd line with
debug options --- and it starts to spew debug output, then, far too
quickly I think, it reports that there have been 4 offline
uncorrectable sectors and sends email.  This happense in under 5
seconds:

    1 root petey 100% /var/local/src/freeTTS > smartd -d -r ioctl
    smartd version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4
      Bruce Allen
    Home page is http://smartmontools.sourceforge.net/

    Opened configuration file /etc/smartd.conf
    Configuration file /etc/smartd.conf parsed.
    Device: /dev/hda, opened

    REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE
    REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0
    Device: /dev/hda, found in smartd database.

    REPORT-IOCTL: DeviceFD=3 Command=SMART ENABLE
    REPORT-IOCTL: DeviceFD=3 Command=SMART ENABLE returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES
    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS
    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS 
returned 0
    Device: /dev/hda, is SMART capable. Adding to "monitor" list.
    Device: /dev/hdc, opened

    REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE
    REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0
    Device: /dev/hdc, found in smartd database.

    REPORT-IOCTL: DeviceFD=3 Command=SMART ENABLE
    REPORT-IOCTL: DeviceFD=3 Command=SMART ENABLE returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES
    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS
    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS 
returned 0
    Device: /dev/hdc, is SMART capable. Adding to "monitor" list.
    Monitoring 2 ATA and 0 SCSI devices

    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES
    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
    REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0

    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES
    REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES returned 0
    Device: /dev/hdc, 4 Offline uncorrectable sectors
    Sending warning via mail to ro...@pe...daceks.home ...
    Warning via mail to ro...@pe...daceks.home: successful

With the error being reported every 30 minutes, I can only guess that
perhaps smartd is not "remembering" what the error log on the drive
shows.  Every time it wakes up it's like the guy in "50 First Dates"
who introduces himself every 15 seconds... :-)

The trouble is, it's polluting my syslog and I'm getting emails.  And
I worry that one day it will report an actual error and I'll ignore
it.

I've searched the 'net for how to wipe or reset the SMART error log,
but it does not seem to be possible.  Or is it?

I would prefer to keep running smartd, but if I can't get it to shut
up about errors that occurred more than 2,000 hours ago...or am I
completely missing the point here?

Can anyone help?

TIA

Bill Hudacek

================================================================================

smartctl version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce 
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     FUJITSU MHS2060AT
Serial Number:    NL00T3213PC4
Firmware Version: 8004
User Capacity:    60,011,642,880 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 3a
Local Time is:    Tue Jan 17 01:09:22 2006 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                        without error or no self-test 
has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 492) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection 
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  83) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME              FLAG   VALUE WORST THRESH TYPE     
UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate         0x000f   100   100    046 Pre-fail 
Always  -           85813
  2 Throughput_Performance      0x0005   100   100    030 Pre-fail 
Offline -           306
  3 Spin_Up_Time                0x0003   100   100    025 Pre-fail 
Always  -           25601
  4 Start_Stop_Count            0x0032   094   094    000 Old_age  
Always  -           3618
  5 Reallocated_Sector_Ct       0x0033   100   100    024 Pre-fail 
Always  -           0
  7 Seek_Error_Rate             0x000f   100   089    047 Pre-fail 
Always  -           3768
  8 Seek_Time_Performance       0x0005   100   100    019 Pre-fail 
Offline -           0
  9 Power_On_Seconds            0x0032   001   001    000 Old_age  
Always  -           21250h+20m+19s
 10 Spin_Retry_Count            0x0013   100   100    020 Pre-fail 
Always  -           0
 12 Power_Cycle_Count           0x0032   091   091    000 Old_age  
Always  -           1363
192 Emergency_Retract_Cycle_Ct  0x0032   099   099    000 Old_age  
Always  -           27
193 Load_Cycle_Count            0x0032   053   053    000 Old_age  
Always  -           174310
194 Temperature_Celsius         0x0022   100   090    000 Old_age  
Always  -           41 (Lifetime Min/Max 15/57)
195 Hardware_ECC_Recovered      0x001a   100   100    000 Old_age  
Always  -           8973
196 Reallocated_Event_Count     0x0032   100   100    000 Old_age  
Always  -           0
197 Current_Pending_Sector      0x0012   100   080    000 Old_age  
Always  -           0
198 Off-line_Scan_UNC_Sector_Ct 0x0010   098   098    000 Old_age  
Offline -           4
199 UDMA_CRC_Error_Count        0x003e   200   200    000 Old_age  
Always  -           0
200 Write_Error_Count           0x000f   100   100    060 Pre-fail 
Always  -           28468
203 Run_Out_Cancel              0x0002   100   100    000 Old_age  
Always  -           3728051929362

SMART Error Log Version: 1
ATA Error Count: 22 (device log contains only the most recent five errors)
    <snip>

Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 22 occurred at disk power-on lifetime: 19400 hours (808 days + 8 
hours)
  When the command that caused the error occurred, the device was doing 
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 59 0a 1d 44 f6 e0  Error: UNC 10 sectors at LBA = 0x00f6441d = 16139293

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 10 17 44 f6 e0 00   1d+02:10:54.015  READ DMA

Error 21 occurred at disk power-on lifetime: 19400 hours (808 days + 8 
hours)
  When the command that caused the error occurred, the device was doing 
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 59 0a 1d 44 f6 e0  Error: UNC 10 sectors at LBA = 0x00f6441d = 16139293

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 10 17 44 f6 e0 00   1d+02:08:04.778  READ DMA

Error 20 occurred at disk power-on lifetime: 19376 hours (807 days + 8 
hours)
  When the command that caused the error occurred, the device was doing 
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 59 02 1d 44 f6 e0  Error: UNC 2 sectors at LBA = 0x00f6441d = 16139293

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 17 44 f6 e0 00      01:41:24.273  READ DMA

Error 19 occurred at disk power-on lifetime: 19376 hours (807 days + 8 
hours)
  When the command that caused the error occurred, the device was doing 
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 59 0a 1d 44 f6 e0  Error: UNC 10 sectors at LBA = 0x00f6441d = 16139293

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 10 17 44 f6 e0 00      01:34:38.528  READ DMA

Error 18 occurred at disk power-on lifetime: 19274 hours (803 days + 2 
hours)
  When the command that caused the error occurred, the device was doing 
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 59 02 1d 44 f6 e0  Error: UNC 2 sectors at LBA = 0x00f6441d = 16139293

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 17 44 f6 e0 00   2d+02:26:08.609  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     
21246         -
# 2  Extended offline    Completed without error       00%     
21227         -
# 3  Conveyance offline  Completed without error       00%     
20864         -
# 4  Extended offline    Completed without error       00%     
20864         -
# 5  Extended offline    Completed: read failure       90%     
19310         16139293
# 6  Extended offline    Completed: read failure       90%     
19261         16139293
# 7  Conveyance offline  Completed without error       00%     
11531         -
# 8  Extended offline    Completed without error       00%      
7136         -
# 9  Short offline       Completed without error       00%      
7134         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Thread: [smartmontools-support]Fujitsu MHS2060AT in a Thinkpad a21p

Disk Inspection and Monitoring

smartmontools-support