From: Mike <mg...@gm...> - 2014-08-12 15:36:33
|
Hi My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl output that returns eventually, taking at least a couple of minutes. I've tried 2 different USB caddys and also SATA connection to laptop running livecd. The drive has exclusively been mounted read-only on a Linux system with an ext4 filesystem, and has had unexpected power loss routinely throughout its life. The values in the attribute table look OK to me I think, but the serious errors at the bottom look bad. Any ideas greatly appreciated: root@w530:/dev# smartctl --all /dev/sdb smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-41-generic] (local build) Copyright (C) 2002-11 by Bruce Allen,http:// <http://smartmontools.sourceforge.net/>smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: M4-CT512M4SSD1 *removed personal information* LU WWN Device Id: 5 00a075 1091fcc21 Firmware Version: 070H User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Tue Aug 12 12:02:50 2014 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 2380) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 39) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 4439 12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 515 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0 171 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 0 173 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0 174 Unknown_Attribute 0x0032 100 100 001 Old_age Always - 398 181 Program_Fail_Cnt_Total 0x0022 100 100 001 Old_age Always - 618478239843 183 Runtime_Bad_Block 0x0032 100 100 001 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0 189 High_Fly_Writes 0x000e 100 100 001 Old_age Always - 215 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 0 202 Data_Address_Mark_Errs 0x0018 100 100 001 Old_age Offline - 0 206 Flying_Height 0x000e 100 100 001 Old_age Always - 0 Read SMART Log Directory failed. Error SMART Error Log Read failed: scsi error medium or hardware error (serious) Smartctl: SMART Error Log Read Failed Error SMART Error Self-Test Log Read failed: scsi error medium or hardware error (serious) Smartctl: SMART Self Test Log Read Failed Error SMART Read Selective Self-Test Log failed: scsi error medium or hardware error (serious) Smartctl: SMART Selective Self Test Log Read Failed |
From: <ro...@sp...> - 2014-08-12 17:02:51
|
Well, the first two things that jump out at me are the version of smartcontrol and the database. 5.41 is quite old. Try using the newest, which is IIRC, 6.3. You might also try updating the database. Re-run that and see what happens. > Hi > > My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl > output that returns eventually, taking at least a couple of minutes. I've > tried 2 different USB caddys and also SATA connection to laptop running > livecd. The drive has exclusively been mounted read-only on a Linux > system > with an ext4 filesystem, and has had unexpected power loss routinely > throughout its life. The values in the attribute table look OK to me I > think, but the serious errors at the bottom look bad. Any ideas greatly > appreciated: > > > > root@w530:/dev# smartctl --all /dev/sdb > smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-41-generic] (local > build) > Copyright (C) 2002-11 by Bruce Allen,http:// > <http://smartmontools.sourceforge.net/>smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Device Model: M4-CT512M4SSD1 > *removed personal information* > LU WWN Device Id: 5 00a075 1091fcc21 > Firmware Version: 070H > User Capacity: 512,110,190,592 bytes [512 GB] > Sector Size: 512 bytes logical/physical > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: 8 > ATA Standard is: ATA-8-ACS revision 6 > Local Time is: Tue Aug 12 12:02:50 2014 BST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > Warning: This result is based on an Attribute check. > > General SMART Values: > Offline data collection status: (0x80) Offline data collection > activity > was never started. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: ( 2380) seconds. > Offline data collection > capabilities: (0x7b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before > entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 39) minutes. > Conveyance self-test routine > recommended polling time: ( 3) minutes. > SCT capabilities: (0x003d) SCT Status supported. > SCT Error Recovery Control supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail > Always - 0 > 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail > Always - 0 > 9 Power_On_Hours 0x0032 100 100 001 Old_age > Always - 4439 > 12 Power_Cycle_Count 0x0032 100 100 001 Old_age > Always - 515 > 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail > Always - 0 > 171 Unknown_Attribute 0x0032 100 100 001 Old_age > Always - 0 > 172 Unknown_Attribute 0x0032 100 100 001 Old_age > Always - 0 > 173 Unknown_Attribute 0x0033 100 100 010 Pre-fail > Always - 0 > 174 Unknown_Attribute 0x0032 100 100 001 Old_age > Always - 398 > 181 Program_Fail_Cnt_Total 0x0022 100 100 001 Old_age > Always - 618478239843 > 183 Runtime_Bad_Block 0x0032 100 100 001 Old_age > Always - 0 > 184 End-to-End_Error 0x0033 100 100 050 Pre-fail > Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 001 Old_age > Always - 0 > 188 Command_Timeout 0x0032 100 100 001 Old_age > Always - 0 > 189 High_Fly_Writes 0x000e 100 100 001 Old_age > Always - 215 > 194 Temperature_Celsius 0x0022 100 100 000 Old_age > Always - 0 > 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age > Always - 0 > 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0032 100 100 001 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age > Always - 0 > 202 Data_Address_Mark_Errs 0x0018 100 100 001 Old_age > Offline - 0 > 206 Flying_Height 0x000e 100 100 001 Old_age > Always - 0 > > Read SMART Log Directory failed. > > Error SMART Error Log Read failed: scsi error medium or hardware error > (serious) > Smartctl: SMART Error Log Read Failed > Error SMART Error Self-Test Log Read failed: scsi error medium or hardware > error (serious) > Smartctl: SMART Self Test Log Read Failed > Error SMART Read Selective Self-Test Log failed: scsi error medium or > hardware error (serious) > Smartctl: SMART Selective Self Test Log Read Failed > ------------------------------------------------------------------------------ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Mike <mg...@gm...> - 2014-08-12 19:06:10
|
Sorry should have done that, the new output is first, then some output after running the short test. It doesn't look good to me, although I cannot think what might have happened. i just took the drive out, connected it to a laptop in order to back it up, and it has been dead since. root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl --all /dev/sdb smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.8.0-41-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Crucial/Micron RealSSD m4/C400/P400 Device Model: M4-CT512M4SSD1 Serial Number: 000000001249091FCC21 LU WWN Device Id: 5 00a075 1091fcc21 Firmware Version: 070H User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s) Local Time is: Tue Aug 12 19:56:30 2014 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status command failed: scsi error medium or hardware error (serious) SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 80) The previous self-test completed having the electrical element of the test failed. Total time to complete Offline data collection: ( 2380) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 39) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 4439 12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 515 170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0 173 Wear_Leveling_Count 0x0033 100 100 010 Pre-fail Always - 0 174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 398 181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 144 45 99 183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0 189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 215 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 0 202 Perc_Rated_Life_Used 0x0018 100 100 001 Old_age Offline - 0 206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 0 Read SMART Log Directory failed: scsi error medium or hardware error (serious) Read SMART Error Log failed: scsi error medium or hardware error (serious) Read SMART Self-test Log failed: scsi error medium or hardware error (serious) Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious) root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -t short /dev/sdb smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.8.0-41-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Tue Aug 12 20:00:51 2014 Use smartctl -X to abort test. root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -l error /dev/sdb smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.8.0-41-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Read SMART Log Directory failed: scsi error medium or hardware error (serious) Read SMART Error Log failed: scsi error medium or hardware error (serious) On 12 August 2014 18:02, <ro...@sp...> wrote: > Well, the first two things that jump out at me are the version of > smartcontrol and the database. 5.41 is quite old. Try using the newest, > which is IIRC, 6.3. You might also try updating the database. Re-run that > and see what happens. > >> Hi >> >> My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl >> output that returns eventually, taking at least a couple of minutes. I've >> tried 2 different USB caddys and also SATA connection to laptop running >> livecd. The drive has exclusively been mounted read-only on a Linux >> system >> with an ext4 filesystem, and has had unexpected power loss routinely >> throughout its life. The values in the attribute table look OK to me I >> think, but the serious errors at the bottom look bad. Any ideas greatly >> appreciated: >> >> >> >> root@w530:/dev# smartctl --all /dev/sdb >> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-41-generic] (local >> build) >> Copyright (C) 2002-11 by Bruce Allen,http:// >> <http://smartmontools.sourceforge.net/>smartmontools.sourceforge.net >> >> === START OF INFORMATION SECTION === >> Device Model: M4-CT512M4SSD1 >> *removed personal information* >> LU WWN Device Id: 5 00a075 1091fcc21 >> Firmware Version: 070H >> User Capacity: 512,110,190,592 bytes [512 GB] >> Sector Size: 512 bytes logical/physical >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: 8 >> ATA Standard is: ATA-8-ACS revision 6 >> Local Time is: Tue Aug 12 12:02:50 2014 BST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> Warning: This result is based on an Attribute check. >> >> General SMART Values: >> Offline data collection status: (0x80) Offline data collection >> activity >> was never started. >> Auto Offline Data Collection: Enabled. >> Self-test execution status: ( 0) The previous self-test routine >> completed >> without error or no self-test has ever >> been run. >> Total time to complete Offline >> data collection: ( 2380) seconds. >> Offline data collection >> capabilities: (0x7b) SMART execute Offline immediate. >> Auto Offline data collection on/off support. >> Suspend Offline collection upon new >> command. >> Offline surface scan supported. >> Self-test supported. >> Conveyance Self-test supported. >> Selective Self-test supported. >> SMART capabilities: (0x0003) Saves SMART data before >> entering >> power-saving mode. >> Supports SMART auto save timer. >> Error logging capability: (0x01) Error logging supported. >> General Purpose Logging supported. >> Short self-test routine >> recommended polling time: ( 2) minutes. >> Extended self-test routine >> recommended polling time: ( 39) minutes. >> Conveyance self-test routine >> recommended polling time: ( 3) minutes. >> SCT capabilities: (0x003d) SCT Status supported. >> SCT Error Recovery Control supported. >> SCT Feature Control supported. >> SCT Data Table supported. >> >> SMART Attributes Data Structure revision number: 16 >> Vendor Specific SMART Attributes with Thresholds: >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED >> WHEN_FAILED RAW_VALUE >> 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail >> Always - 0 >> 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail >> Always - 0 >> 9 Power_On_Hours 0x0032 100 100 001 Old_age >> Always - 4439 >> 12 Power_Cycle_Count 0x0032 100 100 001 Old_age >> Always - 515 >> 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail >> Always - 0 >> 171 Unknown_Attribute 0x0032 100 100 001 Old_age >> Always - 0 >> 172 Unknown_Attribute 0x0032 100 100 001 Old_age >> Always - 0 >> 173 Unknown_Attribute 0x0033 100 100 010 Pre-fail >> Always - 0 >> 174 Unknown_Attribute 0x0032 100 100 001 Old_age >> Always - 398 >> 181 Program_Fail_Cnt_Total 0x0022 100 100 001 Old_age >> Always - 618478239843 >> 183 Runtime_Bad_Block 0x0032 100 100 001 Old_age >> Always - 0 >> 184 End-to-End_Error 0x0033 100 100 050 Pre-fail >> Always - 0 >> 187 Reported_Uncorrect 0x0032 100 100 001 Old_age >> Always - 0 >> 188 Command_Timeout 0x0032 100 100 001 Old_age >> Always - 0 >> 189 High_Fly_Writes 0x000e 100 100 001 Old_age >> Always - 215 >> 194 Temperature_Celsius 0x0022 100 100 000 Old_age >> Always - 0 >> 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age >> Always - 0 >> 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age >> Always - 0 >> 197 Current_Pending_Sector 0x0032 100 100 001 Old_age >> Always - 0 >> 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age >> Offline - 0 >> 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age >> Always - 0 >> 202 Data_Address_Mark_Errs 0x0018 100 100 001 Old_age >> Offline - 0 >> 206 Flying_Height 0x000e 100 100 001 Old_age >> Always - 0 >> >> Read SMART Log Directory failed. >> >> Error SMART Error Log Read failed: scsi error medium or hardware error >> (serious) >> Smartctl: SMART Error Log Read Failed >> Error SMART Error Self-Test Log Read failed: scsi error medium or hardware >> error (serious) >> Smartctl: SMART Self Test Log Read Failed >> Error SMART Read Selective Self-Test Log failed: scsi error medium or >> hardware error (serious) >> Smartctl: SMART Selective Self Test Log Read Failed >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Smartmontools-support mailing list >> Sma...@li... >> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >> > > |
From: <ro...@sp...> - 2014-08-13 03:36:13
|
The key here is "Self-test execution status: ( 80) The previous self-test completed having the electrical element of the test failed." AFAIK, this is a hardware failure. Perhaps Christian Franke can help explain this. I've encountered that type of failure before. In that case it was a platter drive, not SSD, and I was able to use GNU ddrescue to rescue the user's data, although it took me almost a week. > Sorry should have done that, the new output is first, then some output > after running the short test. > It doesn't look good to me, although I cannot think what might have > happened. i just took the drive out, connected it to a laptop in > order to back it up, and it has been dead since. > > root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl --all > /dev/sdb > smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.8.0-41-generic] (local > build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, > www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Crucial/Micron RealSSD m4/C400/P400 > Device Model: M4-CT512M4SSD1 > Serial Number: 000000001249091FCC21 > LU WWN Device Id: 5 00a075 1091fcc21 > Firmware Version: 070H > User Capacity: 512,110,190,592 bytes [512 GB] > Sector Size: 512 bytes logical/physical > Rotation Rate: Solid State Device > Form Factor: 2.5 inches > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s) > Local Time is: Tue Aug 12 19:56:30 2014 BST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART Status command failed: scsi error medium or hardware error (serious) > SMART overall-health self-assessment test result: PASSED > Warning: This result is based on an Attribute check. > > General SMART Values: > Offline data collection status: (0x80) Offline data collection > activity > was never started. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 80) The previous self-test > completed having > the electrical element of the test > failed. > Total time to complete Offline > data collection: ( 2380) seconds. > Offline data collection > capabilities: (0x7b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before > entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 39) minutes. > Conveyance self-test routine > recommended polling time: ( 3) minutes. > SCT capabilities: (0x003d) SCT Status supported. > SCT Error Recovery Control supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail > Always - 0 > 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail > Always - 0 > 9 Power_On_Hours 0x0032 100 100 001 Old_age > Always - 4439 > 12 Power_Cycle_Count 0x0032 100 100 001 Old_age > Always - 515 > 170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail > Always - 0 > 171 Program_Fail_Count 0x0032 100 100 001 Old_age > Always - 0 > 172 Erase_Fail_Count 0x0032 100 100 001 Old_age > Always - 0 > 173 Wear_Leveling_Count 0x0033 100 100 010 Pre-fail > Always - 0 > 174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age > Always - 398 > 181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age > Always - 144 45 99 > 183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age > Always - 0 > 184 End-to-End_Error 0x0033 100 100 050 Pre-fail > Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 001 Old_age > Always - 0 > 188 Command_Timeout 0x0032 100 100 001 Old_age > Always - 0 > 189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age > Always - 215 > 194 Temperature_Celsius 0x0022 100 100 000 Old_age > Always - 0 > 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age > Always - 0 > 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age > Always - 0 > 197 Current_Pending_Sector 0x0032 100 100 001 Old_age > Always - 0 > 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age > Always - 0 > 202 Perc_Rated_Life_Used 0x0018 100 100 001 Old_age > Offline - 0 > 206 Write_Error_Rate 0x000e 100 100 001 Old_age > Always - 0 > > Read SMART Log Directory failed: scsi error medium or hardware error > (serious) > > Read SMART Error Log failed: scsi error medium or hardware error (serious) > > Read SMART Self-test Log failed: scsi error medium or hardware error > (serious) > > Read SMART Selective Self-test Log failed: scsi error medium or > hardware error (serious) > > > > > > > > > > root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -t short > /dev/sdb > smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.8.0-41-generic] (local > build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, > www.smartmontools.org > > === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === > Sending command: "Execute SMART Short self-test routine immediately in > off-line mode". > Drive command "Execute SMART Short self-test routine immediately in > off-line mode" successful. > Testing has begun. > Please wait 2 minutes for test to complete. > Test will complete after Tue Aug 12 20:00:51 2014 > > Use smartctl -X to abort test. > > root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -l error > /dev/sdb > smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.8.0-41-generic] (local > build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, > www.smartmontools.org > > === START OF READ SMART DATA SECTION === > Read SMART Log Directory failed: scsi error medium or hardware error > (serious) > > Read SMART Error Log failed: scsi error medium or hardware error (serious) > > > On 12 August 2014 18:02, <ro...@sp...> wrote: >> Well, the first two things that jump out at me are the version of >> smartcontrol and the database. 5.41 is quite old. Try using the newest, >> which is IIRC, 6.3. You might also try updating the database. Re-run >> that >> and see what happens. >> >>> Hi >>> >>> My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl >>> output that returns eventually, taking at least a couple of minutes. >>> I've >>> tried 2 different USB caddys and also SATA connection to laptop running >>> livecd. The drive has exclusively been mounted read-only on a Linux >>> system >>> with an ext4 filesystem, and has had unexpected power loss routinely >>> throughout its life. The values in the attribute table look OK to me I >>> think, but the serious errors at the bottom look bad. Any ideas >>> greatly >>> appreciated: >>> >>> >>> >>> root@w530:/dev# smartctl --all /dev/sdb >>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-41-generic] (local >>> build) >>> Copyright (C) 2002-11 by Bruce Allen,http:// >>> <http://smartmontools.sourceforge.net/>smartmontools.sourceforge.net >>> >>> === START OF INFORMATION SECTION === >>> Device Model: M4-CT512M4SSD1 >>> *removed personal information* >>> LU WWN Device Id: 5 00a075 1091fcc21 >>> Firmware Version: 070H >>> User Capacity: 512,110,190,592 bytes [512 GB] >>> Sector Size: 512 bytes logical/physical >>> Device is: Not in smartctl database [for details use: -P >>> showall] >>> ATA Version is: 8 >>> ATA Standard is: ATA-8-ACS revision 6 >>> Local Time is: Tue Aug 12 12:02:50 2014 BST >>> SMART support is: Available - device has SMART capability. >>> SMART support is: Enabled >>> >>> === START OF READ SMART DATA SECTION === >>> SMART overall-health self-assessment test result: PASSED >>> Warning: This result is based on an Attribute check. >>> >>> General SMART Values: >>> Offline data collection status: (0x80) Offline data collection >>> activity >>> was never started. >>> Auto Offline Data Collection: Enabled. >>> Self-test execution status: ( 0) The previous self-test >>> routine >>> completed >>> without error or no self-test has ever >>> been run. >>> Total time to complete Offline >>> data collection: ( 2380) seconds. >>> Offline data collection >>> capabilities: (0x7b) SMART execute Offline immediate. >>> Auto Offline data collection on/off support. >>> Suspend Offline collection upon new >>> command. >>> Offline surface scan supported. >>> Self-test supported. >>> Conveyance Self-test supported. >>> Selective Self-test supported. >>> SMART capabilities: (0x0003) Saves SMART data before >>> entering >>> power-saving mode. >>> Supports SMART auto save timer. >>> Error logging capability: (0x01) Error logging supported. >>> General Purpose Logging supported. >>> Short self-test routine >>> recommended polling time: ( 2) minutes. >>> Extended self-test routine >>> recommended polling time: ( 39) minutes. >>> Conveyance self-test routine >>> recommended polling time: ( 3) minutes. >>> SCT capabilities: (0x003d) SCT Status supported. >>> SCT Error Recovery Control supported. >>> SCT Feature Control supported. >>> SCT Data Table supported. >>> >>> SMART Attributes Data Structure revision number: 16 >>> Vendor Specific SMART Attributes with Thresholds: >>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE >>> UPDATED >>> WHEN_FAILED RAW_VALUE >>> 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail >>> Always - 0 >>> 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail >>> Always - 0 >>> 9 Power_On_Hours 0x0032 100 100 001 Old_age >>> Always - 4439 >>> 12 Power_Cycle_Count 0x0032 100 100 001 Old_age >>> Always - 515 >>> 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail >>> Always - 0 >>> 171 Unknown_Attribute 0x0032 100 100 001 Old_age >>> Always - 0 >>> 172 Unknown_Attribute 0x0032 100 100 001 Old_age >>> Always - 0 >>> 173 Unknown_Attribute 0x0033 100 100 010 Pre-fail >>> Always - 0 >>> 174 Unknown_Attribute 0x0032 100 100 001 Old_age >>> Always - 398 >>> 181 Program_Fail_Cnt_Total 0x0022 100 100 001 Old_age >>> Always - 618478239843 >>> 183 Runtime_Bad_Block 0x0032 100 100 001 Old_age >>> Always - 0 >>> 184 End-to-End_Error 0x0033 100 100 050 Pre-fail >>> Always - 0 >>> 187 Reported_Uncorrect 0x0032 100 100 001 Old_age >>> Always - 0 >>> 188 Command_Timeout 0x0032 100 100 001 Old_age >>> Always - 0 >>> 189 High_Fly_Writes 0x000e 100 100 001 Old_age >>> Always - 215 >>> 194 Temperature_Celsius 0x0022 100 100 000 Old_age >>> Always - 0 >>> 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age >>> Always - 0 >>> 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age >>> Always - 0 >>> 197 Current_Pending_Sector 0x0032 100 100 001 Old_age >>> Always - 0 >>> 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age >>> Offline - 0 >>> 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age >>> Always - 0 >>> 202 Data_Address_Mark_Errs 0x0018 100 100 001 Old_age >>> Offline - 0 >>> 206 Flying_Height 0x000e 100 100 001 Old_age >>> Always - 0 >>> >>> Read SMART Log Directory failed. >>> >>> Error SMART Error Log Read failed: scsi error medium or hardware error >>> (serious) >>> Smartctl: SMART Error Log Read Failed >>> Error SMART Error Self-Test Log Read failed: scsi error medium or >>> hardware >>> error (serious) >>> Smartctl: SMART Self Test Log Read Failed >>> Error SMART Read Selective Self-Test Log failed: scsi error medium or >>> hardware error (serious) >>> Smartctl: SMART Selective Self Test Log Read Failed >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Smartmontools-support mailing list >>> Sma...@li... >>> https://lists.sourceforge.net/lists/listinfo/smartmontools-support >>> >> >> > > ------------------------------------------------------------------------------ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support > |
From: Volker K. <lis...@pa...> - 2014-08-13 21:42:42
|
On Wed 13 Aug 2014 03:36:23 NZST +1200, Mike wrote: > My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl > output that returns eventually, taking at least a couple of minutes. I've > tried 2 different USB caddys and also SATA connection to laptop running > livecd. The drive has exclusively been mounted read-only on a Linux system > with an ext4 filesystem, and has had unexpected power loss routinely > throughout its life. The values in the attribute table look OK to me I > think, but the serious errors at the bottom look bad. Any ideas greatly > appreciated: > Read SMART Log Directory failed. > > Error SMART Error Log Read failed: scsi error medium or hardware error > (serious) > Smartctl: SMART Error Log Read Failed > Error SMART Error Self-Test Log Read failed: scsi error medium or hardware > error (serious) > Smartctl: SMART Self Test Log Read Failed > Error SMART Read Selective Self-Test Log failed: scsi error medium or > hardware error (serious) > Smartctl: SMART Selective Self Test Log Read Failed This looks like smartctl may not be able to communicate with the drive properly, which is what you expect if the drive/computer interface circuitry is damaged. However, these errors don't seem to exclude the possibility of a serious platter failure - or rather the equivalent in solid state. Moving the disk around between enclosures etc can easily damage the drive electronics if sufficient ESD protection procedures have not been followed (those procedures must be sufficient regardless of your opinion of them ;-) ). If the problem first occurred without having moved any hardware around then the problem is with the drive. If you can reliably exclude the problem being located at the computer, enclosure or connections/cables then it's also the drive. If the problem is with the drive then the drive is most certainly finished. Copy off all the data you can asap and use your backups for the rest. It might be a good idea to treat SSDs the same as spinning versions. The differences are in speed and behaviour after being dropped, not necessarily in long-term reliability of data. HTH, Volker -- Volker Kuhlmann http://volker.top.geek.nz/ Please do not CC list postings to me. |
From: Mike <mg...@gm...> - 2014-08-14 19:08:19
|
Hello again I've put the drive back into the system it was used in, and used the one other SATA port available. Running smartctl returns some strange output, I've also included the kernel message in case it helps. root@voyage:~/smartmontools-6.3# ./smartctl -T permissive -x /dev/sda smartctl 6.3 2014-07-26 r3976 [i686-linux-3.4.4-voyage] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: /3:0:0:0 Product: Compliance: SPC-5 User Capacity: 600,332,565,813,390,450 bytes [600 PB] Logical block size: 774843950 bytes scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page Read Cache is: Unavailable Writeback Cache is: Unavailable === START OF READ SMART DATA SECTION === Log Sense failed, IE page [scsi response fails sanity test] Error Counter logging not supported scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 Device does not support Self Test logging Device does not support Background scan results logging [ 3.899820] ata1: SATA max UDMA/133 abar m1024@0xffe40000 port 0xffe40100 irq 45 [ 3.899954] ata2: DUMMY [ 3.900067] ata3: SATA max UDMA/133 abar m1024@0xffe40000 port 0xffe40200 irq 45 [ 3.900193] ata4: DUMMY [ 4.205145] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 4.205279] ata1: SATA link down (SStatus 0 SControl 300) [ 4.205686] ata3.00: failed to enable AA (error_mask=0x1) [ 4.205772] ata3.00: ATA-9: M4-CT512M4SSD1, 070H, max UDMA/100 [ 4.205853] ata3.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 31/32) [ 4.206476] ata3.00: failed to enable AA (error_mask=0x1) [ 4.206568] ata3.00: configured for UDMA/100 (device error ignored) [ 4.207143] scsi 3:0:0:0: Direct-Access ATA M4-CT512M4SSD1 070H PQ: 0 ANSI: 5 [ 4.207977] sd 3:0:0:0: [sda] 1000215216 512-byte logical blocks: (512 GB/476 GiB) [ 4.208722] sd 3:0:0:0: [sda] Write Protect is off [ 4.208822] sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 4.208998] sd 3:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 4.569257] scsi 0:0:0:0: Direct-Access TS TS2GUFM-H 1100 PQ: 0 ANSI: 0 CCS [ 4.571000] sd 0:0:0:0: [sdb] 4014080 512-byte logical blocks: (2.05 GB/1.91 GiB) [ 4.572153] sd 0:0:0:0: [sdb] Write Protect is off [ 4.572260] sd 0:0:0:0: [sdb] Mode Sense: 43 00 00 00 [ 4.573219] sd 0:0:0:0: [sdb] No Caching mode page present [ 4.573323] sd 0:0:0:0: [sdb] Assuming drive cache: write through [ 4.577728] sd 0:0:0:0: [sdb] No Caching mode page present [ 4.577837] sd 0:0:0:0: [sdb] Assuming drive cache: write through [ 4.579449] sdb: sdb1 [ 4.583344] sd 0:0:0:0: [sdb] No Caching mode page present [ 4.583454] sd 0:0:0:0: [sdb] Assuming drive cache: write through [ 4.583561] sd 0:0:0:0: [sdb] Attached SCSI disk [ 4.587875] sd 3:0:0:0: Attached scsi generic sg0 type 0 [ 4.588303] sd 0:0:0:0: Attached scsi generic sg1 type 0 [ 34.720249] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [ 34.720368] ata3.00: failed command: READ FPDMA QUEUED [ 34.720482] ata3.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in [ 34.720489] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 34.720684] ata3.00: status: { DRDY } [ 34.720769] ata3: hard resetting link [ 40.074131] ata3: link is slow to respond, please be patient (ready=0) [ 44.766167] ata3: COMRESET failed (errno=-16) [ 44.766267] ata3: hard resetting link [ 50.120136] ata3: link is slow to respond, please be patient (ready=0) [ 54.812132] ata3: COMRESET failed (errno=-16) [ 54.812216] ata3: hard resetting link [ 60.166135] ata3: link is slow to respond, please be patient (ready=0) [ 89.848141] ata3: COMRESET failed (errno=-16) [ 89.848226] ata3: limiting SATA link speed to 1.5 Gbps [ 89.848305] ata3: hard resetting link [ 94.896060] ata3: COMRESET failed (errno=-16) [ 94.896149] ata3: reset failed, giving up [ 94.896225] ata3.00: disabled [ 94.896300] ata3.00: device reported invalid CHS sector 0 [ 94.896402] ata3: EH complete [ 94.896535] sd 3:0:0:0: [sda] Unhandled error code [ 94.896613] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.896726] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 [ 94.897243] end_request: I/O error, dev sda, sector 0 [ 94.897323] Buffer I/O error on device sda, logical block 0 [ 94.897517] sd 3:0:0:0: [sda] Unhandled error code [ 94.897599] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.897712] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 [ 94.898216] end_request: I/O error, dev sda, sector 0 [ 94.898293] Buffer I/O error on device sda, logical block 0 [ 94.898490] sd 3:0:0:0: [sda] Unhandled error code [ 94.898568] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.898681] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 [ 94.899194] end_request: I/O error, dev sda, sector 0 [ 94.899271] Buffer I/O error on device sda, logical block 0 [ 94.899383] ldm_validate_partition_table(): Disk read failed. [ 94.899542] sd 3:0:0:0: [sda] Unhandled error code [ 94.899620] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.899732] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 [ 94.900237] end_request: I/O error, dev sda, sector 0 [ 94.900313] Buffer I/O error on device sda, logical block 0 [ 94.900506] sd 3:0:0:0: [sda] Unhandled error code [ 94.900584] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.900697] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 [ 94.901210] end_request: I/O error, dev sda, sector 0 [ 94.901287] Buffer I/O error on device sda, logical block 0 [ 94.901399] sda: unable to read partition table [ 94.902193] sd 3:0:0:0: [sda] READ CAPACITY(16) failed [ 94.902281] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.902398] sd 3:0:0:0: [sda] Sense not available. [ 94.902682] sd 3:0:0:0: [sda] READ CAPACITY failed [ 94.902762] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 [ 94.902877] sd 3:0:0:0: [sda] Sense not available. [ 94.903338] sd 3:0:0:0: [sda] Got wrong page [ 94.903419] sd 3:0:0:0: [sda] Assuming drive cache: write through [ 94.903501] sd 3:0:0:0: [sda] Attached SCSI disk On 13 August 2014 22:27, Volker Kuhlmann <lis...@pa...> wrote: > On Wed 13 Aug 2014 03:36:23 NZST +1200, Mike wrote: > >> My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl >> output that returns eventually, taking at least a couple of minutes. I've >> tried 2 different USB caddys and also SATA connection to laptop running >> livecd. The drive has exclusively been mounted read-only on a Linux system >> with an ext4 filesystem, and has had unexpected power loss routinely >> throughout its life. The values in the attribute table look OK to me I >> think, but the serious errors at the bottom look bad. Any ideas greatly >> appreciated: > >> Read SMART Log Directory failed. >> >> Error SMART Error Log Read failed: scsi error medium or hardware error >> (serious) >> Smartctl: SMART Error Log Read Failed >> Error SMART Error Self-Test Log Read failed: scsi error medium or hardware >> error (serious) >> Smartctl: SMART Self Test Log Read Failed >> Error SMART Read Selective Self-Test Log failed: scsi error medium or >> hardware error (serious) >> Smartctl: SMART Selective Self Test Log Read Failed > > This looks like smartctl may not be able to communicate with the drive > properly, which is what you expect if the drive/computer interface > circuitry is damaged. However, these errors don't seem to exclude the > possibility of a serious platter failure - or rather the equivalent in > solid state. > > Moving the disk around between enclosures etc can easily damage the > drive electronics if sufficient ESD protection procedures have not been > followed (those procedures must be sufficient regardless of your opinion > of them ;-) ). If the problem first occurred without having moved any > hardware around then the problem is with the drive. If you can reliably > exclude the problem being located at the computer, enclosure or > connections/cables then it's also the drive. > > If the problem is with the drive then the drive is most certainly > finished. Copy off all the data you can asap and use your backups for > the rest. > > It might be a good idea to treat SSDs the same as spinning versions. The > differences are in speed and behaviour after being dropped, not > necessarily in long-term reliability of data. > > HTH, > > Volker > > -- > Volker Kuhlmann > http://volker.top.geek.nz/ Please do not CC list postings to me. > > ------------------------------------------------------------------------------ > _______________________________________________ > Smartmontools-support mailing list > Sma...@li... > https://lists.sourceforge.net/lists/listinfo/smartmontools-support |
From: Mike <mg...@gm...> - 2014-08-16 08:59:39
|
Greetings Following on from previous message, where running smartctl with the disk connected via SATA returns a 600PB disk (it is actually 512Gb Crucial M4), I followed advice from the Crucial message boards and left the disk idle over night in order to trigger automatic garbade collection. No change the next morning, below is some smartctl output. I've tried ddrescue to no avail, reads just fail completely. user@w530:~/Downloads/smartmontools-6.3$ sudo ./smartctl -x /dev/sdb [sudo] password for user: smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Crucial/Micron RealSSD m4/C400/P400 Device Model: M4-CT512M4SSD1 Serial Number: 000000001249091FCC21 LU WWN Device Id: 5 00a075 1091fcc21 Firmware Version: 070H User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s) Local Time is: Sat Aug 16 09:52:11 2014 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Write SCT (Get) Feature Control Command failed: scsi error medium or hardware error (serious) Wt Cache Reorder: Unknown (SCT Feature Control command failed) === START OF READ SMART DATA SECTION === SMART Status command failed: scsi error medium or hardware error (serious) SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 2380) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 39) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 100 100 050 - 0 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 100 100 001 - 4439 12 Power_Cycle_Count -O--CK 100 100 001 - 515 170 Grown_Failing_Block_Ct PO--CK 100 100 010 - 0 171 Program_Fail_Count -O--CK 100 100 001 - 0 172 Erase_Fail_Count -O--CK 100 100 001 - 0 173 Wear_Leveling_Count PO--CK 100 100 010 - 0 174 Unexpect_Power_Loss_Ct -O--CK 100 100 001 - 398 181 Non4k_Aligned_Access -O---K 100 100 001 - 144 45 99 183 SATA_Iface_Downshift -O--CK 100 100 001 - 0 184 End-to-End_Error PO--CK 100 100 050 - 0 187 Reported_Uncorrect -O--CK 100 100 001 - 0 188 Command_Timeout -O--CK 100 100 001 - 0 189 Factory_Bad_Block_Ct -OSR-- 100 100 001 - 215 194 Temperature_Celsius -O---K 100 100 000 - 0 195 Hardware_ECC_Recovered -O-RCK 100 100 001 - 0 196 Reallocated_Event_Count -O--CK 100 100 001 - 0 197 Current_Pending_Sector -O--CK 100 100 001 - 0 198 Offline_Uncorrectable ----CK 100 100 001 - 0 199 UDMA_CRC_Error_Count -O--CK 100 100 001 - 0 202 Perc_Rated_Life_Used ---RC- 100 100 001 - 0 206 Write_Error_Rate -OSR-- 100 100 001 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning Read SMART Log Directory failed: scsi error medium or hardware error (serious) General Purpose Log Directory Version 1 Address Access R/W Size Description 0x00 GPL R/O 1 Log Directory 0x03 GPL R/O 16383 Ext. Comprehensive SMART error log 0x04 GPL R/O 255 Device Statistics log 0x07 GPL R/O 3449 Extended self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters 0x80-0x9f GPL R/W 16 Host vendor specific log 0xa0 GPL VS 2000 Device vendor specific log 0xa1-0xbf GPL VS 1 Device vendor specific log 0xc0 GPL VS 80 Device vendor specific log 0xc1-0xdf GPL VS 1 Device vendor specific log 0xe0 GPL R/W 1 SCT Command/Status 0xe1 GPL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log size 16383 not supported Read SMART Error Log failed: scsi error medium or hardware error (serious) SMART Extended Self-test Log size 3449 not supported Read SMART Self-test Log failed: scsi error medium or hardware error (serious) Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious) SCT Status Version: 3 SCT Version (vendor specific): 1 (0x0001) SCT Support Level: 0 Device State: Stand-by (1) Current Temperature: 0 Celsius Power Cycle Max Temperature: 0 Celsius Lifetime Max Temperature: 0 Celsius SCT Temperature History Version: 2 Temperature Sampling Period: 10 minutes Temperature Logging Interval: 10 minutes Min/Max recommended Temperature: 0/70 Celsius Min/Max Temperature Limit: -5/75 Celsius Temperature History Size (Index): 478 (38) Index Estimated Time Temperature Celsius 39 2014-08-13 02:20 ? - ... ..(475 skipped). .. - 37 2014-08-16 09:40 ? - 38 2014-08-16 09:50 0 - Write SCT (Get) Error Recovery Control Command failed: scsi error medium or hardware error (serious) SCT (Get) Error Recovery Control command failed Device Statistics (GP Log 0x04) Page Offset Size Value Description 1 ===== = = == General Statistics (rev 2) == 1 0x008 4 515 Lifetime Power-On Resets 1 0x010 4 4439 Power-on Hours 1 0x018 6 933634265 Logical Sectors Written 1 0x020 6 2708986 Number of Write Commands 1 0x028 6 852431906 Logical Sectors Read 1 0x030 6 6049196 Number of Read Commands 4 ===== = = == General Errors Statistics (rev 1) == 4 0x008 4 0 Number of Reported Uncorrectable Errors 4 0x010 4 0 Resets Between Cmd Acceptance and Completion 5 ===== = = == Temperature Statistics (rev 1) == 5 0x008 1 0 Current Temperature 5 0x010 1 0 Average Short Term Temperature 5 0x018 1 0 Average Long Term Temperature 5 0x020 1 0 Highest Temperature 5 0x028 1 0 Lowest Temperature 5 0x030 1 0 Highest Average Short Term Temperature 5 0x038 1 0 Lowest Average Short Term Temperature 5 0x040 1 0 Highest Average Long Term Temperature 5 0x048 1 0 Lowest Average Long Term Temperature 5 0x050 4 - Time in Over-Temperature 5 0x058 1 70 Specified Maximum Operating Temperature 5 0x060 4 - Time in Under-Temperature 5 0x068 1 0 Specified Minimum Operating Temperature 6 ===== = = == Transport Statistics (rev 1) == 6 0x008 4 0 Number of Hardware Resets 6 0x010 4 0 Number of ASR Events 6 0x018 4 0 Number of Interface CRC Errors 7 ===== = = == Solid State Device Statistics (rev 1) == 7 0x008 1 21~ Percentage Used Endurance Indicator |_ ~ normalized value SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 4 0 Command failed due to ICRC error 0x000a 4 0 Device-to-host register FISes sent due to a COMRESET On 14 August 2014 20:08, Mike <mg...@gm...> wrote: > Hello again > > I've put the drive back into the system it was used in, and used the > one other SATA port available. Running smartctl returns some strange > output, I've also included the kernel message in case it helps. > > root@voyage:~/smartmontools-6.3# ./smartctl -T permissive -x /dev/sda > smartctl 6.3 2014-07-26 r3976 [i686-linux-3.4.4-voyage] (local build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Vendor: /3:0:0:0 > Product: > Compliance: SPC-5 > User Capacity: 600,332,565,813,390,450 bytes [600 PB] > Logical block size: 774843950 bytes > scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 > scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >>> Terminate command early due to bad response to IEC mode page > Read Cache is: Unavailable > Writeback Cache is: Unavailable > > === START OF READ SMART DATA SECTION === > Log Sense failed, IE page [scsi response fails sanity test] > Error Counter logging not supported > > scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 > Device does not support Self Test logging > Device does not support Background scan results logging > > > [ 3.899820] ata1: SATA max UDMA/133 abar m1024@0xffe40000 port > 0xffe40100 irq 45 > [ 3.899954] ata2: DUMMY > [ 3.900067] ata3: SATA max UDMA/133 abar m1024@0xffe40000 port > 0xffe40200 irq 45 > [ 3.900193] ata4: DUMMY > [ 4.205145] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 4.205279] ata1: SATA link down (SStatus 0 SControl 300) > [ 4.205686] ata3.00: failed to enable AA (error_mask=0x1) > [ 4.205772] ata3.00: ATA-9: M4-CT512M4SSD1, 070H, max UDMA/100 > [ 4.205853] ata3.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 31/32) > [ 4.206476] ata3.00: failed to enable AA (error_mask=0x1) > [ 4.206568] ata3.00: configured for UDMA/100 (device error ignored) > [ 4.207143] scsi 3:0:0:0: Direct-Access ATA M4-CT512M4SSD1 > 070H PQ: 0 ANSI: 5 > [ 4.207977] sd 3:0:0:0: [sda] 1000215216 512-byte logical blocks: > (512 GB/476 GiB) > [ 4.208722] sd 3:0:0:0: [sda] Write Protect is off > [ 4.208822] sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00 > [ 4.208998] sd 3:0:0:0: [sda] Write cache: enabled, read cache: > enabled, doesn't support DPO or FUA > [ 4.569257] scsi 0:0:0:0: Direct-Access TS TS2GUFM-H > 1100 PQ: 0 ANSI: 0 CCS > [ 4.571000] sd 0:0:0:0: [sdb] 4014080 512-byte logical blocks: > (2.05 GB/1.91 GiB) > [ 4.572153] sd 0:0:0:0: [sdb] Write Protect is off > [ 4.572260] sd 0:0:0:0: [sdb] Mode Sense: 43 00 00 00 > [ 4.573219] sd 0:0:0:0: [sdb] No Caching mode page present > [ 4.573323] sd 0:0:0:0: [sdb] Assuming drive cache: write through > [ 4.577728] sd 0:0:0:0: [sdb] No Caching mode page present > [ 4.577837] sd 0:0:0:0: [sdb] Assuming drive cache: write through > [ 4.579449] sdb: sdb1 > [ 4.583344] sd 0:0:0:0: [sdb] No Caching mode page present > [ 4.583454] sd 0:0:0:0: [sdb] Assuming drive cache: write through > [ 4.583561] sd 0:0:0:0: [sdb] Attached SCSI disk > [ 4.587875] sd 3:0:0:0: Attached scsi generic sg0 type 0 > [ 4.588303] sd 0:0:0:0: Attached scsi generic sg1 type 0 > [ 34.720249] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen > [ 34.720368] ata3.00: failed command: READ FPDMA QUEUED > [ 34.720482] ata3.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 > ncq 4096 in > [ 34.720489] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask > 0x4 (timeout) > [ 34.720684] ata3.00: status: { DRDY } > [ 34.720769] ata3: hard resetting link > [ 40.074131] ata3: link is slow to respond, please be patient (ready=0) > [ 44.766167] ata3: COMRESET failed (errno=-16) > [ 44.766267] ata3: hard resetting link > [ 50.120136] ata3: link is slow to respond, please be patient (ready=0) > [ 54.812132] ata3: COMRESET failed (errno=-16) > [ 54.812216] ata3: hard resetting link > [ 60.166135] ata3: link is slow to respond, please be patient (ready=0) > [ 89.848141] ata3: COMRESET failed (errno=-16) > [ 89.848226] ata3: limiting SATA link speed to 1.5 Gbps > [ 89.848305] ata3: hard resetting link > [ 94.896060] ata3: COMRESET failed (errno=-16) > [ 94.896149] ata3: reset failed, giving up > [ 94.896225] ata3.00: disabled > [ 94.896300] ata3.00: device reported invalid CHS sector 0 > [ 94.896402] ata3: EH complete > [ 94.896535] sd 3:0:0:0: [sda] Unhandled error code > [ 94.896613] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.896726] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 > [ 94.897243] end_request: I/O error, dev sda, sector 0 > [ 94.897323] Buffer I/O error on device sda, logical block 0 > [ 94.897517] sd 3:0:0:0: [sda] Unhandled error code > [ 94.897599] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.897712] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 > [ 94.898216] end_request: I/O error, dev sda, sector 0 > [ 94.898293] Buffer I/O error on device sda, logical block 0 > [ 94.898490] sd 3:0:0:0: [sda] Unhandled error code > [ 94.898568] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.898681] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 > [ 94.899194] end_request: I/O error, dev sda, sector 0 > [ 94.899271] Buffer I/O error on device sda, logical block 0 > [ 94.899383] ldm_validate_partition_table(): Disk read failed. > [ 94.899542] sd 3:0:0:0: [sda] Unhandled error code > [ 94.899620] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.899732] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 > [ 94.900237] end_request: I/O error, dev sda, sector 0 > [ 94.900313] Buffer I/O error on device sda, logical block 0 > [ 94.900506] sd 3:0:0:0: [sda] Unhandled error code > [ 94.900584] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.900697] sd 3:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00 > [ 94.901210] end_request: I/O error, dev sda, sector 0 > [ 94.901287] Buffer I/O error on device sda, logical block 0 > [ 94.901399] sda: unable to read partition table > [ 94.902193] sd 3:0:0:0: [sda] READ CAPACITY(16) failed > [ 94.902281] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.902398] sd 3:0:0:0: [sda] Sense not available. > [ 94.902682] sd 3:0:0:0: [sda] READ CAPACITY failed > [ 94.902762] sd 3:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00 > [ 94.902877] sd 3:0:0:0: [sda] Sense not available. > [ 94.903338] sd 3:0:0:0: [sda] Got wrong page > [ 94.903419] sd 3:0:0:0: [sda] Assuming drive cache: write through > [ 94.903501] sd 3:0:0:0: [sda] Attached SCSI disk > > On 13 August 2014 22:27, Volker Kuhlmann <lis...@pa...> wrote: >> On Wed 13 Aug 2014 03:36:23 NZST +1200, Mike wrote: >> >>> My m4 512Gb CT512M4SSD1 suddenly failed to read. Below is the smartctl >>> output that returns eventually, taking at least a couple of minutes. I've >>> tried 2 different USB caddys and also SATA connection to laptop running >>> livecd. The drive has exclusively been mounted read-only on a Linux system >>> with an ext4 filesystem, and has had unexpected power loss routinely >>> throughout its life. The values in the attribute table look OK to me I >>> think, but the serious errors at the bottom look bad. Any ideas greatly >>> appreciated: >> >>> Read SMART Log Directory failed. >>> >>> Error SMART Error Log Read failed: scsi error medium or hardware error >>> (serious) >>> Smartctl: SMART Error Log Read Failed >>> Error SMART Error Self-Test Log Read failed: scsi error medium or hardware >>> error (serious) >>> Smartctl: SMART Self Test Log Read Failed >>> Error SMART Read Selective Self-Test Log failed: scsi error medium or >>> hardware error (serious) >>> Smartctl: SMART Selective Self Test Log Read Failed >> >> This looks like smartctl may not be able to communicate with the drive >> properly, which is what you expect if the drive/computer interface >> circuitry is damaged. However, these errors don't seem to exclude the >> possibility of a serious platter failure - or rather the equivalent in >> solid state. >> >> Moving the disk around between enclosures etc can easily damage the >> drive electronics if sufficient ESD protection procedures have not been >> followed (those procedures must be sufficient regardless of your opinion >> of them ;-) ). If the problem first occurred without having moved any >> hardware around then the problem is with the drive. If you can reliably >> exclude the problem being located at the computer, enclosure or >> connections/cables then it's also the drive. >> >> If the problem is with the drive then the drive is most certainly >> finished. Copy off all the data you can asap and use your backups for >> the rest. >> >> It might be a good idea to treat SSDs the same as spinning versions. The >> differences are in speed and behaviour after being dropped, not >> necessarily in long-term reliability of data. >> >> HTH, >> >> Volker >> >> -- >> Volker Kuhlmann >> http://volker.top.geek.nz/ Please do not CC list postings to me. >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Smartmontools-support mailing list >> Sma...@li... >> https://lists.sourceforge.net/lists/listinfo/smartmontools-support |
From: Christian F. <Chr...@t-...> - 2014-08-16 19:07:10
Attachments:
large-ext-smart-logs.patch
|
Mike wrote: > Greetings > > Following on from previous message, where running smartctl with the > disk connected via SATA returns a 600PB disk (it is actually 512Gb > Crucial M4), I followed advice from the Crucial message boards and > left the disk idle over night in order to trigger automatic garbade > collection. No change the next morning, below is some smartctl > output. I've tried ddrescue to no avail, reads just fail completely. > ... > smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Crucial/Micron RealSSD m4/C400/P400 > Device Model: M4-CT512M4SSD1 > ... > Read SMART Log Directory failed: scsi error medium or hardware error (serious) > This error message is probably due to a limitation/bug in driver/firmware of the used (which?) controller. The device itself should support SMART Log Directory. > General Purpose Log Directory Version 1 > Address Access R/W Size Description > 0x00 GPL R/O 1 Log Directory > 0x03 GPL R/O 16383 Ext. Comprehensive SMART error log > 0x04 GPL R/O 255 Device Statistics log > 0x07 GPL R/O 3449 Extended self-test log > .. > SMART Extended Comprehensive Error Log size 16383 not supported > > Read SMART Error Log failed: scsi error medium or hardware error (serious) > > SMART Extended Self-test Log size 3449 not supported > > Read SMART Self-test Log failed: scsi error medium or hardware error (serious) The extended error/self-test logs may contain useful info. Unfortunately smartctl does not print the extended logs due to a historic size limitation. In the early days of these logs, sizes were <= 8. If possible, apply the attached patch and try whether the logs could be read then. Thanks, Christian |
From: Mike <mg...@gm...> - 2014-08-16 21:47:44
|
Great thanks. I've applied the patch and ran the same command, which initially takes around 30 seconds to return any data at all, it then hung for about 1 minute after printing the GPL table. Here is the output: user@w530:~/Downloads/smartmontools-6.3$ sudo ./smartctl -x /dev/sdb smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Crucial/Micron RealSSD m4/C400/P400 Device Model: M4-CT512M4SSD1 Serial Number: 000000001249091FCC21 LU WWN Device Id: 5 00a075 1091fcc21 Firmware Version: 070H User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s) Local Time is: Sat Aug 16 21:43:33 2014 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Write SCT (Get) Feature Control Command failed: scsi error medium or hardware error (serious) Wt Cache Reorder: Unknown (SCT Feature Control command failed) === START OF READ SMART DATA SECTION === SMART Status command failed: scsi error medium or hardware error (serious) SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 2380) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 39) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 100 100 050 - 0 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 100 100 001 - 4439 12 Power_Cycle_Count -O--CK 100 100 001 - 515 170 Grown_Failing_Block_Ct PO--CK 100 100 010 - 0 171 Program_Fail_Count -O--CK 100 100 001 - 0 172 Erase_Fail_Count -O--CK 100 100 001 - 0 173 Wear_Leveling_Count PO--CK 100 100 010 - 0 174 Unexpect_Power_Loss_Ct -O--CK 100 100 001 - 398 181 Non4k_Aligned_Access -O---K 100 100 001 - 144 45 99 183 SATA_Iface_Downshift -O--CK 100 100 001 - 0 184 End-to-End_Error PO--CK 100 100 050 - 0 187 Reported_Uncorrect -O--CK 100 100 001 - 0 188 Command_Timeout -O--CK 100 100 001 - 0 189 Factory_Bad_Block_Ct -OSR-- 100 100 001 - 215 194 Temperature_Celsius -O---K 100 100 000 - 0 195 Hardware_ECC_Recovered -O-RCK 100 100 001 - 0 196 Reallocated_Event_Count -O--CK 100 100 001 - 0 197 Current_Pending_Sector -O--CK 100 100 001 - 0 198 Offline_Uncorrectable ----CK 100 100 001 - 0 199 UDMA_CRC_Error_Count -O--CK 100 100 001 - 0 202 Perc_Rated_Life_Used ---RC- 100 100 001 - 0 206 Write_Error_Rate -OSR-- 100 100 001 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning Read SMART Log Directory failed: scsi error medium or hardware error (serious) General Purpose Log Directory Version 1 Address Access R/W Size Description 0x00 GPL R/O 1 Log Directory 0x03 GPL R/O 16383 Ext. Comprehensive SMART error log 0x04 GPL R/O 255 Device Statistics log 0x07 GPL R/O 3449 Extended self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters 0x80-0x9f GPL R/W 16 Host vendor specific log 0xa0 GPL VS 2000 Device vendor specific log 0xa1-0xbf GPL VS 1 Device vendor specific log 0xc0 GPL VS 80 Device vendor specific log 0xc1-0xdf GPL VS 1 Device vendor specific log 0xe0 GPL R/W 1 SCT Command/Status 0xe1 GPL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (16383 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (3449 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious) SCT Status Version: 3 SCT Version (vendor specific): 1 (0x0001) SCT Support Level: 0 Device State: Active (0) Current Temperature: 0 Celsius Power Cycle Max Temperature: 0 Celsius Lifetime Max Temperature: 0 Celsius SCT Temperature History Version: 2 Temperature Sampling Period: 10 minutes Temperature Logging Interval: 10 minutes Min/Max recommended Temperature: 0/70 Celsius Min/Max Temperature Limit: -5/75 Celsius Temperature History Size (Index): 478 (37) Index Estimated Time Temperature Celsius 38 2014-08-13 14:10 ? - ... ..(476 skipped). .. - 37 2014-08-16 21:40 ? - Write SCT (Get) Error Recovery Control Command failed: scsi error medium or hardware error (serious) SCT (Get) Error Recovery Control command failed Device Statistics (GP Log 0x04) Page Offset Size Value Description 1 ===== = = == General Statistics (rev 2) == 1 0x008 4 515 Lifetime Power-On Resets 1 0x010 4 4439 Power-on Hours 1 0x018 6 933634265 Logical Sectors Written 1 0x020 6 2708986 Number of Write Commands 1 0x028 6 852431906 Logical Sectors Read 1 0x030 6 6049196 Number of Read Commands 4 ===== = = == General Errors Statistics (rev 1) == 4 0x008 4 0 Number of Reported Uncorrectable Errors 4 0x010 4 0 Resets Between Cmd Acceptance and Completion 5 ===== = = == Temperature Statistics (rev 1) == 5 0x008 1 0 Current Temperature 5 0x010 1 0 Average Short Term Temperature 5 0x018 1 0 Average Long Term Temperature 5 0x020 1 0 Highest Temperature 5 0x028 1 0 Lowest Temperature 5 0x030 1 0 Highest Average Short Term Temperature 5 0x038 1 0 Lowest Average Short Term Temperature 5 0x040 1 0 Highest Average Long Term Temperature 5 0x048 1 0 Lowest Average Long Term Temperature 5 0x050 4 - Time in Over-Temperature 5 0x058 1 70 Specified Maximum Operating Temperature 5 0x060 4 - Time in Under-Temperature 5 0x068 1 0 Specified Minimum Operating Temperature 6 ===== = = == Transport Statistics (rev 1) == 6 0x008 4 0 Number of Hardware Resets 6 0x010 4 0 Number of ASR Events 6 0x018 4 0 Number of Interface CRC Errors 7 ===== = = == Solid State Device Statistics (rev 1) == 7 0x008 1 21~ Percentage Used Endurance Indicator |_ ~ normalized value SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 4 0 Command failed due to ICRC error 0x000a 4 0 Device-to-host register FISes sent due to a COMRESET I then ran a -t short and a --test=long, and then issued the same command as above: user@w530:~/Downloads/smartmontools-6.3$ sudo ./smartctl -x /dev/sdb [sudo] password for user: smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Crucial/Micron RealSSD m4/C400/P400 Device Model: M4-CT512M4SSD1 Serial Number: 000000001249091FCC21 LU WWN Device Id: 5 00a075 1091fcc21 Firmware Version: 070H User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s) Local Time is: Sat Aug 16 22:44:54 2014 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Write SCT (Get) Feature Control Command failed: scsi error medium or hardware error (serious) Wt Cache Reorder: Unknown (SCT Feature Control command failed) === START OF READ SMART DATA SECTION === SMART Status command failed: scsi error medium or hardware error (serious) SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 80) The previous self-test completed having the electrical element of the test failed. Total time to complete Offline data collection: ( 2380) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 39) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 100 100 050 - 0 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 100 100 001 - 4440 12 Power_Cycle_Count -O--CK 100 100 001 - 515 170 Grown_Failing_Block_Ct PO--CK 100 100 010 - 0 171 Program_Fail_Count -O--CK 100 100 001 - 0 172 Erase_Fail_Count -O--CK 100 100 001 - 0 173 Wear_Leveling_Count PO--CK 100 100 010 - 0 174 Unexpect_Power_Loss_Ct -O--CK 100 100 001 - 398 181 Non4k_Aligned_Access -O---K 100 100 001 - 144 45 99 183 SATA_Iface_Downshift -O--CK 100 100 001 - 0 184 End-to-End_Error PO--CK 100 100 050 - 0 187 Reported_Uncorrect -O--CK 100 100 001 - 0 188 Command_Timeout -O--CK 100 100 001 - 0 189 Factory_Bad_Block_Ct -OSR-- 100 100 001 - 215 194 Temperature_Celsius -O---K 100 100 000 - 0 195 Hardware_ECC_Recovered -O-RCK 100 100 001 - 0 196 Reallocated_Event_Count -O--CK 100 100 001 - 0 197 Current_Pending_Sector -O--CK 100 100 001 - 0 198 Offline_Uncorrectable ----CK 100 100 001 - 0 199 UDMA_CRC_Error_Count -O--CK 100 100 001 - 0 202 Perc_Rated_Life_Used ---RC- 100 100 001 - 0 206 Write_Error_Rate -OSR-- 100 100 001 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning Read SMART Log Directory failed: scsi error medium or hardware error (serious) General Purpose Log Directory Version 1 Address Access R/W Size Description 0x00 GPL R/O 1 Log Directory 0x03 GPL R/O 16383 Ext. Comprehensive SMART error log 0x04 GPL R/O 255 Device Statistics log 0x07 GPL R/O 3449 Extended self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters 0x80-0x9f GPL R/W 16 Host vendor specific log 0xa0 GPL VS 2000 Device vendor specific log 0xa1-0xbf GPL VS 1 Device vendor specific log 0xc0 GPL VS 80 Device vendor specific log 0xc1-0xdf GPL VS 1 Device vendor specific log 0xe0 GPL R/W 1 SCT Command/Status 0xe1 GPL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (16383 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (3449 sectors) Read SMART Selective Self-test Log failed: scsi error medium or hardware error (serious) SCT Status Version: 3 SCT Version (vendor specific): 1 (0x0001) SCT Support Level: 0 Device State: Stand-by (1) Current Temperature: 0 Celsius Power Cycle Max Temperature: 0 Celsius Lifetime Max Temperature: 0 Celsius SCT Temperature History Version: 2 Temperature Sampling Period: 10 minutes Temperature Logging Interval: 10 minutes Min/Max recommended Temperature: 0/70 Celsius Min/Max Temperature Limit: -5/75 Celsius Temperature History Size (Index): 478 (40) Index Estimated Time Temperature Celsius 41 2014-08-13 15:10 ? - ... ..(473 skipped). .. - 37 2014-08-16 22:10 ? - 38 2014-08-16 22:20 0 - 39 2014-08-16 22:30 0 - 40 2014-08-16 22:40 0 - Write SCT (Get) Error Recovery Control Command failed: scsi error medium or hardware error (serious) SCT (Get) Error Recovery Control command failed Device Statistics (GP Log 0x04) Page Offset Size Value Description 1 ===== = = == General Statistics (rev 2) == 1 0x008 4 515 Lifetime Power-On Resets 1 0x010 4 4440 Power-on Hours 1 0x018 6 933634265 Logical Sectors Written 1 0x020 6 2708986 Number of Write Commands 1 0x028 6 852431906 Logical Sectors Read 1 0x030 6 6049196 Number of Read Commands 4 ===== = = == General Errors Statistics (rev 1) == 4 0x008 4 0 Number of Reported Uncorrectable Errors 4 0x010 4 0 Resets Between Cmd Acceptance and Completion 5 ===== = = == Temperature Statistics (rev 1) == 5 0x008 1 0 Current Temperature 5 0x010 1 0 Average Short Term Temperature 5 0x018 1 0 Average Long Term Temperature 5 0x020 1 0 Highest Temperature 5 0x028 1 0 Lowest Temperature 5 0x030 1 0 Highest Average Short Term Temperature 5 0x038 1 0 Lowest Average Short Term Temperature 5 0x040 1 0 Highest Average Long Term Temperature 5 0x048 1 0 Lowest Average Long Term Temperature 5 0x050 4 - Time in Over-Temperature 5 0x058 1 70 Specified Maximum Operating Temperature 5 0x060 4 - Time in Under-Temperature 5 0x068 1 0 Specified Minimum Operating Temperature 6 ===== = = == Transport Statistics (rev 1) == 6 0x008 4 0 Number of Hardware Resets 6 0x010 4 0 Number of ASR Events 6 0x018 4 0 Number of Interface CRC Errors 7 ===== = = == Solid State Device Statistics (rev 1) == 7 0x008 1 21~ Percentage Used Endurance Indicator |_ ~ normalized value SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 4 0 Command failed due to ICRC error 0x000a 4 0 Device-to-host register FISes sent due to a COMRESET On 16 August 2014 20:06, Christian Franke <Chr...@t-...> wrote: > Mike wrote: >> >> Greetings >> >> Following on from previous message, where running smartctl with the >> disk connected via SATA returns a 600PB disk (it is actually 512Gb >> Crucial M4), I followed advice from the Crucial message boards and >> left the disk idle over night in order to trigger automatic garbade >> collection. No change the next morning, below is some smartctl >> output. I've tried ddrescue to no avail, reads just fail completely. >> ... >> >> smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local >> build) >> Copyright (C) 2002-14, Bruce Allen, Christian Franke, >> www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Model Family: Crucial/Micron RealSSD m4/C400/P400 >> Device Model: M4-CT512M4SSD1 >> ... >> >> Read SMART Log Directory failed: scsi error medium or hardware error >> (serious) >> > > This error message is probably due to a limitation/bug in driver/firmware of > the used (which?) controller. The device itself should support SMART Log > Directory. > > >> General Purpose Log Directory Version 1 >> Address Access R/W Size Description >> 0x00 GPL R/O 1 Log Directory >> 0x03 GPL R/O 16383 Ext. Comprehensive SMART error log >> 0x04 GPL R/O 255 Device Statistics log >> 0x07 GPL R/O 3449 Extended self-test log >> .. >> >> SMART Extended Comprehensive Error Log size 16383 not supported >> >> Read SMART Error Log failed: scsi error medium or hardware error (serious) >> >> SMART Extended Self-test Log size 3449 not supported >> >> Read SMART Self-test Log failed: scsi error medium or hardware error >> (serious) > > > The extended error/self-test logs may contain useful info. Unfortunately > smartctl does not print the extended logs due to a historic size limitation. > In the early days of these logs, sizes were <= 8. > > If possible, apply the attached patch and try whether the logs could be read > then. > > Thanks, > Christian > |
From: Christian F. <Chr...@t-...> - 2014-08-18 19:05:48
|
Mike wrote: > Great thanks. I've applied the patch and ran the same command, which > initially takes around 30 seconds to return any data at all, it then > hung for about 1 minute after printing the GPL table. This is because, with the patch, these large logs are completely read into memory. Smartctl should only read sectors needed but this requires some rework. > ... > Self-test execution status: ( 80) The previous self-test completed having > the electrical element of the test failed. This suggests a serious problem with the drive. > ... > > SMART Extended Comprehensive Error Log Version: 1 (16383 sectors) > No Errors Logged > > SMART Extended Self-test Log Version: 1 (3449 sectors) smartctl should print one entry here but doesn't. There is either a bug in smartctl or the drive does not fill the self-test log as specified by the standard. Please provide output of "smartctl -l gplog,0x07,0-1 /dev/sdb" as attachment(!). Connecting to another SATA controller with better pass-through support may help to read the old Self-test Log (don't use -x, use -a or -l selftest). Thanks, Christian |
From: Mike <mg...@gm...> - 2014-08-19 18:06:22
|
Hi root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -l gplog,0x70,0-1 /dev/sdb smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org General Purpose Log 0x70 does not exist (override with '-T permissive' option) root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -T permissive -l gplog,0x70,0-1 /dev/sdb smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org General Purpose Log 0x70 has only 1 sectors, output truncated ATA_READ_LOG_EXT (addr=0x70:0x00, page=0, n=1) failed: scsi error medium or hardware error (serious) The first command took a while to run again, during which the following appeared: [ 241.613661] INFO: task smartctl:2566 blocked for more than 120 seconds. [ 241.613665] Tainted: PF O 3.13.0-34-generic #60-Ubuntu [ 241.613674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.613675] smartctl D ffff88021ddd4440 0 2566 2547 0x00000000 [ 241.613689] ffff8800cffe7b30 0000000000000002 ffff88020d7e5fc0 ffff8800cffe7fd8 [ 241.613694] 0000000000014440 0000000000014440 ffff88020d7e5fc0 ffff8802116a1058 [ 241.613699] ffff8802116a105c ffff88020d7e5fc0 00000000ffffffff ffff8802116a1060 [ 241.613704] Call Trace: [ 241.613717] [<ffffffff817206d9>] schedule_preempt_disabled+0x29/0x70 [ 241.613723] [<ffffffff81722545>] __mutex_lock_slowpath+0x135/0x1b0 [ 241.613729] [<ffffffff817225df>] mutex_lock+0x1f/0x2f [ 241.613735] [<ffffffff811f6c83>] __blkdev_get+0x63/0x4c0 [ 241.613740] [<ffffffff811f72a5>] blkdev_get+0x1c5/0x340 [ 241.613745] [<ffffffff811f59fe>] ? bdget+0x3e/0x150 [ 241.613749] [<ffffffff811f74cb>] blkdev_open+0x5b/0x80 [ 241.613756] [<ffffffff811ba033>] do_dentry_open+0x233/0x2e0 [ 241.613760] [<ffffffff811f7470>] ? blkdev_get_by_dev+0x50/0x50 [ 241.613764] [<ffffffff811ba369>] vfs_open+0x49/0x50 [ 241.613770] [<ffffffff811c8f04>] do_last+0x554/0x1200 [ 241.613779] [<ffffffff81311c6b>] ? apparmor_file_alloc_security+0x5b/0x180 [ 241.613784] [<ffffffff811cc38b>] path_openat+0xbb/0x640 [ 241.613790] [<ffffffff811cd76a>] do_filp_open+0x3a/0x90 [ 241.613797] [<ffffffff811da527>] ? __alloc_fd+0xa7/0x130 [ 241.613802] [<ffffffff811bbe89>] do_sys_open+0x129/0x280 [ 241.613809] [<ffffffff81020d45>] ? syscall_trace_enter+0x145/0x250 [ 241.613814] [<ffffffff811bbffe>] SyS_open+0x1e/0x20 [ 241.613821] [<ffffffff8172c97f>] tracesys+0xe1/0xe6 On 18 August 2014 20:05, Christian Franke <Chr...@t-...> wrote: > Mike wrote: >> >> Great thanks. I've applied the patch and ran the same command, which >> initially takes around 30 seconds to return any data at all, it then >> hung for about 1 minute after printing the GPL table. > > > This is because, with the patch, these large logs are completely read into > memory. Smartctl should only read sectors needed but this requires some > rework. > > >> ... >> >> Self-test execution status: ( 80) The previous self-test completed >> having >> the electrical element of the test failed. > > > This suggests a serious problem with the drive. > > >> ... >> >> >> SMART Extended Comprehensive Error Log Version: 1 (16383 sectors) >> No Errors Logged >> >> SMART Extended Self-test Log Version: 1 (3449 sectors) > > > smartctl should print one entry here but doesn't. There is either a bug in > smartctl or the drive does not fill the self-test log as specified by the > standard. > > Please provide output of "smartctl -l gplog,0x07,0-1 /dev/sdb" as > attachment(!). > > Connecting to another SATA controller with better pass-through support may > help to read the old Self-test Log (don't use -x, use -a or -l selftest). > > Thanks, > Christian > |
From: Christian F. <Chr...@t-...> - 2014-08-19 18:22:33
|
Mike wrote: > Hi > > root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -l > gplog,0x70,0-1 /dev/sdb > smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local build) > Copyright (C) 2002-14, Bruce Allen, Christian Franke,www.smartmontools.org > > General Purpose Log 0x70 does not exist (override with '-T permissive' option) Please re-read my previous mail: > Please provide output of "smartctl -l gplog,0x07,0-1 /dev/sdb" as > attachment(!). Thanks, Christian |
From: Mike <mg...@gm...> - 2014-08-19 18:33:43
Attachments:
gplog.txt
|
Sorry, been a long day. Output attached. Two more call traces appeared too, do you want those? Thanks very much. On 19 August 2014 19:22, Christian Franke <Chr...@t-...> wrote: > Mike wrote: >> >> Hi >> >> root@w530:/home/user/Downloads/smartmontools-6.3# ./smartctl -l >> gplog,0x70,0-1 /dev/sdb >> smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.13.0-34-generic] (local >> build) >> Copyright (C) 2002-14, Bruce Allen, Christian Franke,www.smartmontools.org >> >> General Purpose Log 0x70 does not exist (override with '-T permissive' >> option) > > > Please re-read my previous mail: > > >> Please provide output of "smartctl -l gplog,0x07,0-1 /dev/sdb" as >> attachment(!). > > > Thanks, > Christian > |