From: Janardhan M. <mja...@gm...> - 2014-06-20 20:42:59
|
Hi, On systems with Micron ReallSSDs, I noticed that Airflow_Temperature_Cel values are not normalized, instead real values causing the smarttools smartctl health check to report failures. The drive type is present in the smarttools database. Is there a generalized way to run checks so that the reporting is consistent across all the disks [or] we need to intercept these kind of divergences. Any ideas on these values? Is there way to identify if the reported value is real vs normalized with smarttools? >From the doc, I read that as long as the value is higher than the threshold its good (i.e. normalized value 100-<realvalue>) otherwise its bad. Janny E.g. #smartctl /dev/bus/0 -d sat+megaraid,9 -H smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-431.17.1.el6.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. Failed Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0023 019 033 069 Pre-fail Always FAILING_NOW 19 (Min/Max 17/33) # #./smartmontools-6.2/sbin/smartctl /dev/bus/0 -d sat+megaraid,9 -a smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-431.17.1.el6.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Crucial/Micron RealSSD m4/C400/P400 Device Model: MTFDDAK256MAR-1K1AA 90Y8644 90Y8647IBM Serial Number: 033F8A54 LU WWN Device Id: 5 00a075 1033f8a54 Firmware Version: MA44 User Capacity: 256,060,514,304 bytes [256 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Jun 21 02:06:14 2014 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. See vendor-specific Attribute list for failed Attributes. General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 1190) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 19) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0032 100 100 001 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 14019 12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 18 170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0 173 Wear_Leveling_Count 0x0033 099 099 000 Pre-fail Always - 56 174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 15 181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 0 0 0 183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0 189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 90 190 Airflow_Temperature_Cel 0x0023 019 033 069 Pre-fail Always FAILING_NOW 19 (Min/Max 17/33) 194 Temperature_Celsius 0x0022 019 033 000 Old_age Always - 19 (Min/Max 17/33) 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 105 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 2 202 Perc_Rated_Life_Used 0x0018 099 099 001 Old_age Offline - 1 206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 0 231 Temperature_Celsius 0x0033 099 099 010 Pre-fail Always - 0 225 Unknown_SSD_Attribute 0x0000 100 100 000 Old_age Offline - 25994335754 242 Total_LBAs_Read 0x0002 100 100 001 Old_age Always - 8736 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. # |
From: Christian F. <Chr...@t-...> - 2014-06-21 11:18:06
|
Janardhan Molumuri wrote: > On systems with Micron ReallSSDs, I noticed that > Airflow_Temperature_Cel values are not normalized, instead real values > causing the smarttools smartctl health check to report failures. > > The drive type is present in the smarttools database. Is there a > generalized way to run checks so that the reporting is consistent > across all the disks [or] we need to intercept these kind of divergences. > ATA SMART attributes are vendor specific, so there is no way which works with all devices. > Any ideas on these values? Is there way to identify if the reported > value is real vs normalized with smarttools? > > From the doc, I read that as long as the value is higher than the > threshold its good (i.e. normalized value 100-<realvalue>) otherwise > its bad. Yes, that is existing practice. No, that is not part of the ATA standards since ATA-4 (1998). Newer drives including Micron SSDs support ATA Device Statistics (smartctl -l devstat, included in smartctl -x). This is part of recent standards and not vendor specific. > > #smartctl /dev/bus/0 -d sat+megaraid,9 -H > smartctl 6.2 2013-07-26 r3841 > [x86_64-linux-2.6.32-431.17.1.el6.x86_64] (local build) > Copyright (C) 2002-13, Bruce Allen, Christian Franke, > www.smartmontools.org <http://www.smartmontools.org> > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: FAILED! > Drive failure expected in less than 24 hours. SAVE ALL DATA. > Failed Attributes: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 190 Airflow_Temperature_Cel 0x0023 019 033 069 Pre-fail Always > FAILING_NOW 19 (Min/Max 17/33) > None of the specs or previous sample outputs from Micron RealSSDs include attribute 190. Please provide output of smartctl -r ioctl,2 -i -H -d sat+megaraid,9 /dev/bus/0 (as an attachment) I want to make sure that drive's SMART STATUS command actually returns FAILED. In this case this is probably a drive firmware bug. Thanks, Christian |
From: Janardhan M. <mja...@gm...> - 2014-06-22 04:59:15
|
#./smartmontools-6.2/sbin/smartctl -r ioctl,2 -i -H -d sat+megaraid,9 /dev/bus/0 smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.32-431.17.1.el6.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Creating /dev/megaraid_sas_ioctl_node = 17 REPORT-IOCTL: Device=/dev/bus/0 Command=IDENTIFY DEVICE Input: FR=...., SC=0x01, LL=...., LM=...., LH=...., DEV=...., CMD=0xec IN [ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ] REPORT-IOCTL: Device=/dev/bus/0 Command=IDENTIFY DEVICE returned 0 ===== [IDENTIFY DEVICE] DATA START (BASE-16) ===== 000-015: 40 04 ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00 |@..?7.......?...| 016-031: 00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20 |.... | 032-047: 33 30 46 33 41 38 34 35 00 00 00 00 00 00 20 20 |30F3A845...... | 048-063: 20 20 41 4d 34 34 54 4d 44 46 41 44 32 4b 36 35 | AM44TMDFAD2K65| 064-079: 41 4d 2d 52 4b 31 41 31 20 41 39 20 59 30 36 38 |AM-RK1A1 A9 Y068| 080-095: 34 34 39 20 59 30 36 38 37 34 42 49 20 4d 10 80 |449 Y06874BI M..| 096-111: 00 40 00 2f 01 40 00 00 00 00 07 00 ff 3f 10 00 |.@./.@.......?..| 112-127: 3f 00 10 fc fb 00 10 01 ff ff ff 0f 00 00 07 00 |?...............| 128-143: 03 00 78 00 78 00 78 00 78 00 00 40 00 00 00 00 |..x.x.x.x..@....| 144-159: 00 00 00 00 00 00 1f 00 0e 17 06 00 4c 00 44 00 |............L.D.| 160-175: f8 03 28 00 4b 74 09 7d 63 61 49 74 09 bc 63 61 |..(.Kt.}caIt..ca| 176-191: 3f 20 01 00 01 00 fe 00 fe ff 00 00 00 00 00 00 |? ..............| 192-207: 00 00 00 00 00 00 00 00 b0 32 cf 1d 00 00 00 00 |.........2......| 208-223: 00 00 08 00 00 40 00 00 0a 50 51 07 3f 03 54 8a |.....@...PQ.?.T.| 224-239: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1e 40 |...............@| 240-255: 1c 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.@..............| 256-271: 21 00 31 30 34 34 30 2e 2e 31 36 30 00 00 00 00 |!.10440..160....| 272-287: 00 00 39 37 35 35 20 20 20 20 41 34 37 4c 39 37 |..9755 A47L97| 288-303: 34 31 20 20 20 20 00 00 00 00 00 00 00 00 00 00 |41 ..........| 304-319: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 320-335: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 336-351: 03 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 352-367: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 368-383: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 384-399: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 400-415: 00 00 00 00 00 00 00 00 00 00 00 00 3d 00 00 00 |............=...| 416-431: 00 00 00 40 00 00 00 00 00 00 01 00 00 00 00 00 |...@............| 432-447: 00 00 01 00 00 00 00 00 00 00 00 00 3f 10 00 00 |............?...| 448-463: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 464-479: 00 00 00 00 01 00 ff 00 00 00 00 00 00 00 00 00 |................| 480-495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 496-511: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a5 86 |................| ===== [IDENTIFY DEVICE] DATA END (512 Bytes) ===== === START OF INFORMATION SECTION === Model Family: Crucial/Micron RealSSD m4/C400/P400 Device Model: MTFDDAK256MAR-1K1AA 90Y8644 90Y8647IBM Serial Number: 033F8A54 LU WWN Device Id: 5 00a075 1033f8a54 Firmware Version: MA44 User Capacity: 256,060,514,304 bytes [256 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Jun 22 10:12:40 2014 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled REPORT-IOCTL: Device=/dev/bus/0 Command=SMART READ ATTRIBUTE VALUES Input: FR=0xd0, SC=0x01, LL=...., LM=0x4f, LH=0xc2, DEV=...., CMD=0xb0 IN [ata pass-through(16): 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 ] [Duration: 0.006s] REPORT-IOCTL: Device=/dev/bus/0 Command=SMART READ ATTRIBUTE VALUES returned 0 ===== [SMART READ ATTRIBUTE VALUES] DATA START (BASE-16) ===== 000-015: 10 00 01 2f 00 64 64 00 00 00 00 00 00 32 05 32 |.../.dd......2.2| 016-031: 00 64 64 00 00 00 00 00 00 01 09 32 00 64 64 e3 |.dd........2.dd.| 032-047: 36 00 00 00 00 01 0c 32 00 64 64 12 00 00 00 00 |6......2.dd.....| 048-063: 00 01 aa 33 00 64 64 00 00 00 00 00 00 0a ab 32 |...3.dd........2| 064-079: 00 64 64 00 00 00 00 00 00 01 ac 32 00 64 64 00 |.dd........2.dd.| 080-095: 00 00 00 00 00 01 ad 33 00 63 63 38 00 00 00 00 |.......3.cc8....| 096-111: 00 00 ae 32 00 64 64 0f 00 00 00 00 00 01 b5 22 |...2.dd........"| 112-127: 00 64 64 00 00 00 00 00 00 01 b7 32 00 64 64 00 |.dd........2.dd.| 128-143: 00 00 00 00 00 01 b8 33 00 64 64 00 00 00 00 00 |.......3.dd.....| 144-159: 00 32 bb 32 00 64 64 00 00 00 00 00 00 01 bc 32 |.2.2.dd........2| 160-175: 00 64 64 00 00 00 00 00 00 01 bd 0e 00 64 64 5a |.dd..........ddZ| 176-191: 00 00 00 00 00 01 be 23 00 13 21 13 00 11 00 21 |.......#..!....!| 192-207: 00 45 c2 22 00 13 21 13 00 11 00 21 00 00 c3 3a |.E."..!....!...:| 208-223: 00 64 64 69 00 00 00 00 00 01 c4 32 00 64 64 00 |.ddi.......2.dd.| 224-239: 00 00 00 00 00 01 c5 32 00 64 64 00 00 00 00 00 |.......2.dd.....| 240-255: 00 01 c6 30 00 64 64 00 00 00 00 00 00 01 c7 32 |...0.dd........2| 256-271: 00 64 64 02 00 00 00 00 00 01 ca 18 00 63 63 01 |.dd..........cc.| 272-287: 00 00 00 00 00 01 ce 0e 00 64 64 00 00 00 00 00 |.........dd.....| 288-303: 00 01 e7 33 00 63 63 00 00 00 00 00 00 0a e1 00 |...3.cc.........| 304-319: 00 64 64 16 31 7d 11 06 00 00 f2 02 00 64 64 20 |.dd.1}.......dd | 320-335: 22 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 |"...............| 336-351: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 352-367: 00 00 00 00 00 00 00 00 00 00 80 00 a6 04 00 7b |...............{| 368-383: 03 00 01 00 02 13 03 00 00 00 00 00 00 00 00 00 |................| 384-399: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 400-415: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 416-431: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 432-447: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 448-463: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 464-479: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 480-495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 496-511: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c2 |................| ===== [SMART READ ATTRIBUTE VALUES] DATA END (512 Bytes) ===== REPORT-IOCTL: Device=/dev/bus/0 Command=SMART READ ATTRIBUTE THRESHOLDS Input: FR=0xd1, SC=0x01, LL=0x01, LM=0x4f, LH=0xc2, DEV=...., CMD=0xb0 IN [ata pass-through(16): 85 08 0e 00 d1 00 01 00 01 00 4f 00 c2 00 b0 00 ] REPORT-IOCTL: Device=/dev/bus/0 Command=SMART READ ATTRIBUTE THRESHOLDS returned 0 ===== [SMART READ ATTRIBUTE THRESHOLDS] DATA START (BASE-16) ===== 000-015: 10 00 01 32 00 00 00 00 00 00 00 00 00 00 05 01 |...2............| 016-031: 00 00 00 00 00 00 00 00 00 00 09 01 00 00 00 00 |................| 032-047: 00 00 00 00 00 00 0c 01 00 00 00 00 00 00 00 00 |................| 048-063: 00 00 aa 0a 00 00 00 00 00 00 00 00 00 00 ab 01 |................| 064-079: 00 00 00 00 00 00 00 00 00 00 ac 01 00 00 00 00 |................| 080-095: 00 00 00 00 00 00 ad 00 00 00 00 00 00 00 00 00 |................| 096-111: 00 00 ae 01 00 00 00 00 00 00 00 00 00 00 b5 01 |................| 112-127: 00 00 00 00 00 00 00 00 00 00 b7 01 00 00 00 00 |................| 128-143: 00 00 00 00 00 00 b8 32 00 00 00 00 00 00 00 00 |.......2........| 144-159: 00 00 bb 01 00 00 00 00 00 00 00 00 00 00 bc 01 |................| 160-175: 00 00 00 00 00 00 00 00 00 00 bd 01 00 00 00 00 |................| 176-191: 00 00 00 00 00 00 be 45 00 00 00 00 00 00 00 00 |.......E........| 192-207: 00 00 c2 00 00 00 00 00 00 00 00 00 00 00 c3 01 |................| 208-223: 00 00 00 00 00 00 00 00 00 00 c4 01 00 00 00 00 |................| 224-239: 00 00 00 00 00 00 c5 01 00 00 00 00 00 00 00 00 |................| 240-255: 00 00 c6 01 00 00 00 00 00 00 00 00 00 00 c7 01 |................| 256-271: 00 00 00 00 00 00 00 00 00 00 ca 01 00 00 00 00 |................| 272-287: 00 00 00 00 00 00 ce 01 00 00 00 00 00 00 00 00 |................| 288-303: 00 00 e7 0a 00 00 00 00 00 00 00 00 00 00 e1 00 |................| 304-319: 00 00 00 00 00 00 00 00 00 00 f2 01 00 00 00 00 |................| 320-335: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 336-351: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 352-367: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 368-383: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 384-399: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 400-415: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 416-431: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 432-447: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 448-463: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 464-479: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 480-495: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 496-511: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a6 |................| ===== [SMART READ ATTRIBUTE THRESHOLDS] DATA END (512 Bytes) ===== === START OF READ SMART DATA SECTION === REPORT-IOCTL: Device=/dev/bus/0 Command=SMART STATUS CHECK Input: FR=0xda, SC=...., LL=...., LM=0x4f, LH=0xc2, DEV=...., CMD=0xb0 [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] sat_device::ata_pass_through: scsi_pass_through() failed, errno=38 [ATA return descriptor not supported by controller firmware] REPORT-IOCTL: Device=/dev/bus/0 Command=SMART STATUS CHECK returned -1 errno=38 [ATA return descriptor not supported by controller firmware] SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. Failed Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0023 019 033 069 Pre-fail Always FAILING_NOW 19 (Min/Max 17/33) # |
From: Christian F. <Chr...@t-...> - 2014-06-25 16:01:43
|
Janardhan Molumuri wrote: > Thanks for your response, Christian. > > >> smartctl -r ioctl,2 -i -H -d sat+megaraid,9 /dev/bus/0 > Attaching the output. smart status check returned -1. > ... > REPORT-IOCTL: Device=/dev/bus/0 Command=SMART STATUS CHECK > Input: FR=0xda, SC=...., LL=...., LM=0x4f, LH=0xc2, DEV=...., CMD=0xb0 > [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] > sat_device::ata_pass_through: scsi_pass_through() failed, errno=38 [ATA return descriptor not supported by controller firmware] > REPORT-IOCTL: Device=/dev/bus/0 Command=SMART STATUS CHECK returned -1 errno=38 [ATA return descriptor not supported by controller firmware] > SMART overall-health self-assessment test result: FAILED! > Drive failure expected in less than 24 hours. SAVE ALL DATA. > Failed Attributes: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 190 Airflow_Temperature_Cel 0x0023 019 033 069 Pre-fail Always FAILING_NOW 19 (Min/Max 17/33) We have 3 issues here: - LSI MegaRAID firmware does not support SAT ATA return descriptor which means that the (boolean) result from SMART RETURN STATUS command be read from the drive. - smartctl reports this fact only in debug output (I fixed this in r3920). - Drive returns bogus normalized VALUE and THRESHold for attribute 190. If SMART RETURN STATUS does not work, smartctl checks the attribute table instead. Therefore smartctl reports a FAILED SMART test result which is possibly not identical to the actual status returned by the drive. > > >>None of the specs or previous sample outputs from Micron RealSSDs > include attribute 190. > What is the best way out in these cases to detect the health status of > the disks? Is there a flag or something to detect the value type > (normalized vs real)? No, the value in the normalized VALUE field should always be normalized :-) Could you possibly connect this drive to some Motherboard controller or other non-RAID controller and repeat the test? Thanks, Christian |
From: Janardhan M. <mja...@gm...> - 2014-07-17 20:14:46
|
The drivedb.h http://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h doesn't mention the attribute (190 - Airflow_Temperature_Cel). Is it that the smartctl supposed to report only the below values as specified in the drivedb.h (seems like not the case here)? There is no easy way to ignore some attributes on the command line. { "Crucial/Micron RealSSD m4/C400/P400", // Marvell 9176, fixed firmware 209 "C400-MTFDDA[ACK](064|128|256|512)MAM|" 210 "M4-CT(064|128|256|512)M4SSD[123]|" // tested with M4-CT512M4SSD2/0309 211 "MTFDDAK(064|128|256|512|050|100|200|400)MA[RN]-1[JKS]1AA.*", // tested with 212 // MTFDDAK256MAR-1K1AA/MA52 213 "030[9-Z]|03[1-Z].|0[4-Z]..|[1-Z]....*", // >= "0309" 214 "", 215 //"-v 1,raw48,Raw_Read_Error_Rate " 216 //"-v 5,raw16(raw16),Reallocated_Sector_Ct " 217 //"-v 9,raw24(raw8),Power_On_Hours " 218 //"-v 12,raw48,Power_Cycle_Count " 219 "-v 170,raw48,Grown_Failing_Block_Ct " 220 "-v 171,raw48,Program_Fail_Count " 221 "-v 172,raw48,Erase_Fail_Count " 222 "-v 173,raw48,Wear_Leveling_Count " 223 "-v 174,raw48,Unexpect_Power_Loss_Ct " 224 "-v 181,raw16,Non4k_Aligned_Access " 225 "-v 183,raw48,SATA_Iface_Downshift " 226 //"-v 184,raw48,End-to-End_Error " 227 //"-v 187,raw48,Reported_Uncorrect " 228 //"-v 188,raw48,Command_Timeout " 229 "-v 189,raw48,Factory_Bad_Block_Ct " 230 //"-v 194,tempminmax,Temperature_Celsius " 231 //"-v 195,raw48,Hardware_ECC_Recovered " 232 //"-v 196,raw16(raw16),Reallocated_Event_Count " 233 //"-v 197,raw48,Current_Pending_Sector " 234 //"-v 198,raw48,Offline_Uncorrectable " 235 //"-v 199,raw48,UDMA_CRC_Error_Count " 236 "-v 202,raw48,Perc_Rated_Life_Used " 237 "-v 206,raw48,Write_Error_Rate" 238 }, thanks, Janny On Wed, Jun 25, 2014 at 12:01 PM, Christian Franke < Chr...@t-...> wrote: > Janardhan Molumuri wrote: > >> Thanks for your response, Christian. >> >> >> smartctl -r ioctl,2 -i -H -d sat+megaraid,9 /dev/bus/0 >> Attaching the output. smart status check returned -1. >> ... >> >> REPORT-IOCTL: Device=/dev/bus/0 Command=SMART STATUS CHECK >> Input: FR=0xda, SC=...., LL=...., LM=0x4f, LH=0xc2, DEV=...., CMD=0xb0 >> [ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ] >> >> sat_device::ata_pass_through: scsi_pass_through() failed, errno=38 [ATA >> return descriptor not supported by controller firmware] >> REPORT-IOCTL: Device=/dev/bus/0 Command=SMART STATUS CHECK returned -1 >> errno=38 [ATA return descriptor not supported by controller firmware] >> SMART overall-health self-assessment test result: FAILED! >> Drive failure expected in less than 24 hours. SAVE ALL DATA. >> Failed Attributes: >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED >> WHEN_FAILED RAW_VALUE >> 190 Airflow_Temperature_Cel 0x0023 019 033 069 Pre-fail Always >> FAILING_NOW 19 (Min/Max 17/33) >> > > We have 3 issues here: > > - LSI MegaRAID firmware does not support SAT ATA return descriptor which > means that the (boolean) result from SMART RETURN STATUS command be read > from the drive. > > - smartctl reports this fact only in debug output (I fixed this in r3920). > > - Drive returns bogus normalized VALUE and THRESHold for attribute 190. > > If SMART RETURN STATUS does not work, smartctl checks the attribute table > instead. Therefore smartctl reports a FAILED SMART test result which is > possibly not identical to the actual status returned by the drive. > > > >> >>None of the specs or previous sample outputs from Micron RealSSDs >> include attribute 190. >> What is the best way out in these cases to detect the health status of >> the disks? Is there a flag or something to detect the value type >> (normalized vs real)? >> > > No, the value in the normalized VALUE field should always be normalized :-) > > Could you possibly connect this drive to some Motherboard controller or > other non-RAID controller and repeat the test? > > Thanks, > Christian > > -- -- JNR |
From: Christian F. <Chr...@t-...> - 2014-07-18 16:51:45
|
Janardhan Molumuri wrote: > The drivedb.h > http://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h > doesn't mention the attribute (190 - Airflow_Temperature_Cel). Is it > that the smartctl supposed to report only the below values as > specified in the drivedb.h (seems like not the case here)? The -v options change the default setting for an attribute. Default settings are documented in first dummy entry drivedb.h. See also smartctl man page. > There is no easy way to ignore some attributes on the command line. Smartctl always prints all attributes provided by the drive. Simply filter smartctl output with some script, e.g. # /usr/sbin/smartctl -A /dev/sdX | sed '/^194 Temp/d' Thanks, Christian |
From: Janardhan M. <mja...@gm...> - 2014-07-18 20:15:32
|
I should have clarified it, i meant smartctl -H ( --health) doesn't have an easy way to ignore some attributes. e.g. in this case 190. thanks, Janny On Fri, Jul 18, 2014 at 12:51 PM, Christian Franke < Chr...@t-...> wrote: > Janardhan Molumuri wrote: > >> The drivedb.h http://www.smartmontools.org/browser/trunk/smartmontools/ >> drivedb.h doesn't mention the attribute (190 - Airflow_Temperature_Cel). >> Is it that the smartctl supposed to report only the below values as >> specified in the drivedb.h (seems like not the case here)? >> > > The -v options change the default setting for an attribute. Default > settings are documented in first dummy entry drivedb.h. See also smartctl > man page. > > > > There is no easy way to ignore some attributes on the command line. >> > > Smartctl always prints all attributes provided by the drive. Simply filter > smartctl output with some script, e.g. > # /usr/sbin/smartctl -A /dev/sdX | sed '/^194 Temp/d' > > Thanks, > Christian > > -- -- JNR |
From: Christian F. <Chr...@t-...> - 2014-07-19 11:21:31
|
Janardhan Molumuri wrote: > I should have clarified it, i meant smartctl -H ( --health) doesn't > have an easy way to ignore some attributes. e.g. in this case 190. > There is not option specific for this rare use case (broken controller firmware, bogus drive temperature attribute). Normally -H does not check the attributes but simply prints boolean SMART RETURN STATUS info from the drive. Try "-v 190,raw48:v" option as a workaround. It prints the VALUE field as RAW and should suppress threshold check for attribute 190. If this works, you could add a local drive database entry to /etc/smart_drivedb.h, see -B option on smartctl man page. Thanks Christian |
From: Janardhan M. <mja...@gm...> - 2014-07-23 12:42:45
|
I found that the new disk firmware had fixed the SMART reporting and now it reports normalized values :) thanks for your assistance here, Christian. Janny On Sat, Jul 19, 2014 at 7:21 AM, Christian Franke < Chr...@t-...> wrote: > Janardhan Molumuri wrote: > >> I should have clarified it, i meant smartctl -H ( --health) doesn't have >> an easy way to ignore some attributes. e.g. in this case 190. >> >> > There is not option specific for this rare use case (broken controller > firmware, bogus drive temperature attribute). > Normally -H does not check the attributes but simply prints boolean SMART > RETURN STATUS info from the drive. > > Try "-v 190,raw48:v" option as a workaround. It prints the VALUE field as > RAW and should suppress threshold check for attribute 190. > If this works, you could add a local drive database entry to > /etc/smart_drivedb.h, see -B option on smartctl man page. > > Thanks > Christian > > -- -- JNR |
From: Christian F. <Chr...@t-...> - 2014-07-23 15:57:34
|
Janardhan Molumuri wrote: > I found that the new disk firmware had fixed the SMART reporting and > now it reports normalized values :) > New firmware version? > thanks for your assistance here, Christian. > You're welcome. Thanks, Christian |
From: Janardhan M. <mja...@gm...> - 2014-07-23 17:19:37
|
Version: MA52 On Wed, Jul 23, 2014 at 11:57 AM, Christian Franke < Chr...@t-...> wrote: > Janardhan Molumuri wrote: > >> I found that the new disk firmware had fixed the SMART reporting and now >> it reports normalized values :) >> >> > New firmware version? > > > > thanks for your assistance here, Christian. >> >> You're welcome. > > Thanks, > Christian > > -- -- JNR |