From: Jay L. <jl...@sl...> - 2019-09-30 02:27:40
|
Dave, Thank you. The Smartctl output is below, and it appears that I do not have any reallocated sectors. Thank you in advance for any thoughts. === START OF INFORMATION SECTION === Device Model: ST2000DM008-2FR102 Serial Number: ZFL08677 LU WWN Device Id: 5 000c50 0b50ff26c Firmware Version: 0001 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Sep 29 21:56:04 2019 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 201) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x30a5) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 079 064 006 - 76784672 3 Spin_Up_Time PO---- 098 098 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 33 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 7 Seek_Error_Rate POSR-- 073 060 045 - 18906494 9 Power_On_Hours -O--CK 096 096 000 - 3537 (26 240 0) 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 18 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 100 100 000 - 0 189 High_Fly_Writes -O-RCK 100 100 000 - 0 190 Airflow_Temperature_Cel -O---K 070 067 040 - 30 (Min/Max 28/32) 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 138 193 Load_Cycle_Count -O--CK 099 099 000 - 2301 194 Temperature_Celsius -O---K 030 040 000 - 30 (0 19 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 079 064 000 - 76784672 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 240 Head_Flying_Hours ------ 100 253 000 - 3487 (125 231 0) 241 Total_LBAs_Written ------ 100 253 000 - 1813646568 242 Total_LBAs_Read ------ 100 253 000 - 26007428236 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 3432 - # 2 Short offline Completed without error 00% 3429 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Jay On Sun, Sep 29, 2019 at 6:49 PM David Myer <dav...@pr...> wrote: > Hi Jay, > > I encounter this occasionally and have usually found no issues with the > disks according to SMART. Can you post your smartctl output? I was advised > that if there are "reallocated sectors" in the smart summary, this could > relate to the problem. Alternatively, if your disk(s) have compression > enabled, this may be causing errors. > > One thing you could try is mark the disk for removal, reformat it when > ready, then re-add it to the cluster. > > Cheers, > Dave > > > Sent with ProtonMail <https://protonmail.com> Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Sunday, September 29, 2019 3:33 PM, Jay Livens <jl...@sl...> > wrote: > > Hi, > > I have a small MooseFS cluster running on four identical nodes. > Everything was running smoothly until a week ago when one of the nodes > started showing a value under "Last Error." The "Last Error" field updates > every couple of days. The status is still shown as "Ok" for the drive. > > I have run scans on the hard drive on the "Last Error" node, and they > passed without issues. I don't see any issues in the SMART data either. > > What exactly is going on and what exactly does a value in 'Last Error" > tell me? Can someone advise on what else I should check on? > > Thank you, > > Jay > > > |