|
From: Jay L. <jl...@sl...> - 2019-09-30 02:27:40
|
Dave,
Thank you. The Smartctl output is below, and it appears that I do not have
any reallocated sectors. Thank you in advance for any thoughts.
=== START OF INFORMATION SECTION ===
Device Model: ST2000DM008-2FR102
Serial Number: ZFL08677
LU WWN Device Id: 5 000c50 0b50ff26c
Firmware Version: 0001
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 29 21:56:04 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has
ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 201) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30a5) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 079 064 006 - 76784672
3 Spin_Up_Time PO---- 098 098 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 33
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
7 Seek_Error_Rate POSR-- 073 060 045 - 18906494
9 Power_On_Hours -O--CK 096 096 000 - 3537 (26 240 0)
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 18
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 070 067 040 - 30 (Min/Max
28/32)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 138
193 Load_Cycle_Count -O--CK 099 099 000 - 2301
194 Temperature_Celsius -O---K 030 040 000 - 30 (0 19 0 0 0)
195 Hardware_ECC_Recovered -O-RC- 079 064 000 - 76784672
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 3487 (125 231
0)
241 Total_LBAs_Written ------ 100 253 000 - 1813646568
242 Total_LBAs_Read ------ 100 253 000 - 26007428236
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed without error 00% 3432
-
# 2 Short offline Completed without error 00% 3429
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Jay
On Sun, Sep 29, 2019 at 6:49 PM David Myer <dav...@pr...> wrote:
> Hi Jay,
>
> I encounter this occasionally and have usually found no issues with the
> disks according to SMART. Can you post your smartctl output? I was advised
> that if there are "reallocated sectors" in the smart summary, this could
> relate to the problem. Alternatively, if your disk(s) have compression
> enabled, this may be causing errors.
>
> One thing you could try is mark the disk for removal, reformat it when
> ready, then re-add it to the cluster.
>
> Cheers,
> Dave
>
>
> Sent with ProtonMail <https://protonmail.com> Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, September 29, 2019 3:33 PM, Jay Livens <jl...@sl...>
> wrote:
>
> Hi,
>
> I have a small MooseFS cluster running on four identical nodes.
> Everything was running smoothly until a week ago when one of the nodes
> started showing a value under "Last Error." The "Last Error" field updates
> every couple of days. The status is still shown as "Ok" for the drive.
>
> I have run scans on the hard drive on the "Last Error" node, and they
> passed without issues. I don't see any issues in the SMART data either.
>
> What exactly is going on and what exactly does a value in 'Last Error"
> tell me? Can someone advise on what else I should check on?
>
> Thank you,
>
> Jay
>
>
>
|