SnapRAID / Discussion / Help: WARNING! All the files previously present in disk ... are now missing or rewritten!

Hello,

I checked my email from my SnapRAID script last night and realized the sync didn't run and I'm greeted with this error.

WARNING! All the files previously present in disk 'd10' at dir '/mnt/data/HIT-MJ1323YNG1JYVC/'
are now missing or rewritten!

I didn't make any config file changes and the disk is still mounted and healthly and contains files. Any ideas to correct this?

Here's the counters at the end of a diff.

WARNING! All the files previously present in disk 'd10' at dir '/mnt/data/HIT-MJ1323YNG1JYVC/'
are now missing or rewritten!

   74633 equal
       0 moved
       0 copied
       0 restored
    5263 updated
     124 removed
     183 added
There are differences!

Here's the smartmontools output for anyone that's interested...

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.3.0-040300-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 5K3000
Device Model:     Hitachi HDS5C3030ALA630
Serial Number:    MJ1323YNG1JYVC
LU WWN Device Id: 5 000cca 228c0b40c
Firmware Version: MEAOA580
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5700 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Dec 31 07:44:38 2015 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (39068) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    ( 651) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       108
  3 Spin_Up_Time            0x0007   126   126   024    Pre-fail  Always       -       552 (Average 551)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       673
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   135   135   020    Pre-fail  Offline      -       31
  9 Power_On_Hours          0x0012   096   096   000    Old_age   Always       -       31856
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       76
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1635
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1635
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Min/Max 18/51)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     31852         -
# 2  Short offline       Completed without error       00%     31830         -
# 3  Short offline       Completed without error       00%     31820         -
# 4  Short offline       Completed without error       00%     31780         -
# 5  Extended offline    Interrupted (host reset)      10%     31760         -
# 6  Short offline       Completed without error       00%     31708         -
# 7  Short offline       Completed without error       00%     31660         -
# 8  Short offline       Completed without error       00%     31637         -
# 9  Short offline       Completed without error       00%     31611         -
#10  Short offline       Completed without error       00%     31587         -
#11  Extended offline    Completed without error       00%     31577         -
#12  Short offline       Completed without error       00%     31563         -
#13  Short offline       Completed without error       00%     31543         -
#14  Short offline       Completed without error       00%     31525         -
#15  Short offline       Completed without error       00%     31491         -
#16  Short offline       Completed without error       00%     31467         -
#17  Short offline       Completed without error       00%     31443         -
#18  Extended offline    Completed without error       00%     31423         -
#19  Short offline       Completed without error       00%     31395         -
#20  Short offline       Completed without error       00%     31373         -
#21  Short offline       Completed without error       00%     31347         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edit: rubylaser 2015-12-31

Leifi Plomeros - 2015-12-31

To stop the script (and any other automated stuff concerning the disk) and make a backup of the content file (preferably the entire directory) is probably a good start.

The added and removed file mostly belong to other disks?
The file count on d10 is ~5.000?
Does the file count match snapraid status?
Can you try adding a single file to d10 and confirm that diff now shows one more file.

Have you investigated the files on d10 to confirm that the file names, timestamps and file sizes looks legit in general?

If the files look weird upon closer inspection, it is reasonable to assume something bad happend to the disk and that snapraid is correct to state all of them are modified. If not it seems reasonable to assume something bad has happend to the content file.

Are you using the betas with ability to rename disks? If yes, have you earlier renamed any disk? Is the diff result the same if you downgrade to latest non-beta?

As a last step I would try to fix a single file to a different disk and compare with original. Does snapraid even succeed in doing it or report an error?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rubylaser - 2015-12-31
  
  Thanks for the ideas! Here are answers to your questions.
  
  I'm on SnapRAID 9.1 and not the newest betas supporting renaming disks (and, I have not renamed any disks).
  
  Yes, the added and removed files all belong to other disks (not d10).
  
  The filecount on d10 is less than 100.
  
  No the counts do not match, SnapRAID status shows a slightly lower number of files than the actual count for d10 (37 v. 61).
  
  SnapRAID status report: Files Fragmented Excess Wasted Used Free Use Name Files Fragments GB GB GB 13446 23 47 - 2881 67 97% d01 5463 14 37 - 3924 72 98% d02 12420 34 78 - 3914 72 98% d03 7170 4 9 - 2879 52 98% d04 9884 2 8 - 1922 46 97% d05 3055 43 53 - 2869 80 97% d06 24105 1 1 - 2736 210 92% d07 1013 67 92 - 1584 1411 52% d08 2077 65 84 - 2271 1727 56% d09 37 0 0 - 100 2897 3% d10 40 5 5 - 74 2925 2% d11 258 57 100 - 656 2341 21% d12 1052 37 48 - 685 3167 17% d13 ------------------------------------------------------ 80020 352 562 0.0 26502 15071 63%
  
  root@fileserver:/mnt/data/HIT-MJ1323YNG1JYVC# find . -type f | wc -l 61
  
  If I add a file, the count in diff does reflect the increase.
  
  touch testfile # snapraid diff WARNING! All the files previously present in disk 'd10' at dir '/mnt/data/HIT-MJ1323YNG1JYVC/' are now missing or rewritten! 74633 equal 0 moved 0 copied 0 restored 5263 updated 124 removed 184 added There are differences!
  
  All files appear to be named appropriately, have correct timestamps, and the media files are all playable.
  
  At this point, I don't want to do a fix until I know what the problem is as everything appears to be okay other than snapraid.
  
  Last edit: rubylaser 2015-12-31
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leifi Plomeros - 2015-12-31

From the looks of it snapraid is correctly reading everything in the array, but is having a different expectation for more than 5.000 files, not isolated to d10.

If the files had been "touched" by some application you would have spotted that they were all modified between the last successful sync and the failed one. So I guess that can be ruled out.

If they had been moved from one disk to another by some kind of disk balancer they would have showed up as removed at the original location... Unless they were moved back and forth... Which should have resulted in a new time stamp or presented them as restored (at least a I think files only have inode changed are interpreted like that) I am out of logical explanations.

I still think the best option at this point is to do a limited fix to a different disk. Just make sure that you have backups of the content file. The fix operation should not change any parity so it should be completely safe to do.

After the fix you would either get an error message confirming that the content file is broken.
Or you would have files that you can compare to find out what is actually different.

That is probably as far as you can diagnose the problem without help from Andrea.

Last edit: Leifi Plomeros 2015-12-31

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rubylaser - 2015-12-31
  
  Okay, I just did a fix on one file, and it completed without an issue. Here's what it looks like.
  
  root@fileserver:/mnt/data/SEA-Z7P0027C/documents/Test Disc/Bitrate# snapraid fix -f /documents/Test\ Disc/Bitrate/bird60.mkv Self test... Loading state from /var/snapraid/content... Searching disk d01... Searching disk d02... Searching disk d03... Searching disk d04... Searching disk d05... Searching disk d06... Searching disk d07... Searching disk d08... Searching disk d09... Searching disk d10... Searching disk d11... Searching disk d12... Searching disk d13... Filtering... Using 1751 MiB of memory. Initializing... Fixing... 100% completed, 1838 MB processed in 0:01 Everything OK
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leifi Plomeros - 2015-12-31

Name, timestamp and size match?

Can you do a binary compare or create checksums for both files to confirm that they are identical? (according to google that is really easy to do in linux) :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rubylaser - 2015-12-31
  
  Sure, I could do an md5sum, but I would assume that a fix would have reported an error if it's hash didn't match. Here's an md5sum before and after a fix on the file, they are exactly the same before and after. Thanks for the continued ideas :)
  
  Before
  
  root@fileserver:/mnt/data/SEA-Z7P0027C/documents/Test Disc/Bitrate# md5sum bird50.mkv dd1f3013d3dbe2d5b9c000d23b490916 bird50.mkv
  
  Fix
  
  root@fileserver:/mnt/data/SEA-Z7P0027C/documents/Test Disc/Bitrate# snapraid fix -f /documents/Test\ Disc/Bitrate/bird50.mkv Self test... Loading state from /var/snapraid/content... Searching disk d01... Searching disk d02... Searching disk d03... Searching disk d04... Searching disk d05... Searching disk d06... Searching disk d07... Searching disk d08... Searching disk d09... Searching disk d10... Searching disk d11... Searching disk d12... Searching disk d13... Filtering... Using 1751 MiB of memory. Initializing... Fixing... 100% completed, 1548 MB processed in 0:01 Everything OK
  
  md5sum After
  
  root@fileserver:/mnt/data/SEA-Z7P0027C/documents/Test Disc/Bitrate# md5sum bird50.mkv dd1f3013d3dbe2d5b9c000d23b490916 bird50.mkv
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leifi Plomeros - 2015-12-31

The good part is:

Nothing seems wrong with your files

Nothing seems wrong with the fixed versions either

You could chose between sync or fix to correct the issue

The bad part:

Snapraid thinks you have updated thousands of files since last sync.

When snapraid thinks a file is updated during sync it discards the hashes and creates new ones based on the updated file... Which introduces the possibility of undiscovered corruption in those files.

This could have happend on a daily basis for an unknown period of time. (The only reason you discovered it was because it affected all files on a nearly empty disk)

From a somewhat logical point of view:
It is difficult to imagine an undiscovered corruption of the content file on disk or in memory, only affecting time stamps, file size, file name or inode informationm and not triggering an error when snapraid reads the file.

Name and filesize can easily be ruled out since you were able to restore a file with correct name and file size.

Still these are, as far as I know, the only things snapraid looks at during diff. Which pretty much leaves only time stamps and inodes as possible mismatches.

If only inode was wrong I assume that snapraid diff would report the file as restored.

Which leaves only the timestamp... Is it really identical for both the original and the fixed file?

If it is then it is a complete mystery... Which you could probably make even more mysterious by doing this:

Move a file on d10 to a folder outside the array on d10.

Fix all missing files on d10 only (which would be only 1)

Run diff to see if the fixed file is again counted as updated or if you suddenly have 1 restored file.

If the file is reported as restored then obviously snapraid is working as designed and the only explanation left is that snapraid has correctly identified something different with the file attributes. (or that snapraid uses some mechanism to keep track of files it has actually fixed)

If the file is reported as updated, then everything points to some really strange bug in snapraid.

Another test you can do is to just wait a few days and see if the number of updated files increase. If that happens then you know for sure that something is in fact modifying the file attributes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I'll have to try the rest of these tests tomorrow, because a sync doesn't want to run without forcing it. Thanks again for your help!

dd1f3013d3dbe2d5b9c000d23b490916  bird50.mkv
root@fileserver:~# snapraid sync
Self test...
Loading state from /var/snapraid/content...
Scanning disk d01...
Scanning disk d02...
Scanning disk d03...
Scanning disk d04...
Scanning disk d05...
Scanning disk d06...
Scanning disk d07...
Scanning disk d08...
Scanning disk d09...
Scanning disk d10...
Scanning disk d11...
Scanning disk d12...
Scanning disk d13...
WARNING! All the files previously present in disk 'd10' at dir '/mnt/data/HIT-MJ1323YNG1JYVC/'
are now missing or rewritten!
If you want to 'sync' anyway, use 'snapraid --force-empty sync'

rubylaser - 2016-01-02

Is there a way to see specifically what files SnapRAID thinks are missing or rewritten? I'd like to complete a sync if possible.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Leifi Plomeros - 2016-01-02

Snapraid diff does that by default?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rubylaser - 2016-01-02
  
  True, durr, I guess I was looking for more info than a basic diff I guess. At this point all the files seem okay, so I guess I'll just force-empty on the array.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrea Mazzoleni - 2016-01-02

Hi rubylaser,

It's indeed a strange condition. It seems that inodes or timestamp of your files changed for some reason.

Anyway, if fix reported no error, your files are OK. The fix command doesn't use timestamp/inodes, but only the file path. So, if no error is reported it means that all the hashes are matching. At this point you can safely run the --force-empty command.

But I'm still interested to understand what happened.

What filesystem are you using for your disks ?
Does your files have subsecond timestamp ?

You can check this with the stat command like:

$ stat snapraid.1 File: 'snapraid.1' ... Modify: 2016-01-02 16:59:54.686573281 +0100

In the "Modify:" line, the subsecond number is ".686573281".
In some case, it could be ".0", and then SnapRAID doesn't use timestamp to match files, but only inodes.

Ciao,
Andrea
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jessie Taylor - 2016-01-02
  
  Maybe you could add an option to "diff" to be more verbose, and report why each file is different: inode, timestamp (subsecond timestamp), etc.
  
  If a problem like this comes up in the future, it might help to debug it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- rubylaser - 2016-01-03
  
  Thanks Andrea, I'm running --force-empty now. I'm using ext4 for both data and parity disks, so it should be good go. I do get subsecond accuracy as well.
  
  ~~~~
  root@fileserver:/storage/documents/Test Disc/Bitrate# stat bird60.mkv
  File: ‘bird60.mkv’
  Size: 167019983 Blocks: 326216 IO Block: 4096 regular file
  Device: 1bh/27d Inode: 297552 Links: 1
  Access: (0777/-rwxrwxrwx) Uid: ( 1000/ zack) Gid: ( 1000/ zack)
  Access: 2015-12-31 14:58:52.686252107 -0500
  Modify: 2010-05-09 13:49:00.000000000 -0400
  Change: 2015-11-28 09:45:44.444903309 -0500
  Birth: -
  ~~~~
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

WARNING! All the files previously present in disk ... are now missing or...

A backup program for disk arrays

Forums

Help

WARNING! All the files previously present in disk ... are now missing or rewritten!