I'm using mhdfs over a set of volumes protected by snapraid 7.1 on Ubuntu.
I copied some data into newly created directory in the mhdfs filesystem and then ran a snapraid sync. Somewhere between 75 - 78% of the sync process in as shown on the screen output, I get this message (Some of the file and directory names have been change to mask content):
Saving state to /mhds3/STR610MS3G3HHSd14/snapraid/snapraid_mhdfs3.content4...
Data change at file '/mhds3/STN607MS22SJZKd09/Server_Backup/xx_Laptop/xxx_20150417/xxx/Training/xxx/xx v3.10.xls' at position '3'
WARNING! Unexpected data modification of a file without parity!
This file was detected as a copy of another file with the same name, size,
and timestamp, but the file data isn't matching the assumed copy.
If this is a false positive, and the files are expected to be different,
you can 'sync' anyway using 'snapraid --force-nocopy sync'
This mhdfs volume and subsequently the snapraid volumes probably do have many duplicates -- this is repeated point in time copies of a 1 TB drive from a laptop in this instance.
I see no changes in the smartdata for any of the drives, I see no issues in any of the system logs that would indicate an issue with possible read/write issue on the drives.
Is this something that I should be as concerned about as I feel right now?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is this something that I should be as concerned about as I feel right now?
Probably not... Especially if it's just an XLS file. Check it for validity. Compare it to other copies byte-for-byte in Beyond Compare if you feel it's really necessary.
Although the fact that it's an XLS may not be coincidence. I had one file that SnapRAID frequently told me had an error. It was a backup that I still had the original of on another computer, comparing the two there was a difference between them but it seemed to be some useless Excel metadata that did not affect the contents. I never found out how or why that file kept getting modified when it hasn't been written, read, or even accessed in years.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Interesting. I wonder is the spaces in the file name are causing an issue?
There are files that have similar names prior to the first space in the name and differ only slightly in the same directory.
I moved it out of the directory, sync finished ok, I put it back and sync, and I get the same "data change at position 3" error.
There doesn't appear to be any functional error in the source file, and the byte size is the same from the source to this copy.
There are multiple copies of this directory in different date-stamp named folders.
I'm curious now why it's suddenly a problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Spaces, filenames, or paths have nothing to do with it (aside from being a quick-and-dirty method of initially locating potential duplicates).
The way I interpret that error message, I don't know if I'm 100% correct, but I think it's telling you that during a previous sync it found that file to be identical to another (probably based on hashes) so it did not create a separate "parity entry" for it but instead made a reference pointer to the duplicate file which was already calculated. Now it has noticed that the file has changed even though it used to be identical. You have multiple copies of this file in dated backup folders, compare them byte for byte and see if there really is a difference between any two that should be identical.
Just out of curiosity, what program are you using to create these backups? Is this specific XLS file ever modified, opened, or accessed in any way, on either the original source OR on the backup?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I hate to be so secretive about the file, but it's an internal template for a process that my employer uses for product deployment. It's in Excel format until the inertia of the corporate bureaucracy realizes it needs to be coded into a program. Don't get me on that soapbox.
That said, there are lots of different copies of this, some with identical names possibly spread through these backups. The odds are that there are high for identical copies, with identical names in different directories.
I run d a snapraid check on the disk where the data sits and it reports no issues, but the sync continues to fail, which if the hash calculation can't arrive at a unique has, I understand.
I guess I would ask why doesn't location on disk isn't counted in the duplicate file calculation when conducting the sync?
MakOwner: In the log you posted, "If this is a false positive, and the files are expected to be different, you can 'sync' anyway using 'snapraid --force-nocopy sync'". Did you read up on this option?
Quaraxkad: "-N, --force-nocopy: Without this option SnapRAID assumes that files with same attributes, like name, size and timestamp are copies with the same data."
The "file-copy" function is not based on the hashes but is triggering the reuse of calculated hashes.
/X
In the scan.c file for 8.0 you can read
/ if copy detection is enabled /
/ search for a file with the same name and stamp in all the disks /
/ if the nanosecond part of the time stamp is valid, search for name and stamp, otherwise for path and stamp /
/ if found, and it's a fully hashed file /
/ assume that the file is a copy, and reuse the hash /
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using mhdfs over a set of volumes protected by snapraid 7.1 on Ubuntu.
I copied some data into newly created directory in the mhdfs filesystem and then ran a snapraid sync. Somewhere between 75 - 78% of the sync process in as shown on the screen output, I get this message (Some of the file and directory names have been change to mask content):
Saving state to /mhds3/STR610MS3G3HHSd14/snapraid/snapraid_mhdfs3.content4...
Data change at file '/mhds3/STN607MS22SJZKd09/Server_Backup/xx_Laptop/xxx_20150417/xxx/Training/xxx/xx v3.10.xls' at position '3'
WARNING! Unexpected data modification of a file without parity!
This file was detected as a copy of another file with the same name, size,
and timestamp, but the file data isn't matching the assumed copy.
If this is a false positive, and the files are expected to be different,
you can 'sync' anyway using 'snapraid --force-nocopy sync'
This mhdfs volume and subsequently the snapraid volumes probably do have many duplicates -- this is repeated point in time copies of a 1 TB drive from a laptop in this instance.
I see no changes in the smartdata for any of the drives, I see no issues in any of the system logs that would indicate an issue with possible read/write issue on the drives.
Is this something that I should be as concerned about as I feel right now?
Probably not... Especially if it's just an XLS file. Check it for validity. Compare it to other copies byte-for-byte in Beyond Compare if you feel it's really necessary.
Although the fact that it's an XLS may not be coincidence. I had one file that SnapRAID frequently told me had an error. It was a backup that I still had the original of on another computer, comparing the two there was a difference between them but it seemed to be some useless Excel metadata that did not affect the contents. I never found out how or why that file kept getting modified when it hasn't been written, read, or even accessed in years.
Interesting. I wonder is the spaces in the file name are causing an issue?
There are files that have similar names prior to the first space in the name and differ only slightly in the same directory.
I moved it out of the directory, sync finished ok, I put it back and sync, and I get the same "data change at position 3" error.
There doesn't appear to be any functional error in the source file, and the byte size is the same from the source to this copy.
There are multiple copies of this directory in different date-stamp named folders.
I'm curious now why it's suddenly a problem.
Spaces, filenames, or paths have nothing to do with it (aside from being a quick-and-dirty method of initially locating potential duplicates).
The way I interpret that error message, I don't know if I'm 100% correct, but I think it's telling you that during a previous sync it found that file to be identical to another (probably based on hashes) so it did not create a separate "parity entry" for it but instead made a reference pointer to the duplicate file which was already calculated. Now it has noticed that the file has changed even though it used to be identical. You have multiple copies of this file in dated backup folders, compare them byte for byte and see if there really is a difference between any two that should be identical.
Just out of curiosity, what program are you using to create these backups? Is this specific XLS file ever modified, opened, or accessed in any way, on either the original source OR on the backup?
I hate to be so secretive about the file, but it's an internal template for a process that my employer uses for product deployment. It's in Excel format until the inertia of the corporate bureaucracy realizes it needs to be coded into a program. Don't get me on that soapbox.
That said, there are lots of different copies of this, some with identical names possibly spread through these backups. The odds are that there are high for identical copies, with identical names in different directories.
I run d a snapraid check on the disk where the data sits and it reports no issues, but the sync continues to fail, which if the hash calculation can't arrive at a unique has, I understand.
I guess I would ask why doesn't location on disk isn't counted in the duplicate file calculation when conducting the sync?
Hash collisions like that are a pretty rare occurrence - see the odds for collision in dedupe at http://www.exdupe.com/collision.pdf
Of course I know nothing about the algorithm used in snapraid.
I'm going to try renaming the file and see if that makes any difference in the sync.
MakOwner: In the log you posted, "If this is a false positive, and the files are expected to be different, you can 'sync' anyway using 'snapraid --force-nocopy sync'". Did you read up on this option?
Quaraxkad: "-N, --force-nocopy: Without this option SnapRAID assumes that files with same attributes, like name, size and timestamp are copies with the same data."
The "file-copy" function is not based on the hashes but is triggering the reuse of calculated hashes.
/X
In the scan.c file for 8.0 you can read
/ if copy detection is enabled /
/ search for a file with the same name and stamp in all the disks /
/ if the nanosecond part of the time stamp is valid, search for name and stamp, otherwise for path and stamp /
/ if found, and it's a fully hashed file /
/ assume that the file is a copy, and reuse the hash /
And a simple rename, replacing spaces with underscores allowed the sync to complete without error.
This worries me somewhat.
I'm crap at math, but this seems really early in a data set to see hash collisions (if that's what this is.)
Current snapraid status. Although I haven't actually done a file count to see if this is accurate or not.
~~~~~~~~~~~~~~~~~~~~
Files Fragmented Excess Wasted Used Free Use Name
Files Fragments GiB GiB GiB
194130 0 0 0.0 1323 51 96% d01
74485 0 0 0.0 1329 45 96% d02
278387 47 113 0.0 1376 20 98% d03
431636 0 0 0.0 1324 49 96% d04
91740 0 0 0.0 852 63 93% d05
210675 4 73 0.0 1316 57 95% d06
283175 0 0 0.0 1312 61 95% d07
103155 0 0 0.0 1285 89 93% d08
78841 0 0 0.0 439 477 48% d09
0 0 0 0.0 0 0 0% d10
0 0 0 0.0 0 915 0% d11
2 0 0 0.0 0 915 0% d12
2 0 0 0.0 0 915 0% d13
2 0 0 0.0 0 915 0% d14
1746230 51 186 0.0 10560 4579 70%
~~~~~~~~~~~~~~~~~~~~~~~
Edit: Spelling
Last edit: MakOwner 2015-04-20
Why does disk d10 show as 0 free space when nothing is yet is on the disk?
It's there, it's mounted and it's a 1TB disk, just like disk 11 - 14.
See my post here: https://sourceforge.net/p/snapraid/discussion/1677233/thread/6fb66a79/#bf5d
Cheers.
Ah, thank you!