Menu

Odd message from sync

Help
MakOwner
2015-04-18
2015-05-02
  • MakOwner

    MakOwner - 2015-04-18

    I'm using mhdfs over a set of volumes protected by snapraid 7.1 on Ubuntu.

    I copied some data into newly created directory in the mhdfs filesystem and then ran a snapraid sync. Somewhere between 75 - 78% of the sync process in as shown on the screen output, I get this message (Some of the file and directory names have been change to mask content):

    Saving state to /mhds3/STR610MS3G3HHSd14/snapraid/snapraid_mhdfs3.content4...
    Data change at file '/mhds3/STN607MS22SJZKd09/Server_Backup/xx_Laptop/xxx_20150417/xxx/Training/xxx/xx v3.10.xls' at position '3'
    WARNING! Unexpected data modification of a file without parity!
    This file was detected as a copy of another file with the same name, size,
    and timestamp, but the file data isn't matching the assumed copy.
    If this is a false positive, and the files are expected to be different,
    you can 'sync' anyway using 'snapraid --force-nocopy sync'

    This mhdfs volume and subsequently the snapraid volumes probably do have many duplicates -- this is repeated point in time copies of a 1 TB drive from a laptop in this instance.

    I see no changes in the smartdata for any of the drives, I see no issues in any of the system logs that would indicate an issue with possible read/write issue on the drives.

    Is this something that I should be as concerned about as I feel right now?

     
  • Quaraxkad

    Quaraxkad - 2015-04-18

    Is this something that I should be as concerned about as I feel right now?

    Probably not... Especially if it's just an XLS file. Check it for validity. Compare it to other copies byte-for-byte in Beyond Compare if you feel it's really necessary.

    Although the fact that it's an XLS may not be coincidence. I had one file that SnapRAID frequently told me had an error. It was a backup that I still had the original of on another computer, comparing the two there was a difference between them but it seemed to be some useless Excel metadata that did not affect the contents. I never found out how or why that file kept getting modified when it hasn't been written, read, or even accessed in years.

     
  • MakOwner

    MakOwner - 2015-04-18

    Interesting. I wonder is the spaces in the file name are causing an issue?
    There are files that have similar names prior to the first space in the name and differ only slightly in the same directory.

    I moved it out of the directory, sync finished ok, I put it back and sync, and I get the same "data change at position 3" error.

    There doesn't appear to be any functional error in the source file, and the byte size is the same from the source to this copy.

    There are multiple copies of this directory in different date-stamp named folders.

    I'm curious now why it's suddenly a problem.

     
  • Quaraxkad

    Quaraxkad - 2015-04-19

    Spaces, filenames, or paths have nothing to do with it (aside from being a quick-and-dirty method of initially locating potential duplicates).

    The way I interpret that error message, I don't know if I'm 100% correct, but I think it's telling you that during a previous sync it found that file to be identical to another (probably based on hashes) so it did not create a separate "parity entry" for it but instead made a reference pointer to the duplicate file which was already calculated. Now it has noticed that the file has changed even though it used to be identical. You have multiple copies of this file in dated backup folders, compare them byte for byte and see if there really is a difference between any two that should be identical.

    Just out of curiosity, what program are you using to create these backups? Is this specific XLS file ever modified, opened, or accessed in any way, on either the original source OR on the backup?

     
  • MakOwner

    MakOwner - 2015-04-19

    I hate to be so secretive about the file, but it's an internal template for a process that my employer uses for product deployment. It's in Excel format until the inertia of the corporate bureaucracy realizes it needs to be coded into a program. Don't get me on that soapbox.

    That said, there are lots of different copies of this, some with identical names possibly spread through these backups. The odds are that there are high for identical copies, with identical names in different directories.

    I run d a snapraid check on the disk where the data sits and it reports no issues, but the sync continues to fail, which if the hash calculation can't arrive at a unique has, I understand.

    I guess I would ask why doesn't location on disk isn't counted in the duplicate file calculation when conducting the sync?

    Hash collisions like that are a pretty rare occurrence - see the odds for collision in dedupe at http://www.exdupe.com/collision.pdf

    Of course I know nothing about the algorithm used in snapraid.

    I'm going to try renaming the file and see if that makes any difference in the sync.

     
    • xad

      xad - 2015-05-02

      MakOwner: In the log you posted, "If this is a false positive, and the files are expected to be different, you can 'sync' anyway using 'snapraid --force-nocopy sync'". Did you read up on this option?

      Quaraxkad: "-N, --force-nocopy: Without this option SnapRAID assumes that files with same attributes, like name, size and timestamp are copies with the same data."
      The "file-copy" function is not based on the hashes but is triggering the reuse of calculated hashes.

      /X

      In the scan.c file for 8.0 you can read
      / if copy detection is enabled /
      / search for a file with the same name and stamp in all the disks /
      / if the nanosecond part of the time stamp is valid, search for name and stamp, otherwise for path and stamp /
      / if found, and it's a fully hashed file /
      / assume that the file is a copy, and reuse the hash /

       
  • MakOwner

    MakOwner - 2015-04-19

    And a simple rename, replacing spaces with underscores allowed the sync to complete without error.

    This worries me somewhat.

    I'm crap at math, but this seems really early in a data set to see hash collisions (if that's what this is.)

    Current snapraid status. Although I haven't actually done a file count to see if this is accurate or not.

    ~~~~~~~~~~~~~~~~~~~~
    Files Fragmented Excess Wasted Used Free Use Name
    Files Fragments GiB GiB GiB
    194130 0 0 0.0 1323 51 96% d01
    74485 0 0 0.0 1329 45 96% d02
    278387 47 113 0.0 1376 20 98% d03
    431636 0 0 0.0 1324 49 96% d04
    91740 0 0 0.0 852 63 93% d05
    210675 4 73 0.0 1316 57 95% d06
    283175 0 0 0.0 1312 61 95% d07
    103155 0 0 0.0 1285 89 93% d08
    78841 0 0 0.0 439 477 48% d09
    0 0 0 0.0 0 0 0% d10
    0 0 0 0.0 0 915 0% d11
    2 0 0 0.0 0 915 0% d12
    2 0 0 0.0 0 915 0% d13
    2 0 0 0.0 0 915 0% d14


    1746230 51 186 0.0 10560 4579 70%
    ~~~~~~~~~~~~~~~~~~~~~~~

    Edit: Spelling

     

    Last edit: MakOwner 2015-04-20
  • MakOwner

    MakOwner - 2015-04-20

    Why does disk d10 show as 0 free space when nothing is yet is on the disk?
    It's there, it's mounted and it's a 1TB disk, just like disk 11 - 14.

     
  • MQMan

    MQMan - 2015-04-20
     
    • MakOwner

      MakOwner - 2015-04-20

      Ah, thank you!

       

Log in to post a comment.

MongoDB Logo MongoDB