Areca Backup / Bugs / #592 Hash is different from the reference hash

aventin - 2016-02-24

First of all, thanks for this very comprehensive bug report

This bug is indeed a real nightmare to investigate because I never managed to reproduce it, although it has been reported by many users. All I was able to do was to add diagnostic informations so one of the users could one day report more detailed context informations (which you did :) )

Before getting more into the details, there is some misunderstanding that needs to be clarified : the "hash codes" we are talking about have nothing to do with the data stored for delta backup : both features are completely independant (archive verification must be also available for non-delta setups), so AFAIK the problem does not lie in the "HashSequences" objects, but rather in the verification feature itself.

About your point 8 : there are two use cases for "archive verification" :
1) The user has performed a recovery, and wants to check that the recovered files are identical to the original ones : in that case, Areca simply recovers the files, then reads them, compute a HashCode from their content and compares it to the hashCode that was stored in the "hash" file (ie the hashCode of the original file)
2) The user has performed a backup, and wants to check its archive. We could handle this case exactly like use case 1 : recover all files, compute their hashCode and check them against the original ones. This approach (which was implemented in the early versions of Areca) would be inefficient because it would use a lot of storage space, while all we want is to compute recovered HashCodes.
On the other side, we really want to execute the exact same code as in a recovery (that's the only way to make sure that Areca will be able to recover the archive), so we don't really want to write dedicated code to compute HashCodes on the fly and bypass the recovery code in Areca.
So the approach that was choosen was to add a proxy class that intercepts the file writing operations, and write their hashCode instead of their full content. (see the "ContentHashFileSystemDriver" class)
So the steps of backup verification are : first, activate the proxy so the files are not really written to the disk, then perform a 'classical' archive recovery, and then read the recovered files to retrieve their hashCode and check them against the original ones (stored in the "Hash" file). This ensures that little storage space is used for verification (because we only store hash codes) and that we use the exact same code that will be used for recovery (except for the 'proxy' part, of course)

To get back to the bug: the interresting point in your post is that "settings.sol", "file A" and "file B" share the same recovered hashCode. The only reason that could explain that is that their content is identical (for instance, if fileA and fileB are copies of settings.sol) ... is it the case ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

aventin - 2016-02-24

Another thing : when you say "Looking at archive_dir/date_data/hash for 'file B' I found there's correct value ", what value did you found exactly ? According to the log you posted, it should be "14f40ee0aec297422c0087db64da8e0f670059cc"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HULLIN Pascal - 2016-05-31

I continue this thread since I had also the same phenomena on a Windows 2008 R2. I use the VSS plugin.
As far I understand, aventin has some difficulties to track the problem since he could not reproduce it. How luck he is ;-)
On my side, I face this problem on a regular basis but I haven't been able to trigger it on demand when there is none present. I'm not a java developper but if it could help to track this bug, I propose to run some 'debug' version on my server and forward the result to aventin for investigation.
Currently, the last night target backup returned to me this this error... So if I do nothing, the next backup will show the same problem...

Waiting to your feedback and have a nice day.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HULLIN Pascal - 2016-06-01

Hi,

As expected, I had the same hash error during the backup last night. That's a good point for the debug process. Since I want to go further on this problem, I did today the following :
first I have tried to reproduce the problem on a target that only contains the directory with the set of files in error. Here, it is a single directory but notice that all files in error are shortcut files. The problem is NOT related to shortcut files because I already had it on more regular files.

1 - I duplicated the target and changed only the source path to that directory, then I run the archive process. Bad luck, this completed without error but it did a full backup and not an incremental backup. That's normal but it is a difference. I did many archive on this reduced target -> no error.
I restored these files on another place compared manually the checksum of the original files and the restored files. These shortcut files are identical. I used the cvif.exe -md5 command to produce the hash. There is a difference : The original name of the shortcut is in french but the restore shortcut name is in english ! Explorer doesn't translate on fly. It is an 'explorer' problem since a 'dir' command gives the english file name in both cases.
-> This means that there is a difference between full and reduce target (or full and incremental archive) that makes some random result. So it is not useful.

2 - I run the full target archive, but this time I disabled the final check to get a archive file. It ran OK
3 - I run an archive check to be sure that I got the hash error... and I get the all hash errors as last night. But this time it didn't delete the archive.
4 - I restored only the directory with these files in error on another location. I get of course warning message about theses files.
5 - I compared restored files against original files. Some of them are smaller than original and windows didn't recognised them as shortcut (completely broken). All hash warning files in the log are completely broken. The others have the same size, windows recognises them as valid shortcut and have the same md5 hash.
-> There is data corruption either during the archive OR during the restore process and the 'hash' check detect nicely this corruption. Do you have a good idea to determine if the problem occurs during backup or during restore ? in other word how can I rebuild the file from the archive without using Areca ?

As I said in my last post, I can run a troubleshooting version of Areca to spot the bug. I have a configuration where this problem happens. It is a chance to get rid of this long term bug. ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A Person - 2020-04-17

Are there any news about this recovery verification issue? It makes this software completely unusable due to how important the verifications are, and it's scary how someone may realize it too late after becoming dependant on this backup tool.

My error also mentions the well known hash of 5ab9236edeebcb8db2d53f6f15e098102f588789, and it happened in a use-case where I had a .exe file, then made another backup with the file deleted, and after any amount of intermediate backups, returned the same file back and attempted to create a new backup. By the way, this seems to happen specifically under delta && incremental backups.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SystemDotExit - 2020-06-16

I don't think Aventin supports this anymore and no one has picked up the responsibility.

Unless he responds soon to a lot of unanswered support requests, it may be time to start planning a migration of backup tools to something else supported.

Shame really as Areca has all the perfect features and functions many of us require!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hash is different from the reference hash

Group

Searches

Help

#592 Hash is different from the reference hash

Discussion