Menu

#1608 If multiple files have the same hash (are the same) just reference it

open
nobody
None
5
2023-07-18
2023-07-18
Jens
No

Hi,

If I a file for example "abc.txt" has following CRC32: 6EA3B990 and I have thousands of the copies and the compress it, instead of compressing it first, check if there are other files, that have the same hash and just compress the first file and the one just reference it, instead of trying to compress them.

1.) Advantage 1 : It would save space, because sometimes the exacty same files are somewhere in another subfolder and just referencing it would save space.

2.) Advantage 2: Probably faster extraction, because instead of extracting again, you just could give the order copy.

3.) Advnatage 3: Files, which have the same hash do not need to be compressed again, just references, so compressing would be overall much faster.

You can test this out yourself. Just create a folder and paste in any file. Then go on and make thousands of copies of the files. Now you have a folder with the same file thousand times. Normally you would expect 7-zip or other compression software, that they would only compress the first occurrence of the file and the reference the rest of the files, so to save time and also space.

Best regards

1 Attachments

Discussion


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.