Duplicate filename error without duplicate files
A free file archiver for extremely high compression
Brought to you by:
ipavlov
Hello
I have p7zip version 16.02 (locale=fr_FR.UTF-8,Utf16=on,HugeFiles=on,64 bits,24 CPUs Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (206D7),ASM,AES-NI)
to compress thousand of text files in a single 7z file.
When I run /usr/bin/7za u -m0=lzma2 -mx6 -ms=off /data/2017_09/2017-09-28.7z /data/nas/cpm3/2017-09-28/
I often have this error :
ERROR:
Duplicate filename on disk:
2017-09-28/05b1cad0-f853-428e-b805-80467ab1cbaa
But files are uuid so there is no way they may be duplicated !
How to fix/avoid this behavior ?
Regards,
Victor
Mayby some low/upper case problems with same files in nas?
Hi Igor and thanks a lot for your help
Files are lowercases and if I retry two or three times it will finaly works.
In addition it's not always the same duplicate filename.
So I am more thinking about a multi thread concurrency issue because there are 2 or 3 threads running when I look for process using the htop tool.
i'm going to add the -mmt=off switch and see what happens.
(In this case where only one folder is compressed it would be simpler to ignore theses errors but I don't think there is a switch for that.)
Last edit: Victor Da 2017-09-28
Same issue with the -mmt=off switch so it is not a multi thread concurrency issue.
And no other file in the directory :
ls -1 /data/2017-09-28/ | grep -i 00decd82-da46-4da4-84e7-1e0a8a1b95ed
00decd82-da46-4da4-84e7-1e0a8a1b95ed
Any idea ?
What does "nas/cpm3" mean?
1) Try to copy that folder to some local folder.
2) Try "hash" command with log:
it must show all names including duplicates.
Hello Igor,
Sorry for reopening this topic, but I think the issue is not due to NFS storage.
I've just run the
7za hcommand on a linux folder which contains about 120.000 files.I am using p7zip 16.02-10.el6 on RHEL 6.
There was no duplicate filename, but 120K files with uuid names (128 bits) may result in hash collision with p7zip hashing system. I can see p7zip uses a 8 uppercase alpha/numerics hash resulting in a less than 64 bits hash size.
Here is a duplicate :
You can see they have different sizes so two differents files have the same p7zip hash.
What do you think ?
Regards,
Victor d'Agostino
Hi,
This issue is NFS related and looks like https://bugzilla.redhat.com/show_bug.cgi?id=739222
I unmount and remount the NFS share and the 7zip operation succeeded after some I/O wait because of cache memory lost.
Last edit: Victor Da 2017-09-29
Also here. Viz. screenshot. It's not possible to make archive. Author can try to compress Windows user home folder in C:\Users\xxx - this is the best test for various weird issues.
screenshot
Last edit: Petr Fischer 2018-01-30
Hello,
In Petr Fischer example it is obviously just a lowercase/uppercase issue on Windows system.
Using 16.02-10.el6 version with a SAN storage (not a NFS storage anymore) I'm still getting the :
ERROR:
Duplicate filename on disk:
2018-07-19/00aa16cb-1851-467e-b479-464761f6b77d
2018-07-19/00aa16cb-1851-467e-b479-464761f6b77d
when compressing a folder with about 50k files.
Since this issue does not seem so easy to fix :
1) It would be usefull to have an option to ignore this error and use first file : p7zip -skip-duplicate for example.
2) The error text should be written on the same line than "ERROR:" (for log parsing).
Here is the full output :
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=fr_FR.UTF-8,Utf16=on,HugeFiles=on,64 bits,24 CPUs Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (206D7),ASM,AES-NI)
Open archive: /data/copiemail3/nas/cpm3/2018_07/2018-07-19.7z
Path = /data/copiemail3/nas/cpm3/2018_07/2018-07-19.7z
Type = 7z
Physical Size = 4076799713
Headers Size = 1217864
Method = LZMA2:25
Solid = -
Blocks = 41301
Scanning the drive:
1 folder, 117400 files, 33664959992 bytes (32 GiB)
Updating archive: /data/copiemail3/nas/cpm3/2018_07/2018-07-19.7z
ERROR:
Duplicate filename on disk:
2018-07-19/00aa16cb-1851-467e-b479-464761f6b77d
2018-07-19/00aa16cb-1851-467e-b479-464761f6b77d
I think my company could pay for a fix.
Regards
Victor
call the commands:
I jsut want to see items that can match that bad file:
Hello Igor,
Please find attached the two log files for theses commands :
Thanks for your time
Victor
I suppose it works so:
p7zip requests the list of files from nas.
The system returns some name
name1in some iteration.But then it can return same
name1again for next iteration.Maybe you add some file to that folder during operation.
It can be file system driver problem or p7zip's code problem.
I don't klnow.
Last edit: Igor Pavlov 2018-07-23
Yes ! A lot of new files are added in the folder during the operation, but original files are not modified during the operation.
Should I only use p7zip on a static folder ?
p7zip uses
readdirfunction.I don't know how it works in cases, when directory is changed.
Probably static folder can solve that problem.
The readdir function should work, but maybe not with NFS.
Is-it safe for me to compile the 7za binary commenting the lines
?
As i said before, my filenames are uuid and there can not be duplicated file names.
it's not good to change that code.
It will think that there are 2 copies of that file.
Ok
Maybe it is possible to just print the error and skip the file when (CompareFileNames(s1, s2) == 0) instead ?
Last edit: Victor Da 2018-07-24
Hi Igor,
I'm not a C++ expert, but I tried
but the dirItems object is a const so I can't remove the duplicate file.
What would you dou ?
Last edit: Victor Da 2018-07-24
It's better to solve that problem in more early stages when files are enumerated. But such changes in code requre debugging and testing. Now I'm not ready to suggest any solution.
I don't work with p7zip things.
Ok
You're right it should be checked in early stages.
I can't fix the NFS issues with the readdir functions.
I will try to check for duplicate in FileFind.cpp
Hello,
Please find attached the FileFind.cpp I use in production to ignore duplicated files and the output result.
I know it just fixes the symptoms of the issue and not the cause. I hope updating from RHEL 6 to RHEL 7 oneday will fix the readdir with NFS issue.
Victor
I think there really are multiple of the same filename. At least for me, when I saw this error in 7zip and eclipse in win10pro, I think it was caused by ubuntu on a shared ntfs drive I mounted into both ubuntu and windows, when I called a tkinter window in python3 and i told it to write pixels outside the size of the window (by accident) and it kept crashing the spyder IDE. I noticed when I went back into windows that there were 2 of each java class in eclipse, and one of the files it was writing in linux had something from a different OS process written into it instead, so i restored the eclipse workspace dir from backup and that fixed it. Similarly 7zip says theres 2 of some files.