I had to extract .7z archives with a lot of files (>200'000) in one directory. This took about 90 minutes on my machine. While it was extracting I put all those files into a .tar compressed that tar with 7za downloaded it and extracted it in <15 minutes.
I tried to change the "overwrite mode" to always overwrite/skip so there wouldn't be a check anymore, but that had no effect on extraction time.
So I checked the source where the extraction is sleepy. The part slowing down extraction considerably after ~10'000 files is here:
The function GetStream is called for every file in the archive to be extracted.
1078 // ----- Is file (not split) -----
1079 NFind::CFileInfo fileInfo;
1080 if (fileInfo.Find(fullProcessedPath))
1082 switch (_overwriteMode)
To me this looks like a check if the file already exists so the user can be asked if he wants to overwrite the file or ignore or whatever.
I changed the slow file exists code to this:
AString aFullProcessedPath = UnicodeStringToMultiByte(fullProcessedPath, CP_ACP);
const char * cpFullProcessedPath = (const char *)aFullProcessedPath;
bool bfileInfoFindFullProcessedPath = ( access( cpFullProcessedPath, F_OK ) != -1 );
Which cut the 90 minute extraction time down to <3 minutes.
I just copied the string conversions from other parts of the source as I have no clue how those are handled, the important part is using the "access" function to check if the file exists instead of "fileInfo.Find".
My question is: "Is this very expensive way of checking if the file exists necessary?"
What version of Windows?
What filesystem in output folder (NTFS / FAT)?
Is it local computer volume or network folder?
Do you have any antivirus software?
I think I didn't express myself very well lol.
I did this on a Linux machine with ext4 file systems. The slowness is in the way FindFirst is done in p7zip.
On Windows the difference is very small. I ran a test with 200'000 files in a directory on windows:
10'000 calls to FindFirstFile took ~240ms
10'000 calls to PathFileExists took ~190ms.
10'000 calls to GetFileAttributes took ~190ms
So on Windows using FindFirstFile doesn't slow down archive extraction significantly.
I want to fix this problem in p7zip, but I don't want to make FindFirst faster if I can simply check for existance of the file WAY faster without any effort.
My problem is, that I don't know if the pattern checking of FindFirstFile is necessary in that context or if those paths never contain wildcards (* or ?).
The path doesn't contain wildcards there.
Probably p7zip developers must try to optimize it.
maybe it's simpler to opimize the code of
As I remember CFileInfo::Find() doesn't need wildcard check for other calls in 7-zip also.
You can try to change Find() code.
And probably you should create message at p7zip forum about that problem.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.