7-Zip / Discussion / Open Discussion: Inexplicably slow extraction of a compressed WIM archive

Little Vulpix - 2019-01-08

Hello!

I'm facing a strange issue whereby extracting a WIM archive takes inexplicably long. I would be okay with that if I saw high CPU utilization, or more likely high disk utilization, but neither is the case. Disk utilization hovers around ~0-25% for both source and target, cpu is almost idle.

When using wimlib's "apply" to extract the image, target disk I/O is the bottleneck. But with 7-zip, I'm not sure what is. I turned off my antivirus program temporarily, and I also observed the behavior via process monitori; what I can see is that 7zip sort of "pauses" before it processes a batch of data, then pauses again.

OS: W10 x64 1703 latest updates
7Zip: 18.06
CPU: AMD TR 1950x
Source device: Regular disk (~150-180MB/s seq read)
Target device: A different regular disk (same speeds)

Archive is approx. 300GB, lots of which are identical files (which are referenced thanks to WIM store).
WIM archive has been created through wimlib with lzms compression.

If you need some other info, please let me know.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2019-01-08

call the command:

7z t a.wim -scrc -bt > log.txt
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Little Vulpix - 2019-01-08

Running it now. I have 128GB RAM so the entire source file is currently cached, eliminating source disk I/O from the equation completely. I'll paste the results when done.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2019-01-08

Then run again.
So that second run must be from file cache.

Last edit: Igor Pavlov 2019-01-08

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

7-Zip 18.06 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-12-30

Scanning the drive for archives:
1 file, 67433897358 bytes (63 GiB)

Testing archive: Current_repacked_128M_16t.wim
--
Path = Current_repacked_128M_16t.wim
Type = wim
Physical Size = 67433897358
Size = 4294967296
Packed Size = 155715333994
Method = LZMS:17
Cluster Size = 131072
Created = 2018-09-29 23:01:45
Modified = 2018-12-27 15:45:24
Comment = 
{
<WIM><IMAGE INDEX="1"><NAME>Mine</NAME><DIRCOUNT>49476</DIRCOUNT><FILECOUNT>484097</FILECOUNT><TOTALBYTES>324105800591</TOTALBYTES><HARDLINKBYTES>0</HARDLINKBYTES><CREATIONTIME><HIGHPART>0x01D45840</HIGHPART><LOWPART>0x0308148D</LOWPART></CREATIONTIME><LASTMODIFICATIONTIME><HIGHPART>0x01D49DF2</HIGHPART><LOWPART>0xCD1DB8A7</LOWPART></LASTMODIFICATIONTIME></IMAGE><TOTALBYTES>67433896540</TOTALBYTES></WIM>

}
Version = 0.14
Multivolume = -
Volume = 1
Volumes = 1
Images = 1

Everything is Ok

Folders: 49475
Files: 484097
Alternate Streams: 2968
Alternate Streams Size: 77362
Size:       324105723229
Compressed: 67433897358

CRC32  for data:              D675BD58
CRC32  for data and names:    7137D151
CRC32  for streams and names: 1BB972D4


Kernel  Time =   108.062 =    1%               20651774 MCycles
User    Time =  5330.468 =   97%
Process Time =  5438.531 =   99%    Virtual  Memory =    383 MB
Global  Time =  5441.285 =  100%    Physical Memory =    387 MB

So it took 5441 seconds (slightly over an hour and a half). Extracting via wimapply took about 30 minutes, including writing the data out to a new disk, and at all times the target disk was ~100% utilized.

Here are the statistics. The file was already cached because I had unpacked it via wimlib when 7zip was so slow. (verified via rammap and also trying to read the file produced no disk I/O), so Disk I/O was unaffected.

Reading the file from bash (WSL) to cofirm:

Current$ ls -ltrh Current_repacked_128M_16t.wim
-rwxrwxrwx 1 root root 63G Dec 27 17:18 Current_repacked_128M_16t.wim

Current$ time cat  Current_repacked_128M_16t.wim > /dev/null

real    0m20.011s
user    0m0.078s
sys     0m19.828s

Igor Pavlov - 2019-01-09

I don't remember datails of 7zip's wim code now.
There are different cases: solid/nonsolid.
Maybe 7-Zip unpacks each copy of duplicate files.
And another programs can use buffers or copy the data from filesystem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Little Vulpix - 2019-01-09

This is a solid archive, yes. Using LZMS compression, 128M chunks. Is there something I can do or should I use the other program to unpack my WIMs instead? If you need some more troubleshooting info I can provide it.

Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2019-01-09

Actually I don't know why there were pauses.
"Test" command shows 97% of one CPU thread load. So no pauses there.
Now it's to difficult to think about that wim code. I don't remember details.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Inexplicably slow extraction of a compressed WIM archive

A free file archiver for extremely high compression

Forums

Help

Inexplicably slow extraction of a compressed WIM archive

Inexplicably slow extraction of a compressed WIM archive

A free file archiver for extremely high compression

Forums

Help

Inexplicably slow extraction of a compressed WIM archive document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Inexplicably slow extraction of a compressed WIM archive