Menu

Inexplicably slow extraction of a compressed WIM archive

2019-01-08
2019-01-09
  • Little Vulpix

    Little Vulpix - 2019-01-08

    Hello!

    I'm facing a strange issue whereby extracting a WIM archive takes inexplicably long. I would be okay with that if I saw high CPU utilization, or more likely high disk utilization, but neither is the case. Disk utilization hovers around ~0-25% for both source and target, cpu is almost idle.

    When using wimlib's "apply" to extract the image, target disk I/O is the bottleneck. But with 7-zip, I'm not sure what is. I turned off my antivirus program temporarily, and I also observed the behavior via process monitori; what I can see is that 7zip sort of "pauses" before it processes a batch of data, then pauses again.

    OS: W10 x64 1703 latest updates
    7Zip: 18.06
    CPU: AMD TR 1950x
    Source device: Regular disk (~150-180MB/s seq read)
    Target device: A different regular disk (same speeds)

    Archive is approx. 300GB, lots of which are identical files (which are referenced thanks to WIM store).
    WIM archive has been created through wimlib with lzms compression.

    If you need some other info, please let me know.

     
  • Igor Pavlov

    Igor Pavlov - 2019-01-08

    call the command:

    7z t a.wim -scrc -bt > log.txt
    
     
  • Little Vulpix

    Little Vulpix - 2019-01-08

    Running it now. I have 128GB RAM so the entire source file is currently cached, eliminating source disk I/O from the equation completely. I'll paste the results when done.

     
  • Igor Pavlov

    Igor Pavlov - 2019-01-08

    Then run again.
    So that second run must be from file cache.

     

    Last edit: Igor Pavlov 2019-01-08
  • Little Vulpix

    Little Vulpix - 2019-01-08
    7-Zip 18.06 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-12-30
    
    Scanning the drive for archives:
    1 file, 67433897358 bytes (63 GiB)
    
    Testing archive: Current_repacked_128M_16t.wim
    --
    Path = Current_repacked_128M_16t.wim
    Type = wim
    Physical Size = 67433897358
    Size = 4294967296
    Packed Size = 155715333994
    Method = LZMS:17
    Cluster Size = 131072
    Created = 2018-09-29 23:01:45
    Modified = 2018-12-27 15:45:24
    Comment = 
    {
    <WIM><IMAGE INDEX="1"><NAME>Mine</NAME><DIRCOUNT>49476</DIRCOUNT><FILECOUNT>484097</FILECOUNT><TOTALBYTES>324105800591</TOTALBYTES><HARDLINKBYTES>0</HARDLINKBYTES><CREATIONTIME><HIGHPART>0x01D45840</HIGHPART><LOWPART>0x0308148D</LOWPART></CREATIONTIME><LASTMODIFICATIONTIME><HIGHPART>0x01D49DF2</HIGHPART><LOWPART>0xCD1DB8A7</LOWPART></LASTMODIFICATIONTIME></IMAGE><TOTALBYTES>67433896540</TOTALBYTES></WIM>
    
    }
    Version = 0.14
    Multivolume = -
    Volume = 1
    Volumes = 1
    Images = 1
    
    Everything is Ok
    
    Folders: 49475
    Files: 484097
    Alternate Streams: 2968
    Alternate Streams Size: 77362
    Size:       324105723229
    Compressed: 67433897358
    
    CRC32  for data:              D675BD58
    CRC32  for data and names:    7137D151
    CRC32  for streams and names: 1BB972D4
    
    
    Kernel  Time =   108.062 =    1%               20651774 MCycles
    User    Time =  5330.468 =   97%
    Process Time =  5438.531 =   99%    Virtual  Memory =    383 MB
    Global  Time =  5441.285 =  100%    Physical Memory =    387 MB
    

    So it took 5441 seconds (slightly over an hour and a half). Extracting via wimapply took about 30 minutes, including writing the data out to a new disk, and at all times the target disk was ~100% utilized.

    Here are the statistics. The file was already cached because I had unpacked it via wimlib when 7zip was so slow. (verified via rammap and also trying to read the file produced no disk I/O), so Disk I/O was unaffected.

    Reading the file from bash (WSL) to cofirm:

    Current$ ls -ltrh Current_repacked_128M_16t.wim
    -rwxrwxrwx 1 root root 63G Dec 27 17:18 Current_repacked_128M_16t.wim
    
    Current$ time cat  Current_repacked_128M_16t.wim > /dev/null
    
    real    0m20.011s
    user    0m0.078s
    sys     0m19.828s
    
     
  • Igor Pavlov

    Igor Pavlov - 2019-01-09

    I don't remember datails of 7zip's wim code now.
    There are different cases: solid/nonsolid.
    Maybe 7-Zip unpacks each copy of duplicate files.
    And another programs can use buffers or copy the data from filesystem.

     
  • Little Vulpix

    Little Vulpix - 2019-01-09

    This is a solid archive, yes. Using LZMS compression, 128M chunks. Is there something I can do or should I use the other program to unpack my WIMs instead? If you need some more troubleshooting info I can provide it.

    Thanks!

     
  • Igor Pavlov

    Igor Pavlov - 2019-01-09

    Actually I don't know why there were pauses.
    "Test" command shows 97% of one CPU thread load. So no pauses there.
    Now it's to difficult to think about that wim code. I don't remember details.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.