Metadata-Only WIM file?

2013-12-17
2014-01-08
  • Hello Eric,

    Thanks for the great work on wimlib. It's nice to have some real options when it comes to Microsoft's tech :)

    I noticed while perusing Microsoft's WIM file documentation (I was reading a web page, but I think that this is it) that there appears to be a marker in the XML data for a "METADATA_ONLY" WIM file. I was unable to get much info about them through googling, but my best guess is that those files are only created by the WDS capture tools which create a tiny (tens of MBs) WIM file and a large resource-only RWM companion file.

    The reason I bring it up is that WIM files are particularly useful due to their ability to capture the nuances (as you know!) of NTFS stream data, whereas the majority of file copy utilities don't concern themselves with the significance of concepts like ACLs, alternate data streams, hard links, reparse points, symlinks, and so on.

    I'm wondering if it's possible to create a metadata-only WIM file such that I can apply it to mount point and have it "fix" this missing information; insert missing junctions, apply hard links where indicated, symlinks, permissions, and so on. In this way, I could get the raw files and folders onto the volume by any method (and the idea I'm building on is one where I populate the files and folders with Bittorrent Sync!) and then "fill in the blanks" with a much smaller WIM file.

    Can wimlib, and wimlib-imagex, help me in this regard?

    Cheers,
    Andrew Bobulsky

     
  • Eric Biggers
    Eric Biggers
    2013-12-17

    Hi,

    This feature isn't currently supported, and wimlib never sets the METADATA_ONLY flag. I think the main difficulty would be that one might expect alternate data streams and reparse points / symbolic links to be extracted from the metadata WIM, but both of those are really "data" and not "metadata"; in particular, NTFS alternate data streams may be of arbitrary size. Nevertheless, it might still make sense to stuff these data in a WIM and call it metadata only, since I'm not sure there is otherwise much use for a "metadata only" WIM.

     
    Last edit: Eric Biggers 2013-12-17
    • Well... :P

      Do you have any idea whether or not such a WIM can be built? That is, a WIM that contains everything other than the files' primary data streams, like ACLs, reparse points, hard links, alternate data streams, and the like... that could be applied to an existing directory structure to fill in that info?

      It seemed like a reasonable way to go about that task... am I perhaps barking up the wrong tree? :P

      -Andrew

       
  • Eric Biggers
    Eric Biggers
    2013-12-17

    The format does allow that such a file be created, even without setting the METADATA_ONLY flag (which may be inappropriate if alternate streams or reparse points are present anyway). However, the ability to capture and apply such a file in the way you're suggesting has not been implemented. I will consider whether this can be implemented without too many changes.

     
    • I look forward to your findings!

      ...So you're saying that such a thing is within the capabilities of the WIM format, right? :)

       
  • Eric Biggers
    Eric Biggers
    2013-12-19

    Yes, this is within the capabilities of the WIM format. A WIM need not contain all the streams (identified by SHA1 message digest) referenced by the metadata. This is the case with split and delta WIMs, for example. But for your suggestion, the code needs to be changed to add an extraction mode where default streams are intentionally not extracted and can validly be missing, and a capture mode where default streams are not included in the resulting WIM file. (There may be a couple nuances I haven't thought of yet, such as exactly how hard links should be dealt with.)

     
  • maxpat78
    maxpat78
    2013-12-20

    As far as I can guess, the "metadata-only" flag should be set only in spanned SWM sets where one ore more SWM units contain only (WIM) metadatas, while the file contents are stored in subsequent units. A WIM without any file contents (both main un-named $DATA or named NTFS streams) would be a non-sense...

     
    • Following my initial post, I finally understand the point of the flag... I was thinking that METADATA_ONLY was referring to FS metadata, but it seems to actually mean WIM metadata... :P

      I am, however, excited about Eric's findings :)

       
  • Eric Biggers
    Eric Biggers
    2013-12-21

    Well, they're basically the same thing. A WIM metadata resource (of which one exists for each image in the WIM) contains the directory structure for the corresponding image, along with information such as file modification times, inode numbers (hardlinks), file attribute flags, security descriptors, etc. Just not the actual file streams (default file contents, named data streams, reparse point data) themselves, which are identified by SHA1 message digests and located in separate resources.

    maxpat78, the format is flexible enough that setups other than purely stand-alone and split WIMs are possible. For example, as of wimlib v1.5.0 you can create a delta WIM which only contains metadata and the referenced streams not already included in some other "base" WIM. This setup is in fact fully compatible with ImageX, as you simply need to reference the base WIM using the /ref option when applying an image from the delta WIM. Likewise, the setup suggested by Andrew is also allowed by the format, although extracting such WIM files would require an extraction mode that does not treat missing resources as an error, which I can implement but ImageX/WIMGAPI may not have. In any case, someone creating WIM files that are not standalone needs to keep track of which files they have.

     
  • maxpat78
    maxpat78
    2013-12-22

    Sure it is possible, but you observed yourself that a WIM without file res would be considered an "error" from the MS spec view point (and probably from 7-zip's author, whose tool is used by many Win users).

    Instead, it would be very interesting to extend the format with strong encryption of resources and different (=better) compression algorithms/bigger chunk sizes... obviuosly, truncating bridges with the official tools. ;)

     
    Last edit: maxpat78 2013-12-22
  • Eric Biggers
    Eric Biggers
    2013-12-22

    The point is that even using WIMGAPI, you can call WIMSetReferenceFile() to reference file resources from an external WIM and thereby not include any of those in the resulting WIM file; this would allow creating a "delta" WIM, for example. A WIM without file resources is therefore still valid, but requires external WIM files to be referenced to apply it. The extension (assuming there's no way to do it in WIMGAPI) discussed here would be to allow an extraction to proceed even with missing resources, which may be a worthwhile capability to have anyway.

    Coincidentally, right now I'm working on support for LZMS-compressed WIMs and the corresponding format extension where multiple streams may be packed into a single resource and compressed together. This was an extension of the format by MS themselves as of Windows 8 and you can create them using WIMGAPI or with Dism using "/compress recovery". Such WIMs seem to use a default chunk size of 131072, but multiple streams packed into a single resource have their chunk size and compression algorithm stored separately and the MS implementation seems to use a chunk size of 67108864. (Not very good for random access, but it makes for small files!)

     
    • maxpat78
      maxpat78
      2014-01-08

      Thanks, today I played a bit with DISM 6.3 from Windows 8.1 ADK and its /Export-Image command (it seems that /compress:recovery can't be specified for normal capturing, but only for image recompression), but didn't investigate deeply the new WIM format... perhaps I'll try some tests for compatibility between wimlib-imagex 1.6 and DISM!

       
  • Eric Biggers
    Eric Biggers
    2014-01-08

    I'm not sure you'll have much luck with Dism. As far as I know it can export in the new WIM format but cannot capture or apply such files. However, you can capture and apply such files with WIMGAPI itself, which is how I did compatibility testing. You have to pass the undocumented flag 0x20000000 in dwFlagsAndAttributes and use WIM_COMPRESSION_LZMS (3). I don't know if MS will be documenting this in the future, but at least solid archives are a nice feature for wimlib to have. A larger compression chunk size allows a larger LZ77 dictionary and makes the compression ratio more competitive with LZMA (xz).