Reduced wimlib on Win32 with GCC 4.7

maxpat78
2013-03-06
2013-05-02
1 2 > >> (Page 1 of 2)
  • maxpat78

    maxpat78 - 2013-03-06

    Please note that a reduced version of wimlib, containing the compression code only, can easily get compiled under Windows with GCC 4.7 (MingW64). These are the guide lines:

    1) declare the appropriate functions as __declspec(dllexport) in the source code
    2) create an empty CONFIG.H and #define typeof __typeof__
    3) type:
    gcc -mdll -mwin32 -std=c99 -march=native -O2 -flto -finline-functions -funswitch-loops -I. -Wl,-s -Wl,-o,wimlib.dll lzx-compress.c compress.c lzx-common.c lz77.c lzx-decompress.c decompress.c xpress-compress.c xpress-decompress.c

    TIP: -O3 generates bad code (in lzx decompressor)

    This works well with my ImagePyX script at https://github.com/maxpat78/ImagePyX

     
    Last edit: maxpat78 2013-03-06
  • synchronicity

    synchronicity - 2013-03-09

    That's an interesting use case, and I'm glad you found wimlib's compression and
    decompression code useful! I'm wondering if there is any potential for avoiding
    code duplication, though, since you seem to have also implemented a lot of the
    same logic that I have (just in Python instead of C). I am also considering
    making a Windows version of wimlib and 'imagex'...

    I was also wondering if the problem you experienced with -O3 was really a
    compiler bug, or rather a bug in my code. One potential issue is that the LZ77
    matching code will read up to 8 bytes off the end of the array of uncompressed
    data for performance reasons. When it's called from the C code this is not an
    issue because this will be valid stack memory, but a Python string will be
    allocated on the heap, probably with no extra bytes at the end because strings
    in Python are immutable. To be 100% safe you can either modify lz77.c to use
    safe operations only (see line 151-163), or make sure that the string of
    uncompressed data has at least 8 extra characters. I will also add additional
    comments that document this behavior. Another possibility is that I could add
    these functions to wimlib's official API and remove this weird calling
    requirement, but this would be a performance hit and I don't plan to do this.
    Anyway, chances are the -O3 issue was actually something else entirely.

     
  • maxpat78

    maxpat78 - 2013-03-26

    I suspect a GCC bug, this is not the first time -O3 produces weird behaviours with compression/crypto code. However, I've tried with great interest your wimlib-imagex Win32 tool, and sure it is a good candidate to outperform my ImagePyX script and its many limits, and the ImageX itself. At the moment, I see 2 fields in which ImagePyX clearly does better the job: 1) it is faster: while compressing with (your) XPRESS my Seven's INF folder (about 104 MiB), it takes only 9 seconds against the 30 required by your tool; 2) it works in most cases without Admin rights: this is crucial, since I need more an archiver than another admin tool like ImageX.

     
  • synchronicity

    synchronicity - 2013-03-26

    Hi, thanks for the comments! It's not clear to me why ImagePyX would be that much faster than wimlib-imagex, especially if they're both using the same compression code anyway. ImagePyX doesn't get away with reading each file only 1 time, right? (That would require compressing files that might be thrown away as duplicates.) Did you remember to test each one multiple times to neutralize any effects of the OS's buffer cache? Either way, I'll compare the two myself when I have a chance and try to identify any performance problems.

    A couple days ago I actually modified wimlib to, by default, only print warnings (not fail) if security descriptors cannot be captured due to insufficient privileges, so this should fix the problem where admin rights are required. I will post a new ZIP file as soon as I have time to test it a bit more.

    Thanks!

     
    Last edit: synchronicity 2013-03-26
  • maxpat78

    maxpat78 - 2013-03-27

    Hi, and thank you. I can't explain it to myself, too: it should be as fast as ImageX or ImagePyX, in fact! According to your documentation, it seems that it is under Linux, right? Yes, I repeat the same operation 3 times to get the input data cached and achieve the maximum speed possible. No, I can't read input once only, but I've tried to improve performances and get the most benefits from caching this way: the SHA-1 is calculated file by file just before compressing it (if new) or discarding it, so, if it is a small file, it tends to be kept cached when compression starts. And, yes, the error seems related to the lack of Admin rights while opening the root dir to capture the SD... however, I noticed you ask Windows for a privilege more than my script: in fact, ImagePyX is able to retrieve its required SD without admin rights! Are you sure GROUP_SECURITY_INFORMATION has to be retrieved?

     
  • maxpat78

    maxpat78 - 2013-03-27

    P.S. Another trick to improve speed in ImagePyX is processing 2 or more files in parallel (instead of parallelizing blocks compression): this seems the same behaviour of ImageX itself - obviously, this way you waste space for a couple temp files before concatenating them to the main WIM!

     
  • synchronicity

    synchronicity - 2013-03-27

    Hi,

    I tested the speed of wimlib-imagex and ImagePyX capturing about 250MB of files with both LZX and XPRESS compression, and in both cases wimlib-imagex seemed to be about 10% faster. I can think of two reasons why it might have been slower for you: (1) printing to the console in Windows can be very slow, and wimlib-imagex updates its progress on the console very frequently (I have changed this in v1.3.2), (2) the contents of the operating system's buffer cache could have been different between the two timings.

    As you suggested, I think that the advantage that `ImagePyX.py --capture' has is that, although like wimlib it reads every file up to 2 times, in wimlib the 2 reads are done in separate library calls, while in ImagePyX, the file is read for the second time immediately after the first. But, this advantage may be negated by the fact that you're having the compressor threads write the data twice: once to a temporary file, then to the WIM later. This is fine as long as the "SpooledTemporaryFile" stays in memory, but once it spills out to disk then there would be a lot of time being wasted.

    I am not convinced that compressing whole files is faster than compressing individual blocks, although this might be the case.

    I actually have in mind a way that should be faster than either of our implementations, which is that some sort of map of stream sizes can be built (perhaps hashing in the first 4096 bytes of each stream or something). Then, we would know that a file with a unique stream size or "preliminary" hash need not be fully SHA1-summed before writing it compressed to the output WIM file.

    Inodes with multiple links could also be directly linked into the dentry tree with reading virtually nothing, although that only would help significantly if you have a lot of hard linked files.

    I was under the impression that the group information is an important part of the security descriptor, and therefore needs to be captured. Actually, I was having more trouble capturing the SACL (SACL_SECURITY_INFORMATION), since that requires a special privilege (SE_SECURITY_NAME), which by default only the Administrator can request. In the new v1.3.2 code I have just released, wimlib will initially attempt to capture the SACL, then fall back to everything but the SACL in the case of insufficient privileges, and finally fall back to capturing no security descriptor at all. The exception is if the --strict-acls flag is specified, which indicates that the security descriptor MUST be captured exactly as is.

     
  • maxpat78

    maxpat78 - 2013-03-27

    I noticed the console updating seems slow, I'll try minimizing it. Obviously, you're right about the SpooledTemporaryFile, but it seemed the fastest and most equilibrated solution in my scenarios. About taking a partial and preliminary SHA-1, I've thought about it myself, but discarded the idea: in fact, you can have the first 4K identical, but different chunks after... you can't be sure before having reached the file's end. On SD I'm not sure... my tests showed that ImagePyX captured SDs were identical to those captured by ImageX, but I couldn't test all cases.

     
  • maxpat78

    maxpat78 - 2013-03-27

    Ok, v.1.3.2 takes 9", too!

     
  • synchronicity

    synchronicity - 2013-03-27

    Yes, a partial SHA1 may not be realistic; another issue is that this partial SHA would have to be taken on existing resources in the WIM, in the case of appending, which would waste time. But, we still have all the stream lengths essentially for free, so it certainly would be possible to delay the SHA1-summing of any streams with a unique length, which I think could make a big difference.

    I think the SACLs are important, as some security descriptors in the install.wim's and boot.wim's for Windows 7 and Windows 8 include SACLs. And I don't think that the SACLs are included in the security descriptor returned from GetFileSecurity() unless you specify SACL_SECURITY_INFORMATION; otherwise, the API wouldn't make any sense. But, reading the SACLs always requires admin rights, so the solution that makes the most sense to me is to skip reading the SACLs if we have no permission to, but warn the user. I know that some people might want it to Just Work(TM) and not have any warning messages printed, but I feel it is important to at least print a warning message if the program is for any reason unable to capture all information in the directory tree.

     
  • synchronicity

    synchronicity - 2013-03-27

    Also, a somewhat unrelated note: I simplified the way that lzx_compress() and xpress_compress() are called in wimlib, so they only take 3 arguments now, and the return value is the compressed length, or 0 if compression was not performed. I attached the patch to ImagePyX.

    I also made it so that the compression functions are exported if EXPORT_COMPRESSION_FUNCTIONS is defined at compilation time, although now I'm thinking it would be most convenient to just unconditionally export them. If I did this, I'd rename the functions to wimlib_lzx_compress() and wimlib_xpress_compress() to be consistent with the other library functions.

     
  • synchronicity

    synchronicity - 2013-03-27

    Oh sorry, I forgot about the decompression functions. They would become wimlib_lzx_decompress() and wimlib_xpress_decompress() if they were to be exported by default.

    I also somehow was able to use ImagePyX to apply a LZX-compressed WIM without having exported either of the decompression functions from wimlib.dll, but no files were extracted, so I think that's something you need to fix (properly fail with an error instead).

     
  • maxpat78

    maxpat78 - 2013-03-28

    Thanks for the hints and the patch (altough I won't recompile without substantial improvements in the DLL). Very strange the behaviour with no exports: Python should raise an exception if it can't resolve the imports!

    Moreover, I tried one of my favourite tests, WIMming with a fresh XP installation (1,29 GB, 10.325 files, 691 folders):

          wimlib-imagex       ImagePyX
    

    capture 6:46 [2:52] 5:25 [3:06] [process time in square brackets]
    7:47 [3:09] w/ 3 threads
    6:42 [3:04] w/ 3 threads, 2nd execution
    7:16 [2:43] w/ 1 thread, 3rd execution
    6:17 [2:51] w/ 2 threads, 4th execution
    apply 2:43 [0:51] 8:27 [6:39]

    Timings were taken with Microsoft TIMEIT tool.

    ImagePyX is slow in applying the image, since it actually works with a single thread (and needs some other kind of optimization).
    The capture process is faster, however, because I process 3 files a time with 3 threads, instead of 2.
    Would wimlib-imagex be faster combining your block parallelism with my file parallelism only with small inputs (i.e., smaller than 10 MiB or so)? Why does ImagePyX seem faster?
    [Note: my unreleased chunk-parallel version of the Python codecs probably has something wrong in the MT synchronization, that's why it gave unexpected bad performances to me...]

     
  • synchronicity

    synchronicity - 2013-03-28

    Hi,

    Well, another weird thing about the DLLs I noticed is that ImagePyX
    automatically threw an exception if the MSCompress DLL is not present,
    apparently due to the fact that MSCompress classes were unconditionally imported
    and contained static initializers that unconditionally loaded the DLL functions.
    So I had commented out the MSCompress classes, as they weren't even used anyway.

    wimlib only uses 1 thread when applying images. Multiple threads are unlikely
    to help significantly because decompression is much faster than compression, and
    the extraction will be IO-bound.

    I took a look at your extraction code, and the only thing I was able to think of
    that could significantly improve the performance is that you do not extract
    streams sequentially (i.e. sorted by position in the WIM). wimlib-imagex does
    this by default, so the WIM file will in general be read sequentially, which
    could improve performance, especially on Windows, which may not handle caching
    as well as Linux.

    ImagePyX probably was faster when capturing an image due to the point brought up
    earlier where it writes files immediately after checksumming them, meaning they
    are more likely to still be cached by the operating system. I do not think that
    file vs. chunk parallelism can account for the difference.

    A couple other things I noticed while looking at your extraction code:

    • WIMArchive.py: touch(): hFile is leaked if SetFileTime() fails. Similar
      problem in IsHardlinkedFile(). And GetReparsePointData().

    • SSWIMD.py: extract_test(): You always calculate the total bytes to extract
      as the total size of the streams in the lookup table, but not all streams are
      necessarily extracted (especially when extracting only 1 image of many).
      Also, some streams may actually need to be extracted multiple times.

    • SSWIMD.py: extract_test(): Some calls to kernel32.dll functions don't
      appear to be conditional on sys.platform.

    • SSWIMD.py: extract_test(): You appear to always hard linking identical files
      with nonzero hard link group ID, but such files need not all be part of the same
      hard link group.

    • SSWIMD.py: extract_test(): You're using the 32-bit 'dwReparseReserved' first
      as the hard link group ID, but I think that the hard link group ID includes the
      next 32 bits as well.

    • SSWIMD.py: extract_test(): You're sometimes using CreateSymbolicLinkW()
      instead of setting the reparse data directly, apparently to make it so that
      extracted symlinks are valid when the image is not extracted to the root directory.
      It's not clear to me this is the best behavior,
      as it could be that someone wants to apply an image, then capture it and have
      it be exactly the same as the original. Any idea what the behavior of
      Microsoft's program is, and also whether the WIM_HDR_FLAG_RP_FIX flag set in
      the WIM header is related?

     
  • maxpat78

    maxpat78 - 2013-03-29

    Hi, and thank you very much for your precious suggestions.

    In fact, extraction and Win32 API related code is the most recent, and most untested one: I had to redesign the extraction/test part and was considering myself about the impact of resources order (the choice of an OrderedDictionary instead of a simple dictionary was the first step toward this direction), but I was also coding a generic CodecMT class to adopt in both compressors and decompressors. Perhaps this is not the case, if you say that MT gives no substantial improvements while extracting.

    At this point, I ask myself if the big slowness in applying an image is originated by the disposal of API calls (perhaps times and perms should be applied in one pass to the directory tree, to improve seeking time to the NTFS metadata).

    I tried to use only the APIs which work without Admin rights (in fact, applying the reparse data directly does require such rights).

    I think FLAG_HEADER_RP_FIX means that when an absolute source path lies inside the image root, and FLAG_HEADER_RP_FIX is set in the WIM header (it is by default), its string is fixed by making it direct descendant of the root, which is conventionally represented with the source drive letter.
    So, if we are capturing an image from the root folder “X:\Real\Path”, the target
    \??\X:\Real\Path\Subdir1
    becomes
    \??\X:\Subdir1
    Else it has to be left unchanged.

    There are my observations and interpretations about the DIRENTRY members at 0x54:0x60.
    The following anonymous union better represents the DIRENTRY members at 0x54:0x60:

    union {
    struct {
    DWORD dwUnused;
    DWORD dwReparseTag;
    USHORT usUnknown;
    USHORT usIsRelative;
    } SymLink;
    struct {
    DWORD dwUnused;
    DWORD dwFileIndexLow;
    DWORD dwFileIndexHigh;
    } HardLink;
    }

    liHardLink probably is a DWORD. However, the implementation suggests dwReparseTag, dwReparseReserved and liHardlLink must be 12 bytes long, not 16. Moreover, the implementation suggests: 1) dwReparseTag is always zero; 2) if the entry represents an hard linked file, the two DWORDs dwReparseReserved and dwHardLink contain the 64-bit file index retrieved with GetFileInformationByHandle; 3) if it is a symbolic link (or junction), dwReparseReserved contains the reparse point type code (0xA0000003, or IO_REPARSE_TAG_MOUNT_POINT;0xA000000C or IO_REPARSE_TAG_SYMLINK), while dwHardLink is 0x10000 if the link is relative.

     
  • synchronicity

    synchronicity - 2013-04-02

    Hi,

    Thanks for the tip about the WIM_HDR_FLAG_RP_FIX. I will think about how to make wimlib handle reparse points better, including using CreateSymbolicLink() when appropriate if that really requires fewer privileges than setting the reparse data directly.

    It may help a little bit to apply times and perms in only one pass. In wimlib, I am applying timestamps in a separate depth-first traversal to avoid any issues with the extraction itself modifying timestamps of directories after they have been set. I'm sure this could be optimized slightly to remove this extra pass, although I'm not sure it would help a lot. I generally would expect extracting the actual file data to take most of the time.

    It looks like your interpretation of those weird fields in the WIM dentry is essentially the same as mine. I have:

    if (inode->i_attributes & FILE_ATTRIBUTE_REPARSE_POINT) {
            p += 4;
            p = get_u32(p, &inode->i_reparse_tag);
            p += 4;
    } else {
            p = get_u32(p, &inode->i_reparse_tag);
            p = get_u64(p, &inode->i_ino);
    }
    

    So for reparse points, I just use the middle DWORD as the reparse point tag and ignore the other fields (one of which you observed may indicate whether the symlink is relative or not--- although I thought you can also determine that from the reparse data itself). For non-reparse points, I use the first 32 bits as the reparse tag (which is actually irrelevant and I really should just be ignoring it--- I guess it's always set to 0), and the next 64 bits as the hard link group ID, file index, or inode number (whatever you want to call it). I may have just been confused by your Python code because it looks like you access some fields with names that are not the same as the actual data stored in the field.

     
  • maxpat78

    maxpat78 - 2013-04-03

    Yes I did, but the confusion originates from MS, where they give names to the fields but assign different contents in contrast with them.
    With NTFS as source FS, the 64 bits are the unique MFT record number; they call it "group ID" since other Windows FS probably (could) have something similar (UDF? exFAT?).

    ImagePyX seems to waste about 10-15% of total execution time to apply times, perms and SDS, but this is not the cause of its big slowness.

    However, I'm substantially rewriting the code. Actually my MT "plurichunk" codec works, but, even if it seems correctly implemented, it tends to be slower at capturing an image than the old "plurifile" version and wimlib-imagex: at almost every repetition of the same test, there is a strange disk activity that slows down the operation, but I don't know why, yet (I can reach the old, faster execution time as an exception, instead of a rule). The decompression part is becoming faster, but again far away than expected.

     
  • maxpat78

    maxpat78 - 2013-04-03

    Hi again,

    here you are an analysis (I hope, exhaustive :D) of the different behaviours with hard/symbolic links and junctions, both with ImageX and wimlib-imagex (Win32 edition).

    "source_dir.txt" represents the original directory tree captured (displayed with "DIR /S T")

    "sample_links_imagex.wim" is generated by ImageX capturing the "T" tree
    "sample_links_wimlib.wim" is generated by your wimlib-imagex capturing the same tree (with Admin rights)

    "imagex_imagex_restored_dir.txt" represents how ImageX restores "sample_links_imagex.wim" (the result we want)
    "imagex_wimlib_restored_dir.txt" represents how ImageX restores "sample_links_wimlib.wim"

    "wimlib_wimlib_restored_dir.txt" represents how wimlib-imagex restores "sample_links_wimlib.wim"
    "wimlib_imagex_restored_dir.txt" represents how wimlib-imagex restores "sample_links_imagex.wim"

    I haven't tested yet the ImageX behaviour when removing the FLAG_HEADER_RP_FIX, but at this point you should have enough material to fix your tool.

    Note: letters ja/sa/jr/sr in file names mean junction/symboliclink absolute/relative, according to the way they were created with Windows MKLINK tool.

     
  • synchronicity

    synchronicity - 2013-04-03

    Hi,

    Thanks for the tests regarding reparse points. Note that I would caution against always treating Microsoft's software as the desired behavior, since there are bugs in it (for example, I found that it does not always capture alternate data streams correctly). However, I would tentatively agree that fixing up of absolute symbolic links should be done by default, unless overridden using an option. I will look into this when I have time. I also would like to write some automated tests for the Windows build. (The UNIX build already has some automated tests.)

     
  • maxpat78

    maxpat78 - 2013-04-04

    Hehehe... but, in this case, your tool can't handle the reparse points well: neither ImageX nor itself can restore them properly (at least, on Windows)!

    In fact, one not-desired behavior in MS software (=Win7, Win8) is that now I have to delete a WIM before opening (and/or truncating to zero) it to write, or the mentioned disk access problem will slow down my writing operations - now the MT codec is fast as expected.

    Please send me some tests to carry out in Windows, if you can.

     
  • maxpat78

    maxpat78 - 2013-04-04

    (I mean, no tool can actually restore properly the reparse data generated by wimlib-imagex on Windows)

     
  • synchronicity

    synchronicity - 2013-04-04

    I think that this kind of thing is very much dependent on your definition of restoring them "properly". wimlib, as currently written, will capture and restore reparse points exactly as they are, using the documented APIs for doing so. There are many different kinds of reparse points and Microsoft can add new ones at any time, so 3rd party programs cannot be expected to understand all of them. However, we've established that symbolic links and junction points are two specific types of reparse points (I still cannot understand why there are 2 different types of symbolic links) that should be treated specially, and I will take this into account when I have time to work on the library a bit more. I guess Windows is how it is...

    I am unsure as to the cause of the disk access problem you were experiencing.

    Again, thanks for all the help!

     
    Last edit: synchronicity 2013-04-04
  • maxpat78

    maxpat78 - 2013-04-04

    Ah, yes, I've just catched another problem in new (1.3.2) source code: I guess the decompression functions should be preceeded by this, too:

    #ifdef EXPORT_COMPRESSION_FUNCTIONS
    WIMLIBAPI
    #endif

    Also, I noticed some INF files aren't compressed properly with LZX (compressor returning zero bytes).

    How does my ImagePyX work with reparse stuff under Linux? Can it restore them "properly"? (I mean, making them pointing to something useful and to what the user wants they point to). I never tested it in that environment... Also, I was asking myself if wimlib can get compiled as a native Android executable with their NDK.

     
    Last edit: maxpat78 2013-04-04
  • synchronicity

    synchronicity - 2013-04-04

    Hi,

    Yes, I'm aware that I forgot to export the decompression functions with EXPORT_COMPRESSION_FUNCTIONS. In the next release I will just export the compression and decompression functions by default, and they will be prefixed with wimlib_, and documented in the HTML documentation. Also, as I mentioned in an earlier post, the calling convention of the compression functions was simplified. A 0 return value now indicates that the input could not be compressed to less than the original size. So you will need to change your code (and I had provided a patch earlier), although if you don't it will unfortunately not be obvious due to the fact that Python cannot use the C headers to check the function signature when calling into the DLL.

    It's not clear to me how you expect ImagePyX to work on non-Windows systems, considering that in multiple places you are unconditionally calling kernel32 functions, and even in places with conditional calls there are typically no calls to equivalent UNIX functions such as link() and symlink(). Am I really looking at the latest version of your code? It looks like the last commit was on February 18.

    I don't know if wimlib can be compiled as a native executable under the Android SDK. I'm sure it would be possible to get it working somehow, although it might be necessary to replace some functions that are not available in the Android libraries (such as Bionic, which is Android's simplified C library). I'm not sure why anyone would want to do this though. If you're just looking for a Linux system to test stuff out on, just install it on an older computer, use VirtualBox with a virtual machine, or install it alongside Windows.

     
  • maxpat78

    maxpat78 - 2013-04-05

    Ok, I didn't take in account that zero now means "no useful chunk compression" (in fact, the whole file seemed compressed well).

    No, I'm working on newer code, probably I'll release next week if the SSWIMMD module will be finished in the week end. (ctypes in Python seems quite buggy: for example, a LoadLibrary call fails, while directly accessing to the DLL works fine...)

    Sooner or later I'll have to reinstall Debian 6 or ubuntu in some VM, and test there (I had them on an external USB HDD before).

     
1 2 > >> (Page 1 of 2)

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks