Menu

Can i apply an image read from stdin?

reni
2013-07-11
2013-09-15
  • reni

    reni - 2013-07-11

    I have a ntfsclone imaging solution, but i'm looking in ways to switch to a wim format, so i can more easily edit the images.

    One thing i use with ntfsclone now is wget the image from a webserver and pipe it to ntfsclone to write it to disk.

    Is this possible with wimlib?

     
  • synchronicity

    synchronicity - 2013-07-11

    Not currently. The WIM format is designed for random access, not sequential access. Despite this, it is theoretically possible to create a specially arranged WIM file that can be read sequentially. But when I tried doing so at one point, Microsoft's software was confused and refused to read the file.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-08-06

    Hi....

    I am in the same position....using ntfsclone+wget like reni above for a pc imaging solution. You mention that you tried specially arranging a WIM file so it could be read sequentially but M$ software couldn't read it.

    Would it be possible to have a command line option in wimlib-imagex to allow you to do this anyway (with an appropriate warning that M$ software would not work)?

    I don't need the M$ software to work, but being able to use wimlib-imagex as a drop in replacement for ntfsclone would be nice as I currently have ntfsclone etc. scripted to use stdin/stdout with wget for downloading/uploading PC images.

    Just thought I'd ask....

    Chris.

     
  • synchronicity

    synchronicity - 2013-08-06

    Hi,

    I'm considering this, and I've re-implemented the code to write a "pipe-friendly" WIM file. (Again, confirmed that Microsoft's software treats it as invalid, but wimlib can apply images from the file [not a pipe] with no problem.) The harder part, however, is making the extraction code work properly when the WIM file descriptor is a pipe, which I'm still working on. This may require refactoring some of the extraction code to take into account this new use case in a logical way.

     
  • synchronicity

    synchronicity - 2013-08-09

    Hi,

    I finally have "pipable" WIMs working to some extent. Basically you can capture a WIM with the --pipable option to wimlib-imagex capture, then apply images from it when read on standard input with wimlib-imagex apply. So this will probably go out when I make a v1.4.3 release. One caveat, though: Chris mentioned something about uploading/downloading images, so I assume he also wanted the ability to use wimlib-imagex capture to capture a pipable WIM and write it directly to standard output. Unfortunately this isn't yet possible since the WIM format wasn't designed for this. For example, to make it so the WIM can be read from a pipe, the stream lookup table needs to go before the streams, but to make it so the WIM can be written to a pipe, the stream lookup table needs to go after the streams. (Also, the last thing you need to do when writing a WIM is to override the header, at the very beginning of the file.) Of course if we're already going incompatible with M$ then I can change it however I want, although I don't want to make it too ugly. I guess it depends on whether the main expected use case is to image each individual computer as opposed to applying a single image to many computers.

     
    • Chris Bradshaw

      Chris Bradshaw - 2013-08-12

      Hi...

      Thanx for this. The way we upload is using a named pipe....so for uploading, we would output the captured disk image to a named pipe, which sftp reads and uploads to our imaging server. When we download, we do currently take stdout from wget and pipe to stdin ntfsclone, but we could use a named pipe for this too.

      Would your changes to wimlib-imagex work using a named pipe in this way?

      Also, if I split the captured wim, would I be able to download the split.wims.* via wget and pipe (either using | or a named pipe) to wimlib-imagex apply? Or if a wildcard would not work, if I were to have wget download the split.wim files in the correct order and pipe them (again either | or named pipe) to wimlib-imagex apply would that work?

      Thanx in advance for your help.

      Chris.

       
  • synchronicity

    synchronicity - 2013-08-12

    Hi,

    I've been working on implementing support for a different "pipable" WIM format that allows capturing an image directly to a non-seekable file, such as an unnamed or named pipe, as well as applying the same image when the same data is received via another non-seekable file. So to answer your first question, yes, the upload/download of images will work in the way you stated.

    This "pipable" WIM format does a bit more than just re-arrange components of the file, so it still will be incompatible with MS's software; this is necessary because it's impossible to support this use case with a compatible format.

    I haven't yet implemented support for creating, joining, or applying split, pipable WIMs. It's not entirely clear to me what the advantage of this would be.

     
  • synchronicity

    synchronicity - 2013-08-12

    Well, regardless, I decided to refactor the code to create split WIMs anyway, so now it can create pipable split WIMs without much extra trouble. So you'll be able to use pipable split WIMs, if you really want to. One quirk is that the first part must be sent over the pipe first, but the remaining parts can be sent over the pipe in any order.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-08-15

    Hi....

    Wow....thanx so much for the quick turnaround on this. I have been trying the new features using the latest code from git and they work well. One thing regarding split pipeable WIMs though. Based on what you say in your last post above, and also on some tests here, if I understand correctly, no matter what split wim file I apply, I always have to apply the first split file first and then my file of choice.

    So if say I had test.swm, test2.swm and test3.swm and I first applied only test.swm + test2.swm with imagex apply, and then wanted to apply test3.swm by running a separate imagex apply command, I would have to apply test.swm + test3.swm. Is this correct?

    If so, would it be possible to have a '--no-overwrite' or '--ignore-existing-files' files option for imagex apply, because when I run the second imagex apply with test.swm + test3.swm, it complains that files applied via from test.swm in the first imagex apply already exist and it bombs out.

    The reason I would like to be able to do this is so that when I reload a room full of PCs (which we do regularly) sometimes one or two will fail. With ntfsclone, I have no choice but to let the scripts we use restart the reload from the beginning, even if it failed 59Gb into a 60Gb download....with wimlib-imagex I could have the code only restart on the split wim file which failed, so if it failed on number 19 out of 20, I could just restart imagex apply using test.swm + test19.swm and continue from there....much more efficient.

    Hope this makes sense.....and thanx again for all your help.

    Chris.

     
  • synchronicity

    synchronicity - 2013-08-15

    Hi,

    The WIM format is single-instancing, so split WIM parts do not contain "files"
    per se. Instead they contain "streams", each of which can be represented in
    multiple unnamed data streams (default file contents), named data streams, or
    reparse data. So "applying" a split WIM part really means something along the
    lines of extracting a subset of the streams contained in the original WIM.

    I believe I could implement something similar to what you're requesting at least
    in terms of restarting an apply of 20 split WIM parts on the 19th, provided that
    the 1st, 19, and 20th parts are sent over the pipe on the restarted command.
    However, one needs to consider the case where the parts used for the resumed
    extraction do not include all the remaining streams that actually need to be
    extracted, which could happen accidentally and result in an incomplete
    extraction. Since not even all stream information (such as sizes) will be
    available in such cases, I might just have to go with checksumming the extracted
    files (or perhaps a random sample thereof, although that will not be 100%
    reliable) as a sanity check.

    Note: The first split WIM part is always needed because it contains the image
    metadata, and I don't want to duplicate the image metadata in each part. One
    might hope that the metadata could be split up among the parts, but again due to
    the single-instancing and potentially multiple streams per file this is
    non-trivial, and it would require even more changes from the original WIM format
    which I think would be getting too far out of the scope of the wimlib project.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-08-25

    Hi....

    Sorry for delay in replying....I did kind of suspect that the WIM format was probably not ideally suited to being used in this way, but I was hoping it wasn't too dissimilar to a TAR file.

    If I'm honest, I'd have to admit I'm not sure many people (apart from me ;-) would benefit from a 'restart on the last split wimfile' feature, so I can understand if this is a low or even non-existent priority....

    One option which perhaps might be of more use to a wider audience if it existed and it was possible to implement without too much coding gymnastics would be an option for wimapply which would sync the contents of the NTFS filesystem with the specified WIM file....i.e:

    • Only overwrite existing files/streams on NTFS filesystem if they are different from files/streams of same name in WIM file.
    • Apply files/streams from WIM file which don't already exist on NTFS filesystem.
    • Delete files/streams from NTFS filesystem which are not present in WIM file.

    ....it would be a kind of like an 'rsync the hard drive with the WIM file' option.

    Certainly for a PC imaging system like mine and reni's this could greatly speed up applying an image, and would also speed up a retried wimapply where the first attempt fails for some reason. And it could be even faster again if the new --update-from and/or --delta options could somehow be used during wimapply and not just when capturing an image.

    Anyway, just throwing some ideas out there....not sure if any of this is do-able....

    Thanx for your time and help.

    Chris.

     
  • synchronicity

    synchronicity - 2013-08-25

    Hi,

    I did end up adding a --resume option to wimapply, but I have left it undocumented (for now) since its behavior is currently not very good. You have to be careful to specify all needed split WIM parts, or else the apply may be incomplete (with no warning).

    Speeding up the application of an image by not extracting files already present is an interesting idea. The hard part is ensuring that an already-present file is indeed the same as the one to be extracted. Knowing for certain requires reading all the file's data and metadata, which largely defeats the desired optimization. The alternative, which would parallel the technique used in the --update-of option to wimcapture and wimappend, would be to only compare timestamps and other metadata such as stream sizes. In my opinion this is slightly less safe in the extraction scenario, since in theory, a malicious actor could prevent arbitrary files from being extracted by manipulating timestamps and file sizes, as opposed in the capture scenario where the same would merely result in older versions of files being backed up. rsync itself likely suffers from the same problem, however.

    What would be nice would be to mount the WIM image and the NTFS volume, then literally use rsync to do the extraction. Unfortunately for various reasons that won't work entirely correctly.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-08-25

    Hi....

    Interesting about the --resume option....I'd be happy to test it out (even if it's not fully working)....if I run wimapply from 1.5.0BETA with no args will it show me the command format?

    Originally, our imaging system actually used tar for the image format, but back then the windows filesystem was able to fit within the max allowable size of a FAT32 partition. Using the 'star' command I could capture an image into a split tar archive (each file ~500Mb) and during apply if there was a failure I could restart with the last split file to fail, and not overwrite anything already on the drive (in case the one that failed applied some but not all of its contents).

    Once we moved to NTFS and ntfsclone, I lost this ability and I have been searching ever since for an image format which would allow me to do the same things as I used to do with FAT32/tar....hence my interest in these options in wimlib.

    As regards comparing timestamps and file sizes, in my case it might actually be sufficient, but I do see the limitations alright. However, when we used FAT32/tar the 'star' command we just used its --keep-old-files option which 'keeps existing files rather than restoring them' (from the star manpage), which actually sounds much less sophisticated than comparing timestamps and file sizes.

    Another option and possibly a faster, more efficient one would be to just have a --no-overwrite (or --keep-existing) option for wimapply (like with star) with no comparing of timestamps or file sizes. In this case, I would begin (as we currently do and also did with FAT32/tar/star) by wiping the hard drive and applying an image from scratch. If it fails half way thru, I could restart the apply either from the beginning or from the last applied file (if --resume is viable) but specify not to overwrite any existing files. This would mean the apply should skip forward to where it failed and continue from there.

    Thanx again for your help and interest in all of this.

    Chris.

     
  • synchronicity

    synchronicity - 2013-08-25

    Hi,

    Since --resume is (currently) undocumented it does not appear in the help output. However, the current behavior is that by passing it to wimapply when applying an image from a pipe, the initial step where all files and directories are created is skipped (the assumption being this phase already completed successfully), then any streams available are applied as-is (with no check for missing streams, the assumption being that any missing streams were in previous parts that were already processed successfully). But as mentioned I feel that without a good way to detect missing streams this option isn't quite up to spec.

    A simplified --no-overwrite or --keep-existing option wouldn't quite work. In the case of a failed or aborted extraction, it's possible for a file to be partially but not fully extracted. Additional work must be done to fully extract such files, even though they already exist. Also, for several reasons related to sequential extraction of streams, wimlib will actually create all files very early in the extraction process but only fill in the data later. This is a big difference from tar, as tar has no central directory that contains information about all files in the archive, nor does tar store duplicate streams only one time, nor does tar support multiple data streams per file.

    By the way: Unfortunately NTFS-3g doesn't do any journaling, so if your computer crashes, powers off, or catches on fire while applying a WIM image to a NTFS volume, in theory the resulting filesystem could be in pretty much any state and it might be a good idea to start from a clean filesystem anyway.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-08-25

    Hi....

    Ah well....shame its so complicated (and un-tar like), but it was worth checking out. I'll certainly give the --resume option a shot and see how it goes. Even that without a sync or a keep-existing option could help a lot to speed things up in the event of a mid way failure.

    Thanx again for the help.

    Chris.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-09-09

    Hi....

    Just to let you know I have tried using the --resume option in v1.5.0 but it doesn't seem to do anything....when I use it, I get a message to say that $Recycle_Bin already exists, and the wimapply just bombs out.

    Maybe you have disabled the option?

    Or perhaps I am using it incorrectly?

    My WIM file is split and pipable and the command I used to test a resume was:

    cat 1st_split_file.swm failed_split_file.swm failed_split_file+1.swm .... etc. etc. | wimapply - 1 /dev/sda1 --resume

    HTH.

    Chris.

     
  • synchronicity

    synchronicity - 2013-09-09

    Hi,

    I looked into this, and there are two oversights in how I implemented the "resume" behavior that affect the NTFS-3g extraction backend. However even if I were to fix them this doesn't change the other unresolved issues I brought up above, including there being no verification that all the streams were in fact extracted and that the filesystem state is unknown if it was not properly unmounted. So you won't be able to use this undocumented option for now, sorry. Ultimately a proper implementation will need some sort of checkpointing built into it which will make it significantly more complicated.

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-09-13

    Hi....

    Thanx for the reply. Just another idea regarding the possibility of a --resume option....

    Would the following be possible, again without too much coding gymnastics or a big drop in performance?

    Could wimapply write a log in which each file/stream it had successfully applied to the harddrive is logged, and which a subsequent wimapply with the --resume option could use as a list of files/streams that it should skip over and not try to apply, and in that way it would only start writing to the hard drive on the next file which would have been written if the previous wimapply had not failed?

    Might not be feasible, but just thought I'd throw it out there anyway....

    HTH.

    Chris.

     
  • synchronicity

    synchronicity - 2013-09-13

    That's basically what I was considering, regarding checkpointing. Unfortunately it's harder than you might think due to the fact that modern OS's don't guarantee the order that writes reach the hard disk. Working around this might require syncing the log file and NTFS device at regular intervals, which would decrease performance.

    Perhaps the problem could be simplified if it could be assumed that the "failing" wimapply has a chance to terminate cleanly, e.g. with a handler for SIGINT (Ctrl-C)? Under what circumstances would it be "failing"?

     
  • Chris Bradshaw

    Chris Bradshaw - 2013-09-15

    Hi....

    I can't be very specific about why it would be failing. Broadly, it usually would fail due to a timeout of some kind, usually in circumstances where there would be a significant load on the server or the LAN or both (eg: if we were re-imaging more than ~100 PCs at the same time).

    Our infrastructure is good....powerful Linux FTP server with 10GbE connection, and GbE to the desktop. Because this tends to happen under load (and usually at night when no one is around) its always been very much easier to work around the problem by restarting a timed out/failed imaging session (preferrably from where it left off) than to try and diagnose the root cause of the problem.

    Not sure if this helps.

    Chris.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.