#2002 Unpacking tarballs with symlinks does not always work


Certain tarballs produced on non-MSYS systems such as Linux include symlinks to other files within the tarball. If the table of contents for the tarball shows the symlink before the file it points to within the tarball, then the following error is generated by attempting to unpack the tarball on MSYS.

bash.exe-3.1$ tar tvf symlink.tar
lrwxrwxrwx wine/wine 0 2013-07-03 17:41 bar -> foo
-rw-r--r-- wine/wine 0 2013-07-03 17:41 foo

bash.exe-3.1$ # Essential: make sure no local copies of foo bar
bash.exe-3.1$ rm -f foo bar
bash.exe-3.1$ tar hxf symlink.tar
tar: bar: Cannot create symlink to `foo': No such file or directory
tar: Exiting with failure status due to previous errors
bash.exe-3.1$ ls -l foo bar
ls: bar: No such file or directory
-rw-r--r-- 1 wine 544 0 Jul 3 17:41 foo
bash.exe-3.1$ tar hxf symlink.tar
bash.exe-3.1$ ls -l foo bar
-rw-r--r-- 1 wine 544 0 Jul 3 17:41 bar
-rw-r--r-- 1 wine 544 0 Jul 3 17:41 foo

My guess about what is going on when the tarball is being unpacked is the contents of the tarball are processed in table of contents order so the processing of the symlink generates a temporary dangling symlink for bar that is resolved (on Linux) when foo is unpacked later. But on MSYS this sequence of events does not work since symlinks are just copies on MSYS so the first unpack generates the error message above but then
goes on to unpack foo. The subsequent second unpack attempt works because foo exists on the filesystem when the symlink is unpacked.

So double unpack is the workaround for now if you want to reliably unpack Linux tarballs that contain symlinks using MSYS tar.exe.

One possible fix for this bug might be to take two passes at the table of contents of the tarball when unpacking with the first pass dealing strictly with files, and the second pass dealing with symlinks that point to those files. Or you could also wait (which might be a very long time) until MSYS allows dangling symlinks.

System details. I use the latest (updated) MinGW/MSYS that is installed with mingw-get-inst-20120426.exe but then downgraded afterwards to msys-core-bin=1.0.17-1 as recommended at http://sourceforge.net/p/mingw/bugs/1950/. All the above results were produced on Wine-1.6-rc4, but a friend with access to MSYS on Microsoft Windows confirms the same issue, and the bug should be easy for the MSYS developers to reproduce since I have attached a compressed version of the symlink.tar tarball that demonstrates the bug.

1 Attachments


  • Keith Marshall

    Keith Marshall - 2013-07-05

    I'm not actually convinced that this should be classified as a bug -- our own project rules for creating Windows compatible tarballs explicitly ban the use of symbolic links. Even if you do consider it a bug, I suspect it may be classified as a "won't fix".

    You aren't entirely correct, in your assumption that tarballs are unpacked in TOC order, for there is no TOC in a tarball -- it is simply a sequential stream of alternating header and data blocks, one header block for each file, immediately followed by the content of that file, spanning as many blocks as it requires, before the header for the next file is encountered.

    You are correct in your assumption that, on Linux, a symlink will be unpacked as dangling, if it is unpacked before the file to which it refers, and subsequently becomes valid when the file is unpacked later.

    The reason that this doesn't work in MSYS is because MSYS doesn't support symlinks -- it was developed in an era when Windows itself didn't support them, and even today, its support is flaky; (symlink creation is a privileged operation). Hence, the dependency on copies rather than symlinks, in MSYS, but of course, the content has to already have been extracted, before the copy can be created.

    This is a known limitation of Windows. The workaround we recommend is to specify "tar -chf ..." when creating tarballs on any unixy system for extraction on Windows -- this forces tar to read through the symlink, and embed a physical copy of the file content in the tarball, in its place.

    BTW, hard links may also cause problems, when unpacking tarballs on Windows; MSYS tar handles them correctly, but some other Windows archive extraction tools do not.

  • Keith Marshall

    Keith Marshall - 2013-07-05
    • status: unread --> assigned
    • assigned_to: Cesar Strauss
    • Type: Bug --> Feature
    • Category: Unknown --> Known_Feature
  • Alan W. Irwin

    Alan W. Irwin - 2013-07-05

    Thanks for your correction concerning the fact that a TOC is generated rather than being part of the tarball. Nevertheless, I don't think that makes a practical difference since (I assume) the TOC that is generated for a tarball follows the order of the header blocks as you have described them so if you generate the TOC using the tv options, you can predict whether this error will occur or not depending on whether symlinks occur in the generated TOC before the files they point to.

    I was also well aware that MSYS ln creates copies.

    I also see your point concerning asking those who generate tarballs for MSYS to
    always use the h option. But I ran into this issue for a third-party tarball I had no control over because developers sometimes are unaware of this issue with MSYS tar.
    So at minimum I hope this bug report adds to the publicity concerning this issue.

    My use case is I am putting together a project called "build_projects" which does exactly what its title says on different platforms including MinGW/MSYS. So unpacking all the different source code tarballs for the different projects that will be built is an accident waiting to happen with MSYS tar because of this issue. Of course, I can work around this issue by unpacking the tarball twice for the case where the build occurs on MinGW/MSYS platforms, but I obviously hope you can deal with this issue more fundamentally by changing the order in which symlinks and files are unpacked from a tarball by MSYS tar.exe. I suggest this issue might be straightforwardly fixed by collecting symlink information (rather than immediately unpacking those) while the tarball files are unpacked as normal, then unpacking that collected symlink information at the end. But I have no knowledge of the tar code so the MSYS developer concerned with that will have to figure out for himself whether that method of reording of when symlinks are unpacked is easy to implement or not.

    Last edit: Alan W. Irwin 2013-07-05
  • Keith Marshall

    Keith Marshall - 2013-07-06

    Thanks for your correction concerning the fact that a TOC is generated rather than being part of the tarball.

    I think you rather missed the point: there is no TOC, neither present within the tarball, nor generated by tar at any time. tar is designed to operate on tape archives, a medium which is designed for strictly sequential access. A tarball is simply an image of such an archive. Extraction proceeds in a single pass, as follows:

    1. Read a header block.

    2. Check if it represents end of archive; if so, exit.

    3. If still going, interpret it, extract and commit its associated data.

    4. Wipe the slate clean; go back to (1).

    Note that, when tar goes back to (1), all knowledge of what has gone before is discarded.

    the MSYS developer concerned with [tar] will have to figure out for himself whether that method of reording of when symlinks are unpacked is easy to implement or not.

    I am not he; if I were, this would be a definite "won't fix". This is not an MSYS bug; there are other systems, besides MSYS, which do not support symlinks, (including some Unix systems). Consequently, any tarball which embeds symlinks is not portable; it isn't the responsibility of the MSYS tar maintainer to support such non-portable features. I seriously doubt that he will want to maintain local patches which will make his tar behave in a distinctly and diametrically different manner from every other implementation on the planet, but the final decision must be Cesar's.

    Last edit: Keith Marshall 2013-07-06
  • Alan W. Irwin

    Alan W. Irwin - 2013-07-06

    "I think you rather missed the point: there is no TOC, neither present within the tarball, nor generated by tar at any time."

    I didn't miss the point, but I was also not clear in my response so you completely misunderstood me. I was referring to the TOC generated by tar tvf which humans can read to verify for themselves the exact order of files in the archive. And I got your point about the normal sequential processing by tar the first time you mentioned it so there was no need to repeat it with all the annoying bold face. In short, please take a deep breath and relax. I am not your enemy.

    You do make an important point that Cesar may feel the cost of deviating from that sequential processing model is too high for the benefit I have described. But it is obvious that decision must be Cesar's since he is the one maintaining MSYS tar. So please let him get on with that decision without all the editorializing.

  • Cesar Strauss

    Cesar Strauss - 2013-07-11

    I don't mind adding a new option to tar to explicitly extract all symlinks as copies. I don't have the time nor the inclination for doing it myself, but maybe some other volunteer will come forward to create a patch.

    I'm flagging this as a feature request.

  • Cesar Strauss

    Cesar Strauss - 2013-07-11
    • status: assigned --> open

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks