#1810 Native symlinks

MSYS
open
Cesar Strauss
None
Task
none
Unknown
True
2013-02-13
2010-08-16
Ladislav Michl
No

This premiliary patch contains support for native symlinks. See this thread: http://thread.gmane.org/gmane.comp.gnu.mingw.user/32432/

As I do not know which Windows versions are officially supported by MSYS, create_symlink function is not implemented as it is available since Vista. Original patch had CreateSymbolicLink dynamically loaded based on Windows version, but since CreateHardLink is used directly (and IsWow54Process elsewhere as well), lets leave actual implementation once this question is resolved.

msys_symlink was called from path.cc with arguments swapped, so fix both caller and callee. At least this part of patch would deserve merging.

Also it turned out, that symlinks are actually easy to implement, but learning the rest of MSYS about their existence is much harder. There is some code trying to resolve .lnk files as well as code which stores symlink info into NTFS extended attributes and also code implementing links with BackupWrite. Everything probably inherited from Cygwin. How should it be done in MSYS?

Interestingly enough BackupWrite approach uses fallback semantics. If 'symlink' cannot be created, file copy is performed. On Linux link syscall returns -EPERM on FAT filesystem (and I would expect -ENOTSUPP). I'd rather avoid silent fallback, or make that optional.

More comments later, there is already enough questions asked.

Thank you,
ladis

Discussion

1 2 > >> (Page 1 of 2)
  • Ladislav Michl
    Ladislav Michl
    2010-08-16

    Patch originally by Vincent Richomme

     
    Attachments
  • Earnie Boyd
    Earnie Boyd
    2010-08-16

    • assigned_to: earnie --> cstrauss
     
  • Earnie Boyd
    Earnie Boyd
    2010-08-16

    > Also it turned out, that symlinks are actually easy to implement, but
    > learning the rest of MSYS about their existence is much harder. There
    > is some code trying to resolve .lnk files as well as code which
    > stores symlink info into NTFS extended attributes and also code
    > implementing links with BackupWrite. Everything probably inherited
    > from Cygwin. How should it be done in MSYS?

    Yes, the .lnk code was inherited from Cygwin. I disabled it because Windows native programs do not understand .lnk files. For directories I would suggest using Reparse (Junction) Points. For files I would suggest that if the OS and file system supports it we use the CreateSymbolicLink. If the OS and/or file system does not support CreateSymbolicLink then a copy of the file is correct for what we need. Too many configure scripts use ln -s to not have the fallback of copy by default.

    Cesar Strauss is the current maintainer. He may have more to say.

     
  • Cesar Strauss
    Cesar Strauss
    2010-08-17

    > As I do not know which Windows versions are officially supported by
    > MSYS, create_symlink function is not implemented as it is available
    > since Vista. Original patch had CreateSymbolicLink dynamically loaded
    > based on Windows version, but since CreateHardLink is used directly
    > (and IsWow54Process elsewhere as well), lets leave actual
    > implementation once this question is resolved.

    Please see the source in autoload.cc. If you add a CreateSymbolicLink declaration in there, you can call it in the code directly. You are actually calling a stub code that will load the appropriate library by demand, and either forward your call to the real function, or return FALSE if it doesn't exists in your version of Windows.

    > msys_symlink was called from path.cc with arguments swapped, so fix both
    > caller and callee. At least this part of patch would deserve merging.

    Please swap the order of the arguments of the symlink function itself as well. According to the POSIX standard, the contents of the symbolic link (frompath) is the first argument. To see it, type "symlink" on the search box in the following page:
    http://www.opengroup.org/onlinepubs/009695399/

    > Also it turned out, that symlinks are actually easy to implement, but
    > learning the rest of MSYS about their existence is much harder. There
    > is some code trying to resolve .lnk files as well as code which stores
    > symlink info into NTFS extended attributes and also code implementing
    > links with BackupWrite. Everything probably inherited from Cygwin. How
    > should it be done in MSYS?

    It would be nice to display symlink info in "ls -l".
    Maybe a start would be updating the symlink_info::check method.
    I guess you should ensure the following system calls are aware of symlinks: stat, lstat and readlink (you can read about them in the POSIX standard link I gave above).

    Earnie writes:
    > If the OS and/or file system does not support CreateSymbolicLink then a
    > copy of the file is correct for what we need. Too many configure scripts
    > use ln -s to not have the fallback of copy by default.

    Autoconf currently has an AC_PROG_LN_S macro which will set the $(LN_S) variable to either "ln -s", "ln" or "cp -p", depending on the host capabilities. Given this, is there any longer a need for the symlink-as-copy fallback to be on by default on MSYS?

    Regards,
    Cesar

     
  • Ladislav Michl
    Ladislav Michl
    2010-08-17

    > Please see the source in autoload.cc. If you add a CreateSymbolicLink
    > declaration in there, you can call it in the code directly. You are
    > actually calling a stub code that will load the appropriate library by
    > demand, and either forward your call to the real function, or return FALSE
    > if it doesn't exists in your version of Windows.

    Very interesting, usefull and explains my questions. I added
    CreateSymbolicLink to the w32api (patch attached) and modified
    autoload.cc accordingly.

    > Please swap the order of the arguments of the symlink function itself as
    > well. According to the POSIX standard, the contents of the symbolic link
    > (frompath) is the first argument. To see it, type "symlink" on the search
    > box in the following page:
    > http://www.opengroup.org/onlinepubs/009695399/

    http://www.opengroup.org/onlinepubs/009695399/functions/symlink.html

    int symlink(const char *path1, const char *path2);

    The symlink() function shall create a symbolic link called path2 that
    contains the string pointed to by path1 ( path2 is the name of the
    symbolic link created, path1 is the string contained in the symbolic link).

    Well, my understanding is quite opposite and as implemented in
    patch - content of the symbolic link (topath) is the first argument.
    As symlink name points to symlink content.

    $ touch file
    $ ln -s file link
    $ ls -l
    -rw-r--r-- 1 ladis ladis 0 2010-08-17 14:52 file
    lrwxrwxrwx 1 ladis ladis 4 2010-08-17 14:52 link -> file

    in patched MSYS with debug enabled, you get:
    $ ln -s file link
    msys_symlink (file, link)
    create_copy_file (C:\MSYS\test\file, C:\MSYS\test\link)

    as you can see, symlink points from 'link' (second argument) to 'file'
    (first argument).

    > It would be nice to display symlink info in "ls -l".
    > Maybe a start would be updating the symlink_info::check method.

    I read symlink_info::check method yesterday as well as the rest of that file
    and now after more than 12 hours I'm able to comment it without showing
    too much emotions. Ummm... So, the only sane way how to update path.cc
    is to downwrite its desired function and then implement it as it keeps all
    cygwin logic which has no use for us and make code very hard to
    understand and modify (someone already tried to do so). I would
    very much appreciate if this could be done by someone more familiar with
    MSYS, but in the worst case I can do it myself (it will take me quite some
    time).

    > I guess you should ensure the following system calls are aware of
    > symlinks: stat, lstat and readlink (you can read about them in the POSIX
    > standard link I gave above).

    Actually the most important is unlink and remove as they are not junctions
    aware.

    > Autoconf currently has an AC_PROG_LN_S macro which will set the
    > $(LN_S) variable to either "ln -s", "ln" or "cp -p", depending on the
    > host capabilities. Given this, is there any longer a need for the
    > symlink-as-copy fallback to be on by default on MSYS?

    I modified the patch to provide some fallback logic, which is currently
    disabled and copy method used. Easy to fix once we came to a
    conclusion.

     
  • Ladislav Michl
    Ladislav Michl
    2010-08-17

    2nd version with fallback - decide how to set nofallback

     
    Attachments
  • Earnie Boyd
    Earnie Boyd
    2010-08-17

    >Given this, is there any longer a need for the symlink-as-copy fallback to be on by default on MSYS?

    Testing is the only way to know but I would guess it would still be needed.

     
  • Keith Marshall
    Keith Marshall
    2010-08-17

    >> Autoconf currently has an AC_PROG_LN_S macro
    >> which will set the $(LN_S) variable to either "ln -s",
    >> "ln" or "cp -p", depending on the host capabilities.
    >> Given this, is there any longer a need for the
    >> symlink-as-copy fallback to be on by default
    >> on MSYS?
    >
    > Testing is the only way to know but I would guess
    > it would still be needed.

    IMO, it is still very much needed. Certainly, there are many projects using autoconf, and provided they've used AC_PROG_LN_S, and substituted LN_S appropriately in their makefiles, then they will most likely DTRT. However, there is still an extensive corpus of projects which *don't* use autoconf, and thus may *not* DTRT when they use ln -s. Indeed, there are still many developers who continue to believe that they can write better configure scripts by hand, than autoconf can generate; (they can't, but convincing them of this fundamental reality is very difficult). We need MSYS to DTRT, when a developer blindly uses ln -s, without checking that it does, in fact, work.

     
  • Keith Marshall
    Keith Marshall
    2010-08-17

    >> Please swap the order of the arguments of the symlink function
    >> itself as well. According to the POSIX standard, the contents of
    >> the symbolic link (frompath) is the first argument. To see it,
    >> type "symlink" on the search box in the following page:
    >> http://www.opengroup.org/onlinepubs/009695399/
    >
    > http://www.opengroup.org/onlinepubs/009695399/functions/symlink.html
    >
    > int symlink(const char *path1, const char *path2);
    >
    > The symlink() function shall create a symbolic link called path2
    > that contains the string pointed to by path1 ( path2 is the name
    > of the symbolic link created, path1 is the string contained in the
    > symbolic link).
    >
    > Well, my understanding is quite opposite and as implemented in
    > patch - content of the symbolic link (topath) is the first argument.
    > As symlink name points to symlink content.

    It's hardly the epitome of clarity! The order of the arguments to the POSIX symlink() function is the same as used in the ln -s command; i.e. the first argument is the existing path name of the entity to which the link will refer, and the second specifies the link entity to be created. Logically, I would interpret a symlink as pointing from the created link entity to the original entity, and thus I would expect the terminology to refer to the first argument as 'topath' and the second as 'frompath'; i.e. my understanding matches Ladis' interpretation.

     
  • Ladislav Michl
    Ladislav Michl
    2010-08-17

    > It's hardly the epitome of clarity!

    As I have problem understanding the meaning of above sentence (irony?)
    I'd rather clarify: POSIX specification is accurate and I understand it the
    opposite way Cesar wrote.

    As I do not like mixing bugfix, cosmetic changes and actual new
    features is anyone interested in separate patch? Should I open
    new ticket? Should I do the same for w32api?

    > However, there is still an extensive corpus of projects which *don't*
    > use autoconf, and thus may *not* DTRT when they use ln -s. Indeed,
    > there are still many developers who continue to believe that they
    > can write better configure scripts by hand, than autoconf can
    > generate; (they can't, but convincing them of this fundamental
    > reality is very difficult).

    Here I assume DTRT means "Do The Right Thing". And, well, reality
    is that most people can't write proper configure script either way.
    I'm mostly cross-compiling and there are vast minority of packages
    which 'just work'. However, this does not mean MSYS should make
    things any worse.

    > We need MSYS to DTRT, when a developer blindly uses ln -s,
    > without checking that it does, in fact, work.

    Agree. My only concern is that fallback should be configurable.
    1) Symlink using (recursive) copy
    2) Use symlinks, fail with -EPERM if impossible (*)
    3) Use symlinks, fallback to 1) if impossible.

    (*) It would be more accurate to return -ENOTSUPP here, but
    Linux manpage states: "EPERM The file system containing
    newpath does not support the creation of symbolic links."
    and POSIX does not specify this situation at all.

    environ.cc already contains definition of winsymlinks variable,
    but its type is bool. Is it acceptable to change that?

    Yet noone wants to write path.cc functional analysis? I'll wait
    a bit longer... ;-)

     
  • Earnie Boyd
    Earnie Boyd
    2010-08-17

    > As I do not like mixing bugfix, cosmetic changes and actual new
    > features is anyone interested in separate patch? Should I open
    > new ticket? Should I do the same for w32api?

    Yes, these should be separated.

     
  • Ladislav Michl
    Ladislav Michl
    2010-08-18

    Created 3047566 and removed CreateSymbolicLink patch from here.
    Fix for msys_symlink argument order added as 3047571.
    I'll update this patch once merged.

     
  • Keith Marshall
    Keith Marshall
    2010-08-18

    Ladis,

    >> It's hardly the epitome of clarity!
    >
    > As I have problem understanding the meaning
    > of above sentence (irony?)

    No irony intended on my part, but perhaps ironic that you had difficulty understanding. I'm guessing that your problem may be with "epitome":
    http://oxforddictionaries.com/search?q=epitome

    Hence, even as a native English speaker I found the wording of the POSIX spec to be confusing, so little wonder that you and Cesar, (for both of whom English is, presumably, a second language), may have interpreted it differently. Just to confirm: having read it several times, and having sought additional clarification from the manpage on my Ubuntu box, I concur with your interpretation, rather than Cesar's.

    >> We need MSYS to DTRT, when a developer blindly uses
    >> ln -s, without checking that it does, in fact, work.
    >
    > Agree. My only concern is that fallback should be configurable.
    > 1) Symlink using (recursive) copy
    > 2) Use symlinks, fail with -EPERM if impossible (*)
    > 3) Use symlinks, fallback to 1) if impossible.

    To me, making it (somehow) configurable seems like overkill. Rather, perform just one consistent sequence of operations, in *all* cases:

    1) Attempt to create a (true) symbolic link; if the OS/FS cannot support that, the attempt will report failure, in which case we proceed to (2); if it succeeds, we are done, so immediately return, reporting success.

    2) If the entity to be linked to is a *file*, attempt to create a *hard* link; if that succeeds, then we are done and can return immediately; otherwise we proceed to (3).

    3) If the OS/FS supports reparse points for the type of entity to be linked, attempt to create one; return immediately on success, otherwise proceed to (4).

    4) If the entity to be linked is a file, substitute a simple file copy for the link; otherwise recursively replicate the directory structure, and attempt to hard link individual files into the appropriate duplicate location; substitute file copies, if hard linking fails.

    > Yet no one wants to write path.cc functional analysis?

    Personally, I am not sufficiently familiar with it to do that. Cesar is probably best equipped to do so, but he may not wish to devote the time it would take.

     
  • Cesar Strauss
    Cesar Strauss
    2010-08-18

    > Well, my understanding is quite opposite and as implemented in
    > patch - content of the symbolic link (topath) is the first argument.
    > As symlink name points to symlink content.

    You are right, sorry by the confusion. I was still thinking in terms of symlink-as-copy, where you copy *from* the original path *to* its new destination.

    >> We need MSYS to DTRT, when a developer blindly uses ln -s,
    >> without checking that it does, in fact, work.

    OK.

    > I would very much appreciate if this could be done by
    > someone more familiar with MSYS, but in the worst case
    > I can do it myself (it will take me quite some time).

    I can't say I am an expert on this part of the code, but I'll try to help. Unfortunately, I currently have access only to Windows XP and 95 installations, so I really have no way to test the symlink part.

    As for your previous question on the mailing list:
    > where to hook code supposed to run only once at dll startup?

    A place would be in dcrt0.cc (dll_crt0_1).

    Regards,
    Cesar

     
  • Earnie Boyd
    Earnie Boyd
    2010-08-18

    > 3) If the OS/FS supports reparse points for the type of entity to be
    > linked, attempt to create one; return immediately on success, otherwise
    > proceed to (4).

    Be careful here. I have been able to create reparse points for files and MSYS programs use the file but native programs cannot. We do not want to make use of an undocumented feature.

     
  • Keith Marshall
    Keith Marshall
    2010-08-18

    >> 3) If the OS/FS supports reparse points for the type of entity to
    >> be linked, attempt to create one; return immediately on success,
    >> otherwise proceed to (4).
    >
    > Be careful here. I have been able to create reparse points for files
    > and MSYS programs use the file but native programs cannot.
    > We do not want to make use of an undocumented feature.

    Okay, agreed. Let's rewrite (3):

    3) If the entity to be linked represents a directory, attempt to create a reparse point to represent the link; return immediately on success. In all other cases, proceed to (4).

     
  • Keith Marshall
    Keith Marshall
    2010-08-18

    > Okay, agreed. Let's rewrite (3):
    >
    > 3) If the entity to be linked represents a directory, attempt to create
    > a reparse point to represent the link; return immediately on success.
    > In all other cases, proceed to (4).

    On further reflection, POSIX semantics require creation of a symbolic link to fail if the 'frompath' entity already exists[*]. In this case, Windows should fail to create a reparse point, but we don't then want to proceed to (4) -- we just want to immediately *return* failure. However, just encapsulating that into (3) isn't sufficient; we actually want to insert an initial check *before* (1):

    0) If the path from which the link is to be created represents an existing entity, immediately return failure, otherwise proceed to (1).

    [*] unless we forcibly remove it first, as with ln -sf ...

     
  • Ladislav Michl
    Ladislav Michl
    2010-08-18

    cstrauss wrote:
    >> I would very much appreciate if this could be done by
    >> someone more familiar with MSYS, but in the worst case
    >> I can do it myself (it will take me quite some time).

    > I can't say I am an expert on this part of the code, but I'll try to help.
    > Unfortunately, I currently have access only to Windows XP and 95
    > installations, so I really have no way to test the symlink part.

    There is no need to have access to any windows box. What I need
    to know are requirements to path.cc implementation. It is quite
    hard job to write it down already, so I'll prepare msys-1.0.dll
    and test case and let users test it on various boxes.
    I'll probably start with cleaning path.cc to make further
    modifications easier, but it would be really helpfull to know
    which functionality is actually required by msys and
    implement it from scratch - this part is easy comparing
    to analysis...

    >As for your previous question on the mailing list:
    >> where to hook code supposed to run only once at dll startup?

    > A place would be in dcrt0.cc (dll_crt0_1).

    Things made some progress meanwhile, so it does not seem
    to be a need for any sort of startup code. But thanks anyway,
    I'll possibly use that knowledge in future contributions.

    keithmarshall wrote:
    > On further reflection, POSIX semantics require creation of a symbolic
    > link to fail if the 'frompath' entity already exists (unless we forcibly
    > remove it first, as with ln -sf ...)

    This functionality is already provided by current implementation. After
    Cesar decides what to do with symlink argument order patch, I'll
    sent an update based on cvs head...

    Now is probably a good time to unveil my secret plan for world
    domination, aka why do I want to ln -s optionally fail on filesystems
    not supporting it. As you probably noted, many projects provide
    mingw.org as a base for crosscompiler environments and most
    of them use some homebrew scrips to compile gcc and friends.
    (As a side note, this afternoon I read (hopefully) all relevant threads
    about mingw-w64 fork and I'm pretty dissapointed). These projects
    are pretty hard to maintain in long term, simply because there is
    not enough manpower. With PTXdist and OSELAS.Toolchain,
    building your own crosscompiler, boottable linux root filesystem
    image or just any other software package consisting of opensource
    projects is as simple as typing 'ptxdist go' - in case somebody
    provide relevant configuration. It already works pretty well on Linux
    and the only (well, the 'only' is an obvious euphemism here) thing
    which is preventing this tool to work efficiently on windows is
    symlink support. Therefore I want to know if ln -s works or 'DTRT'

     
  • Ladislav Michl
    Ladislav Michl
    2010-08-24

     
    Attachments
  • Ladislav Michl
    Ladislav Michl
    2010-08-24

    Added new version of patch against recent CVS. To test it just set mode to
    one of SYMLINK_MODE_NATIVE, SYMLINK_MODE_JUNCTION or SYMLINK_MODE_COPY.
    Once set to SYMLINK_MODE_NATIVE, you'll get behaviour described by
    keithmarshall at 2010-08-18 15:58:40 CEST.

    I'll work on making the rest of MSYS symlink aware and I'm dumping patch
    update here just in case someone get more spare time than myself and to let
    people test it out.

    Btw, MSYS' newlib has almost no unicode support and symlink code needs
    wcscpy, wcscat and wcslen. These are currently replaced with lstrcpyW,
    lstrcatW and lstrlenW, but as they are available since Windows 2000, it
    would be nice to provide their proper implementation. Also note that
    syscalls.cc already contains implementations of getw, putw, wcscmp, wcslen
    and wprintf - some of them are not properly implemented.
    Should unicode support come from recent version of newlib or from mingwrt?
    Am I missing anything?

     
  • Ladislav Michl
    Ladislav Michl
    2010-09-07

    Seems whole path_conv code could be simplified considerably by actually
    deleting (not just disabling) cygwin symlink stuff. Unfortunately I haven't
    found time for it yet, as it needs extensive testing. Perhaps I'll start
    sending easy to review incremental patches.

    Meanwhile as a proof of concept I updated msys\rt\src\newlib\libc\string
    content from newlib-1.18.0 including relevant header files. This allows
    cleaning up msys\rt\src\winsup\cygwin from homebrew functions. I checked
    cygwin repository and they are using recent version of newlib. Question is
    if we should import whole newer newlib library or just a required part.
    Note that recent newlib is huge - 13.5MB. Either way I'm ready to provide
    patch.

     
  • Earnie Boyd
    Earnie Boyd
    2010-09-07

    > Question is
    > if we should import whole newer newlib library or just a required part.
    > Note that recent newlib is huge - 13.5MB. Either way I'm ready to provide
    > patch.

    From what I remember you can only update the pieces of newlib for the pieces of Cygwin being modified. The versions between the two go hand in hand. There has been a lot of "let us move this code from Cygwin to newlib" since MSYS forked.

     
  • Michael Gooch
    Michael Gooch
    2012-08-26

    you could at the very least use the following command on supported systems by pushing it through cmd /c:

    MKLINK [[/D] | [/H] | [/J]] Link Target

    /D Creates a directory symbolic link. Default is a file
    symbolic link.
    /H Creates a hard link instead of a symbolic link.
    /J Creates a Directory Junction.
    Link specifies the new symbolic link name.
    Target specifies the path (relative or absolute) that the new link
    refers to.

     
  • Earnie Boyd
    Earnie Boyd
    2012-08-27

    > you could at the very least use the following command on supported systems
    > by pushing it through cmd /c:
    >
    > MKLINK [[/D] | [/H] | [/J]] Link Target
    >
    > /D Creates a directory symbolic link. Default is a file
    > symbolic link.
    > /H Creates a hard link instead of a symbolic link.
    > /J Creates a Directory Junction.
    > Link specifies the new symbolic link name.
    > Target specifies the path (relative or absolute) that the new link
    > refers to.

    I do this manually with

    cmd //c mklink //d LINKbar DIRfoo

    You remove that link with

    cmd //c rmdir LINKbar

    If you're linking a directory be sure you use the /d option or you end up with a pointer to a file and it is broken. You can have a link to a file via

    cmd //c mklink LINKbar FILEfoo

    To remove this link you simply to

    rm -f LINKbar

    To find if a directory or a file is a link use cmd's dir command

    cmd //c dir //x LINKbar

     
  • Keith Marshall
    Keith Marshall
    2012-08-27

    Sure, at CLI level a user may invoke the MKLINK command, when the platform supports it. However, that is NOT the way to make ln -s DTRT, when the platform provides symlink capability; the ONLY sane way to implement that is by calling the CreateSymbolicLink() function, via a hook in MSYS-1.0.DLL, when the system runtime DLL provides it.

    BTW, neither MKLINK nor CreateSymbolicLink() will succeed, even when available (on Vista anyway), for any user who is not an administrator, when he who is lacks the wit to grant the CreateSymbolicLink privilege to regular users; (*nix imposes no such draconian restriction on regular users). Thus, even when the CreateSymbolicLink() capability is supported, the MSYS-1.0.DLL interface will need to fall back to the previous way of doing things, in the event that the necessary privilege has not been granted.

     
1 2 > >> (Page 1 of 2)