#256 Full implementation of realpath() for Win32

closed-wont-fix
nobody
2014-08-26
2005-09-17
Nach M. S.
No

I noticed the other day that Win32 is the only port one
of my applications doesn't have access to realpath()
on. So I wrote realpath() for it, and I might as well
submit it so it becomes part of MinGW and I don't have
to make a special ifdef __WIN32__ for it ;)

The realpath() I wrote should be standards conforming.
It uses GetFullPathNameA() from Win32 as the underlying
means as getting the resolved path. It sets errno
appropriately as needed and even translates Win32
errors to the appropriate errno.

It also makes sure the path exists (via a call to
stat()) and supports the glibc extension to allocate
the resolved path buffer if it was passed in as null to
the call to realpath().

My tests show the function works quite well, but if
someone finds an error with it, I'll be happy to fix it.

I give full permission to MinGW and it's users to use
it, if I messed up on the usage rights in the file,
I'll be happy to change it too.

-Nach

Related

Issues: #2207

Discussion

  • Nach M. S.
    Nach M. S.
    2005-09-17

    realpath() for Win32

     
    Attachments
  • Keith Marshall
    Keith Marshall
    2005-12-02

    Alternative implementation

     
    Attachments
  • Keith Marshall
    Keith Marshall
    2005-12-02

    Logged In: YES
    user_id=823908

    Hmm. I normally just use

    #define realpath(N,R) _fullpath((R),(N),_MAX_PATH)

    if I need realpath().

    _fullpath() is a standard Microsoft function, which somewhat
    more closely resembles realpath() than GetFullPathNameA()
    does, although it still doesn't confirm that the referenced
    entity either exists, or is accessible.

    If you want a somewhat more SUSv3 conformant implementation
    of realpath(), you might consider the more compact version
    I've attached.

    Thanks for your patch, but I will not add it to MinGW in its
    present form. Note that for formal consideration, patches
    should be accompanied by an appropriate ChangeLog entry; in
    the case of a function such as realpath(), which is not a
    standard Microsoft function, and therfore not documented on
    MSDN, I would also like to see documentation, preferably in
    the form of a manpage, describing the function's usage, and
    any peculiarities of its behaviour in it's Win32 adaptation.
    You should also include a patch, in `diff -u' format, to
    add a suitable function prototype to an appropriate header file.

    Also note that, when you assign software to the public
    domain, then you implicitly relinquish any claim to
    copyright over that code. Thus, the preamble to your code
    should not claim copyright, while also asserting assignment
    to the public domain; rather say `Public domain software:
    written by M.S. Nach', or words to that effect, if you wish
    to be identified as the author, or simply donate it anonymously.

    Regards,
    Keith.

     
  • Nach M. S.
    Nach M. S.
    2005-12-14

    Logged In: YES
    user_id=695796

    >Hmm. I normally just use
    >
    > #define realpath(N,R) _fullpath((R),(N),_MAX_PATH)
    >
    >if I need realpath().
    >
    >_fullpath() is a standard Microsoft function, which >somewhat
    >more closely resembles realpath() than GetFullPathNameA()
    >does, although it still doesn't confirm that the
    >referenced
    >entity either exists, or is accessible.
    >
    >If you want a somewhat more SUSv3 conformant implementation
    >of realpath(), you might consider the more compact version
    >I've attached.
    That's very nice, but your implementation isn't standards
    conformant, close doesn't cut it. The one I've written makes
    sure the entity does exist and is accessable. Mine also sets
    errno properly.

    >in the case of a function such as realpath(), which is not
    >a standard Microsoft function, and therfore not documented
    >on MSDN, I would also like to see documentation, preferably
    >in the form of a manpage

    I thought you have access to the manpage for realpath, I'm
    attaching the Linux one I have which is basically what I cloned.

    >and any peculiarities of its behaviour in it's Win32
    >adaptation.

    In the case of Win32, the ELOOP won't ever be returned by my
    implementation since it doesn't do any checks for symbolic
    links. As far as I know, Win32 currently doesn't use
    symbolic links.

    >You should also include a patch, in `diff -u' format, to
    >add a suitable function prototype to an appropriate header
    >file.
    Not a problem.

    --- /usr/i586-mingw32/include/stdlib.h 2004-09-28
    22:56:17.000000000 +0200
    +++ stdlib.h 2005-12-14 15:25:07.000000000 +0200
    @@ -407,6 +407,7 @@
    _CRTIMP void __cdecl _makepath (char*, const char*, const
    char*, const char*, const char*);
    _CRTIMP void __cdecl _splitpath (const char*, char*,
    char*, char*, char*);
    _CRTIMP char* __cdecl _fullpath (char*, const char*, size_t);
    +_CRTIMP char* __cdecl realpath(const char *path, char
    resolved_path[PATH_MAX]);

    _CRTIMP char* __cdecl _itoa (int, char*, int);
    _CRTIMP char* __cdecl _ltoa (long, char*, int);

    >Also note that, when you assign software to the public
    >domain, then you implicitly relinquish any claim to
    >copyright over that code. Thus, the preamble to your code
    >should not claim copyright, while also asserting assignment
    >to the public domain; rather say `Public domain software:
    >written by M.S. Nach', or words to that effect, if you
    >wish to be identified as the author, or simply donate it
    >anonymously.

    Fine, I'll resubmit the file with a different header.

    -Nach

     
  • Nach M. S.
    Nach M. S.
    2005-12-14

    Realpath Manpage

     
    Attachments
  • Nach M. S.
    Nach M. S.
    2005-12-14

    realpath() for Win32 with corrected license

     
    Attachments
  • Logged In: YES
    user_id=1040098

    I'd just like to comment on your implementation n-a-c-h.

    * Firstly, the comment about ELOOP is interesting. Yes,
    current NTFS has limited support for both soft and hard
    links, which almost no Windows applications handle
    correctly, leading to all sorts of weird problems,
    including security problems. I wonder if realpath() for
    Win32 should do anything special. Furthermore, Vista will
    have more pronounced symlink support; I wonder what this
    will change. Anyway, there should be some documented
    rationale, at the very least, for whatever decision is
    made here. On the other hand, POSIX guarantees no
    particular semantics with regard to symbolic links.
    Still, something reasonable should happen.

    * The 'glibc extension' to handle a null resolved_filename
    is inviolation of the standard. POSIX states that it
    shall return EINVAL in that case.

    * You may consider adding 'restrict' as appropriate,
    conditionalized on C99.

    * Is PATH_MAX really the right constant to use here? In
    POSIX, there is no particular compile-time constant
    corresponding to the maximum path length, which may be
    infinity.

    * The behavior to malloc() a new buffer when PATH_MAX is
    not big enough is incorrect. This will almost certainly
    cause memory leaks. Outside of doing strlen() on the
    return value, how can client code determine whether malloc
    has been used, and consequently if free() needs to be
    called? Besides, theres no particular guarantee that the
    client pay attention to the return value of realpath() at
    all.

     
  • Nach M. S.
    Nach M. S.
    2005-12-15

    Logged In: YES
    user_id=695796

    >* The 'glibc extension' to handle a null
    >resolved_filename
    >is inviolation of the standard. POSIX states that it
    >shall return EINVAL in that case.
    Yes that is correct, I only added it because I saw an
    argument on the MinGW mailing list somewhere about
    implementing this a few months back, and one of the devs
    mentioned they wanted the glibc extension.
    It should be noted in the case where the extension fails, it
    does return EINVAL in my implementation.

    >* Is PATH_MAX really the right constant to use here? In
    >POSIX, there is no particular compile-time constant
    >corresponding to the maximum path length, which may be
    >infinity.
    That I'm not sure about, hence why I asked people to comment
    on errors and possible solutions with it.

    >* You may consider adding 'restrict' as appropriate,
    >conditionalized on C99.
    Where here is restrict appropriate? I honestly never looked
    much into that C99 addition. I'd be happy to hear more
    information on it.

    >* The behavior to malloc() a new buffer when PATH_MAX is
    >not big enough is incorrect. This will almost certainly
    >cause memory leaks. Outside of doing strlen() on the
    >return value, how can client code determine whether malloc
    >has been used, and consequently if free() needs to be
    >called? Besides, theres no particular guarantee that the
    >client pay attention to the return value of realpath() at
    >all.
    Perhaps you didn't understand what I did.
    The behavior to malloc a new buffer is when malloc(PATH_MAX)
    wasn't big enough. That is the behavior I saw somewhere on
    MSDN for using GetFullPathName*(). The user knows he has to
    call free, because he called realpath() with a 0 as his
    second parameter. I don't see how this would cause a memory
    leak. The worry about him requesting to allocate memory then
    not checking or freeing it is the same as if someone were to
    write:
    malloc(5000);
    Or
    strdup(mystr);
    We never worry about that, so I don't see the problem here.

    I'll be happy to see further link dicussions. Although the
    available C runtime for Windows seems very limited in it's
    support for things in these areas. Things like stat()'s
    st_ino are always 0 preventing checks to see if two files
    are the same, etc...

    -Nach

     
  • Logged In: YES
    user_id=1040098

    > Where here is restrict appropriate? I honestly
    > never looked much into that C99 addition. I'd be
    > happy to hear more information on it.

    In the general case, where (by the interface) two pointer
    parameters shall not alias. It provides a performance
    boost in some cases.

    In this specific case, POSIX (the modern version which is
    based on C99) specifies that both parameters shall be
    restrict-qualified.

    > Perhaps you didn't understand what I did.

    You're right; I didn't. Nonetheless, I am still concerned
    about how this behavior that differs from standard-
    mandated behavior.

    Another interesting point is that most software that needs
    this function is broken or legacy. Without some operating
    system-specific guarantee, the standard version of this
    function is difficult to use in a strictly-conforming
    manner. This, combined with how past implementations
    (including glibc's) have invariably had multiple serious
    bugs, means most software ships their own implementation
    of this. I am curious what software there is that depends
    specifically on glibc's extensions.

     
  • Nach M. S.
    Nach M. S.
    2005-12-15

    Logged In: YES
    user_id=695796

    >Another interesting point is that most software that needs
    >this function is broken or legacy. Without some operating
    >system-specific guarantee, the standard version of this
    >function is difficult to use in a strictly-conforming
    >manner. This, combined with how past implementations
    >(including glibc's) have invariably had multiple serious
    >bugs, means most software ships their own implementation
    >of this. I am curious what software there is that depends
    >specifically on glibc's extensions.

    I have a program which I currently have it running and
    tested on at least MS-DOS, Windows, Linux, FreeBSD, and Mac
    OS X. It doesn't use the glibc extension (which personally
    makes no sense - unless it becomes part of POSIX), and works
    fine everywhere. My only problem was that I had to make my
    own for Windows, everything else worked fine.

    As to why I use it; basically my program sometimes needs to
    launch another program passing it a file, since the program
    to launch could be located in one directory and the file in
    another and the current directory of the running program
    somewhere else, I pass the absolute path of the file to the
    program being launched.

    -Nach

     
  • Keith Marshall
    Keith Marshall
    2005-12-19

    Logged In: YES
    user_id=823908

    Can you provide a simple test case, which will demonstrate
    why your implementation is any more standards conformant
    than mine?

    If you look closely, you will note that mine also *does*
    confirm that the entity both exists and is accessible, by
    use of an access() call for the *input* path name -- which
    is what SUSv3 requires to be checked -- and this will also
    set errno appropriately.

    Some further comments:--

    * the extension to support auto-allocation of a return
    buffer is *not* in violation of SUSv3 -- this is now allowed
    as an implementation dependent option, and may even be
    mandated in a future version. Nevertheless, since it isn't
    yet mandatory, and certainly isn't portable, it is probably
    better not to support it in a current implementation.

    * If we agree that the above extension is not to be
    supported, then a realpath implementation can become even
    simpler than my earlier proposal; all that is required is to
    check *both* arguments are non-NULL, setting EINVAL if
    either fails; if both pass, perform an access() check on the
    input path name, returning with errno set by access() on
    failure; if this also passes, return its _fullpath()
    resolution, also setting errno appropriately, if this
    happens to be NULL.

    * The ELOOP status is meaningless on Win32, which AFAIK does
    *not* support any form of symbolic link. NTFS supports hard
    links, but there can be no concept of circular link
    references with a hard link. The nearest Win32 gets to
    providing a symbolic link is a "shortcut", but a Win32
    shortcut is *not* a symbolic link, in the *nix sense;
    therefore, there are no circumstances under which a Win32
    implementation of realpath() should need to raise an ELOOP
    condition.

    * The correct -- Microsoft defined -- constant for
    specifying the required buffer size is _MAX_PATH, which is
    defined in stdlib.h.

    * I don't think it appropriate to include
    `resolved_path[PATH_MAX]' in the function prototype -- it
    should be as specified by SUSv3, simply as `char *'; (you
    have no guarantee that the user will include any header
    which defines PATH_MAX, so your prototype may be broken, at
    the point of reference.

    * I *did* try the restrict qualifier, as specified in SUSv3,
    for my implementation, but my MinGW gcc 3.3.2 choked on it;
    perhaps I didn't get the usage right, as I've never used it
    before -- I'll need to explore this further.

    * simply providing an exact copy of a Linux manpage is *not*
    appropriate, as documentation for any Win32 function
    implementation; the manpage must be adapted to correctly
    reflect the details, and limitations, which are specific to
    the Win32 implementation.

    Regards,
    Keith.

     
  • Keith Marshall
    Keith Marshall
    2005-12-20

    Logged In: YES
    user_id=823908

    I've now modified my implementation to use `restrict' for
    the two arguments, as specified by SUSv3.

    I've also provided a trivial test case, which proves that my
    implementation *does* handle all of those error conditions
    specified by SUSv3, which make sense for the Win32
    environment, and *does* set errno appropriately, (with the
    exception of passing a NULL pointer for the input path name
    argument).

    I've not removed the capability for allocating a return
    buffer, if it's passed as NULL by the caller; a further
    reading of SUSv3 confirms:

    * This is *entirely* standards *conformant*, for the
    standard explicitly states that handling of this condition
    is implementation dependent.

    * The realpath function itself is *not* a POSIX standard; it
    is an X/Open System Interfaces Extension to IEEE 1003.1-2004.

    It may also be noted that allowing for auto-allocation of a
    return buffer makes the implementation more robust, in spite
    of the lack of portability of this behaviour; I'm not
    convinced that it should be excluded.

    Regards,
    Keith.

     
  • Keith Marshall
    Keith Marshall
    2005-12-20

    Revised implementation with test case

     
    Attachments
  • Logged In: YES
    user_id=1040098

    HI Keith,

    > the extension to support auto-allocation of a return
    > buffer is *not* in violation of SUSv3

    Can you tell me where you are reading this? I've looked
    at both SUSv3 (Issue 6) and IEEE 1003.1 (2004), and none
    of them mention implementation-defined behavior here.

    What I'm reading says, "The realpath() function shall fail
    if ... [EINVAL] The file_name argument is a null
    pointer." Note that it is in the 'shall' section, not
    the 'may' section, the latter of which denotes optional
    implementation-defined behavior. Is there somewhere else
    in the standard that this is discussed, or a newer version?

    You're right that this isn't part of core POSIX; however,
    I am skeptical of adding XSI-specified functions contrary
    to the specifications of XSI, regardless of what any other
    single particular implementation (glibc2) chooses to do.

    > The ELOOP status is meaningless on Win32, which
    > AFAIK does *not* support any form of symbolic link

    Current NTFS is able to partially support them
    through "reparse points." A reparse point is a special
    file that has additional filesystem-specific semantics
    that are handled by the back-end filesystem driver or
    filter. Unfortunately, complete documentation is only
    availible as part of the IFS kit, which is quite expensive.

    While reparse points themselves are fully capable of doing
    anything symbolic links can, presently (in WinXP) NTFS is
    only capable of supporting directory symbolic links.
    (Note that these are soft links, not hard links.) They
    are commonly referred to as 'junction points,' and there
    is an official Microsoft tool floating around for
    manipulating them, as well as several unofficial.

    I actually have no idea what GetFullPathName() does when
    it hits a loop in reparse points, although I suspect it
    sets a particular GetLastError() value that can be
    translated into an errno value. An easy way to figure out
    would be to set one of these up, and see what happens. :)
    Unfortunately, I'm about to leave on vacation, so I can't
    follow up with this.

    Since ELOOP *is* a manditory error, I think it is somewhat
    important that this issue is handled, if only in
    documentation saying it is not supported.

    Regarding _MAX_PATH, I'm not really sure why it would be
    any more appropriate than MAX_PATH. Actually, I was
    informed the other day that if MAX_PATH is defined by the
    implementation, then it is considered to be the value of
    {MAX_PATH}. (It's the case where its not defined that
    realpath() goes to crap.)

    I haven't looked at Keith's implementation because I don't
    have access to gzip here, but I would suspect that once
    the correct semantics are decided-on, the two
    implementations would converge.

     
  • Logged In: YES
    user_id=1040098

    Scratch my comments regarding resolved_name being null.
    As should be obvious now, I was looking at the wrong
    parameter. :)

    POSIX says, "If resolved_name is a null pointer, the
    behavior of realpath() is implementation-defined."

    And, since there is obviously a significant amount of
    software that depends on the resolved_name==0 feature
    (despite this probably being a bad idea), I'd say that I
    support it being implemented, so long as it is documented.

     
  • Keith Marshall
    Keith Marshall
    2005-12-22

    Logged In: YES
    user_id=823908

    Just for clarification: when I said previously that I
    didn't check for a NULL input file_name pointer, I was
    referring only to the Q&D test case I threw together. My
    actual realpath() implementation does start out by checking
    for this condition, returning NULL with errno = EINVAL, if
    it finds it. I simply neglected to include a test case to
    prove correct operation of this -- should be trivial to add
    though.

    Regarding NTFS Reparse Points
    -----------------------------

    I confess that I hadn't considered the possible effect of
    reparse points, and their passing resemblance to symbolic
    links. How will Microsoft's own access() and _fullpath()
    functions behave, if presented with a path name which
    dereferences a circularly defined reparse point? Will they
    set errno to ELOOP, as they should, or provide some other
    indication of the condition? (Sorry, but I don't see how I
    can test that myself, cross-compiling from GNU/Linux and
    running any test under WINE -- my underlying file system is
    ext2, not NTFS).

    If Microsoft's runtime routines provide suitable ELOOP
    detection support, then any MinGW implementation of
    realpath() should exploit it, on those Win32 variants where
    it may be applicable; otherwise, we should simply document,
    in the manpage, that the condition is either not relevant,
    or detection of it is not supported.

    It is, perhaps, worth noting that neither MSYS nor MinGW
    provide any specific mechanism for identifying, nor for
    manipulating reparse points; it is, of course, possible that
    a user written MinGW application could do so, through an
    appropriate Microsoft API.

    Regarding the Handling of a NULL Return Arg Pointer
    ---------------------------------------------------

    I wonder if there really is any software which relies on the
    automatic allocation of a return buffer, if passed a NULL
    pointer for the `resolved_path' argument; any software
    written this way would not be robust, since there is no
    guarantee that the feature is supported by any target
    implementation of realpath(). That said, however, I think
    that supporting this feature does make for a more robust
    implementation, since it absolves the programmer from the
    chore of deciding what buffer size may be necessary --
    unless you are also going to provide an implementation of
    pathconf().

    There must come a point where we stop trying to make
    MS-Windows behave like a POSIX system -- maybe realpath() is
    already a step too far. Microsoft themselves provide
    _fullpath(), which may be used in place of realpath(),
    unless you require the check for existence and accessibility
    of the referenced entity, which needs the addition of an
    access() call. _fullpath() *does* support the allocation of
    a return buffer if passed a NULL pointer, in which case it
    allocates _MAX_PATH bytes, where _MAX_PATH is defined, (by
    Microsoft, according to my interpretation of the relevant
    MSDN documentation), in stdlib.h, together with the
    prototype for _fullpath(), (and MinGW reproduces this).
    Since my implementation of realpath() uses _fullpath(), I
    suggest that _MAX_PATH is the appropriate buffer size to
    request, in preference to PATH_MAX, which MinGW defines in
    limits.h, and Microsoft may not define at all. (BTW,
    current MinGW defines PATH_MAX as 259, whereas _MAX_PATH is
    defined as 260 -- I don't know why).

    Just one further point: realpath() is a buffer overrun
    waiting to happen. The function design is flawed, since it
    provides no mechanism for preventing the returned string
    from growing beyond the end of the user supplied buffer --
    the function has no way of knowing what size that buffer is.
    In this respect, Microsoft's _fullpath() interface is much
    better designed, since it provides a third parameter to
    specify the buffer size.

    Regards,
    Keith.

     
  • Logged In: YES
    user_id=1040098

    Sounds good. I will look at the reparse point loop stuff
    when I return from vacation in about two weeks, unless
    someone else posts some info about this before me.

    I have a request, though. I think someone should figure
    out what the deal with PATH_MAX is, and if at all
    possible, get it in alignment with _MAX_PATH. Not
    necessarily the same.. just in alignment, if they have
    different semantics.

    I think this is important, because POSIX says that
    PATH_MAX is {PATH_MAX} if it is defined, and {PATH_MAX} is
    what realpath() is required to use.

     
  • Keith Marshall
    Keith Marshall
    2005-12-23

    Logged In: YES
    user_id=823908

    Agreed, we should resolve the inconsistency between PATH_MAX
    and _MAX_PATH. I'll pursue the issue on mingw-dvlpr.

    Thanks, Aaron, for your offer to investigate the reparse
    point feature. It will be great if you can follow this up.

    Regards,
    Keith.

     
  • Where do we stand with this patch?

     
  • Keith Marshall
    Keith Marshall
    2009-12-07

    > Where do we stand with this patch?

    As we left it 4 years ago; no one has provided any further feed back on the likely impact of a junction point loop.

    WRT the PATH_MAX issue, I think that should be correctly redefined as 260; I believe Cygwin made that change some time ago.

    An appropriately written manpage is still outstanding. There seems to be little demand for addition of realpath(), and it definitely should not be progressed further, without appropriate (implementation specific) documentation.

     
  • Earnie Boyd
    Earnie Boyd
    2009-12-09

    • status: open --> pending-wont-fix
     
  • Earnie Boyd
    Earnie Boyd
    2009-12-09

    junction point loops are definitely possible. I have created them. Since we don't wish to create a UNIX runtime then I think we should abandon this patch and mark it as won't fix.

     
  • Keith Marshall
    Keith Marshall
    2009-12-11

    Even if we abandon the patch, we should still address the _MAX_PATH vs. PATH_MAX anomaly; both should have the same value (260). (POSIX is explicit: PATH_MAX is the maximum number of characters allowed in a path name, *including* the space required to accommodate the terminating NUL; making it less than _MAX_PATH exposes a buffer overrun potential, due to insufficient buffer allocation of PATH_MAX bytes, when client code expects to have _MAX_PATH bytes available).

    It would be a shame to completely drop this implementation. We may agree that there is insufficient justification for including it within libmingwex.a, but there could still be mileage in adding the prototype to stdlib.h, with a comment to the effect that the user needs to provide the implementation. Then we could create a "cookbook" of useful resources, (perhaps as a CVS module on SF, or free-standing on MinGW.org), and store the source code there, as an example of a suitable implementation, (with a FIXME added, to highlight the lack of handling for possible junction point loops).

    Just thinking aloud...

     
  • This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 30 days (the time period specified by
    the administrator of this Tracker).

     
    • status: pending-wont-fix --> closed-wont-fix