I noticed the other day that Win32 is the only port one
of my applications doesn't have access to realpath()
on. So I wrote realpath() for it, and I might as well
submit it so it becomes part of MinGW and I don't have
to make a special ifdef __WIN32__ for it ;)
The realpath() I wrote should be standards conforming.
It uses GetFullPathNameA() from Win32 as the underlying
means as getting the resolved path. It sets errno
appropriately as needed and even translates Win32
errors to the appropriate errno.
It also makes sure the path exists (via a call to
stat()) and supports the glibc extension to allocate
the resolved path buffer if it was passed in as null to
the call to realpath().
My tests show the function works quite well, but if
someone finds an error with it, I'll be happy to fix it.
I give full permission to MinGW and it's users to use
it, if I messed up on the usage rights in the file,
I'll be happy to change it too.
-Nach
realpath() for Win32
Alternative implementation
Logged In: YES
user_id=823908
Hmm. I normally just use
#define realpath(N,R) _fullpath((R),(N),_MAX_PATH)
if I need realpath().
_fullpath() is a standard Microsoft function, which somewhat
more closely resembles realpath() than GetFullPathNameA()
does, although it still doesn't confirm that the referenced
entity either exists, or is accessible.
If you want a somewhat more SUSv3 conformant implementation
of realpath(), you might consider the more compact version
I've attached.
Thanks for your patch, but I will not add it to MinGW in its
present form. Note that for formal consideration, patches
should be accompanied by an appropriate ChangeLog entry; in
the case of a function such as realpath(), which is not a
standard Microsoft function, and therfore not documented on
MSDN, I would also like to see documentation, preferably in
the form of a manpage, describing the function's usage, and
any peculiarities of its behaviour in it's Win32 adaptation.
You should also include a patch, in `diff -u' format, to
add a suitable function prototype to an appropriate header file.
Also note that, when you assign software to the public
domain, then you implicitly relinquish any claim to
copyright over that code. Thus, the preamble to your code
should not claim copyright, while also asserting assignment
to the public domain; rather say `Public domain software:
written by M.S. Nach', or words to that effect, if you wish
to be identified as the author, or simply donate it anonymously.
Regards,
Keith.
Logged In: YES
user_id=695796
>Hmm. I normally just use
>
> #define realpath(N,R) _fullpath((R),(N),_MAX_PATH)
>
>if I need realpath().
>
>_fullpath() is a standard Microsoft function, which >somewhat
>more closely resembles realpath() than GetFullPathNameA()
>does, although it still doesn't confirm that the
>referenced
>entity either exists, or is accessible.
>
>If you want a somewhat more SUSv3 conformant implementation
>of realpath(), you might consider the more compact version
>I've attached.
That's very nice, but your implementation isn't standards
conformant, close doesn't cut it. The one I've written makes
sure the entity does exist and is accessable. Mine also sets
errno properly.
>in the case of a function such as realpath(), which is not
>a standard Microsoft function, and therfore not documented
>on MSDN, I would also like to see documentation, preferably
>in the form of a manpage
I thought you have access to the manpage for realpath, I'm
attaching the Linux one I have which is basically what I cloned.
>and any peculiarities of its behaviour in it's Win32
>adaptation.
In the case of Win32, the ELOOP won't ever be returned by my
implementation since it doesn't do any checks for symbolic
links. As far as I know, Win32 currently doesn't use
symbolic links.
>You should also include a patch, in `diff -u' format, to
>add a suitable function prototype to an appropriate header
>file.
Not a problem.
--- /usr/i586-mingw32/include/stdlib.h 2004-09-28
22:56:17.000000000 +0200
+++ stdlib.h 2005-12-14 15:25:07.000000000 +0200
@@ -407,6 +407,7 @@
_CRTIMP void __cdecl _makepath (char*, const char*, const
char*, const char*, const char*);
_CRTIMP void __cdecl _splitpath (const char*, char*,
char*, char*, char*);
_CRTIMP char* __cdecl _fullpath (char*, const char*, size_t);
+_CRTIMP char* __cdecl realpath(const char *path, char
resolved_path[PATH_MAX]);
_CRTIMP char* __cdecl _itoa (int, char*, int);
_CRTIMP char* __cdecl _ltoa (long, char*, int);
>Also note that, when you assign software to the public
>domain, then you implicitly relinquish any claim to
>copyright over that code. Thus, the preamble to your code
>should not claim copyright, while also asserting assignment
>to the public domain; rather say `Public domain software:
>written by M.S. Nach', or words to that effect, if you
>wish to be identified as the author, or simply donate it
>anonymously.
Fine, I'll resubmit the file with a different header.
-Nach
Realpath Manpage
realpath() for Win32 with corrected license
Logged In: YES
user_id=1040098
I'd just like to comment on your implementation n-a-c-h.
* Firstly, the comment about ELOOP is interesting. Yes,
current NTFS has limited support for both soft and hard
links, which almost no Windows applications handle
correctly, leading to all sorts of weird problems,
including security problems. I wonder if realpath() for
Win32 should do anything special. Furthermore, Vista will
have more pronounced symlink support; I wonder what this
will change. Anyway, there should be some documented
rationale, at the very least, for whatever decision is
made here. On the other hand, POSIX guarantees no
particular semantics with regard to symbolic links.
Still, something reasonable should happen.
* The 'glibc extension' to handle a null resolved_filename
is inviolation of the standard. POSIX states that it
shall return EINVAL in that case.
* You may consider adding 'restrict' as appropriate,
conditionalized on C99.
* Is PATH_MAX really the right constant to use here? In
POSIX, there is no particular compile-time constant
corresponding to the maximum path length, which may be
infinity.
* The behavior to malloc() a new buffer when PATH_MAX is
not big enough is incorrect. This will almost certainly
cause memory leaks. Outside of doing strlen() on the
return value, how can client code determine whether malloc
has been used, and consequently if free() needs to be
called? Besides, theres no particular guarantee that the
client pay attention to the return value of realpath() at
all.
Logged In: YES
user_id=695796
>* The 'glibc extension' to handle a null
>resolved_filename
>is inviolation of the standard. POSIX states that it
>shall return EINVAL in that case.
Yes that is correct, I only added it because I saw an
argument on the MinGW mailing list somewhere about
implementing this a few months back, and one of the devs
mentioned they wanted the glibc extension.
It should be noted in the case where the extension fails, it
does return EINVAL in my implementation.
>* Is PATH_MAX really the right constant to use here? In
>POSIX, there is no particular compile-time constant
>corresponding to the maximum path length, which may be
>infinity.
That I'm not sure about, hence why I asked people to comment
on errors and possible solutions with it.
>* You may consider adding 'restrict' as appropriate,
>conditionalized on C99.
Where here is restrict appropriate? I honestly never looked
much into that C99 addition. I'd be happy to hear more
information on it.
>* The behavior to malloc() a new buffer when PATH_MAX is
>not big enough is incorrect. This will almost certainly
>cause memory leaks. Outside of doing strlen() on the
>return value, how can client code determine whether malloc
>has been used, and consequently if free() needs to be
>called? Besides, theres no particular guarantee that the
>client pay attention to the return value of realpath() at
>all.
Perhaps you didn't understand what I did.
The behavior to malloc a new buffer is when malloc(PATH_MAX)
wasn't big enough. That is the behavior I saw somewhere on
MSDN for using GetFullPathName*(). The user knows he has to
call free, because he called realpath() with a 0 as his
second parameter. I don't see how this would cause a memory
leak. The worry about him requesting to allocate memory then
not checking or freeing it is the same as if someone were to
write:
malloc(5000);
Or
strdup(mystr);
We never worry about that, so I don't see the problem here.
I'll be happy to see further link dicussions. Although the
available C runtime for Windows seems very limited in it's
support for things in these areas. Things like stat()'s
st_ino are always 0 preventing checks to see if two files
are the same, etc...
-Nach
Logged In: YES
user_id=1040098
> Where here is restrict appropriate? I honestly
> never looked much into that C99 addition. I'd be
> happy to hear more information on it.
In the general case, where (by the interface) two pointer
parameters shall not alias. It provides a performance
boost in some cases.
In this specific case, POSIX (the modern version which is
based on C99) specifies that both parameters shall be
restrict-qualified.
> Perhaps you didn't understand what I did.
You're right; I didn't. Nonetheless, I am still concerned
about how this behavior that differs from standard-
mandated behavior.
Another interesting point is that most software that needs
this function is broken or legacy. Without some operating
system-specific guarantee, the standard version of this
function is difficult to use in a strictly-conforming
manner. This, combined with how past implementations
(including glibc's) have invariably had multiple serious
bugs, means most software ships their own implementation
of this. I am curious what software there is that depends
specifically on glibc's extensions.
Logged In: YES
user_id=695796
>Another interesting point is that most software that needs
>this function is broken or legacy. Without some operating
>system-specific guarantee, the standard version of this
>function is difficult to use in a strictly-conforming
>manner. This, combined with how past implementations
>(including glibc's) have invariably had multiple serious
>bugs, means most software ships their own implementation
>of this. I am curious what software there is that depends
>specifically on glibc's extensions.
I have a program which I currently have it running and
tested on at least MS-DOS, Windows, Linux, FreeBSD, and Mac
OS X. It doesn't use the glibc extension (which personally
makes no sense - unless it becomes part of POSIX), and works
fine everywhere. My only problem was that I had to make my
own for Windows, everything else worked fine.
As to why I use it; basically my program sometimes needs to
launch another program passing it a file, since the program
to launch could be located in one directory and the file in
another and the current directory of the running program
somewhere else, I pass the absolute path of the file to the
program being launched.
-Nach
Logged In: YES
user_id=823908
Can you provide a simple test case, which will demonstrate
why your implementation is any more standards conformant
than mine?
If you look closely, you will note that mine also *does*
confirm that the entity both exists and is accessible, by
use of an access() call for the *input* path name -- which
is what SUSv3 requires to be checked -- and this will also
set errno appropriately.
Some further comments:--
* the extension to support auto-allocation of a return
buffer is *not* in violation of SUSv3 -- this is now allowed
as an implementation dependent option, and may even be
mandated in a future version. Nevertheless, since it isn't
yet mandatory, and certainly isn't portable, it is probably
better not to support it in a current implementation.
* If we agree that the above extension is not to be
supported, then a realpath implementation can become even
simpler than my earlier proposal; all that is required is to
check *both* arguments are non-NULL, setting EINVAL if
either fails; if both pass, perform an access() check on the
input path name, returning with errno set by access() on
failure; if this also passes, return its _fullpath()
resolution, also setting errno appropriately, if this
happens to be NULL.
* The ELOOP status is meaningless on Win32, which AFAIK does
*not* support any form of symbolic link. NTFS supports hard
links, but there can be no concept of circular link
references with a hard link. The nearest Win32 gets to
providing a symbolic link is a "shortcut", but a Win32
shortcut is *not* a symbolic link, in the *nix sense;
therefore, there are no circumstances under which a Win32
implementation of realpath() should need to raise an ELOOP
condition.
* The correct -- Microsoft defined -- constant for
specifying the required buffer size is _MAX_PATH, which is
defined in stdlib.h.
* I don't think it appropriate to include
`resolved_path[PATH_MAX]' in the function prototype -- it
should be as specified by SUSv3, simply as `char *'; (you
have no guarantee that the user will include any header
which defines PATH_MAX, so your prototype may be broken, at
the point of reference.
* I *did* try the restrict qualifier, as specified in SUSv3,
for my implementation, but my MinGW gcc 3.3.2 choked on it;
perhaps I didn't get the usage right, as I've never used it
before -- I'll need to explore this further.
* simply providing an exact copy of a Linux manpage is *not*
appropriate, as documentation for any Win32 function
implementation; the manpage must be adapted to correctly
reflect the details, and limitations, which are specific to
the Win32 implementation.
Regards,
Keith.
Logged In: YES
user_id=823908
I've now modified my implementation to use `restrict' for
the two arguments, as specified by SUSv3.
I've also provided a trivial test case, which proves that my
implementation *does* handle all of those error conditions
specified by SUSv3, which make sense for the Win32
environment, and *does* set errno appropriately, (with the
exception of passing a NULL pointer for the input path name
argument).
I've not removed the capability for allocating a return
buffer, if it's passed as NULL by the caller; a further
reading of SUSv3 confirms:
* This is *entirely* standards *conformant*, for the
standard explicitly states that handling of this condition
is implementation dependent.
* The realpath function itself is *not* a POSIX standard; it
is an X/Open System Interfaces Extension to IEEE 1003.1-2004.
It may also be noted that allowing for auto-allocation of a
return buffer makes the implementation more robust, in spite
of the lack of portability of this behaviour; I'm not
convinced that it should be excluded.
Regards,
Keith.
Revised implementation with test case
Logged In: YES
user_id=1040098
HI Keith,
> the extension to support auto-allocation of a return
> buffer is *not* in violation of SUSv3
Can you tell me where you are reading this? I've looked
at both SUSv3 (Issue 6) and IEEE 1003.1 (2004), and none
of them mention implementation-defined behavior here.
What I'm reading says, "The realpath() function shall fail
if ... [EINVAL] The file_name argument is a null
pointer." Note that it is in the 'shall' section, not
the 'may' section, the latter of which denotes optional
implementation-defined behavior. Is there somewhere else
in the standard that this is discussed, or a newer version?
You're right that this isn't part of core POSIX; however,
I am skeptical of adding XSI-specified functions contrary
to the specifications of XSI, regardless of what any other
single particular implementation (glibc2) chooses to do.
> The ELOOP status is meaningless on Win32, which
> AFAIK does *not* support any form of symbolic link
Current NTFS is able to partially support them
through "reparse points." A reparse point is a special
file that has additional filesystem-specific semantics
that are handled by the back-end filesystem driver or
filter. Unfortunately, complete documentation is only
availible as part of the IFS kit, which is quite expensive.
While reparse points themselves are fully capable of doing
anything symbolic links can, presently (in WinXP) NTFS is
only capable of supporting directory symbolic links.
(Note that these are soft links, not hard links.) They
are commonly referred to as 'junction points,' and there
is an official Microsoft tool floating around for
manipulating them, as well as several unofficial.
I actually have no idea what GetFullPathName() does when
it hits a loop in reparse points, although I suspect it
sets a particular GetLastError() value that can be
translated into an errno value. An easy way to figure out
would be to set one of these up, and see what happens. :)
Unfortunately, I'm about to leave on vacation, so I can't
follow up with this.
Since ELOOP *is* a manditory error, I think it is somewhat
important that this issue is handled, if only in
documentation saying it is not supported.
Regarding _MAX_PATH, I'm not really sure why it would be
any more appropriate than MAX_PATH. Actually, I was
informed the other day that if MAX_PATH is defined by the
implementation, then it is considered to be the value of
{MAX_PATH}. (It's the case where its not defined that
realpath() goes to crap.)
I haven't looked at Keith's implementation because I don't
have access to gzip here, but I would suspect that once
the correct semantics are decided-on, the two
implementations would converge.
Logged In: YES
user_id=1040098
Scratch my comments regarding resolved_name being null.
As should be obvious now, I was looking at the wrong
parameter. :)
POSIX says, "If resolved_name is a null pointer, the
behavior of realpath() is implementation-defined."
And, since there is obviously a significant amount of
software that depends on the resolved_name==0 feature
(despite this probably being a bad idea), I'd say that I
support it being implemented, so long as it is documented.
Logged In: YES
user_id=823908
Just for clarification: when I said previously that I
didn't check for a NULL input file_name pointer, I was
referring only to the Q&D test case I threw together. My
actual realpath() implementation does start out by checking
for this condition, returning NULL with errno = EINVAL, if
it finds it. I simply neglected to include a test case to
prove correct operation of this -- should be trivial to add
though.
Regarding NTFS Reparse Points
-----------------------------
I confess that I hadn't considered the possible effect of
reparse points, and their passing resemblance to symbolic
links. How will Microsoft's own access() and _fullpath()
functions behave, if presented with a path name which
dereferences a circularly defined reparse point? Will they
set errno to ELOOP, as they should, or provide some other
indication of the condition? (Sorry, but I don't see how I
can test that myself, cross-compiling from GNU/Linux and
running any test under WINE -- my underlying file system is
ext2, not NTFS).
If Microsoft's runtime routines provide suitable ELOOP
detection support, then any MinGW implementation of
realpath() should exploit it, on those Win32 variants where
it may be applicable; otherwise, we should simply document,
in the manpage, that the condition is either not relevant,
or detection of it is not supported.
It is, perhaps, worth noting that neither MSYS nor MinGW
provide any specific mechanism for identifying, nor for
manipulating reparse points; it is, of course, possible that
a user written MinGW application could do so, through an
appropriate Microsoft API.
Regarding the Handling of a NULL Return Arg Pointer
---------------------------------------------------
I wonder if there really is any software which relies on the
automatic allocation of a return buffer, if passed a NULL
pointer for the `resolved_path' argument; any software
written this way would not be robust, since there is no
guarantee that the feature is supported by any target
implementation of realpath(). That said, however, I think
that supporting this feature does make for a more robust
implementation, since it absolves the programmer from the
chore of deciding what buffer size may be necessary --
unless you are also going to provide an implementation of
pathconf().
There must come a point where we stop trying to make
MS-Windows behave like a POSIX system -- maybe realpath() is
already a step too far. Microsoft themselves provide
_fullpath(), which may be used in place of realpath(),
unless you require the check for existence and accessibility
of the referenced entity, which needs the addition of an
access() call. _fullpath() *does* support the allocation of
a return buffer if passed a NULL pointer, in which case it
allocates _MAX_PATH bytes, where _MAX_PATH is defined, (by
Microsoft, according to my interpretation of the relevant
MSDN documentation), in stdlib.h, together with the
prototype for _fullpath(), (and MinGW reproduces this).
Since my implementation of realpath() uses _fullpath(), I
suggest that _MAX_PATH is the appropriate buffer size to
request, in preference to PATH_MAX, which MinGW defines in
limits.h, and Microsoft may not define at all. (BTW,
current MinGW defines PATH_MAX as 259, whereas _MAX_PATH is
defined as 260 -- I don't know why).
Just one further point: realpath() is a buffer overrun
waiting to happen. The function design is flawed, since it
provides no mechanism for preventing the returned string
from growing beyond the end of the user supplied buffer --
the function has no way of knowing what size that buffer is.
In this respect, Microsoft's _fullpath() interface is much
better designed, since it provides a third parameter to
specify the buffer size.
Regards,
Keith.
Logged In: YES
user_id=1040098
Sounds good. I will look at the reparse point loop stuff
when I return from vacation in about two weeks, unless
someone else posts some info about this before me.
I have a request, though. I think someone should figure
out what the deal with PATH_MAX is, and if at all
possible, get it in alignment with _MAX_PATH. Not
necessarily the same.. just in alignment, if they have
different semantics.
I think this is important, because POSIX says that
PATH_MAX is {PATH_MAX} if it is defined, and {PATH_MAX} is
what realpath() is required to use.
Logged In: YES
user_id=823908
Agreed, we should resolve the inconsistency between PATH_MAX
and _MAX_PATH. I'll pursue the issue on mingw-dvlpr.
Thanks, Aaron, for your offer to investigate the reparse
point feature. It will be great if you can follow this up.
Regards,
Keith.
Where do we stand with this patch?
> Where do we stand with this patch?
As we left it 4 years ago; no one has provided any further feed back on the likely impact of a junction point loop.
WRT the PATH_MAX issue, I think that should be correctly redefined as 260; I believe Cygwin made that change some time ago.
An appropriately written manpage is still outstanding. There seems to be little demand for addition of realpath(), and it definitely should not be progressed further, without appropriate (implementation specific) documentation.
junction point loops are definitely possible. I have created them. Since we don't wish to create a UNIX runtime then I think we should abandon this patch and mark it as won't fix.
Even if we abandon the patch, we should still address the _MAX_PATH vs. PATH_MAX anomaly; both should have the same value (260). (POSIX is explicit: PATH_MAX is the maximum number of characters allowed in a path name, *including* the space required to accommodate the terminating NUL; making it less than _MAX_PATH exposes a buffer overrun potential, due to insufficient buffer allocation of PATH_MAX bytes, when client code expects to have _MAX_PATH bytes available).
It would be a shame to completely drop this implementation. We may agree that there is insufficient justification for including it within libmingwex.a, but there could still be mileage in adding the prototype to stdlib.h, with a comment to the effect that the user needs to provide the implementation. Then we could create a "cookbook" of useful resources, (perhaps as a CVS module on SF, or free-standing on MinGW.org), and store the source code there, as an example of a suitable implementation, (with a FIXME added, to highlight the lack of handling for possible junction point loops).
Just thinking aloud...
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 30 days (the time period specified by
the administrator of this Tracker).