From: SourceForge.net <no...@so...> - 2009-12-11 12:29:15
|
Patches item #1294010, was opened at 2005-09-17 19:29 Message generated for change (Comment added) made by keithmarshall You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=302435&aid=1294010&group_id=2435 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: runtime Group: None Status: Pending Resolution: Wont Fix Priority: 5 Private: No Submitted By: Nach M. S. (n-a-c-h) Assigned to: Nobody/Anonymous (nobody) Summary: Full implementation of realpath() for Win32 Initial Comment: I noticed the other day that Win32 is the only port one of my applications doesn't have access to realpath() on. So I wrote realpath() for it, and I might as well submit it so it becomes part of MinGW and I don't have to make a special ifdef __WIN32__ for it ;) The realpath() I wrote should be standards conforming. It uses GetFullPathNameA() from Win32 as the underlying means as getting the resolved path. It sets errno appropriately as needed and even translates Win32 errors to the appropriate errno. It also makes sure the path exists (via a call to stat()) and supports the glibc extension to allocate the resolved path buffer if it was passed in as null to the call to realpath(). My tests show the function works quite well, but if someone finds an error with it, I'll be happy to fix it. I give full permission to MinGW and it's users to use it, if I messed up on the usage rights in the file, I'll be happy to change it too. -Nach ---------------------------------------------------------------------- >Comment By: Keith Marshall (keithmarshall) Date: 2009-12-11 12:29 Message: Even if we abandon the patch, we should still address the _MAX_PATH vs. PATH_MAX anomaly; both should have the same value (260). (POSIX is explicit: PATH_MAX is the maximum number of characters allowed in a path name, *including* the space required to accommodate the terminating NUL; making it less than _MAX_PATH exposes a buffer overrun potential, due to insufficient buffer allocation of PATH_MAX bytes, when client code expects to have _MAX_PATH bytes available). It would be a shame to completely drop this implementation. We may agree that there is insufficient justification for including it within libmingwex.a, but there could still be mileage in adding the prototype to stdlib.h, with a comment to the effect that the user needs to provide the implementation. Then we could create a "cookbook" of useful resources, (perhaps as a CVS module on SF, or free-standing on MinGW.org), and store the source code there, as an example of a suitable implementation, (with a FIXME added, to highlight the lack of handling for possible junction point loops). Just thinking aloud... ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2009-12-09 17:29 Message: junction point loops are definitely possible. I have created them. Since we don't wish to create a UNIX runtime then I think we should abandon this patch and mark it as won't fix. ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2009-12-07 16:16 Message: > Where do we stand with this patch? As we left it 4 years ago; no one has provided any further feed back on the likely impact of a junction point loop. WRT the PATH_MAX issue, I think that should be correctly redefined as 260; I believe Cygwin made that change some time ago. An appropriately written manpage is still outstanding. There seems to be little demand for addition of realpath(), and it definitely should not be progressed further, without appropriate (implementation specific) documentation. ---------------------------------------------------------------------- Comment By: Chris Sutcliffe (ir0nh34d) Date: 2009-07-18 14:59 Message: Where do we stand with this patch? ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-23 12:44 Message: Logged In: YES user_id=823908 Agreed, we should resolve the inconsistency between PATH_MAX and _MAX_PATH. I'll pursue the issue on mingw-dvlpr. Thanks, Aaron, for your offer to investigate the reparse point feature. It will be great if you can follow this up. Regards, Keith. ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-22 17:34 Message: Logged In: YES user_id=1040098 Sounds good. I will look at the reparse point loop stuff when I return from vacation in about two weeks, unless someone else posts some info about this before me. I have a request, though. I think someone should figure out what the deal with PATH_MAX is, and if at all possible, get it in alignment with _MAX_PATH. Not necessarily the same.. just in alignment, if they have different semantics. I think this is important, because POSIX says that PATH_MAX is {PATH_MAX} if it is defined, and {PATH_MAX} is what realpath() is required to use. ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-22 15:00 Message: Logged In: YES user_id=823908 Just for clarification: when I said previously that I didn't check for a NULL input file_name pointer, I was referring only to the Q&D test case I threw together. My actual realpath() implementation does start out by checking for this condition, returning NULL with errno = EINVAL, if it finds it. I simply neglected to include a test case to prove correct operation of this -- should be trivial to add though. Regarding NTFS Reparse Points ----------------------------- I confess that I hadn't considered the possible effect of reparse points, and their passing resemblance to symbolic links. How will Microsoft's own access() and _fullpath() functions behave, if presented with a path name which dereferences a circularly defined reparse point? Will they set errno to ELOOP, as they should, or provide some other indication of the condition? (Sorry, but I don't see how I can test that myself, cross-compiling from GNU/Linux and running any test under WINE -- my underlying file system is ext2, not NTFS). If Microsoft's runtime routines provide suitable ELOOP detection support, then any MinGW implementation of realpath() should exploit it, on those Win32 variants where it may be applicable; otherwise, we should simply document, in the manpage, that the condition is either not relevant, or detection of it is not supported. It is, perhaps, worth noting that neither MSYS nor MinGW provide any specific mechanism for identifying, nor for manipulating reparse points; it is, of course, possible that a user written MinGW application could do so, through an appropriate Microsoft API. Regarding the Handling of a NULL Return Arg Pointer --------------------------------------------------- I wonder if there really is any software which relies on the automatic allocation of a return buffer, if passed a NULL pointer for the `resolved_path' argument; any software written this way would not be robust, since there is no guarantee that the feature is supported by any target implementation of realpath(). That said, however, I think that supporting this feature does make for a more robust implementation, since it absolves the programmer from the chore of deciding what buffer size may be necessary -- unless you are also going to provide an implementation of pathconf(). There must come a point where we stop trying to make MS-Windows behave like a POSIX system -- maybe realpath() is already a step too far. Microsoft themselves provide _fullpath(), which may be used in place of realpath(), unless you require the check for existence and accessibility of the referenced entity, which needs the addition of an access() call. _fullpath() *does* support the allocation of a return buffer if passed a NULL pointer, in which case it allocates _MAX_PATH bytes, where _MAX_PATH is defined, (by Microsoft, according to my interpretation of the relevant MSDN documentation), in stdlib.h, together with the prototype for _fullpath(), (and MinGW reproduces this). Since my implementation of realpath() uses _fullpath(), I suggest that _MAX_PATH is the appropriate buffer size to request, in preference to PATH_MAX, which MinGW defines in limits.h, and Microsoft may not define at all. (BTW, current MinGW defines PATH_MAX as 259, whereas _MAX_PATH is defined as 260 -- I don't know why). Just one further point: realpath() is a buffer overrun waiting to happen. The function design is flawed, since it provides no mechanism for preventing the returned string from growing beyond the end of the user supplied buffer -- the function has no way of knowing what size that buffer is. In this respect, Microsoft's _fullpath() interface is much better designed, since it provides a third parameter to specify the buffer size. Regards, Keith. ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-21 20:00 Message: Logged In: YES user_id=1040098 Scratch my comments regarding resolved_name being null. As should be obvious now, I was looking at the wrong parameter. :) POSIX says, "If resolved_name is a null pointer, the behavior of realpath() is implementation-defined." And, since there is obviously a significant amount of software that depends on the resolved_name==0 feature (despite this probably being a bad idea), I'd say that I support it being implemented, so long as it is documented. ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-21 19:55 Message: Logged In: YES user_id=1040098 HI Keith, > the extension to support auto-allocation of a return > buffer is *not* in violation of SUSv3 Can you tell me where you are reading this? I've looked at both SUSv3 (Issue 6) and IEEE 1003.1 (2004), and none of them mention implementation-defined behavior here. What I'm reading says, "The realpath() function shall fail if ... [EINVAL] The file_name argument is a null pointer." Note that it is in the 'shall' section, not the 'may' section, the latter of which denotes optional implementation-defined behavior. Is there somewhere else in the standard that this is discussed, or a newer version? You're right that this isn't part of core POSIX; however, I am skeptical of adding XSI-specified functions contrary to the specifications of XSI, regardless of what any other single particular implementation (glibc2) chooses to do. > The ELOOP status is meaningless on Win32, which > AFAIK does *not* support any form of symbolic link Current NTFS is able to partially support them through "reparse points." A reparse point is a special file that has additional filesystem-specific semantics that are handled by the back-end filesystem driver or filter. Unfortunately, complete documentation is only availible as part of the IFS kit, which is quite expensive. While reparse points themselves are fully capable of doing anything symbolic links can, presently (in WinXP) NTFS is only capable of supporting directory symbolic links. (Note that these are soft links, not hard links.) They are commonly referred to as 'junction points,' and there is an official Microsoft tool floating around for manipulating them, as well as several unofficial. I actually have no idea what GetFullPathName() does when it hits a loop in reparse points, although I suspect it sets a particular GetLastError() value that can be translated into an errno value. An easy way to figure out would be to set one of these up, and see what happens. :) Unfortunately, I'm about to leave on vacation, so I can't follow up with this. Since ELOOP *is* a manditory error, I think it is somewhat important that this issue is handled, if only in documentation saying it is not supported. Regarding _MAX_PATH, I'm not really sure why it would be any more appropriate than MAX_PATH. Actually, I was informed the other day that if MAX_PATH is defined by the implementation, then it is considered to be the value of {MAX_PATH}. (It's the case where its not defined that realpath() goes to crap.) I haven't looked at Keith's implementation because I don't have access to gzip here, but I would suspect that once the correct semantics are decided-on, the two implementations would converge. ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-20 17:19 Message: Logged In: YES user_id=823908 I've now modified my implementation to use `restrict' for the two arguments, as specified by SUSv3. I've also provided a trivial test case, which proves that my implementation *does* handle all of those error conditions specified by SUSv3, which make sense for the Win32 environment, and *does* set errno appropriately, (with the exception of passing a NULL pointer for the input path name argument). I've not removed the capability for allocating a return buffer, if it's passed as NULL by the caller; a further reading of SUSv3 confirms: * This is *entirely* standards *conformant*, for the standard explicitly states that handling of this condition is implementation dependent. * The realpath function itself is *not* a POSIX standard; it is an X/Open System Interfaces Extension to IEEE 1003.1-2004. It may also be noted that allowing for auto-allocation of a return buffer makes the implementation more robust, in spite of the lack of portability of this behaviour; I'm not convinced that it should be excluded. Regards, Keith. ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-19 21:52 Message: Logged In: YES user_id=823908 Can you provide a simple test case, which will demonstrate why your implementation is any more standards conformant than mine? If you look closely, you will note that mine also *does* confirm that the entity both exists and is accessible, by use of an access() call for the *input* path name -- which is what SUSv3 requires to be checked -- and this will also set errno appropriately. Some further comments:-- * the extension to support auto-allocation of a return buffer is *not* in violation of SUSv3 -- this is now allowed as an implementation dependent option, and may even be mandated in a future version. Nevertheless, since it isn't yet mandatory, and certainly isn't portable, it is probably better not to support it in a current implementation. * If we agree that the above extension is not to be supported, then a realpath implementation can become even simpler than my earlier proposal; all that is required is to check *both* arguments are non-NULL, setting EINVAL if either fails; if both pass, perform an access() check on the input path name, returning with errno set by access() on failure; if this also passes, return its _fullpath() resolution, also setting errno appropriately, if this happens to be NULL. * The ELOOP status is meaningless on Win32, which AFAIK does *not* support any form of symbolic link. NTFS supports hard links, but there can be no concept of circular link references with a hard link. The nearest Win32 gets to providing a symbolic link is a "shortcut", but a Win32 shortcut is *not* a symbolic link, in the *nix sense; therefore, there are no circumstances under which a Win32 implementation of realpath() should need to raise an ELOOP condition. * The correct -- Microsoft defined -- constant for specifying the required buffer size is _MAX_PATH, which is defined in stdlib.h. * I don't think it appropriate to include `resolved_path[PATH_MAX]' in the function prototype -- it should be as specified by SUSv3, simply as `char *'; (you have no guarantee that the user will include any header which defines PATH_MAX, so your prototype may be broken, at the point of reference. * I *did* try the restrict qualifier, as specified in SUSv3, for my implementation, but my MinGW gcc 3.3.2 choked on it; perhaps I didn't get the usage right, as I've never used it before -- I'll need to explore this further. * simply providing an exact copy of a Linux manpage is *not* appropriate, as documentation for any Win32 function implementation; the manpage must be adapted to correctly reflect the details, and limitations, which are specific to the Win32 implementation. Regards, Keith. ---------------------------------------------------------------------- Comment By: Nach M. S. (n-a-c-h) Date: 2005-12-15 15:17 Message: Logged In: YES user_id=695796 >Another interesting point is that most software that needs >this function is broken or legacy. Without some operating >system-specific guarantee, the standard version of this >function is difficult to use in a strictly-conforming >manner. This, combined with how past implementations >(including glibc's) have invariably had multiple serious >bugs, means most software ships their own implementation >of this. I am curious what software there is that depends >specifically on glibc's extensions. I have a program which I currently have it running and tested on at least MS-DOS, Windows, Linux, FreeBSD, and Mac OS X. It doesn't use the glibc extension (which personally makes no sense - unless it becomes part of POSIX), and works fine everywhere. My only problem was that I had to make my own for Windows, everything else worked fine. As to why I use it; basically my program sometimes needs to launch another program passing it a file, since the program to launch could be located in one directory and the file in another and the current directory of the running program somewhere else, I pass the absolute path of the file to the program being launched. -Nach ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-15 13:55 Message: Logged In: YES user_id=1040098 > Where here is restrict appropriate? I honestly > never looked much into that C99 addition. I'd be > happy to hear more information on it. In the general case, where (by the interface) two pointer parameters shall not alias. It provides a performance boost in some cases. In this specific case, POSIX (the modern version which is based on C99) specifies that both parameters shall be restrict-qualified. > Perhaps you didn't understand what I did. You're right; I didn't. Nonetheless, I am still concerned about how this behavior that differs from standard- mandated behavior. Another interesting point is that most software that needs this function is broken or legacy. Without some operating system-specific guarantee, the standard version of this function is difficult to use in a strictly-conforming manner. This, combined with how past implementations (including glibc's) have invariably had multiple serious bugs, means most software ships their own implementation of this. I am curious what software there is that depends specifically on glibc's extensions. ---------------------------------------------------------------------- Comment By: Nach M. S. (n-a-c-h) Date: 2005-12-15 07:51 Message: Logged In: YES user_id=695796 >* The 'glibc extension' to handle a null >resolved_filename >is inviolation of the standard. POSIX states that it >shall return EINVAL in that case. Yes that is correct, I only added it because I saw an argument on the MinGW mailing list somewhere about implementing this a few months back, and one of the devs mentioned they wanted the glibc extension. It should be noted in the case where the extension fails, it does return EINVAL in my implementation. >* Is PATH_MAX really the right constant to use here? In >POSIX, there is no particular compile-time constant >corresponding to the maximum path length, which may be >infinity. That I'm not sure about, hence why I asked people to comment on errors and possible solutions with it. >* You may consider adding 'restrict' as appropriate, >conditionalized on C99. Where here is restrict appropriate? I honestly never looked much into that C99 addition. I'd be happy to hear more information on it. >* The behavior to malloc() a new buffer when PATH_MAX is >not big enough is incorrect. This will almost certainly >cause memory leaks. Outside of doing strlen() on the >return value, how can client code determine whether malloc >has been used, and consequently if free() needs to be >called? Besides, theres no particular guarantee that the >client pay attention to the return value of realpath() at >all. Perhaps you didn't understand what I did. The behavior to malloc a new buffer is when malloc(PATH_MAX) wasn't big enough. That is the behavior I saw somewhere on MSDN for using GetFullPathName*(). The user knows he has to call free, because he called realpath() with a 0 as his second parameter. I don't see how this would cause a memory leak. The worry about him requesting to allocate memory then not checking or freeing it is the same as if someone were to write: malloc(5000); Or strdup(mystr); We never worry about that, so I don't see the problem here. I'll be happy to see further link dicussions. Although the available C runtime for Windows seems very limited in it's support for things in these areas. Things like stat()'s st_ino are always 0 preventing checks to see if two files are the same, etc... -Nach ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-15 07:25 Message: Logged In: YES user_id=1040098 I'd just like to comment on your implementation n-a-c-h. * Firstly, the comment about ELOOP is interesting. Yes, current NTFS has limited support for both soft and hard links, which almost no Windows applications handle correctly, leading to all sorts of weird problems, including security problems. I wonder if realpath() for Win32 should do anything special. Furthermore, Vista will have more pronounced symlink support; I wonder what this will change. Anyway, there should be some documented rationale, at the very least, for whatever decision is made here. On the other hand, POSIX guarantees no particular semantics with regard to symbolic links. Still, something reasonable should happen. * The 'glibc extension' to handle a null resolved_filename is inviolation of the standard. POSIX states that it shall return EINVAL in that case. * You may consider adding 'restrict' as appropriate, conditionalized on C99. * Is PATH_MAX really the right constant to use here? In POSIX, there is no particular compile-time constant corresponding to the maximum path length, which may be infinity. * The behavior to malloc() a new buffer when PATH_MAX is not big enough is incorrect. This will almost certainly cause memory leaks. Outside of doing strlen() on the return value, how can client code determine whether malloc has been used, and consequently if free() needs to be called? Besides, theres no particular guarantee that the client pay attention to the return value of realpath() at all. ---------------------------------------------------------------------- Comment By: Nach M. S. (n-a-c-h) Date: 2005-12-14 13:20 Message: Logged In: YES user_id=695796 >Hmm. I normally just use > > #define realpath(N,R) _fullpath((R),(N),_MAX_PATH) > >if I need realpath(). > >_fullpath() is a standard Microsoft function, which >somewhat >more closely resembles realpath() than GetFullPathNameA() >does, although it still doesn't confirm that the >referenced >entity either exists, or is accessible. > >If you want a somewhat more SUSv3 conformant implementation >of realpath(), you might consider the more compact version >I've attached. That's very nice, but your implementation isn't standards conformant, close doesn't cut it. The one I've written makes sure the entity does exist and is accessable. Mine also sets errno properly. >in the case of a function such as realpath(), which is not >a standard Microsoft function, and therfore not documented >on MSDN, I would also like to see documentation, preferably >in the form of a manpage I thought you have access to the manpage for realpath, I'm attaching the Linux one I have which is basically what I cloned. >and any peculiarities of its behaviour in it's Win32 >adaptation. In the case of Win32, the ELOOP won't ever be returned by my implementation since it doesn't do any checks for symbolic links. As far as I know, Win32 currently doesn't use symbolic links. >You should also include a patch, in `diff -u' format, to >add a suitable function prototype to an appropriate header >file. Not a problem. --- /usr/i586-mingw32/include/stdlib.h 2004-09-28 22:56:17.000000000 +0200 +++ stdlib.h 2005-12-14 15:25:07.000000000 +0200 @@ -407,6 +407,7 @@ _CRTIMP void __cdecl _makepath (char*, const char*, const char*, const char*, const char*); _CRTIMP void __cdecl _splitpath (const char*, char*, char*, char*, char*); _CRTIMP char* __cdecl _fullpath (char*, const char*, size_t); +_CRTIMP char* __cdecl realpath(const char *path, char resolved_path[PATH_MAX]); _CRTIMP char* __cdecl _itoa (int, char*, int); _CRTIMP char* __cdecl _ltoa (long, char*, int); >Also note that, when you assign software to the public >domain, then you implicitly relinquish any claim to >copyright over that code. Thus, the preamble to your code >should not claim copyright, while also asserting assignment >to the public domain; rather say `Public domain software: >written by M.S. Nach', or words to that effect, if you >wish to be identified as the author, or simply donate it >anonymously. Fine, I'll resubmit the file with a different header. -Nach ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-02 21:50 Message: Logged In: YES user_id=823908 Hmm. I normally just use #define realpath(N,R) _fullpath((R),(N),_MAX_PATH) if I need realpath(). _fullpath() is a standard Microsoft function, which somewhat more closely resembles realpath() than GetFullPathNameA() does, although it still doesn't confirm that the referenced entity either exists, or is accessible. If you want a somewhat more SUSv3 conformant implementation of realpath(), you might consider the more compact version I've attached. Thanks for your patch, but I will not add it to MinGW in its present form. Note that for formal consideration, patches should be accompanied by an appropriate ChangeLog entry; in the case of a function such as realpath(), which is not a standard Microsoft function, and therfore not documented on MSDN, I would also like to see documentation, preferably in the form of a manpage, describing the function's usage, and any peculiarities of its behaviour in it's Win32 adaptation. You should also include a patch, in `diff -u' format, to add a suitable function prototype to an appropriate header file. Also note that, when you assign software to the public domain, then you implicitly relinquish any claim to copyright over that code. Thus, the preamble to your code should not claim copyright, while also asserting assignment to the public domain; rather say `Public domain software: written by M.S. Nach', or words to that effect, if you wish to be identified as the author, or simply donate it anonymously. Regards, Keith. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=302435&aid=1294010&group_id=2435 |