From: SourceForge.net <no...@so...> - 2005-12-21 19:55:45
|
Patches item #1294010, was opened at 2005-09-17 14:29 Message generated for change (Comment added) made by aaronwl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=302435&aid=1294010&group_id=2435 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: runtime Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nach M. S. (n-a-c-h) Assigned to: Nobody/Anonymous (nobody) Summary: Full implementation of realpath() for Win32 Initial Comment: I noticed the other day that Win32 is the only port one of my applications doesn't have access to realpath() on. So I wrote realpath() for it, and I might as well submit it so it becomes part of MinGW and I don't have to make a special ifdef __WIN32__ for it ;) The realpath() I wrote should be standards conforming. It uses GetFullPathNameA() from Win32 as the underlying means as getting the resolved path. It sets errno appropriately as needed and even translates Win32 errors to the appropriate errno. It also makes sure the path exists (via a call to stat()) and supports the glibc extension to allocate the resolved path buffer if it was passed in as null to the call to realpath(). My tests show the function works quite well, but if someone finds an error with it, I'll be happy to fix it. I give full permission to MinGW and it's users to use it, if I messed up on the usage rights in the file, I'll be happy to change it too. -Nach ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-21 13:55 Message: Logged In: YES user_id=1040098 HI Keith, > the extension to support auto-allocation of a return > buffer is *not* in violation of SUSv3 Can you tell me where you are reading this? I've looked at both SUSv3 (Issue 6) and IEEE 1003.1 (2004), and none of them mention implementation-defined behavior here. What I'm reading says, "The realpath() function shall fail if ... [EINVAL] The file_name argument is a null pointer." Note that it is in the 'shall' section, not the 'may' section, the latter of which denotes optional implementation-defined behavior. Is there somewhere else in the standard that this is discussed, or a newer version? You're right that this isn't part of core POSIX; however, I am skeptical of adding XSI-specified functions contrary to the specifications of XSI, regardless of what any other single particular implementation (glibc2) chooses to do. > The ELOOP status is meaningless on Win32, which > AFAIK does *not* support any form of symbolic link Current NTFS is able to partially support them through "reparse points." A reparse point is a special file that has additional filesystem-specific semantics that are handled by the back-end filesystem driver or filter. Unfortunately, complete documentation is only availible as part of the IFS kit, which is quite expensive. While reparse points themselves are fully capable of doing anything symbolic links can, presently (in WinXP) NTFS is only capable of supporting directory symbolic links. (Note that these are soft links, not hard links.) They are commonly referred to as 'junction points,' and there is an official Microsoft tool floating around for manipulating them, as well as several unofficial. I actually have no idea what GetFullPathName() does when it hits a loop in reparse points, although I suspect it sets a particular GetLastError() value that can be translated into an errno value. An easy way to figure out would be to set one of these up, and see what happens. :) Unfortunately, I'm about to leave on vacation, so I can't follow up with this. Since ELOOP *is* a manditory error, I think it is somewhat important that this issue is handled, if only in documentation saying it is not supported. Regarding _MAX_PATH, I'm not really sure why it would be any more appropriate than MAX_PATH. Actually, I was informed the other day that if MAX_PATH is defined by the implementation, then it is considered to be the value of {MAX_PATH}. (It's the case where its not defined that realpath() goes to crap.) I haven't looked at Keith's implementation because I don't have access to gzip here, but I would suspect that once the correct semantics are decided-on, the two implementations would converge. ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-20 11:19 Message: Logged In: YES user_id=823908 I've now modified my implementation to use `restrict' for the two arguments, as specified by SUSv3. I've also provided a trivial test case, which proves that my implementation *does* handle all of those error conditions specified by SUSv3, which make sense for the Win32 environment, and *does* set errno appropriately, (with the exception of passing a NULL pointer for the input path name argument). I've not removed the capability for allocating a return buffer, if it's passed as NULL by the caller; a further reading of SUSv3 confirms: * This is *entirely* standards *conformant*, for the standard explicitly states that handling of this condition is implementation dependent. * The realpath function itself is *not* a POSIX standard; it is an X/Open System Interfaces Extension to IEEE 1003.1-2004. It may also be noted that allowing for auto-allocation of a return buffer makes the implementation more robust, in spite of the lack of portability of this behaviour; I'm not convinced that it should be excluded. Regards, Keith. ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-19 15:52 Message: Logged In: YES user_id=823908 Can you provide a simple test case, which will demonstrate why your implementation is any more standards conformant than mine? If you look closely, you will note that mine also *does* confirm that the entity both exists and is accessible, by use of an access() call for the *input* path name -- which is what SUSv3 requires to be checked -- and this will also set errno appropriately. Some further comments:-- * the extension to support auto-allocation of a return buffer is *not* in violation of SUSv3 -- this is now allowed as an implementation dependent option, and may even be mandated in a future version. Nevertheless, since it isn't yet mandatory, and certainly isn't portable, it is probably better not to support it in a current implementation. * If we agree that the above extension is not to be supported, then a realpath implementation can become even simpler than my earlier proposal; all that is required is to check *both* arguments are non-NULL, setting EINVAL if either fails; if both pass, perform an access() check on the input path name, returning with errno set by access() on failure; if this also passes, return its _fullpath() resolution, also setting errno appropriately, if this happens to be NULL. * The ELOOP status is meaningless on Win32, which AFAIK does *not* support any form of symbolic link. NTFS supports hard links, but there can be no concept of circular link references with a hard link. The nearest Win32 gets to providing a symbolic link is a "shortcut", but a Win32 shortcut is *not* a symbolic link, in the *nix sense; therefore, there are no circumstances under which a Win32 implementation of realpath() should need to raise an ELOOP condition. * The correct -- Microsoft defined -- constant for specifying the required buffer size is _MAX_PATH, which is defined in stdlib.h. * I don't think it appropriate to include `resolved_path[PATH_MAX]' in the function prototype -- it should be as specified by SUSv3, simply as `char *'; (you have no guarantee that the user will include any header which defines PATH_MAX, so your prototype may be broken, at the point of reference. * I *did* try the restrict qualifier, as specified in SUSv3, for my implementation, but my MinGW gcc 3.3.2 choked on it; perhaps I didn't get the usage right, as I've never used it before -- I'll need to explore this further. * simply providing an exact copy of a Linux manpage is *not* appropriate, as documentation for any Win32 function implementation; the manpage must be adapted to correctly reflect the details, and limitations, which are specific to the Win32 implementation. Regards, Keith. ---------------------------------------------------------------------- Comment By: Nach M. S. (n-a-c-h) Date: 2005-12-15 09:17 Message: Logged In: YES user_id=695796 >Another interesting point is that most software that needs >this function is broken or legacy. Without some operating >system-specific guarantee, the standard version of this >function is difficult to use in a strictly-conforming >manner. This, combined with how past implementations >(including glibc's) have invariably had multiple serious >bugs, means most software ships their own implementation >of this. I am curious what software there is that depends >specifically on glibc's extensions. I have a program which I currently have it running and tested on at least MS-DOS, Windows, Linux, FreeBSD, and Mac OS X. It doesn't use the glibc extension (which personally makes no sense - unless it becomes part of POSIX), and works fine everywhere. My only problem was that I had to make my own for Windows, everything else worked fine. As to why I use it; basically my program sometimes needs to launch another program passing it a file, since the program to launch could be located in one directory and the file in another and the current directory of the running program somewhere else, I pass the absolute path of the file to the program being launched. -Nach ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-15 07:55 Message: Logged In: YES user_id=1040098 > Where here is restrict appropriate? I honestly > never looked much into that C99 addition. I'd be > happy to hear more information on it. In the general case, where (by the interface) two pointer parameters shall not alias. It provides a performance boost in some cases. In this specific case, POSIX (the modern version which is based on C99) specifies that both parameters shall be restrict-qualified. > Perhaps you didn't understand what I did. You're right; I didn't. Nonetheless, I am still concerned about how this behavior that differs from standard- mandated behavior. Another interesting point is that most software that needs this function is broken or legacy. Without some operating system-specific guarantee, the standard version of this function is difficult to use in a strictly-conforming manner. This, combined with how past implementations (including glibc's) have invariably had multiple serious bugs, means most software ships their own implementation of this. I am curious what software there is that depends specifically on glibc's extensions. ---------------------------------------------------------------------- Comment By: Nach M. S. (n-a-c-h) Date: 2005-12-15 01:51 Message: Logged In: YES user_id=695796 >* The 'glibc extension' to handle a null >resolved_filename >is inviolation of the standard. POSIX states that it >shall return EINVAL in that case. Yes that is correct, I only added it because I saw an argument on the MinGW mailing list somewhere about implementing this a few months back, and one of the devs mentioned they wanted the glibc extension. It should be noted in the case where the extension fails, it does return EINVAL in my implementation. >* Is PATH_MAX really the right constant to use here? In >POSIX, there is no particular compile-time constant >corresponding to the maximum path length, which may be >infinity. That I'm not sure about, hence why I asked people to comment on errors and possible solutions with it. >* You may consider adding 'restrict' as appropriate, >conditionalized on C99. Where here is restrict appropriate? I honestly never looked much into that C99 addition. I'd be happy to hear more information on it. >* The behavior to malloc() a new buffer when PATH_MAX is >not big enough is incorrect. This will almost certainly >cause memory leaks. Outside of doing strlen() on the >return value, how can client code determine whether malloc >has been used, and consequently if free() needs to be >called? Besides, theres no particular guarantee that the >client pay attention to the return value of realpath() at >all. Perhaps you didn't understand what I did. The behavior to malloc a new buffer is when malloc(PATH_MAX) wasn't big enough. That is the behavior I saw somewhere on MSDN for using GetFullPathName*(). The user knows he has to call free, because he called realpath() with a 0 as his second parameter. I don't see how this would cause a memory leak. The worry about him requesting to allocate memory then not checking or freeing it is the same as if someone were to write: malloc(5000); Or strdup(mystr); We never worry about that, so I don't see the problem here. I'll be happy to see further link dicussions. Although the available C runtime for Windows seems very limited in it's support for things in these areas. Things like stat()'s st_ino are always 0 preventing checks to see if two files are the same, etc... -Nach ---------------------------------------------------------------------- Comment By: Aaron W. LaFramboise (aaronwl) Date: 2005-12-15 01:25 Message: Logged In: YES user_id=1040098 I'd just like to comment on your implementation n-a-c-h. * Firstly, the comment about ELOOP is interesting. Yes, current NTFS has limited support for both soft and hard links, which almost no Windows applications handle correctly, leading to all sorts of weird problems, including security problems. I wonder if realpath() for Win32 should do anything special. Furthermore, Vista will have more pronounced symlink support; I wonder what this will change. Anyway, there should be some documented rationale, at the very least, for whatever decision is made here. On the other hand, POSIX guarantees no particular semantics with regard to symbolic links. Still, something reasonable should happen. * The 'glibc extension' to handle a null resolved_filename is inviolation of the standard. POSIX states that it shall return EINVAL in that case. * You may consider adding 'restrict' as appropriate, conditionalized on C99. * Is PATH_MAX really the right constant to use here? In POSIX, there is no particular compile-time constant corresponding to the maximum path length, which may be infinity. * The behavior to malloc() a new buffer when PATH_MAX is not big enough is incorrect. This will almost certainly cause memory leaks. Outside of doing strlen() on the return value, how can client code determine whether malloc has been used, and consequently if free() needs to be called? Besides, theres no particular guarantee that the client pay attention to the return value of realpath() at all. ---------------------------------------------------------------------- Comment By: Nach M. S. (n-a-c-h) Date: 2005-12-14 07:20 Message: Logged In: YES user_id=695796 >Hmm. I normally just use > > #define realpath(N,R) _fullpath((R),(N),_MAX_PATH) > >if I need realpath(). > >_fullpath() is a standard Microsoft function, which >somewhat >more closely resembles realpath() than GetFullPathNameA() >does, although it still doesn't confirm that the >referenced >entity either exists, or is accessible. > >If you want a somewhat more SUSv3 conformant implementation >of realpath(), you might consider the more compact version >I've attached. That's very nice, but your implementation isn't standards conformant, close doesn't cut it. The one I've written makes sure the entity does exist and is accessable. Mine also sets errno properly. >in the case of a function such as realpath(), which is not >a standard Microsoft function, and therfore not documented >on MSDN, I would also like to see documentation, preferably >in the form of a manpage I thought you have access to the manpage for realpath, I'm attaching the Linux one I have which is basically what I cloned. >and any peculiarities of its behaviour in it's Win32 >adaptation. In the case of Win32, the ELOOP won't ever be returned by my implementation since it doesn't do any checks for symbolic links. As far as I know, Win32 currently doesn't use symbolic links. >You should also include a patch, in `diff -u' format, to >add a suitable function prototype to an appropriate header >file. Not a problem. --- /usr/i586-mingw32/include/stdlib.h 2004-09-28 22:56:17.000000000 +0200 +++ stdlib.h 2005-12-14 15:25:07.000000000 +0200 @@ -407,6 +407,7 @@ _CRTIMP void __cdecl _makepath (char*, const char*, const char*, const char*, const char*); _CRTIMP void __cdecl _splitpath (const char*, char*, char*, char*, char*); _CRTIMP char* __cdecl _fullpath (char*, const char*, size_t); +_CRTIMP char* __cdecl realpath(const char *path, char resolved_path[PATH_MAX]); _CRTIMP char* __cdecl _itoa (int, char*, int); _CRTIMP char* __cdecl _ltoa (long, char*, int); >Also note that, when you assign software to the public >domain, then you implicitly relinquish any claim to >copyright over that code. Thus, the preamble to your code >should not claim copyright, while also asserting assignment >to the public domain; rather say `Public domain software: >written by M.S. Nach', or words to that effect, if you >wish to be identified as the author, or simply donate it >anonymously. Fine, I'll resubmit the file with a different header. -Nach ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2005-12-02 15:50 Message: Logged In: YES user_id=823908 Hmm. I normally just use #define realpath(N,R) _fullpath((R),(N),_MAX_PATH) if I need realpath(). _fullpath() is a standard Microsoft function, which somewhat more closely resembles realpath() than GetFullPathNameA() does, although it still doesn't confirm that the referenced entity either exists, or is accessible. If you want a somewhat more SUSv3 conformant implementation of realpath(), you might consider the more compact version I've attached. Thanks for your patch, but I will not add it to MinGW in its present form. Note that for formal consideration, patches should be accompanied by an appropriate ChangeLog entry; in the case of a function such as realpath(), which is not a standard Microsoft function, and therfore not documented on MSDN, I would also like to see documentation, preferably in the form of a manpage, describing the function's usage, and any peculiarities of its behaviour in it's Win32 adaptation. You should also include a patch, in `diff -u' format, to add a suitable function prototype to an appropriate header file. Also note that, when you assign software to the public domain, then you implicitly relinquish any claim to copyright over that code. Thus, the preamble to your code should not claim copyright, while also asserting assignment to the public domain; rather say `Public domain software: written by M.S. Nach', or words to that effect, if you wish to be identified as the author, or simply donate it anonymously. Regards, Keith. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=302435&aid=1294010&group_id=2435 |