From: SourceForge.net <no...@so...> - 2007-10-09 06:11:48
|
Patches item #1807933, was opened at 2007-10-05 08:13 Message generated for change (Comment added) made by lambourg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=302435&aid=1807933&group_id=2435 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jerome Lambourg (lambourg) Assigned to: Chris Sutcliffe (ir0nh34d) Summary: stat function does not handle trailing slashes Initial Comment: The _stat windows function does not support trailing directory separators (slash or antislash) as parameter: _stat ("/correct/directory/", &st) will return an error while _stat("/correct/directory", &st) will perform correctly. This has impact on gcc for example where -I/my/includes/ is not accepted (accepted on all unix I know of). The proposed patch was done on Mingw runtime 3.13 sources. It creates a new _mingw_stat function, removing the trailing slash if there (unless the root directory is furnished), before calling the Microsoft's C run-time function. Then, the stat.h include defines _stat and stat to point to this new function. ---------------------------------------------------------------------- >Comment By: Jerome Lambourg (lambourg) Date: 2007-10-09 06:11 Message: Logged In: YES user_id=1906084 Originator: YES File Added: patch-stat-3.diff ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2007-10-08 20:16 Message: Logged In: YES user_id=11494 Originator: NO > Sure, but the argument to stat(), which is being manipulated prior to the > call, is a (possibly) multibyte character string; it must be manipulated in > a locale which will match the character set of the file system. If that > isn't the system locale, then what is it? CP_ACP is the default system locale. The usual way to convert to this is with w32api call to MultibyteToWideChar. Jerome, adacore, please look in adaint.c gnat_stat()in gnat sources. It handles the trailing slashes just fine on mingw32 without mucking with locale data. I suspect Pascal Obry was the author. Danny Here is a reference to the thread safety problem with setlocale: http://lists-archives.org/mingw-users/00944-streams-not-thread-safe.html Danny ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2007-10-08 12:11 Message: Logged In: YES user_id=823908 Originator: NO > -_CRTIMP int __cdecl __MINGW_NOTHROW _stat (const char*, struct _stat*); > +int __cdecl __MINGW_NOTHROW _mingw_stat (const char*, struct _stat*); > +#define _stat(__p,__s) _mingw_stat(__p,__s) > > You may be justified in making the POSIX name follow your rules (the > actual POSIX spec for stat() says nothing about the handling of trailing > slashes), No, but most actual implementations do deal with them in the manner this patch strives to achieve. > but I think forcing a redefinition of the MS _stat is a bit severe. I'm inclined to agree. Since this is a patch to assist in porting POSIX applications, it would be sensible to restrict its scope to only redirect the call those applications will actually use. > I mean after all, MSVC stat will never be POSIX-ly correct and never > claimed to be. What do you intend to do with > > _stat32 > _stat64 > _stati64 > and on Vista > _stat32i64 > _stat64i32 > > and their mappings to _stat() as per MSDN documentation? One could conceive a wrapper function which would take a pointer to the underlying function to invoke, as a third parameter, and redirect as appropriate. Simpler obviously, to just require users of the uglified Microsnot names to ensure Microsnot compatibility up front, (which would not compromise portability of POSIX code anyway). > + /* to handle path names for files in multibyte character locales, > + * we need to set up LC_CTYPE to match the host file system locale. > + */ > + > + char *locale = setlocale (LC_CTYPE, NULL); > + if (locale != NULL) locale = strdup (locale); > + setlocale (LC_CTYPE, ""); > > This is not right. The OS does not use libc locale for file system. > It use win32api language functions and structures. Sure, but the argument to stat(), which is being manipulated prior to the call, is a (possibly) multibyte character string; it must be manipulated in a locale which will match the character set of the file system. If that isn't the system locale, then what is it? > It is also results in heavy penalty to pay for the the majority of > cases that do not have trailing slashes. Perhaps the wrapper could first try the _stat() call; if it succeeds, simply return immediately; if it fails with ENOENT, (this appears to be the appropriate errno value), then proceed to check for, and discard non-significant trailing slashes/backslashes, and return the result of retrying the _stat() call, if any are present, otherwise simply return the status from the original failing call? > Calls to libc setlocale requires locks/unlocks of locale data to > ensure thread safety. This has been an issue in design of C++ locale > classes. In fact, there was a a libstdc++ bug report a few years ago > where repeated calls to setlocale exposed a bug in this locking > mechanism on windows, at least on some versions of mscvcrt.dll > and caused a segfault. I'm sure you do speak with authority, Danny, but I see no references to issues of thread safety, in any MSDN documentation on setlocale(), or on any other function which is dependent on locale settings. If the locale settings are not inherently thread safe, then how can *any* locale dependent function ever be expected to work reliably, in a multithreaded application? > The call to mbstowcs also involves locking/unlocking locale data. Same comment. > This > + if (len > 1 && (refcopy [len - 1] == L'/' > || refcopy [len - 1] == L'\\')) { > > buggers root dirs like "C:/" and "//./C:/" which require the > trailing slash The description of the patch suggests that it avoids this, but like you, I don't see how. It should not be difficult to achieve, but I don't think the necessary protection is in place, in the current patch. ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2007-10-06 03:39 Message: Logged In: YES user_id=11494 Originator: NO Three comments from a user: 1) -_CRTIMP int __cdecl __MINGW_NOTHROW _stat (const char*, struct _stat*); +int __cdecl __MINGW_NOTHROW _mingw_stat (const char*, struct _stat*); +#define _stat(__p,__s) _mingw_stat(__p,__s) You may be justified in making the POSIX name follow your rules (the actual POSIX spec for stat() says nothing about the handling of trailing slashes), but I think forcing a redefinition of the MS _stat is a bit severe. I mean after all, MSVC stat will never be POSIX-ly correct and never claimed to be. What do you intend to do with. _stat32 _stat64 _stati64 and on Vista _stat32i64 _stat64i32 and their mappings to _stat() as per MSDN documentation? 2) + /* to handle path names for files in multibyte character locales, + * we need to set up LC_CTYPE to match the host file system locale. + */ + + char *locale = setlocale (LC_CTYPE, NULL); + if (locale != NULL) locale = strdup (locale); + setlocale (LC_CTYPE, ""); This is not right. The OS does not use libc locale for file system. It use win32api language functions and structures. It is also results in heavy penalty to pay for the the majority of cases that do not have trailing slashes. Calls to libc setlocale requires locks/unlocks of locale data to ensure thread safety. This has been an issue in design of C++ locale classes. In fact, there was a a libstdc++ bug report a few years ago where repeated calls to setlocale exposed a bug in this locking mechanism on windows, at least on some versions of mscvcrt.dll and caused a segfault. The call to mbstowcs also involves locking/unlocking locale data. 3) This + if (len > 1 && (refcopy [len - 1] == L'/' || refcopy [len - 1] == L'\\')) { buggers root dirs like "C:/" and "//./C:/" which require the trailing slash ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2007-10-05 18:59 Message: Logged In: YES user_id=823908 Originator: NO It is much better, thanks, but I'm still not sure that it is quite right. I see that it does convert the input path to the wchar_t domain; good. Then it examines the last wchar cell, and discounts it if it is a slash or a backslash, so it drops at most one such trailing character; should it not iterate here, in case of multiple such trailers? Finally, it performs a memcpy() of the possibly shortened string, which is still in the wchar_t domain, and passes this to _stat(); should it not transform back to the mbs domain, with wcstombs() here? Or alternatively, and perhaps better, dispense with the extra malloc(), and the memcpy() altogether, zero terminate directly within the refcopy buffer itself, and call _wstat() on that? ---------------------------------------------------------------------- Comment By: Chris Sutcliffe (ir0nh34d) Date: 2007-10-05 15:33 Message: Logged In: YES user_id=570619 Originator: NO I believe all concerns have been addressed. Keith, any comments before I commit? ---------------------------------------------------------------------- Comment By: Jerome Lambourg (lambourg) Date: 2007-10-05 15:15 Message: Logged In: YES user_id=1906084 Originator: YES File Added: patch-stat-2.diff ---------------------------------------------------------------------- Comment By: Keith Marshall (keithmarshall) Date: 2007-10-05 12:44 Message: Logged In: YES user_id=823908 Originator: NO The GCC folks have already addressed this within GCC sources; we experienced the -I/path/ vs. -I/path issue with the GCC-3.4.2 issue of MinGW, but an upstream patch fixed it by the time we issued the GCC-3.4.5 version. However, it still makes a lot of sense to handle it more generally, for all MinGW compiled applications. Perhaps Chris could review the patch, with a view to adoption for mingw-runtime-3.14. However, there are some issues I would like to see addressed, please, before it is considered further: 1) The patch format is unacceptable; it *must* be resubmitted in `diff -u' (preferred), or `diff -c' format. 2) The new source file requires a disclaimer of copyright, and of warranty, and preferably also identification of the author, in its opening comment block. 3) The technique adopted is unsafe in multibyte character locales; it may incorrectly identify trailing slashes, or more likely trailing backslashes, on certain international versions of Windows, where the file system uses such a locale for file names. Take a look at how I adapted the `basename' and `dirname' functions, to accommodate this. FWIW, I do have a small suite of library functions for handling path names, all of which I believe are locale safe, and which I've assembled to support my `man' port; one of these normalises paths, removing trailing slashes, and replacing all sequences of embedded slashes and backslashes, consistently with a single instance of the preferred dirname separator character, (user's choice of either slash or backslash; backslash by default). I was planning to make it available as a GPL standalone library, which would preclude its inclusion in mingw-runtime, but I might be persuaded to make a specific exception to allow that usage. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=302435&aid=1807933&group_id=2435 |