#2062 glob() may fold case in command args with no globbing tokens.

WSL
pending
None
Bug
fixed
Feature_in_WSL_4.1
True
2014-08-14
2013-09-23
Jan Nijtmans
No

Compile the following little program:

#include <stdio.h>
int main(int argc, char **argv){
  printf("argv1: %s\n", argv[1]);
}

run it:

>.\a.exe version
argv1: VERSION

This effects fossil when compiled with mingw:

>.\fossil version
.\fossil: unknown command: VERSION
.\fossil: use "help" for more information
1 Attachments

Related

Issues: #2160
Issues: #2182
Issues: #2183

Discussion

1 2 > >> (Page 1 of 2)
  • Jan Nijtmans
    Jan Nijtmans
    2013-09-23

    Additional information: It turns out that the current directory contained a file named "VERSION". Apparently this is a feature: If argv[1] looks like an existing filename, but it differs from it in case only, the argument is changed to match the file name exactly.

    Is there any compiler option or another way to switch this feature off? Not all
    programs have a filename as first commandline argument.......

     
  • Keith Marshall
    Keith Marshall
    2013-09-23

    This is a consequence of file name globbing, which applies by default for all unquoted arguments on the command line, coupled with the case-insensitive nature of Windows file system, which causes a glob expansion of the unquoted keyword "version" to match the existing file named "VERSION".

    You can disable globbing for any single argument by enclosing it in quotes; (double quotes by default, but _mingw.h describes an option to also accept single quotes). Alternatively, you can define the public symbol _CRT_glob, (an integer with C binding), with a value of zero, to disable globbing entirely. There are also other options, described in _mingw.h, which you may wish to consider; of particular interest in this case, may be:

    int _CRT_glob = __CRT_GLOB_USE_MINGW__ + __CRT_GLOB_CASE_SENSITIVE__;
    

    which would still allow globbing, using the MinGW algorithm, but would require file names to match case-sensitively, so that unquoted "version" would not match a file named "VERSION".

     
    Last edit: Keith Marshall 2013-09-23
  • Jan Nijtmans
    Jan Nijtmans
    2013-09-23

    If I compile fossil with mingw-w64 or msc (with globbing enabled),
    this behavior is different: globbing only makes sense when
    the argument at least contains the '*' character. My
    suggestion would be to let mingw do command line
    globbing the same as MSVC does it (even though I
    cannot find documentation on it). Experimenting,
    however, with a MSVC build of fossil:

    .\fossil version
    This is fossil version 1.27......
    .\fossil ver*ion
    .\fossil unknown command: VERSION
    .\fossil use "help" for more information

    See: [http://msdn.microsoft.com/en-us/library/8bch7bkk.aspx]
    (This describes how globbing is enabled in a MSVC build, it
    doesn't document the exact globbing algorithm).

    It appears that the mingw command line globbing is different
    from how Windows "operating system commands" do it.

     
  • Keith Marshall
    Keith Marshall
    2013-09-23

    Sometime during the lifetime of of WinXP or Vista, Microsoft changed their globbing algorithm. In the process, they broke it, beyond any hope of redemption. (Try to get a globbing token into a command line, with globbing enabled. You can't. Quoting, which should work, doesn't do the trick. Microsoft seem to think this broken crap is actually desirable behaviour, and they have refused to fix it).

    At the request of some of our users, we provided our own algorithm to correct the Microsoft breakage. Other GCC distributors for Windows may not use our runtime, so will not benefit from our improved globbing algorithm; that isn't any justification for us to not offer the enhancement. If you don't want to use it in your application, set _CRT_glob to one to revert to Microsoft's broken globbing, or to zero to disable it altogether, (which IIRC is the default for MSVC).

    That said, our new algorithm does have an internal (static) function to test for the presence of globbing tokens in the pattern it is evaluating. The behaviour you report suggests that this function may be bypassed in some circumstance; this is why I'd like you to at least try the

    int _CRT_glob = __CRT_GLOB_USE_MINGW__ + __CRT_GLOB_CASE_SENSITIVE__;
    

    work around, to confirm my diagnosis before I embark on what may turn out to be a wild goose chase.

     
  • Jan Nijtmans
    Jan Nijtmans
    2013-09-25

    Thanks! I tried your suggested:

    int _CRT_glob = __CRT_GLOB_USE_MINGW__ + __CRT_GLOB_CASE_SENSITIVE__;
    

    and it indeed helps. However, for compatibility with earlier releases,
    "int _CRT_glob = 1" gives the least surprises, no matter how broken it might be.

    I would prefer a _CRT_glob option which only uses globbing when the argument
    contains at least a single globbing character. Without any globbing character,
    whether the argument matches an available filename or not, it doesn't make sense
    to modify the argument and skipping the globbing will result in a faster startup
    normally. Doing globbing always is what caused the surprising behavior here.
    Anyway I understand what's going on now.

    Many thanks!

     
  • Keith Marshall
    Keith Marshall
    2013-09-25

    I tried your suggested:

    int _CRT_glob = __CRT_GLOB_USE_MINGW__ + __CRT_GLOB_CASE_SENSITIVE__;
    

    and it indeed helps.

    Thanks. That confirms that the problem is that the globbing algorithm is indeed bypassing a call to its static is_glob_pattern() function, at some point.

    However, for compatibility with earlier releases, "int _CRT_glob = 1" gives the least surprises, no matter how broken it might be.

    Of course, because that's exactly what previous versions did. The problem is that, from Vista onwards, that option denies you any chance of injecting a literal globbing token into any command line argument, because Microsoft broke the effect that quoting should deliver.

    I would prefer a _CRT_glob option which only uses globbing when the argument contains at least a single globbing character.

    It shouldn't need any _CRT_glob option; it should work that way regardless. This is a bug; it requires investigation. Earnie, since I wrote glob.c, feel free to reassign this to me.

    Without any globbing character, whether the argument matches an available filename or not, it doesn't make sense to modify the argument ...

    Except when the modification results from matching a globbing token, (but that's pretty much what you've said).

    and skipping the globbing will result in a faster startup normally.

    To some extent, probably; each argument still needs to be parsed character by character, through is_glob_pattern(), to check for unquoted globbing tokens, but we should be able to avoid calls to readdir(), (implemented using _findfirst() and _findnext() on Windows).

     
  • Keith Marshall
    Keith Marshall
    2013-09-25

    • status: unread --> assigned
    • Type: Support --> Bug
     
  • Earnie Boyd
    Earnie Boyd
    2013-09-25

    • assigned_to: Earnie Boyd --> Keith Marshall
     
  • Keith Marshall
    Keith Marshall
    2013-09-26

    • summary: "version" -> "VERSION" in command line. --> glob() may fold case in command args with no globbing tokens.
    • status: assigned --> open
     
  • Keith Marshall
    Keith Marshall
    2013-10-20

    • status: open --> pending
    • Resolution: none --> limbo
    • Category: Unknown --> Waiting_User_Response
    • Patch attached: False --> True
     
1 2 > >> (Page 1 of 2)