Menu

#2311 std::regex constructor throws with std::string input

OTHER
upstream
nobody
None
Bug
none
Unknown
False
2016-08-25
2016-08-22
Alban Deniz
No

Hello,

std::regex constructor throws when passed an std::string with a certain value. The attached test program testRegex.cc exhibits the behavior: testRegex1 is constructed correctly, but testRegex2 throws.

Output of program under gcc 6.1.0 is in testRegex.output.

To compile: g++ -o testRegex -std=c++11 testRegex.cc

OS: Windows 7 Home Premium, Service Pack 1, AMD Phenom(tm) II X4 955 Processor 3.19 GHz, 2.46 GB ram, 64-bit OS.

Output of gcc -v is in gccVersion.output.

Output of uname -a is in uname.output.

Output of ld -v: GNU ld (GNU Binutils) 2.25

Problem occurs with gcc 6.1.0 and 5.3.0 (x86_64-posix-seh-rev0).

Problem does not occur with gcc 4.9.1 (x86_64-posix-seh-rev1).

Please let me know if you need more info, or if this is something I'm doing incorrectly.

Thank you,
Alban

4 Attachments

Discussion

  • Keith Marshall

    Keith Marshall - 2016-08-23
    • status: unread --> upstream
     
  • Keith Marshall

    Keith Marshall - 2016-08-23

    We have never published GCC-6. We have published GCC-5.3.0, but never a 64-bit version, and most definitely never any that pretends to support SEH. Likely, you are reporting this to the wrong project.

    That said, with the native GCC-4.8.2 on my LinuxMint Debian box, your example throws on the first of your regular expressions, (all of which look odd, to me, BTW):

    $ g++ --version
    g++ (Debian 4.8.2-1) 4.8.2
    Copyright (C) 2013 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    $ g++ -std=gnu++11 testRegex.cc 
    $ ./a.out
    Trying:  [:[:alnum:]\.]
    Exception:  regex_error
    

    while with a locally built GCC-5.3.0 substitute, I see:

    $ PATH=~/gcc-native/bin g++ --version
    g++ (GCC) 5.3.0
    Copyright (C) 2015 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    $ PATH=~/gcc-native/bin:$PATH g++ -static -std=gnu++11 testRegex.cc 
    $ ./a.out
    Trying:  [:[:alnum:]\.]
    Trying:  [:\.[:alnum:]\-]
    Trying:  [:[:alnum:]\-\.]
    Exception:  regex_error
    

    (which is the same result as I see with my cross-hosted build of our mingw32-g++).

    I don't know what your objective with these regular expressions, (which do look odd to me), may be; if you believe they are valid, (and therefore, shouldn't throw), then this would seem to be an upstream GCC project issue; you should raise it on their bugzilla tracker.

     

    Last edit: Keith Marshall 2016-08-23
    • Keith Marshall

      Keith Marshall - 2016-08-23

      FWIW, special characters lose their magic properties within regular expression character class specifications, so it isn't normal to escape them. [\-\] is surely an invalid character range specification; (if the intent is to match a literal -, then it must be either first or last in the group; [.] always matches a literal ., and to include literal ] in the class, it must be specified as the first character.

       
  • Alban Deniz

    Alban Deniz - 2016-08-25

    Thank you for looking into this; I appreciate your help. It seems that this is the wrong project for my MinGW distribution; I apologize for that.

    I didn't know that to match a literal '-' it must be either the first or last in the group. I'm depending on http://www.cplusplus.com/reference/regex/ECMAScript/ for information about regular expressions, and didn't notice anything about '-' needing to be the first or last character in the group.

    I appreciate your suggestion that this might be an upstream GCC project issue. I'll check that by verifying that the regular expressions are valid, and running my test program on gcc version 5.3.0 running under Linux.

     
    • Keith Marshall

      Keith Marshall - 2016-08-26

      Thank you for looking into this ...

      You're welcome.

      I didn't know that to match a literal '-' it must be either the first or last in the group.

      That's the rule for POSIX regular expressions, (with which I'm most familiar; I'm not at all familiar with the ECMAScript flavour).

      I'm depending on http://www.cplusplus.com/reference/regex/ECMAScript/ for information about regular expressions, and didn't notice anything about '-' needing to be the first or last character in the group.

      Beyond knowing what it is, and that it defines the default regex grammar for C++11, I know next to nothing about ECMAScript. I don't know how reliable cplusplus.com is, as a reference source, but http://www.ecma-international.org/ecma-262/7.0/index.html#sec-classescape does suggest that perhaps [ClassRanges\-ClassRanges] should identify the hyphen-minus character as a literal member of the CharacterClass, rather than as a ClassRange constructor, (which appears to be how GCC-5.3.0 may be interpreting it). If this is the case, then it would appear to be a defect in GCC's ECMAScript regex parser, which would warrant a report upstream.

      I appreciate your suggestion that this might be an upstream GCC project issue. I'll check that by verifying that the regular expressions are valid, and running my test program on gcc version 5.3.0 running under Linux.

      As I've already noted, GCC-5.3.0 on my Linux box causes your third example to throw, (just as it does with this project's GCC-5.3.0). If that expression is technically valid, as an ECMAScript regex, (and the specification suggests that it may be, although you really shouldn't escape the [.] character within the CharacterClass), then you should report this upstream.