Hello,
std::regex constructor throws when passed an std::string with a certain value. The attached test program testRegex.cc exhibits the behavior: testRegex1 is constructed correctly, but testRegex2 throws.
Output of program under gcc 6.1.0 is in testRegex.output.
To compile: g++ -o testRegex -std=c++11 testRegex.cc
OS: Windows 7 Home Premium, Service Pack 1, AMD Phenom(tm) II X4 955 Processor 3.19 GHz, 2.46 GB ram, 64-bit OS.
Output of gcc -v is in gccVersion.output.
Output of uname -a is in uname.output.
Output of ld -v: GNU ld (GNU Binutils) 2.25
Problem occurs with gcc 6.1.0 and 5.3.0 (x86_64-posix-seh-rev0).
Problem does not occur with gcc 4.9.1 (x86_64-posix-seh-rev1).
Please let me know if you need more info, or if this is something I'm doing incorrectly.
Thank you,
Alban
We have never published GCC-6. We have published GCC-5.3.0, but never a 64-bit version, and most definitely never any that pretends to support SEH. Likely, you are reporting this to the wrong project.
That said, with the native GCC-4.8.2 on my LinuxMint Debian box, your example throws on the first of your regular expressions, (all of which look odd, to me, BTW):
while with a locally built GCC-5.3.0 substitute, I see:
(which is the same result as I see with my cross-hosted build of our
mingw32-g++
).I don't know what your objective with these regular expressions, (which do look odd to me), may be; if you believe they are valid, (and therefore, shouldn't throw), then this would seem to be an upstream GCC project issue; you should raise it on their bugzilla tracker.
Last edit: Keith Marshall 2016-08-23
FWIW, special characters lose their magic properties within regular expression character class specifications, so it isn't normal to escape them.
[\-\]
is surely an invalid character range specification; (if the intent is to match a literal-
, then it must be either first or last in the group;[.]
always matches a literal.
, and to include literal]
in the class, it must be specified as the first character.Thank you for looking into this; I appreciate your help. It seems that this is the wrong project for my MinGW distribution; I apologize for that.
I didn't know that to match a literal '-' it must be either the first or last in the group. I'm depending on http://www.cplusplus.com/reference/regex/ECMAScript/ for information about regular expressions, and didn't notice anything about '-' needing to be the first or last character in the group.
I appreciate your suggestion that this might be an upstream GCC project issue. I'll check that by verifying that the regular expressions are valid, and running my test program on gcc version 5.3.0 running under Linux.
You're welcome.
That's the rule for POSIX regular expressions, (with which I'm most familiar; I'm not at all familiar with the ECMAScript flavour).
Beyond knowing what it is, and that it defines the default regex grammar for C++11, I know next to nothing about ECMAScript. I don't know how reliable
cplusplus.com
is, as a reference source, but http://www.ecma-international.org/ecma-262/7.0/index.html#sec-classescape does suggest that perhaps[ClassRanges\-ClassRanges]
should identify thehyphen-minus
character as a literal member of theCharacterClass
, rather than as aClassRange
constructor, (which appears to be how GCC-5.3.0 may be interpreting it). If this is the case, then it would appear to be a defect in GCC's ECMAScript regex parser, which would warrant a report upstream.As I've already noted, GCC-5.3.0 on my Linux box causes your third example to throw, (just as it does with this project's GCC-5.3.0). If that expression is technically valid, as an ECMAScript regex, (and the specification suggests that it may be, although you really shouldn't escape the
[.]
character within theCharacterClass
), then you should report this upstream.