The combination of characters R" (i.e. a capital R followed by a quote) prevents the rest of the file from being syntax highlighted in C and C++ language modes, as demonstrated in this short C program:
#include <stdio.h>
#define R "buggy"
int main()
{
printf("Hello, "R" world!\n");
// this comment is no longer syntax highlighted!
return 0;
}
This has been observed on SciTE v3.2.5.
This has also been reported for Notepad++, which uses scintilla: https://sourceforge.net/p/notepad-plus/bugs/3692/
R" is a raw string literal.
http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals
The recognition of raw string literals could be tightened up to not trigger on this example.
Ah, I see. Didn't know this construct, thanks for pointing it out. In this case, the parsing rule should definitely be adapted so that this works correctly. In a regular expression, this might be something like this:
However, it only matches the first example on the Wikipedia page, not the one with an additional delimiter string, but that's not the point here.
Fixed by rejecting raw strings when character after " is in " )\\\t\v\f\n".
The C++11 standard documents this in the section "String literals".
Committed as [71d931].
Related
Commit: [71d931]
Are you sure that this fixes the problem consequently? If I modify the above example by moving the whitespace into the define, I would say it does not work either:
The fix was only for the case where the character immediately following the " is invalid. I do not know how the standard would interpret R"world!\n" since it looks like a raw string up until the \.
clang thinks it is a quite bogus raw string:
You are totally right. I just tried compilig this example with g++ using the -std=c++0x parameter in order to compile it with raw string support, and it reports an error:
So, what we could have expected is that you cannot define a macro called "R" in C++11, since the symbol already exists. However, it is a vaild example in pure C, so is there a way to separate the C parsing from the C++ parsing?
Furthermore, if we extend the macro name to any string ending on an "R" it compiles in C as well as in C++11:
So, the parser should evaluate the characters BEFORE the R, that's what I tried to express above with the regexp ^"\w?R"(.*?)".
Oh, forgot to indent the regexp:
The quote character has to be removed from the beginning, and I removed the end for clearness:
In plain words: The only allowed character sequences before the "R" are "u", "u8", "U" or "L". Everything else before it must not be an alphanumeric characer, otherwise it may not be treated as a raw string.
Is there an actual problem with the ABCR example with the updated Scintilla?
There could be a C-only mode but that could hide potential bugs when a file is reused as C++. Particularly a problem for headers.
I cannot try since I haven't built Scintilla from source. Have you tried it?
Looks fine to me. I haven't been able to work out what the difference is to the current behaviour that you are trying to convey with the regex.
Oh, you are right! Didn't realize that the ABCR example already worked in the released version. Was confused, because in Notepad++ it did not work. However, thanks for the fix!