Scintilla / Bugs / #1966 C preprocessor highlighting bug if a variable not defined

Neil Hodgson - 2017-08-05

labels: preprocessor --> preprocessor, lexer, cpp, scintilla

status: open --> open-accepted

Priority: 5 --> 4
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jannick - 2017-08-07

Neil, here a patch on the issue and a little bit beyond covering the below where steps are chopped to commit pieces:

undefined variables default to 0 in expression expansion (original issue)

evaluation of multiple !s

precedence of arithmetic expressions

return value of expression expansion procedure

EOL treatment on Windows (CRLF)

Separate question after focussing on the parser feature only: Is there a way to feed a list of ex-ante defines (like those defined by the compiler itself) into the lexer?

Hope the patch is helpful after it passed several checks here. Just let me know.

Thanks,
/J.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jannick - 2017-08-12

Updated patch (including commits of the previous patch):

added index guard when erasing vector items
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ollydbg - 2017-08-12

Is there a way to feed a list of ex-ante defines (like those defined by the compiler itself) into the lexer?

In my post here: Lexcpp: allow white space when reading hte macro definition - Google Groups

I'm trying adding macro definition to scintilla, but it looks like scintilla does not support such definition

#define mydefine(x,y) x + y

While for scintilla, you should gives a string like "mydef(x,y)=x + y"

Because scintilla will cut the definition by spaces.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2017-08-12

There are 8 change sets now so this is difficult to evaluate. Some appear wrong. For example, the second change set only remembers the first token as the value of a prreprocessor definition so may give bad results with a multi-token definition such as

#define A 5 - 5

Change set 7 is doing something with \r\n but its unclear what since no example is shown.

The C++ lexer has added style metadata for 4.0.0 (releasing in 2 days) so the line numbers no longer match up.

The 5th set of keywords are used to predefine macros. Keyword lists are first split by space so that means the macros themselves can not contain spaces.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jannick - 2017-08-15
  
  There are 8 change sets now so this is difficult to evaluate.
  
  Not sure how you usually deal with updates, since I thought it would be good to apply the usual practice to chop things into logical tiny pieces. If this is not feasible for you, simply take the whole patch and run tests (I haven't found a test suite in the git repository I am afraid to say) against the new patch version.
  
  Some appear wrong. For example, the second change set only remembers the first token as the value of a prreprocessor definition so may give bad results with a multi-token definition such as
  
  #define A 5 - 5
  
  True - thanks. Here I was a bit too restrictive regarding the definition string of #defines if provided. The latest patch attached remedies this.
  
  Change set 7 is doing something with \r\n but its unclear what since no example is shown.
  
  Well, I think no example needs to be shown, since this deals with the dangling line end issue differently handled on the various platforms: On Windows, e.g., the line end is marked with two characters '\r' '\n' and the patch simply swallows the additional '\r'.
  
  The C++ lexer has added style metadata for 4.0.0 (releasing in 2 days) so the line numbers no longer match up.
  
  I am not sure if I am following. If you are referring to the starting point of the patches, then this is SciTe 3.7.5 as given in the first posting. Since everything is hosted here on SF and the github is just a mirror, I was hesitating posting pull-requests there, so that I better leave it with you to merge the patches into the code which should be an easy but manual exercise; with pull-requests it would certainly have been easier.
  
  The 5th set of keywords are used to predefine macros. Keyword lists are first split by space so that means the macros themselves can not contain spaces.
  
  ... umm. Probably this refers to my question on how to feed predefined keywords into the lexer - I'll have a look at that shortly.Does that include simply keywords or rather a keyword dictionary, i.e. pairs (keyword, definition string) like in the lexer file here?
  
  Separately, attached an updated patch as always against SciTe 3.7.5 (incl. the remedy for the issue you raised - thanks again) and I am including a test cases showing false positives and negatives (against ScitTe 3.7.5) which are all well captured by the latest patch together with many others more (file attached too):
  
  // false highlighting in SciTe 3.7.5 #define A #if A true #endif #if A == false #endif #if A == 2 false #endif #if !!! defined A false #endif #if 1 + 2 * 3 == 9 false #endif #if 1 + 2 * 3 == 7 true #endif #if !!!!defined B false #endif #if ! defined B || B == 0 true #endif #define f(x) 1 + f(x) // crash ... //#if f(1) //#endif
  
  I would be great if this could find its way into scintilla (release date postponed to Aug 16, 2016, if I am not mistaken), since it appears that the CPP lexer would be far more useful if less buggy in terms of false positives and negatives.
  
  BTW: I do not get any email alert from SF on the postings although I am subscribed to this thread. I happened to see your reply today. So sorry for the late reply.
  
  2017-08-15_13.44.09_SciTe375_LexCPP_ParsingRules.patch
  
  test1.cpp
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2017-08-15

Not sure how you usually deal with updates, since I thought it would be good to apply the usual practice to chop things into logical tiny pieces.

The chunks are semi-independent making it difficult to understand what is required to fix the bug, as reported, and what is being added for other reasons. For example the line end change does not appear to be addressing the bug. Segmenting a change set may be appropriate if it enhances understandability but this set obscures more than it enlightens.

Well, I think no example needs to be shown, since this deals with the dangling line end issue differently handled on the various platforms: On Windows, e.g., the line end is marked with two characters '\r' '\n' and the patch simply swallows the additional '\r'.

The 'EOL treatment' chunk changes nothing with "\r\n": line endings. Its a forward loop which terminates early when a '\r' is found. If you have an example where the patch changes behaviour then show it.

The 'index guard' chunk may be checking without need since the conditions leading to the erasures should prevent the possibility of failure. If there are examples where failure may occur then that is interesting but the code should not be made more complex without need.

Changes should use the same style of coding as the original file. Indentation of this file uses tabs and there is no space after '('.

I am not sure if I am following. If you are referring to the starting point of the patches, then this is SciTe 3.7.5 as given in the first posting. Since everything is hosted here on SF and the github is just a mirror, I was hesitating posting pull-requests there, so that I better leave it with you to merge the patches into the code which should be an easy but manual exercise; with pull-requests it would certainly have been easier.

The Scintilla repository is also on SF at https://sourceforge.net/p/scintilla/code/ci/default/tree/ . Since 4.0.0 is a complex major release, it has taken more time then most releases and the repository has diverged further from the last release making it more difficult to reconcile change sets. Continue using patch files - the pull request mechanism is unpleasant and not monitored.

Before a release, the project is frozen and only regressions are fixed. All other changes are delayed.

Keywords 5 is a set of NAME{=VALUE} preprocessor definitions - the example in SciTE's cpp.properties is

keywords5.$(file.patterns.cpp)=_MSC_VER SCI_NAMESPACE GTK_MAJOR_VERSION=2

In the example, there is

#define f(x) 1 + f(x) // crash ... #if f(1) #endif

This does not crash for me - there is a limit to evaluation of 100 iterations that should stop excessive recursion.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

3 changes seem simple:
1. #define A -> #define A 1
2. defined A removes A before replacing defined with value
3. Not found identifier is replaced with 0

diff -r ccc04ea8f7fd lexers/LexCPP.cxx
--- a/lexers/LexCPP.cxx Fri Aug 18 09:09:25 2017 +1000
+++ b/lexers/LexCPP.cxx Fri Aug 18 18:11:47 2017 +1000
@@ -1333,6 +1333,8 @@
                                    while ((startValue < restOfLine.length()) && IsSpaceOrTab(restOfLine[startValue]))
                                        startValue++;
                                    std::string value = restOfLine.substr(startValue);
+                                   if (OnlySpaceOrTab(value))
+                                       value = "1";    // No value defaults to 1
                                    preprocessorDefinitions[key] = value;
                                    ppDefineHistory.push_back(PPDefinition(lineCurrent, key, value));
                                    definitionsChanged = true;
@@ -1520,6 +1522,7 @@
                if (it != preprocessorDefinitions.end()) {
                    val = "1";
                }
+               tokens.erase(tokens.begin() + i + 1, tokens.begin() + i + 2);
            }
            tokens[i] = val;
        } else {
@@ -1582,8 +1585,8 @@
                    tokens.insert(tokens.begin() + i, macroTokens.begin(), macroTokens.end());
                }
            } else {
-               // Identifier not found
-               tokens.erase(tokens.begin() + i);
+               // Identifier not found and value defaults to zero
+               tokens[i] = "0";
            }
        } else {
            i++;

Jannick - 2017-08-18

Great - here some comments on both your email response and your patch (based on the brand new version 4.0.0):

patch @@ 1520 - what happens if tokens.size() = i + 2?

Just downloaded SciTe400.exe and run the crash example. Please type the code snip in instead of copying it into the editor. Doesn't it break for you? Here it still breaks in version 4.0.0 as it did in version 3.7.5.

Is there any test suite? I could not find that in the repository I must admit. Here more examples of false positives / negatives in version 4.0.0:

#if = false #endif #if /* ... */ 1 true #endif

Last edit: Jannick 2017-08-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2017-08-22

If there are any problems with the 3 change change set I posted above, I'd much prefer to fix them before committing instead of committing and then fixing the problem with another change. Including regressions in history can complicate bug bisection and cause problems for downstreams that are cherry-picking change sets.

If tokens.size() = i + 2 then I think the invocation is malformed and should be abandoned since there should be a ')' to balance the '('.
It may be safer to give up on this invocation with:

if (tok >= tokens.size()) { i++; continue; }

There are lexer tests run by scintilla/tests/lexTests.py with example files in scintilla/tests/examples. The example for C++ is x.cxx and its styled form is in x.cxx.styled. If a changed lexer disagrees with x.cxx.styled, a new file x.cxx.new contains the new styled form. On Windows, a SciLexer.DLL should be in scintilla/bin and it must be 32-bit for a 32-bit build of Python and 64-bit for a 64-bit Python. On Linux this test requires a Qt/PySide build of ScintillaEditPy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2017-08-28

Committed 3 change sets as [89e440], [e4cfae], and [f5aeba].

Related

Commit: [89e440]
Commit: [e4cfae]
Commit: [f5aeba]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2017-10-25

status: open-accepted --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2017-11-13

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

C preprocessor highlighting bug if a variable not defined

Group

Searches

Help

#1966 C preprocessor highlighting bug if a variable not defined

Discussion

Related