#288 Unrelated #define can confuse cscope

open
nobody
None
5
2014-08-17
2013-08-11
dima
No

With the latest cscope (15.8a) it is possible for an unrelated #define to confuse the cscope find-global-definition search.

Say I have a header such as this:

#define x extern
int ef;

I'm observing that 'int ef' is not being found as a global definition. This is happening because the 'extern' on the previous line is confusing cscope into thinking that this isn't the GLOBAL definition. I tried debugging this a bit using the slightly more complicated file

#define y fextrnf
#define z adsf
#define x extern
int ef;
int ef2;

I'm attaching this file as 'e.h'. Here 'ef2' is found as a global, but 'ef' is not. This can be verified by

$ cat cscope.files
e.h

$ cscope -c -b -i cscope.files -f cscope.out

$ cscope -d -L1 ef

$ cscope -d -L1 ef2
e.h ef2 5 int ef2;

Debugging this, I clearly see that the problems occur when the cscope database is created. I can see that when looking at 'int ef' fscanner.l sees external=1 at the '.|\n' lexer rule, suggesting that the 'extern' is seen as applying to the 'int ef' definition.

Furthermore, I see that the "#define x extern" line matches the '("*"|{wsnl})+{identifier}' lexer rule, while the others do not. Unfortunately everything I know about lex I learned in the last 30 minutes, so I can't explain this yet. Hopefully this is enough for more knowledgeable people than me to fix this.

dima

1 Attachments
e.h

Discussion

  • dima
    dima
    2013-08-11

    OK, so looking at this more I see the problem. '#define x' is matched by the

    <IN_PREPROC>define{ws}+{identifier}
    

    rule in fscanner.l. This rule resets the matching state with

    BEGIN(INITIAL)
    

    The text "extern\nint ef" is thus matched more-or-less in isolation, which causes the issue. I'm attaching a patch to fix this problem. This fix looks at the 'ppdefine' variable to only act on the 'extern' if we're not in a pre-processor definition. The nice thing is that this patch is very small and applies ONLY to this case, and thus wouldn't have many unwanted side effects. The not-so-nice thing is that this patch doesn't fix any similar issues that the code may have. In the same general area in the code the code looks for 'enum', 'struct', 'union', 'template' and 'typedef'; I haven't looked for similar problems with those, and they remain unfixed with the attached patch. Thoughts?

    I'm attaching a slightly modified test header, and a small script that runs various cscope queries on this header to look for regressions. I'm also attaching the results of this script when run with cscope before and after the patch. The differences are

    $ diff -t -W60 -y patched.result baseline.result 
    should have 2 refs              should have 2 refs
    e.h <global> 2 #define int b    e.h <global> 2 #define int b
    
    should have 1 def               should have 1 def
    
    should have 2 defs              should have 2 defs
    e.h ef 5 int ef;             <
    e.h ef2 6 int ef2;              e.h ef2 6 int ef2;
    
    should have 2 refs              should have 2 refs
    e.h <global> 5 int ef;          e.h <global> 5 int ef;
    e.h <global> 6 int ef2;         e.h <global> 6 int ef2;
    
    should have 3 defs              should have 3 defs
    e.h x 4 #define x extern        e.h x 4 #define x extern
    e.h y 1 #define y fextrnf       e.h y 1 #define y fextrnf
    e.h z 3 #define z adsf          e.h z 3 #define z adsf
    
    should have 3 refs              should have 3 refs
    e.h <global> 4 #define x ext    e.h <global> 4 #define x ext
    e.h <global> 1 #define y fex    e.h <global> 1 #define y fex
    e.h <global> 3 #define z ads    e.h <global> 3 #define z ads
    
    should have 2 refs              should have 2 refs
    e.h y 1 #define y fextrnf       e.h y 1 #define y fextrnf
    e.h z 3 #define z adsf          e.h z 3 #define z adsf
    
    should have no ref              should have no ref
    

    So the patch has no detected effects other than fixing the bug. I think the patch is safe to use as is, but I can easily imagine other similar bugs lurk and they can be fixed together with this bug in a more generic way.

    Also, I feel like I should comment about the likelyhood that such code may appear in the wild. I found this bug because the codebase I was working on was doing something like this. This codebase has global variables defined in a header, and was #defining EXTERN selectively to avoid multiple definitions of said variables. Maybe only 'extern' is likely to be used in this way and not 'enum', 'struct', etc; then maybe the patch is fine as is.

    dima

     
    • This codebase has global variables defined in a header, and was #defining EXTERN
      selectively to avoid multiple definitions of said variables.

      But it would be highly unusual for source code relying on this old trick to actually ever contain a line like

      int eh;
      

      particularly not directly after the #define EXTERN extern line. That's because a) that #define is invariably inside an #ifdef of its own, and b) such code would spell that line

      EXTERN int eh;
      
       
      • dima
        dima
        2013-08-11

        I see you've seen this idiom before (I haven't before this). The file that showed this issue looks more like this (attached also):

        /**
         * asdf
        */
        
        #ifndef E2_H
        #define E2_H
        
        #ifdef X
        #define X                // asdf
        #else
        #define X      extern    // qwer
        #endif
        
        /*
         * qwer
        */
        
        #define A B
        /*
         * 1234
        */
        X int * ef3;
        

        The unwanted effect of the 'extern' persists all the way to the end, and the patch fixes it.

         
        Attachments
  • Unfortunately the patch does cause problems elsewhere. It causes multi-line preprocessor conditionals like this (found in an X11 header)

    #if (defined(__GNUC__) && ((__GNUC__ * 100 + __GNUC_MINOR__) >= 205)) \
        || (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590))
    

    to be interpreted as a global definition of __SUNPRO_C

     
  • dima
    dima
    2013-08-12

    OK... I'm not sure what you are seeing. If I modify my test header to place that multi-line #if on top I see cscope detect a global definition of __SUNPRO_C both before and after my patch. I do agree this is wrong.

    What I ALSO see is that the previously-attached patch no longer has the desired effect when reading this augmented test header; as before the 'ef' definition is skipped. Turns out this is because that multi-line #if tickled an entirely different bug in cscope. There are several places in fscanner.l that check to see if we're in a preprocessor construct with this code:

    ppdefine == NO && my_yytext[0] != '#'
    

    This picks up single and multi-line #define statements, and single-line #if statements only. With the above #if the '__SUNPRO_C >= 0x590' text was being construed as an initializer. This, in turn, was causing 'int ef' to not be picked up as a global definition. This is an entirely new bug.

    I fixed this faulty preprocessor check to function with multi-line #if statements also. This makes 'int ef' be detected as a global definition AND makes __SUNPRO_C not be detected as one; these were both caused by the above faulty test.

    I'm attaching the two patches (the first patch should be the same as before). I'm also attaching the modified test header; it's the same as originally posted but with the additional #if from above. I'm also attaching an augmented test script. The difference between the unpatched test run and a post-both-patches run:

    $ diff -t -W60 -y result.baseline result.extern_multiline_if_patch 
    should have 0 defs              should have 0 defs
    e.h __SUNPRO_C 2 || (defined <
    
    should have 1 refs              should have 1 refs
    e.h <global> 5 #define int b    e.h <global> 5 #define int b
    
    should have 0 defs              should have 0 defs
    
    should have 2 defs              should have 2 defs
                                 >  e.h ef 8 int ef;
    e.h ef2 9 int ef2;              e.h ef2 9 int ef2;
    
    should have 2 refs              should have 2 refs
    e.h <global> 8 int ef;          e.h <global> 8 int ef;
    e.h <global> 9 int ef2;         e.h <global> 9 int ef2;
    
    should have 3 defs              should have 3 defs
    e.h x 7 #define x extern        e.h x 7 #define x extern
    e.h y 4 #define y fextrnf       e.h y 4 #define y fextrnf
    e.h z 6 #define z adsf          e.h z 6 #define z adsf
    
    should have 3 refs              should have 3 refs
    e.h <global> 7 #define x ext    e.h <global> 7 #define x ext
    e.h <global> 4 #define y fex    e.h <global> 4 #define y fex
    e.h <global> 6 #define z ads    e.h <global> 6 #define z ads
    
    should have 2 refs              should have 2 refs
    e.h y 4 #define y fextrnf       e.h y 4 #define y fextrnf
    e.h z 6 #define z adsf          e.h z 6 #define z adsf
    
    should have no ref              should have no ref
    

    As noted previously the bogus __SUNPRO_C definition is gone, while the valid 'ef' definition is now detected.

    This second patch should be OK, but it's clearly more invasive than the last. I'm a bit uneasy about asking for this to be merged without a test suite. Are there some tests people run to validate a release? I will certainly run a patched cscope myself from this point on to see if any issues come up, but this isn't really enough. Hans-Bernhard, does the patch look ok to you, at least?

     
    • dima
      dima
      2013-08-14

      The second patch has a regression. I inadvertently changed the
      way preprocessor statements other than #if... and #define were
      handled. Thus the original 'extern' bug could be re-triggered with

      #warning asfd qewr extern
      int a;
      

      I'm attaching new patches, test script, test header that fix this
      regression. Once again patch 0001 should be the same as before.
      Additionally constructs such as multi-line #warnings now work.
      Maybe these aren't standard but at least gcc supports them.

      Test diff:

      $ diff -t -W60 -y result.baseline result.patched 
      should have 0 defs              should have 0 defs
      e.h __SUNPRO_C 2 || (defined <
      
      should have 1 refs              should have 1 refs
      e.h <global> 5 #define int b    e.h <global> 5 #define int b
      
      should have 0 defs              should have 0 defs
      
      should have 2 defs              should have 2 defs
                                   >  e.h ef 8 int ef;
      e.h ef2 9 int ef2;              e.h ef2 9 int ef2;
      
      should have 2 refs              should have 2 refs
      e.h <global> 8 int ef;          e.h <global> 8 int ef;
      e.h <global> 9 int ef2;         e.h <global> 9 int ef2;
      
      should have 3 defs              should have 3 defs
      e.h x 7 #define x extern        e.h x 7 #define x extern
      e.h y 4 #define y fextrnf       e.h y 4 #define y fextrnf
      e.h z 6 #define z adsf          e.h z 6 #define z adsf
      
      should have 3 refs              should have 3 refs
      e.h <global> 7 #define x ext    e.h <global> 7 #define x ext
      e.h <global> 4 #define y fex    e.h <global> 4 #define y fex
      e.h <global> 6 #define z ads    e.h <global> 6 #define z ads
      
      should have 2 refs              should have 2 refs
      e.h y 4 #define y fextrnf       e.h y 4 #define y fextrnf
      e.h z 6 #define z adsf          e.h z 6 #define z adsf
      
      should have no ref              should have no ref
      
      should have 1 def               should have 1 def
                                   >  e.h a 14 int a;
      
      should have 1 ref               should have 1 ref
      e.h <global> 14 int a;          e.h <global> 14 int a;
      
      should have 1 def               should have 1 def
      e.h b 18 int b;                 e.h b 18 int b;
      
      should have 1 ref               should have 1 ref
      e.h <global> 18 int b;          e.h <global> 18 int b;
      
      should have 1 def               should have 1 def
      e.h c 20 int c;                 e.h c 20 int c;
      
      should have 1 ref               should have 1 ref
      e.h <global> 20 int c;          e.h <global> 20 int c;
      

      So these patches

      1. Get rid of the bogus __SUNPRO_C definition
      2. Make '#define X extern' work
      3. Make multi-line preprocessor expressions work (such as #warning)