Menu

#728 C++ CPD treats escapes as significant in strings

open
cpd (39)
3
2012-10-07
2007-10-17
Dale King
No

While looking into bug 1814737 I noticed that the tokenizer for C++ is treating differences in escape sequences as significant in strings even though those differences may not result in differences in the actual text generated.

In particular it would treat this string:

"abc"

as different from this string:

"a\
b\
c"

when they are really identical.

Similarly it would treat these as different strings:

"a\007b" vs. "a\x07b"

It would be more robust if CPD figured out what will actually be generated for the string to use in comparing the tokens.

A similar issue can occur with concatenated string literals as in

"ab" vs. "a" "b"

which generate the same thing.

Discussion

  • Tom Copeland

    Tom Copeland - 2007-10-18

    Logged In: YES
    user_id=5159
    Originator: NO

    Hm, interesting... I'm not sure if I entirely agree, but I see where you're coming from there. I guess I'm used to thinking of a tokenizer as being able to produce exactly what was entered as input...

    Yours,

    Tom

     
  • Dale King

    Dale King - 2007-10-18

    Logged In: YES
    user_id=130378
    Originator: YES

    I don't think it really matters what the job of a tokenizer is. The point is what is the job of CPD.

    The job of CPD is to find bits of code to find snippets of code that do not differ in a significant way. CPD ignores whitespace for between tokens because whitespace is insignificant to the code produced. Similarly the use of line continuation in a string is insignificant as is octal vs. hex escape codes and concatenating adjacent strings.

     

Log in to post a comment.