PMD / Issues / #728 C++ CPD treats escapes as significant in strings

#728 C++ CPD treats escapes as significant in strings

Status: open

Owner: Tom Copeland

Labels: cpd (39)

Module:

Priority: 3

Type:

Affects version:

Ruleset / Rule:

Updated: 2012-10-07

Created: 2007-10-17

Creator: Dale King

Private: No

While looking into bug 1814737 I noticed that the tokenizer for C++ is treating differences in escape sequences as significant in strings even though those differences may not result in differences in the actual text generated.

In particular it would treat this string:

"abc"

as different from this string:

"a\
b\
c"

when they are really identical.

Similarly it would treat these as different strings:

"a\007b" vs. "a\x07b"

It would be more robust if CPD figured out what will actually be generated for the string to use in comparing the tokens.

A similar issue can occur with concatenated string literals as in

"ab" vs. "a" "b"

which generate the same thing.

Discussion

Tom Copeland - 2007-10-18

Logged In: YES
user_id=5159
Originator: NO

Hm, interesting... I'm not sure if I entirely agree, but I see where you're coming from there. I guess I'm used to thinking of a tokenizer as being able to produce exactly what was entered as input...

Yours,

Tom

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dale King - 2007-10-18

Logged In: YES
user_id=130378
Originator: YES

I don't think it really matters what the job of a tokenizer is. The point is what is the job of CPD.

The job of CPD is to find bits of code to find snippets of code that do not differ in a significant way. CPD ignores whitespace for between tokens because whitespace is insignificant to the code produced. Similarly the use of line continuation in a string is insignificant as is octal vs. hex escape codes and concatenating adjacent strings.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

C++ CPD treats escapes as significant in strings

A source code analyzer

Milestone

Searches

Help

#728 C++ CPD treats escapes as significant in strings

Discussion