PMD / Issues / #379 CPD C grammar problems

#379 CPD C grammar problems

Status: open

Owner: nobody

Labels: None

Module:

Priority: 5

Type:

Affects version:

Ruleset / Rule:

Updated: 2014-08-28

Created: 2005-10-25

Creator: Tom Copeland

Private: No

Thx to Jarkko Hietaniemi for the report!

=========================
(1) In VMS C it is legal to have '$' in identifiers, e.g.

            if (!((__vmssts = sys$delprc(&proc,0))

& 1)) {

or

              $DESCRIPTOR(msgdsc,msg);

(also seen in code for another rarer OS, VOS)

(2) String contant lines ending with backslashes

            DEBUG_o( Perl_deb(aTHX_ "Resolving

method %"SVf256\ "' for overloaded%s' in
package `%.256s'\n",
GvSV(gv), cp,
HvNAME(stash)) );

where SVf256 is

    #define SVf "_"
    #define SVf256 ".256"SVf

or

Perl_croak(aTHX_ "suidperl is no longer needed

since the kernel can
now execute\n\
setuid perl scripts securely.\n");

(3) Not recognizing the "\xHH" notation?

    if (SvCUR(TARG) == 0 ||

!is_utf8_string((U8*)tmps, SvCUR(TARG)) ||
memEQ(tmps, "\xef\xbf\xbd\0", 4)) {

dies on the memEQ constant string argument, or

             char *t0 = "\xcc\x88\xcc\x81";

========================

Discussion

Tom Judge - 2006-01-06

Logged In: YES
user_id=416200

I have submitted a patch that fixes the multi line string
literal with lines ending \, it is patch request: 1398501.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tom Copeland - 2006-04-11

Logged In: YES
user_id=5159

Updating with current list of problems:

================
(1) the syntax for hexadecimals in string literals is NOT
"\0xabcd", but instead "\xabcd"! Ditto for character literals.
(2) adding "LL" for long longs (didn't feel like adding
another token type since I don't know what that would entail
so instead hitch a ride on the longs)
(3) "L" is not a valid suffix on float literals!
================

and the patch from Jarkko:

================
--- etc/grammar/cpp.jj.dist 2006-04-09
16:57:44.000000000 +0300
+++ etc/grammar/cpp.jj 2006-04-09 17:37:44.000000000 +0300
@@ -284,26 +284,25 @@
TOKEN [IGNORE_CASE] :
{
< OCTALINT : "0" (["0"-"7"])* >
-| < OCTALLONG : <octalint> "l" >
-| < UNSIGNED_OCTALINT : <octalint> "u" >
-| < UNSIGNED_OCTALLONG : <octalint> ("ul" | "lu") >
+| < OCTALLONG : <octalint> ("l")? >
+| < UNSIGNED_OCTALINT : <octalint> ("u")? >
+| < UNSIGNED_OCTALLONG : <octalint> ("ul" | "lu" | "ull" |
"llu" )? ></octalint></octalint></octalint></octalint></octalint></octalint>

| < DECIMALINT : ["1"-"9"] (["0"-"9"])* >
-| < DECIMALLONG : <decimalint> ["u","l"] >
-| < UNSIGNED_DECIMALINT : <decimalint> "u" >
-| < UNSIGNED_DECIMALLONG : <decimalint> ("ul" | "lu") >
-
-
-| < HEXADECIMALINT : "0x" (["0"-"9","a"-"f"])+ >
-| < HEXADECIMALLONG : <hexadecimalint> (["u","l"])? >
-| < UNSIGNED_HEXADECIMALINT : <hexadecimalint> "u" >
-| < UNSIGNED_HEXADECIMALLONG : <hexadecimalint> ("ul" |
"lu") >
+| < DECIMALLONG : <decimalint> ("l")? >
+| < UNSIGNED_DECIMALINT : <decimalint> ("u")? >
+| < UNSIGNED_DECIMALLONG : <decimalint> ("ul" | "lu" |
"ull" | "llu")? >
+
+| < HEXADECIMALINT : "0x" (["0"-"9","a"-"f","A"-"F"])+ >
+| < HEXADECIMALLONG : <hexadecimalint> ("l")? >
+| < UNSIGNED_HEXADECIMALINT : <hexadecimalint> ("u")? >
+| < UNSIGNED_HEXADECIMALLONG : <hexadecimalint> ("ul" |
"lu" | "ull" | "llu")? ></hexadecimalint></hexadecimalint></hexadecimalint></decimalint></decimalint></decimalint></hexadecimalint></hexadecimalint></hexadecimalint></decimalint></decimalint></decimalint>

| < FLOATONE : ((["0"-"9"])+ "." (["0"-"9"]) |
(["0"-"9"]) "." (["0"-"9"])+)
- ("e" (["-","+"])? (["0"-"9"])+)? (["f","l"])? >
+ ("e" (["-","+"])? (["0"-"9"])+)? (["f"])? >

-| < FLOATTWO : (["0"-"9"])+ "e" (["-","+"])? (["0"-"9"])+
(["f","l"])? >
+| < FLOATTWO : (["0"-"9"])+ "e" (["-","+"])? (["0"-"9"])+
(["f"])? >
}

TOKEN :
@@ -318,7 +317,7 @@
|
["1"-"9"] (["0"-"9"])
|
- ("0x" | "0X") (["0"-"9","a"-"f","A"-"F"])+
+ ("x" | "X") (["0"-"9","a"-"f","A"-"F"])+
)
)
)
@@ -333,7 +332,7 @@
|
["1"-"9"] (["0"-"9"])
|
- ("0x" | "0X") (["0"-"9","a"-"f","A"-"F"])+
+ ("x" | "X") (["0"-"9","a"-"f","A"-"F"])+
)
)
)*
================

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tom Copeland - 2006-04-19

Logged In: YES
user_id=5159

Clarification for # 1:

===================
Maybe a clarification to this that inside string and character
constants the rule is "\x" followed by ONE OR TWO hexadecimal
digits and or octals, it is ONE TO THREE octal constants:

"\xa" string one chars long (plus the terminating \0) "\xAB" string one chars long (...) "\xaby" string two chars long '\xa' one char '\xAB' one char '\xABy' ERROR "\7" string one chars long (plus ...) "\123" string one chars long "\123y" string two chars long '\7' one char '\123' one char '\123y' ERROR

My example of \xabcd is a bit misleading: depending on the
C compiler it either parses the 'ab' as the character '\xab'
and then the characters 'c' and 'd', or it warns or throws
an error that it cannot fit 0xabcd into a char.
===================
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tom Copeland - 2006-06-27

Logged In: YES
user_id=5159

Last test file (maybe):

include <stdio.h></stdio.h>

int main() {
printf("s = [%s]\n", "foo"\
"bar");
return 0;
}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

CPD C grammar problems

A source code analyzer

Milestone

Searches

Help

#379 CPD C grammar problems

Discussion

include <stdio.h></stdio.h>