From: Andre B. <and...@gm...> - 2003-04-17 19:21:11
|
hello together, I'm currently working on the "declaration" parsing. That's stuff like: typedef class x { ... } x_type; const int * (*f)(...); I've analyzed the C++ Spec. and identified the following elements. However I'm not yet sure if the parsing has to go that deep at the moment. Maybe at implementation phase I decide to reduce the level for the first. Here comes a proposal for a possible AST structure: =============================================================== * Declaration-Level-Elements // each declaration consists of a list of specifier and a list of declarators (with optional init part) [simple-declaration] declaration-specifier-property => [declaration-specifier-list] // specifiers for all declarators... n [init-declarator] // each declaration can declare 'n' identifier [declaration-specifier-list] // type declaration is organized as list n [declaration-specifier] [init-declarator] // each declarator consist of a declarator identifier, initialize-part and the declarator-description (pointer, const, ref etc.) ?identifier-property => [#declarator-identifier] ?initializer => [expression] declarator-property => [unparsed-declarator] , [declarator] [unparsed-declarator] #mutate: declarator [declarator] // declarators are organized hierarchical, the property 'sub-declarator-property' links to the sub element. ?sub-declarator-property => [declarator] ?[function-declarator] // int ... (...) ?[array-declarator] // int ... [...] ?[#braces] // int (...) ?[ptr-declarator] // int (* ...) ?[#ref-declarator] // int (& ...) ?[named-ptr-declarator] // int (::c::d * ...) [declaration-specifier] ?[#type-specifier] // "simple-type-specifier" (char, wchar_t, bool, etc.) also "cv-qualifier" ---> see C++ Spec. ?[class-specifier] // full class spec: "class x { ... }" ?[enum-specifier] // full enum spec ?[elaborated-type-specifier] // something like "class xxx" or "struct y" =============================================================== I will put this on a wiki page for discussion and add examples which use this ast. Just feel free to send your opinion ! greetings, André |
From: Baptiste L. <gai...@fr...> - 2003-04-23 08:25:02
|
I'll give this a more thorough look later on. A few primilary remarks though: Our 'C++' is wider than the standard (we have single pass parsing). For example, we should support class declaration such as: class RFTA_API TextDocument { }; Were RFTA_API is stored as the optional 'export macro name'. This is a widely used idiom on Win32. The AST should capture all data required for the source code manipulation (capture the range of all element that might be removed, inserted, modified). A primary goal would probably be able to identify the function/method bodies (this would solve the issue you raised concerning the starting compound statement for variable declaration search). Next, I would probably go for class parsing (rough, just extract the name, methods, and method/function implementation parsing. Having those two open doors for ExtractInterface refactoring (just needs the class methods), and ExtractMethod refactoring (need to add a method in the class declaration, adding a new method body, and know what are the method parameters). Baptiste. ----- Original Message ----- From: "Andre Baresel" <and...@gm...> To: "CppTool Mailing List" <Cpp...@li...> Sent: Thursday, April 17, 2003 9:26 PM Subject: [Cpptool-develop] Declaration parsing started... > hello together, > > I'm currently working on the "declaration" parsing. > > That's stuff like: > > typedef class x { ... } x_type; > const int * (*f)(...); > > I've analyzed the C++ Spec. and identified the following elements. > However I'm not yet sure if the parsing has to go that deep at the moment. > Maybe at implementation phase I decide to reduce the level for the first. > > Here comes a proposal for a possible AST structure: > =============================================================== > * Declaration-Level-Elements > > // each declaration consists of a list of specifier and a list of > declarators (with optional init part) > [simple-declaration] > declaration-specifier-property => [declaration-specifier-list] // > specifiers for all declarators... > n [init-declarator] // each declaration can declare 'n' identifier > > [declaration-specifier-list] // type declaration is organized as list > n [declaration-specifier] > > [init-declarator] // each declarator consist of a > declarator identifier, initialize-part and the declarator-description > (pointer, const, ref etc.) > ?identifier-property => [#declarator-identifier] > ?initializer => [expression] > declarator-property => [unparsed-declarator] , [declarator] > > [unparsed-declarator] > #mutate: declarator > > [declarator] // declarators are organized > hierarchical, the property 'sub-declarator-property' links to the sub > element. > ?sub-declarator-property => [declarator] > ?[function-declarator] // int ... (...) > ?[array-declarator] // int ... [...] > ?[#braces] // int (...) > ?[ptr-declarator] // int (* ...) > ?[#ref-declarator] // int (& ...) > ?[named-ptr-declarator] // int (::c::d * ...) > > [declaration-specifier] > ?[#type-specifier] // "simple-type-specifier" (char, wchar_t, bool, > etc.) also "cv-qualifier" ---> see C++ Spec. > ?[class-specifier] // full class spec: "class x { ... }" > ?[enum-specifier] // full enum spec > ?[elaborated-type-specifier] // something like "class xxx" or "struct y" > =============================================================== > > I will put this on a wiki page for discussion > and add examples which use this ast. > > Just feel free to send your opinion ! > > greetings, > André |
From: Andre B. <and...@gm...> - 2003-04-25 06:44:16
|
Baptiste Lepilleur wrote: >I'll give this a more thorough look later on. A few primilary remarks >though: > >Our 'C++' is wider than the standard (we have single pass parsing). For >example, we should support class declaration such as: > >class RFTA_API TextDocument { }; > >Were RFTA_API is stored as the optional 'export macro name'. This is a >widely used idiom on Win32. > >The AST should capture all data required for the source code manipulation >(capture the range of all element that might be removed, inserted, >modified). > >A primary goal would probably be able to identify the function/method bodies >(this would solve the issue you raised concerning the starting compound >statement for variable declaration search). > This works allready, I'm currently at the point of parsing declaration specifiers and declarators. The parsing of specifiers includes class declarations. About the declarators I'm not yet sure how far I should go. At the moment I try to merge the variabledeclmutator with my declarationmutator and try to get a more general solution which can parse any declaration type. There's only one problem left which are function pointers like: usertype (*f) ( int ); As soon as I have cleaned the code I will check in. >Next, I would probably go for class parsing (rough, just extract the name, >methods, and method/function implementation parsing. Having those two open >doors for ExtractInterface refactoring (just needs the class methods), and >ExtractMethod refactoring (need to add a method in the class declaration, >adding a new method body, and know what are the method parameters). > This is fine with the current implementation. until later, André |
From: Baptiste L. <gai...@fr...> - 2003-04-25 08:41:26
|
Wouah, you're fast. This sound very promising. Notes that concerning the function pointer, the most important place to support them is typedef. In most of the code I have seen, function pointers were nearly always 'typedefed'. Concerning code clean-up, you might want to reuse the SourceBuilder class we use in the rfta library (or implement a similar class, more dedicated to ast node range). This would makes the tests easier to write, read and maintain. Baptiste. ----- Original Message ----- From: "Andre Baresel" <and...@gm...> To: "CppTool Mailing List" <Cpp...@li...> Sent: Friday, April 25, 2003 8:50 AM Subject: Re: [Cpptool-develop] Declaration parsing started... > Baptiste Lepilleur wrote: > > >I'll give this a more thorough look later on. A few primilary remarks > >though: > > > >Our 'C++' is wider than the standard (we have single pass parsing). For > >example, we should support class declaration such as: > > > >class RFTA_API TextDocument { }; > > > >Were RFTA_API is stored as the optional 'export macro name'. This is a > >widely used idiom on Win32. > > > >The AST should capture all data required for the source code manipulation > >(capture the range of all element that might be removed, inserted, > >modified). > > > >A primary goal would probably be able to identify the function/method bodies > >(this would solve the issue you raised concerning the starting compound > >statement for variable declaration search). > > > This works allready, I'm currently at the point of parsing declaration > specifiers and declarators. The parsing of > specifiers includes class declarations. About the declarators I'm not > yet sure how far I should go. At the moment > I try to merge the variabledeclmutator with my declarationmutator and > try to get a more general solution which > can parse any declaration type. > There's only one problem left which are function pointers like: > usertype (*f) ( int ); > As soon as I have cleaned the code I will check in. > > >Next, I would probably go for class parsing (rough, just extract the name, > >methods, and method/function implementation parsing. Having those two open > >doors for ExtractInterface refactoring (just needs the class methods), and > >ExtractMethod refactoring (need to add a method in the class declaration, > >adding a new method body, and know what are the method parameters). > > > This is fine with the current implementation. > > until later, > André |
From: Andre B. <and...@gm...> - 2003-04-26 14:00:10
|
Baptiste Lepilleur wrote: >Notes that concerning the function pointer, the most important place to >support them is typedef. In most of the code I have seen, function pointers >were nearly always 'typedefed'. > Function pointers and even the declaration of 'operator' overloading/type-conversion can be detected now. The parser does recognize this and parses it correctly (hopefully!? see Tests), but no information is stored yet. We get Type-Specifiers and Unparsed-Declarators at the moment. Well typedef itselfe is no problem, since it is just one keyword. Note that it can even stay in the middle of a declaration: int typedef mydef; Looks strange ? yes, but the EBNF allows this. Tests took really long (see UnparsedDeclarationMutatorTests), but after all I managed a very clean solution. I first played arround with the VariableDeclMutator which does nearly the same job, but was not able to fullfill all tests and was to complex after adding functionality. I'll try to replace the variable- declaration-mutator as soon as the 'variable-detection' is handled by the new Parser. Durring tests I also recognized a problem in parsing declaration which can not be solved with parser information only (I believe): (Has been added to the Wiki) typedef int myint; class x { public: static myint y; }; myint ::x::y; // <--- how to parse ? (variant a) type "myint::x::y" + no variable (variant b) type "myint::x" + variable "::y" (variant c) type "myint" + variable "::x::y" btw: spaces are allowed between '::' and identifier ("x<space>::y" equals "x::y") "class x ::z::y" has three meanings ! I wrote a test on this (deactivated), but this problem is not high priority. >Concerning code clean-up, you might want to reuse the SourceBuilder class we >use in the rfta library (or implement a similar class, more dedicated to ast >node range). This would makes the tests easier to write, read and maintain. > I was thinking about that, but than decide a different solution. For class body parsing I will check this test solution again. I more and more understand your test routines ;-)... Open tasks for me, at the moment: - class body parsing, supporting additional declaration elements (e.g. public, protected keywords) "UnparsedClassSpecifierMutator" will be created - simple declarator parsing for detecting variables "UnparsedDeclaratorMutator" will be created - function header parsing, to detect parameters "UnparsedFunctionHeaderMutator" will be created - I'm thinking about a new Mutator to replace the MaxLODMutator. This Mutator only accesses Mutator for more details at the specified Source Position. It would allow us for Refactoring Operations to start parsing allways at SourceLevel and than step down to the position where the user has activated some refactoring request. How to name it ? what about "PositionMaxLODMutator" - An additional Mutator could be written which only parses source positions where some identifier does appear. (reuse of the PositionMaxLODMutator) For the future: - to access identifier: identifier resolver strategy for class/namespace/function header --> we need named scopes, and methods to resolve scoped identifier -- André |
From: Baptiste L. <gai...@fr...> - 2003-04-26 22:12:02
|
----- Original Message ----- From: "Andre Baresel" <and...@gm...> To: "CppTool Mailing List" <Cpp...@li...> Sent: Saturday, April 26, 2003 4:06 PM Subject: Re: [Cpptool-develop] Declaration parsing started... > Baptiste Lepilleur wrote: > > >Notes that concerning the function pointer, the most important place to > >support them is typedef. In most of the code I have seen, function pointers > >were nearly always 'typedefed'. > > > Function pointers and even the declaration of 'operator' > overloading/type-conversion can be detected > now. The parser does recognize this and parses it correctly (hopefully!? > see Tests), but no information > is stored yet. We get Type-Specifiers and Unparsed-Declarators at the > moment. I skimmed over the test, and this looks very promising. Nice works ! > Well typedef itselfe is no problem, since it is just one keyword. > Note that it can even stay in the middle of a declaration: > > int typedef mydef; > > Looks strange ? yes, but the EBNF allows this. What it the comon equivalent expression ? (That's the kind of stuff that will probably never be used, just like the fact you can put braces around parameter name). > Tests took really long (see UnparsedDeclarationMutatorTests), but after > all I managed a very clean > solution. I first played arround with the VariableDeclMutator which does > nearly the same job, but was > not able to fullfill all tests and was to complex after adding > functionality. I'll try to replace the variable- > declaration-mutator as soon as the 'variable-detection' is handled by > the new Parser. Local variable declaration is very close to class attribute declaration, so reuse should occur. > Durring tests I also recognized a problem in parsing declaration which > can not be solved with parser > information only (I believe): (Has been added to the Wiki) > > typedef int myint; > > class x { > public: > static myint y; > }; > > myint ::x::y; // <--- how to parse ? > (variant a) type "myint::x::y" + no variable > (variant b) type "myint::x" + variable "::y" > (variant c) type "myint" + variable "::x::y" > > btw: spaces are allowed between '::' and identifier ("x<space>::y" > equals "x::y") > "class x ::z::y" has three meanings ! > I wrote a test on this (deactivated), but this problem is not high priority. Could you provide some examples of variable declaration ? For what I've tried, it is not possible to prefix the name of the declared variable by ::. int ::x; // failed to compile => can not force x to be in global namespace The only place I could see this work is when you are referering to variable x in some expression. > >Concerning code clean-up, you might want to reuse the SourceBuilder class we > >use in the rfta library (or implement a similar class, more dedicated to ast > >node range). This would makes the tests easier to write, read and maintain. > > > I was thinking about that, but than decide a different solution. > For class body parsing I will check this test solution again. > I more and more understand your test routines ;-)... I've looked at some of the tests, and there is some 'magical' constants appearing from nowhere for ast node range test. This makes it hard to understand what is the range of a given node. How did you figure out their values anyway ? As for the test routines, they call it test driven design, or test first design. I just made the environment to make it possible for us ;-). It was actually the reason why I renounced to use Boost.Spirit (a generic programming parser framework). Compile time were huge (just including the headers in a cpp resulted in a compile time of around a minute !). > Open tasks for me, at the moment: > - class body parsing, supporting additional declaration elements > (e.g. public, protected keywords) You may want to add QT extension too: public slots: // (any access modifier is ok before slots) signals: slots is a macro that expands to nothing, and signals expands to protected (I think). More info: http://doc.trolltech.com/3.1/signalsandslots.html. They should not be expanded, but memorized as additional access modifiers. > "UnparsedClassSpecifierMutator" will be created > - simple declarator parsing for detecting variables > "UnparsedDeclaratorMutator" will be created > - function header parsing, to detect parameters > "UnparsedFunctionHeaderMutator" will be created > - I'm thinking about a new Mutator to replace the MaxLODMutator. > This Mutator only accesses Mutator for more details at the > specified Source Position. > It would allow us for Refactoring Operations to start parsing > allways at SourceLevel and > than step down to the position where the user has activated > some refactoring request. > How to name it ? what about "PositionMaxLODMutator" > - An additional Mutator could be written which only parses source > positions where some > identifier does appear. (reuse of the PositionMaxLODMutator) I was thinking of something along the same lines. I've added a CPPParser class, which should ultimately become the main interface to the rftaparser library (at the current time, it's just a squeleton). The only service we need at the current time is: - find and parse the function body which encompass a specified location (usualy, the user selection). A service we will need later: - parse all the source, but don't parse struct/class/function bodies As for the implementation, I would go the other way: keep the MaxLODMutator expanding everything (if given at top-level, everything get expanded). And I would add another LOD mutator to implements second service (up to function level). May be name it BodyLessLODMutator (we parse everything but the bodies) or something like that. The first service would be implemented roughly like this: - parse at bodyless level (DeclarationListParser + BodyLessLODMutator) - walk down the ast node tree until we find a body that includes the specified position (error if not found) - mutate the body with the MaxLODMutator - return the function declaration node (will need to updated Refactoring to start at this level, and obtains the compound statement node from that one). We will probably need a service concerning identifiers, but will study that when we get there (global renaming). > For the future: > - to access identifier: > identifier resolver strategy for class/namespace/function header > --> we need named scopes, and methods to resolve scoped identifier Yes, and that promise to be tough. We'll definitively use TDD for that one. Hopyfully, we'll have a nice code model to work with at the time. I'll also try to get something out so that we can run the new parser over a large number of sources and study the output (much like astdumper). Baptiste. > > -- André |
From: Andre B. <and...@gm...> - 2003-04-27 10:03:22
|
Baptiste Lepilleur wrote: >>I'll try to replace the variable- >>declaration-mutator as soon as the 'variable-detection' is handled by >>the new Parser. >> >> > >Local variable declaration is very close to class attribute declaration, so >reuse should occur. > Definitly, however the declaration list within a class body has an extended syntax (visibility keywords and pure-declaration. But I think we can simply extend the usual declaration-list-parser for this, since these keywords will not appear at filescope. >>Durring tests I also recognized a problem in parsing declaration which >>can not be solved with parser >>information only (I believe): (Has been added to the Wiki) >> >> typedef int myint; >> >> class x { >> public: >> static myint y; >> }; >> >> myint ::x::y; // <--- how to parse ? >>(variant a) type "myint::x::y" + no variable >>(variant b) type "myint::x" + variable "::y" >>(variant c) type "myint" + variable "::x::y" >> >> btw: spaces are allowed between '::' and identifier ("x<space>::y" >>equals "x::y") >> "class x ::z::y" has three meanings ! >> >> >Could you provide some examples of variable declaration ? For what I've >tried, it is not possible to prefix the name of the declared variable by ::. > >int ::x; // failed to compile => can not force x to be in global >namespace > >The only place I could see this work is when you are referering to variable >x in some expression. > Declaration of class variables or variables of a namespace are using this syntax. typedef int myint; namespace nnnn { extern int x; } myint ::nnnn::x = 0; Note that the problem only appears in case of userdefined type. Since than the syntax specifies, that a userdefined type can be specified using "::". For this reason we can not stop reading after "myint" and have to continue with "::nnnn". The parser does not know where to stop if a "::" follows. A second example is the use of class variables: class nnnn { static int x; }; myint ::nnnn::x = 0; > > > >>>Concerning code clean-up, you might want to reuse the SourceBuilder class >>> >>> >we > > >>>use in the rfta library (or implement a similar class, more dedicated to >>> >>> >ast > > >>>node range). This would makes the tests easier to write, read and >>> >>> >maintain. > > >>I was thinking about that, but than decide a different solution. >>For class body parsing I will check this test solution again. >>I more and more understand your test routines ;-)... >> >> > >I've looked at some of the tests, and there is some 'magical' constants >appearing from nowhere for ast node range test. This makes it hard to >understand what is the range of a given node. How did you figure out their >values anyway ? > Counting by hand is the simple answer. E.g. "const short int" has three specifiers. They start at 0, 6, and 12. Their lengths are 5, 5, and 3. I will clean this up a little bit ! >As for the test routines, they call it test driven design, or test first >design. I just made the environment to make it possible for us ;-). It was >actually the reason why I renounced to use Boost.Spirit (a generic >programming parser framework). Compile time were huge (just including the >headers in a cpp resulted in a compile time of around a minute !). > > I'm not sure yet, if the boost.spirit approach makes the parser better understandable, since even reading the simple examples at the spirit pages don't make me say "wow that's it" ... what do you think - did I miss some point which makes the "Spirit" worth for us ? >I was thinking of something along the same lines. I've added a CPPParser >class, which should ultimately become the main interface to the rftaparser >library (at the current time, it's just a squeleton). > Yep that CPPParser is a nice idea. >I'll also try to get something out so that we can run the new parser over a >large number of sources and study the output (much like astdumper). > Yep, I also allready played arround with a modified version of ASTDump, but did not checked it in... -- André |
From: Baptiste L. <gai...@fr...> - 2003-04-27 11:25:29
|
----- Original Message -----=20 From: Andre Baresel=20 To: CppTool Mailing List=20 Sent: Sunday, April 27, 2003 12:09 PM Subject: Re: [Cpptool-develop] Declaration parsing started... Declaration of class variables or variables of a namespace are using = this syntax. typedef int myint; namespace nnnn { extern int x; } myint ::nnnn::x =3D 0; Note that the problem only appears in case of userdefined type. = Since than the syntax specifies, that a userdefined type can be specified using "::". For = this reason we can not stop reading after "myint" and have to continue with "::nnnn". The = parser does not know where to stop if a "::" follows. A second example is the use of class variables: class nnnn { static int x; }; myint ::nnnn::x =3D 0; The second example is more of an issue than the first one. Though, I = doubt we'll stumble uppon it often. In most case we can rely on the fact = that the static will be declared within the namespace and the type won't = be prefixed: namespace nnnn { myint x =3D 0; } Or : myint nnnn::x =3D 0; Have you ever seen code as you pointed out ? I don't think I did. Baptiste. PS: check your mailer configuration. I think that mail was in HTML. This = makes it difficult to quote when replying. |
From: Baptiste L. <gai...@fr...> - 2003-04-27 11:42:35
|
----- Original Message ----- From: Andre Baresel To: CppTool Mailing List Sent: Sunday, April 27, 2003 12:09 PM Subject: Re: [Cpptool-develop] Declaration parsing started... Baptiste Lepilleur wrote: >>I've looked at some of the tests, and there is some 'magical' constants appearing from nowhere for ast node range test. This makes it hard to understand what is the range of a given node. How did you figure out their values anyway ? >Counting by hand is the simple answer. E.g. "const short int" has three specifiers. They start at 0, 6, and 12. Their lengths are 5, 5, and 3. I will clean this up a little bit ! That would explain why most 'source' where on a single line. I'll try to see if I can come up with a better version of SourceBuilder for testing (this issue is not specific to your code. I remember that variable declaration testing was quite a mess). >>As for the test routines, they call it test driven design, or test first design. I just made the environment to make it possible for us ;-). It was actually the reason why I renounced to use Boost.Spirit (a generic programming parser framework). Compile time were huge (just including the headers in a cpp resulted in a compile time of around a minute !). >I'm not sure yet, if the boost.spirit approach makes the parser better understandable, since even reading the simple examples at the spirit pages don't make me say "wow that's it" ... what do you think - did I miss some point which makes the "Spirit" worth for us ? I think it would be useful to write mini-parser (much akin to ours). The problem is again compilation time. Who would use a date parser that take a minute to compile ? You also get more flexibility than in parser generator since you can define hand coded rules and action. Another major issue with Spirit is support for VC 6 is low. You often stumble uppon internal compiler error, some or those, I never managed to work around. If we need a parser, I think PCCTS would be the best choice. It's widely supported, and have a lot of functionnalities (it's only drawback I remember when I looked at it a long while ago was unicode support, which is not an issue for us). One of the nice feature it has is the ability to generate AST, and then write a 'parser' to modify or visit that AST. Even if it would require code generation, it would still lead to a compile/test cycle way faster than with Boost.Spirit. Anyway, our hand parser is fairly good and allow for very simple testing. One of its most important property I think is the fact that it takes advantage of the hierarchical structure of C++. This provides strong error recovery capability (or at least, it should :-) ). Baptiste. >-- André |
From: Baptiste L. <gai...@fr...> - 2003-04-28 18:48:52
|
----- Original Message ----- From: Andre Baresel To: Baptiste Lepilleur Sent: Monday, April 28, 2003 5:17 AM Subject: Re: [Cpptool-develop] Declaration parsing started... Baptiste Lepilleur wrote: Counting by hand is the simple answer. >>>E.g. "const short int" has three specifiers. They start at 0, 6, and 12. Their lengths are 5, 5, and 3. I will clean this up a little bit ! >>That would explain why most 'source' where on a single line. I'll try to see if I can come up with a better version of SourceBuilder for testing (this issue is not specific to your code. I remember that variable declaration testing was quite a mess). >I have fixed this with a new test util class "KeyedString". Great job. The code is much more readable. Baptiste. -- André |