I am always uncomfortable using a old version of software when developing software. Unfortunately the only C++ grammer seems to work only with old versions of ANTLR.
I am just going through the grammer and I will post what I find there. Eventhough I never worked with ANTLR, I worked quite extensively on lex and yacc recently to generate parsers for a modified SQL and IDL.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is an excellent idea. Presently, I'm using the opencxx parser to parse c++, of course, it's not perfect - (troubles with templates) but no C++ parser is perfect. One would hope, that a proper DB interface would release us from parser dependancies, so we can switch parsers with (relative) ease.
Good luck!
p.s. opencxx will never (i feel sure on this) attempt to resolve template instatiations, it will merely report that the code was instantiating some template. this may well be a strength, depending on the amount of work we have to do with templates...
Having said that I am concerned about templates. I wouldn't like to reason about the code Andrei Alexandrescu creates (for example...)
Oh well, "that's a bridge what will get crossed when we come to it."
And, some compilers (VC6.0 VC7.0) for example, can't even deal with partial template specialization, statics in templated classes, templated template parameters. to name a few issues...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am coming around to a view that we might not need a full fledged AST. So, we might not get into all the sort of problems that regular C++ compilers face.
I was just checking all the parsers that are available (specific to ANTLR). I am impressed with the simplicity and clarity with which ANTLR supports parser creation. I doubt whether I will go for yacc/lex combinations ever again.
There is an excellent grammer for GNU-C on ANTLR website. It looks clean and covers more or less all the constructs of ANSI-C. However, it is written in Java and might not be direct use to us. Meanwhile, I am looking at Stroustrup and feel that a parser from ground-up is not a very bad idea.
There is another concern to do with preprocessing. There are few C++ specific refactorings that can make use of the knowledge of preprocessor. Think of changing a set of #defines to enum or a macro to an inline function. If we do not do any macro expansion and process only #includes then all preprocessor constructs can also be considered as targets for refactorings. Any comments on this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmmm interesting...
First, and obvious too, the actual #if logic of the preprocessor must still be followed, even if macro expansions are not. else include guards won't work, #IFDEF ...MSVC... and #if 0 ... #endif comments won't work either...
I think we should keep things as simple as possible for now. However, 2 things strike me.
1) Not expanding macros would mean that we had to do it, this could get messy. for example, a user could #define any C++ keyword, so the parser would have to look at the macros, mark the token as being a preprocessor entity, and expand the token any parse the result.
IMHO this is far too mcuh trouble for a couple of refactorings, far be it from me to throw the idea out, but I think we should concentrate on getting a parser working first, before adding extra requirements and complexity to it.
2) I have to admit that we cannot ignore the preprocessor (At present however, I am). This is important to the writer module, that has to regenerate the transformed C++ code. However, how far do you want to go?
I've seen code where the name of a method was the result of a macro expansion, and where the declaration of inheritance, member functions and even close braces were all macros.
Should we record all such macros are output them again? BTW, these did absolutely nothing for understandability, and when I was dealing with this code I was forced to run it through preprocessor to understand it, and I then manually removed all these macros.
Macros such as NULL, MAGIC_NUMBER INTERNAL_ERROR_CODE should, I think, be kept.
Should macros such as MIN(A B) and MAX(A B) be kept too? - probably.
Dagnabit. this is not easy to deal with.
Bleedin' macros ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You got real points against preprocessor. I also think it is too premature at this time to think about those issues.At this time, all the current work I am doing using ANTLR is on C++ files that are already preprocessed and includes only #line statements.
May be we can write a Preprocessor-refactoring tool and include with CppRefactor later ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Given the C/C++ relationship with the preprocessor, I think thats a good idea for later refactorings.
A lot of C++ is supposed to replace the old style C preprocessor commands, though in practice this hasnt really happened. So I reckon we could probably just concentrate on vanilla C++ initially.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am always uncomfortable using a old version of software when developing software. Unfortunately the only C++ grammer seems to work only with old versions of ANTLR.
I am just going through the grammer and I will post what I find there. Eventhough I never worked with ANTLR, I worked quite extensively on lex and yacc recently to generate parsers for a modified SQL and IDL.
This is an excellent idea. Presently, I'm using the opencxx parser to parse c++, of course, it's not perfect - (troubles with templates) but no C++ parser is perfect. One would hope, that a proper DB interface would release us from parser dependancies, so we can switch parsers with (relative) ease.
Good luck!
p.s. opencxx will never (i feel sure on this) attempt to resolve template instatiations, it will merely report that the code was instantiating some template. this may well be a strength, depending on the amount of work we have to do with templates...
Having said that I am concerned about templates. I wouldn't like to reason about the code Andrei Alexandrescu creates (for example...)
Oh well, "that's a bridge what will get crossed when we come to it."
And, some compilers (VC6.0 VC7.0) for example, can't even deal with partial template specialization, statics in templated classes, templated template parameters. to name a few issues...
I am coming around to a view that we might not need a full fledged AST. So, we might not get into all the sort of problems that regular C++ compilers face.
I was just checking all the parsers that are available (specific to ANTLR). I am impressed with the simplicity and clarity with which ANTLR supports parser creation. I doubt whether I will go for yacc/lex combinations ever again.
There is an excellent grammer for GNU-C on ANTLR website. It looks clean and covers more or less all the constructs of ANSI-C. However, it is written in Java and might not be direct use to us. Meanwhile, I am looking at Stroustrup and feel that a parser from ground-up is not a very bad idea.
There is another concern to do with preprocessing. There are few C++ specific refactorings that can make use of the knowledge of preprocessor. Think of changing a set of #defines to enum or a macro to an inline function. If we do not do any macro expansion and process only #includes then all preprocessor constructs can also be considered as targets for refactorings. Any comments on this?
Hmmm interesting...
First, and obvious too, the actual #if logic of the preprocessor must still be followed, even if macro expansions are not. else include guards won't work, #IFDEF ...MSVC... and #if 0 ... #endif comments won't work either...
I think we should keep things as simple as possible for now. However, 2 things strike me.
1) Not expanding macros would mean that we had to do it, this could get messy. for example, a user could #define any C++ keyword, so the parser would have to look at the macros, mark the token as being a preprocessor entity, and expand the token any parse the result.
IMHO this is far too mcuh trouble for a couple of refactorings, far be it from me to throw the idea out, but I think we should concentrate on getting a parser working first, before adding extra requirements and complexity to it.
2) I have to admit that we cannot ignore the preprocessor (At present however, I am). This is important to the writer module, that has to regenerate the transformed C++ code. However, how far do you want to go?
I've seen code where the name of a method was the result of a macro expansion, and where the declaration of inheritance, member functions and even close braces were all macros.
Should we record all such macros are output them again? BTW, these did absolutely nothing for understandability, and when I was dealing with this code I was forced to run it through preprocessor to understand it, and I then manually removed all these macros.
Macros such as NULL, MAGIC_NUMBER INTERNAL_ERROR_CODE should, I think, be kept.
Should macros such as MIN(A B) and MAX(A B) be kept too? - probably.
Dagnabit. this is not easy to deal with.
Bleedin' macros ;-)
You got real points against preprocessor. I also think it is too premature at this time to think about those issues.At this time, all the current work I am doing using ANTLR is on C++ files that are already preprocessed and includes only #line statements.
May be we can write a Preprocessor-refactoring tool and include with CppRefactor later ;-)
Given the C/C++ relationship with the preprocessor, I think thats a good idea for later refactorings.
A lot of C++ is supposed to replace the old style C preprocessor commands, though in practice this hasnt really happened. So I reckon we could probably just concentrate on vanilla C++ initially.
This might be of some use:
http://www.nobugs.org/developer/parsingcpp/
-Shaun