I'm not sure if this is possible, but I was wondering what kind of support PMD had for analyzing comments and documentation within source code. I see that there are rules concerning uncommented empty methods and constructors, but I'd like to be able to do more.
One of the things I am interested in is letting PMD check to make sure I have uniform headers across of my source files, including fields for author, last modified, etc...
I know that Eclipse can already check for missing or malformed Javadoc, but I am not aware of functionality that allows it to scan the content of the comments for specific line inclusions. I attempted to look through a generated AST to solve this, but found nothing concerning comments, which intuitively makes sense. We are dealing with source code, after all. Perhaps it is possible through XPath?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for your help. Thank to Xavier, I have been working in comment rules using the 4.2.x branch and the idea of using line numbers.
I have written a simple rule to detect if a method has or has not a comment. The code is at the end of this post.
The main advantage of this approach is that it is not needed to change the code of PMD.
But there are a couple of drawbacks. You have to consume the comments, so, you need a static or persistent data structure to know the remaining comments (in my example I use a HashMap the static class “StaticIntegers” to store the index of the list of comments).
But the main problem is that you have to consume all the comments. Lets see an example:
/ A */
int a;
void m() { //… }
/ N */
void n() { //… }
If you use the rule for methods only, you will get the comment of the attribute (/* A /) as the comment of the first method ( m() ) because there is not a rule for attributes that consume that comment.
So, by now, I think this is not a good solution.
Feedback is welcome. ¿Can anybody suggest how to improve this idea or another different way?
It would be exciting if we can have something working on 4.2.X using ASTCompilationUnit and without modify the grammar.
Sorry for my bad English.
public class MethodWithoutComment extends AbstractRule {
@Override
public Object visit(ASTMethodDeclaration node, Object data) {
int methodBeginLine;
List<Comment> comments;
Comment lastC, comm = null;
int counter = StaticIntegers.Get("c");
PMD is designed to look at the code structure itself, not it's formatting/appearance. The parsers PMD uses to create the ASTs does not generally make the comments/whitespace available in the AST structure itself. There are places where we check to see for the existence of a comment in a Block, or whether a given Class appears to be used in a JavaDoc comment, and annotate an AST node with that extra information. But the actual comment itself, is not available. This is by intent, as comments would vastly complicate navigation of the AST, as they can appear nearly anywhere and we nearly always do not care about them (i.e. they are ignored just like whitespace).
Thinking out loud here...
I have considered allowing the whitespace and comment blocks to be made available in an alternate version of the AST. A Rule would have to explicitly request this AST, and deal with the corresponding complexities. Taking that a step further to perhaps limit the complexities, would be to add the ability to jump between corresponding nodes in the commented and uncommented ASTs. This could be exposed as a function in XPath. One would use the uncommented AST to identify the basic construct needing analysis, and then switch to the commented AST to dig around.
Anyway, there's a bunch of heavy lifting required to make something like that work. It's not likely I'll get around to something like that soon, if ever. Patches are always welcome.
Good luck,
Ryan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One thought - the PMD grammar stores comments in tokens of type "SPECIAL_TOKEN"; in the case of formal comments (e.g., /* FOO /) these get placed in a List<Token> that's attached to ASTCompilationUnit (see etc/grammar/Java.jjt line 203 for where this happens). That means you can get to them from a rule, e.g.:
============================
$ cat src/net/sourceforge/pmd/lang/java/rule/basic/TomTestRule.java
package net.sourceforge.pmd.lang.java.rule.basic; [... some imports ... ]
public class TomTestRule extends AbstractJavaRule {
@Override
public Object visit(ASTCompilationUnit node, Object data) {
System.out.println("There was a comment: " + node.getFormalComments().get(0).image);
return super.visit(node, data);
}
}
$ cat Foo.java
public class Foo {
/ this is a formal comment */
int x;
}
$ java net.sourceforge.pmd.PMD Foo.java text basic
There was a comment: / this is a formal comment */
No problems found!
============================
Single line comments are not currently stored like this, although they could be. They're also SPECIAL_TOKENs, so you'd need to add a little lexical action as with formal comments.
A tricky bit is figuring out which comment goes with which node; the current setup doesn't make this very easy. I suppose you could add some utility methods to figure things out using beginLine/beginColumn and such.
Anyhow, as Ryan said, there's some work that'd need to be done there. But, in every job that must be done there is an element of fun, you know...
Thanks for the clarification Tom. I hadn't realized we kept quite that much information available.
Something like this should be able to find a home for the comments:
1) Look for the highest AST node on the same line as the comment.
2) Look for the next AST node after the comment.
3) If they are trailing comments (e.g. end of a block scope), place them onto the containing AST node. Worst case this results in ASTComplationUnit.
Creating JavaNode.addComment(Comment) and List<Comment>JavaNode.getComments() to support extracting that information. The Comment class could contain methods for extracting various bits of detail from the Comments, e.g. formal, single line, JavaDoc info extraction, etc.
Hmm, this is all seems reasonable to me. And it doesn't pollute the normal AST structure! I'll add it to the list for PMD 5.0.
Again, patches most welcome! PMD 5.0 has a lot on the table, and I'm just one guy. :)
Ryan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The formal comment list in ASTCompilationUnit was added when I fixed the UnusedImports rule to parse the @see and similar constructs in the javadoc comments.
Maybe a first step would be to store all the comments (and not just the formal ones) in a flat ordered list or lists in ASTCompilationUnit as the current mechanism could be extended fairly easily to achieve that. The fact that the comment tokens are discarded at the lexical level make it a little more complex to attach them to a given AST node but utility methods could be used to extract subsets for a specific node and deal with getting the preceding comments, enclosed comments, ... I guess it all depends on what kind of post processing we want to apply to them.
But I agree that the AST structure shouldn't be changed if possible as too many rules rely on absolute positions of children nodes...
Xavier
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Maybe a first step would be to store all the comments (and not just the formal ones) in a flat ordered list or lists
> in ASTCompilationUnit as the current mechanism could be extended fairly easily to achieve that.
Agreed, at least getting it so all comment tokens are available.
> The fact that the comment tokens are discarded at the lexical level make it a little more complex to attach
> them to a given AST node but utility methods could be used to extract subsets for a specific node and deal
> with getting the preceding comments, enclosed comments,
If you mean attach them at the time of the Parser building the AST, then I agree. We could tweak the Parser to push Comment tokens on a Stack, and consume them at various node production match points in the grammar (e.g. JavaNode.addComment(Comment)). Or, the utility method approach to extract the comments post-parse using the line numbers as Tom suggests. Performance wise I prefer the former, any AST level information extracted one time necessarily scales well with the number of Rules, and I just like that design better, as it doesn't look like a bolted-on feature.
Ryan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for this explanation. Line numbers should be easy enough for my purposes, since I am checking for file headers. Verifying that beginLine/beginColumn are both 1 (is it actually zero-based? this would be a good thing to know) should do the trick, and then regular expressions can take it from there.
If I end up with something abstract and expandable, I'll be sure to contribute back to you guys. Thanks for all the great work you've done!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have been exploring how to write rules for comment blocks. We have followed a different path.
We found that PMD 4.2.2 has a little support for comments. It stores all coments in a list and add them to the only ASTCompilationUnit node.
We used that code to add comment support to ASTClassOrInterfaceDeclaration, ASTFieldDeclaration, ASTMethodDeclarator and ASTConstructorDeclaration. So, now we can write rules like “Non-private method without comment” or “All method that return a value must have a @return tag”.
Although it is easy and fast changing the grammar and the code, this path is a bad design. We repeat code, we destroy the support for tags in UnusedImports and, if there is a new version, we have to change the grammar and the code again. There is also a problem when comment blocks are in unexpected places
Thus, we hope there will be support for comments and comments rules in next major version.
Sorry for my English.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd love to add more advanced comment support to PMD 5.0, but there is no definitive release date yet.
If you can submit a patch against 4.2.x, it could greatly help me in implementing that support, and ultimately getting 5.0 out the door sooner. I might even be able to clean it up slightly and patch it into 4.2.x branch for more immediate use by other users.
Also, any example Rules you have written would be great too. A new RuleSet/category related to comments could be added to PMD, with your Rules as the first entries.
Thanks,
Ryan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've added support for all comment types in my local version so that they're all available in the ASTCompilationUnit. Different subtypes are created for formal, one line and multi line comments.
The comments are not associated with other AST nodes at this point, the only change is that the list contains all comment types. Associating the comments with AST nodes could be added later on without having to modify the JavaCC grammar file.
I also would like to see a patch with the changes and see how the comments are used in example rules. That way, I could merge it with my changes to make sure it works for the existing rules using comments and the new ones.
Xavier
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for your replies. I am working in a patch but I did not use the CVS code (but the 4.2.2 code) and I do not know CVS, So I will take a couple of days.
Xavier, could you give me an idea of how associate comments to nodes without modifying the grammar?
Thank you very much in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For the patch, you could just send me the differences with the original 4.2.2 code and I'll see how to merge it in the current code base for both the 4.2.x branch and the 5.0 trunk.
It's not really clear how the comments should be accessed at the AST level as depending on the context, we may be interested in the comment(s) located before, after or inside the scope of a specific AST node. What I've done is to generalize building the comment list to include all comment types. Utility methods or post processing actions on the comment list could be used to associate the comments from the list with specific AST nodes based on their location in the file.
It would be interesting to see what kind of rules you've written so please send me the differences with 4.2.2 or even your entire source tree in a private email so that I can start adding extra support for comments.
Thanks,
Xavier
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've just committed some changes on the 4.2.x branch and on the trunk to add all comment types to the list in ASTCompilationUnit.
As it is, the code is not really user friendly and there's no association of other AST nodes with comments yet but at least it should get the ball rolling if we want to implement comment based rules. And it will be in the official 4.2.3 which should be coming soon so people can start experimenting with it as well without having to deal with svn.
Javier, could you send me your modifications so that I can merge them if possible?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure if this is possible, but I was wondering what kind of support PMD had for analyzing comments and documentation within source code. I see that there are rules concerning uncommented empty methods and constructors, but I'd like to be able to do more.
One of the things I am interested in is letting PMD check to make sure I have uniform headers across of my source files, including fields for author, last modified, etc...
I know that Eclipse can already check for missing or malformed Javadoc, but I am not aware of functionality that allows it to scan the content of the comments for specific line inclusions. I attempted to look through a generated AST to solve this, but found nothing concerning comments, which intuitively makes sense. We are dealing with source code, after all. Perhaps it is possible through XPath?
Dear all.
Thank you very much for your help. Thank to Xavier, I have been working in comment rules using the 4.2.x branch and the idea of using line numbers.
I have written a simple rule to detect if a method has or has not a comment. The code is at the end of this post.
The main advantage of this approach is that it is not needed to change the code of PMD.
But there are a couple of drawbacks. You have to consume the comments, so, you need a static or persistent data structure to know the remaining comments (in my example I use a HashMap the static class “StaticIntegers” to store the index of the list of comments).
But the main problem is that you have to consume all the comments. Lets see an example:
/ A */
int a;
void m() { //… }
/ N */
void n() { //… }
If you use the rule for methods only, you will get the comment of the attribute (/* A /) as the comment of the first method ( m() ) because there is not a rule for attributes that consume that comment.
So, by now, I think this is not a good solution.
Feedback is welcome. ¿Can anybody suggest how to improve this idea or another different way?
It would be exciting if we can have something working on 4.2.X using ASTCompilationUnit and without modify the grammar.
Sorry for my bad English.
public class MethodWithoutComment extends AbstractRule {
@Override
public Object visit(ASTMethodDeclaration node, Object data) {
int methodBeginLine;
List<Comment> comments;
Comment lastC, comm = null;
int counter = StaticIntegers.Get("c");
Hi Mike,
Have you looked into Checkstyle http://checkstyle.sourceforge.net/ ?
PMD is designed to look at the code structure itself, not it's formatting/appearance. The parsers PMD uses to create the ASTs does not generally make the comments/whitespace available in the AST structure itself. There are places where we check to see for the existence of a comment in a Block, or whether a given Class appears to be used in a JavaDoc comment, and annotate an AST node with that extra information. But the actual comment itself, is not available. This is by intent, as comments would vastly complicate navigation of the AST, as they can appear nearly anywhere and we nearly always do not care about them (i.e. they are ignored just like whitespace).
Thinking out loud here...
I have considered allowing the whitespace and comment blocks to be made available in an alternate version of the AST. A Rule would have to explicitly request this AST, and deal with the corresponding complexities. Taking that a step further to perhaps limit the complexities, would be to add the ability to jump between corresponding nodes in the commented and uncommented ASTs. This could be exposed as a function in XPath. One would use the uncommented AST to identify the basic construct needing analysis, and then switch to the commented AST to dig around.
Anyway, there's a bunch of heavy lifting required to make something like that work. It's not likely I'll get around to something like that soon, if ever. Patches are always welcome.
Good luck,
Ryan
One thought - the PMD grammar stores comments in tokens of type "SPECIAL_TOKEN"; in the case of formal comments (e.g., /* FOO /) these get placed in a List<Token> that's attached to ASTCompilationUnit (see etc/grammar/Java.jjt line 203 for where this happens). That means you can get to them from a rule, e.g.:
============================
$ cat src/net/sourceforge/pmd/lang/java/rule/basic/TomTestRule.java
package net.sourceforge.pmd.lang.java.rule.basic;
[... some imports ... ]
public class TomTestRule extends AbstractJavaRule {
@Override
public Object visit(ASTCompilationUnit node, Object data) {
System.out.println("There was a comment: " + node.getFormalComments().get(0).image);
return super.visit(node, data);
}
}
$ cat Foo.java
public class Foo {
/ this is a formal comment */
int x;
}
$ java net.sourceforge.pmd.PMD Foo.java text basic
There was a comment: / this is a formal comment */
No problems found!
============================
Single line comments are not currently stored like this, although they could be. They're also SPECIAL_TOKENs, so you'd need to add a little lexical action as with formal comments.
A tricky bit is figuring out which comment goes with which node; the current setup doesn't make this very easy. I suppose you could add some utility methods to figure things out using beginLine/beginColumn and such.
Anyhow, as Ryan said, there's some work that'd need to be done there. But, in every job that must be done there is an element of fun, you know...
Yours,
Tom
http://generatingparserswithjavacc.com/
Thanks for the clarification Tom. I hadn't realized we kept quite that much information available.
Something like this should be able to find a home for the comments:
1) Look for the highest AST node on the same line as the comment.
2) Look for the next AST node after the comment.
3) If they are trailing comments (e.g. end of a block scope), place them onto the containing AST node. Worst case this results in ASTComplationUnit.
Creating JavaNode.addComment(Comment) and List<Comment>JavaNode.getComments() to support extracting that information. The Comment class could contain methods for extracting various bits of detail from the Comments, e.g. formal, single line, JavaDoc info extraction, etc.
Hmm, this is all seems reasonable to me. And it doesn't pollute the normal AST structure! I'll add it to the list for PMD 5.0.
Again, patches most welcome! PMD 5.0 has a lot on the table, and I'm just one guy. :)
Ryan
The formal comment list in ASTCompilationUnit was added when I fixed the UnusedImports rule to parse the @see and similar constructs in the javadoc comments.
Maybe a first step would be to store all the comments (and not just the formal ones) in a flat ordered list or lists in ASTCompilationUnit as the current mechanism could be extended fairly easily to achieve that. The fact that the comment tokens are discarded at the lexical level make it a little more complex to attach them to a given AST node but utility methods could be used to extract subsets for a specific node and deal with getting the preceding comments, enclosed comments, ... I guess it all depends on what kind of post processing we want to apply to them.
But I agree that the AST structure shouldn't be changed if possible as too many rules rely on absolute positions of children nodes...
Xavier
> Maybe a first step would be to store all the comments (and not just the formal ones) in a flat ordered list or lists
> in ASTCompilationUnit as the current mechanism could be extended fairly easily to achieve that.
Agreed, at least getting it so all comment tokens are available.
> The fact that the comment tokens are discarded at the lexical level make it a little more complex to attach
> them to a given AST node but utility methods could be used to extract subsets for a specific node and deal
> with getting the preceding comments, enclosed comments,
If you mean attach them at the time of the Parser building the AST, then I agree. We could tweak the Parser to push Comment tokens on a Stack, and consume them at various node production match points in the grammar (e.g. JavaNode.addComment(Comment)). Or, the utility method approach to extract the comments post-parse using the line numbers as Tom suggests. Performance wise I prefer the former, any AST level information extracted one time necessarily scales well with the number of Rules, and I just like that design better, as it doesn't look like a bolted-on feature.
Ryan
Thank you for this explanation. Line numbers should be easy enough for my purposes, since I am checking for file headers. Verifying that beginLine/beginColumn are both 1 (is it actually zero-based? this would be a good thing to know) should do the trick, and then regular expressions can take it from there.
If I end up with something abstract and expandable, I'll be sure to contribute back to you guys. Thanks for all the great work you've done!
Hi all.
We have been exploring how to write rules for comment blocks. We have followed a different path.
We found that PMD 4.2.2 has a little support for comments. It stores all coments in a list and add them to the only ASTCompilationUnit node.
We used that code to add comment support to ASTClassOrInterfaceDeclaration, ASTFieldDeclaration, ASTMethodDeclarator and ASTConstructorDeclaration. So, now we can write rules like “Non-private method without comment” or “All method that return a value must have a @return tag”.
Although it is easy and fast changing the grammar and the code, this path is a bad design. We repeat code, we destroy the support for tags in UnusedImports and, if there is a new version, we have to change the grammar and the code again. There is also a problem when comment blocks are in unexpected places
Thus, we hope there will be support for comments and comments rules in next major version.
Sorry for my English.
Hi Javier,
I'd love to add more advanced comment support to PMD 5.0, but there is no definitive release date yet.
If you can submit a patch against 4.2.x, it could greatly help me in implementing that support, and ultimately getting 5.0 out the door sooner. I might even be able to clean it up slightly and patch it into 4.2.x branch for more immediate use by other users.
Also, any example Rules you have written would be great too. A new RuleSet/category related to comments could be added to PMD, with your Rules as the first entries.
Thanks,
Ryan
I've added support for all comment types in my local version so that they're all available in the ASTCompilationUnit. Different subtypes are created for formal, one line and multi line comments.
The comments are not associated with other AST nodes at this point, the only change is that the list contains all comment types. Associating the comments with AST nodes could be added later on without having to modify the JavaCC grammar file.
I also would like to see a patch with the changes and see how the comments are used in example rules. That way, I could merge it with my changes to make sure it works for the existing rules using comments and the new ones.
Xavier
Dear Ryan and Xavier.
Thank you very much for your replies. I am working in a patch but I did not use the CVS code (but the 4.2.2 code) and I do not know CVS, So I will take a couple of days.
Xavier, could you give me an idea of how associate comments to nodes without modifying the grammar?
Thank you very much in advance.
For the patch, you could just send me the differences with the original 4.2.2 code and I'll see how to merge it in the current code base for both the 4.2.x branch and the 5.0 trunk.
It's not really clear how the comments should be accessed at the AST level as depending on the context, we may be interested in the comment(s) located before, after or inside the scope of a specific AST node. What I've done is to generalize building the comment list to include all comment types. Utility methods or post processing actions on the comment list could be used to associate the comments from the list with specific AST nodes based on their location in the file.
It would be interesting to see what kind of rules you've written so please send me the differences with 4.2.2 or even your entire source tree in a private email so that I can start adding extra support for comments.
Thanks,
Xavier
I've just committed some changes on the 4.2.x branch and on the trunk to add all comment types to the list in ASTCompilationUnit.
As it is, the code is not really user friendly and there's no association of other AST nodes with comments yet but at least it should get the ball rolling if we want to implement comment based rules. And it will be in the official 4.2.3 which should be coming soon so people can start experimenting with it as well without having to deal with svn.
Javier, could you send me your modifications so that I can merge them if possible?