Short description of how things are done. For those who would like to contribute to the project.
We use the default PHP tokenizer that we extend to identify tabs and returns and to add a few tokens. This allow the project to work on any computer having PHP installed without any modification.
A cleaner / more complete solution would be to use a proper AST (http://en.wikipedia.org/wiki/Abstract_syntax_tree) and parse the files with complete information about each token and its context.
The difficulty is that PHP doesn't have a real official grammar and it's not easy to build such an analyser. Some projects could help do that (https://github.com/nikic/PHP-Parser) or we could use facebook's work on HipHopVM.
To compensate for the lack of a real AST, we build during the analysis a stack of currently opened statements.
StatementItem objets are stored in the "StatementStack". this allow us to have some limited contextual information.
This can be easily visualised by launching the tool with the "--debug" flag. It will display something like this :
CLASS(PHPCheckstyle) -> FUNCTION(_processControlStatement) -> IF -> IF
When a check rule is not verified an error message is sent to one ore more reporter. All the reporters (Console, HTML, XML, ...) extend the "Reporter" class.