Thread: [SrcML] CFG support

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

As we will be spreading development to hopefully several students in about
a month I decided to move our discussions and progress reports to this 
list for later reference. This list will also be used to coordinate design
decisions and for general exchange of ideas.

I'm currently working on a Control Flow Graph (CFG) support for the API and 
it's coming along quite nice. There will be three levels of detail: full, expr
and block. 
Full detail might only be suitable in case we add an actual compiler backend,
as I don't see many other reasons to dissect the control flow within an
addition or similar expressions. 
The expression detail level might become the most used version as it creates a
new node for every expression, but abstracts from the details of the
expression.
For even further abstraction the block level will combine several expressions
following each other without any branching into one node.

The first application which will make use of these CFGs will probably be the
source code presenter. It will be a great aid in teaching programming when you
can easily get a visualization of the control flow for a section of your
program (f.ex. to talk about the possible flows in a for-loop including break,
continue or return statements). However there is no guarantee that the CFG
support will be fully available at the start of the semester.

As for the presentation of the CFG this will not be part of the API. The API
will only support creation of the graph and will provide several helper
methods in order to work with the created graph. Right now a proof of concept
visualization is done using a jython script which outputs a dot-compatible
graph representation. Using the dot tool we can get a simple .ps or what might
proof to be more useful in the future a .svg file for the graph.
Using the ViewPlatform we can get a highly customizable visualization of the
graph, as it allows us to easily render the expression a node is representing
as plain source code. An example of a graph visualized like this can be found
in the attached .ps file.

Now for the remaining problems:

- We need a more formal way of defining what exactly will be a new node in
  the graph and which parts of the DOM tree can share a node (actually three
  definitions for the different levels are needed).
- The CFG can easily be determined for a method on its own, but what about a
  whole class? The current implementation is only dissecting the class into
  its methods and creates a forest of rooted graphs for all those methods.
  We could take a closer look at a main method though and create an actual
  flow with links between those methods (also see the next point for this).
- How are method calls treated within an expression? Obviously this is a form
  of branching, however for the block and expression level detail the
  recursion would normally just treat the whole expression as one node. Of
  course that expression may consist of any number of method calls. Will it
  be enough to just find those calls and add edges to the respective methods'
  CFG trees (and back again)?
- how can we create useful unit tests for the CFG creation? I can only think
  of preselected sample source files with known should-be CFGs with which to
  compare the results. That would at least catch some problems when the CFG
  creation process is modified and returning different results.

Please feel free to join in on the discussion. Especially comments and ideas 
on the open problems are very much appreciated.
-- 
Raiser, Frank
Student @ University of Ulm (www.uni-ulm.de)

The difficult is what takes a little time;
the impossible is what takes a little longer. (Fridtjof Nansen)

Thread: [SrcML] CFG support

srcml-discussion