Thread: [Opencxx-users] [opencxx] OpenC++ , can serve to my goal?

Brought to you by: chiba, jakacki

opencxx-users

[Opencxx-users] [opencxx] OpenC++ , can serve to my goal?

From: Bernardi M. L. <mar...@in...> - 2002-10-08 21:02:33

Hi all , 
I'm a Computer Science Engineering Student. I'm working at my master
thesis on the subject of Reverse Engineering. To be precise my thesis
is about the recovering of scenarios ( use cases view ) from C++ source
code. I've studied some methodologies about this recovering process
as MM-Path and ASF ( Atomic System Function ). To implement a software
tool that use these methodologies i need to do some analysis on c++ code
such as parsing , symbol table management , control flow analysis and
data flow analysis. I've not already done a deep analysis on *all*
information i must to extract from code but reading of OpenC++ it 
has interested me in some ways. I'd like to know if it is suitable as
a analysis platform to extract from code the info i require ( for
example list of Methods , Class Hierarchy , Symbols informations and
the capability of identify some particular type of statements[like I/O
statements] to record in witch method a MM-Path start and/or die).
If someone who has a certain degree of practice with it can summarize me
the features that the reflective approach can give to my goal i would
thank him a lot! I've read some papers from the OpenC++ site but i need
to better understand what openc++ can do and what remains on my 
shoulder :)
 
Cheers , 
Mario Luca Bernardi

[Opencxx-users] Re: [opencxx] OpenC++ , can serve to my goal?

From: Grzegorz J. <ja...@he...> - 2002-10-11 07:06:54

On 8 Oct 2002, Bernardi Mario Luca wrote:

[intro snipped]
> I'd like to know if it is suitable as
> a analysis platform to extract from code the info i require ( for
> example list of Methods , Class Hierarchy , Symbols informations and
> the capability of identify some particular type of statements[like I/O
> statements] to record in witch method a MM-Path start and/or die).

Yes. Observe however, that there are two major modes of using OpenC++:

(1) Deriving classes from "Class" in order to customize the
    source-to-source translation.

(2) Taking OpenC++ source code as a code base to build your own
    application of top of it (it usually means deriving from "Walker"
    and/or making modifications to existing code).

From what you wrote about your requirements it looks like you should be
looking at mode (2).

> If someone who has a certain degree of practice with it can summarize me
> the features that the reflective approach can give to my goal i would
> thank him a lot! I've read some papers from the OpenC++ site but i need
> to better understand what openc++ can do and what remains on my
> shoulder :)

> i need to do some analysis on c++ code
> such as parsing

Probably OpenC++ has almost all of the things you need wrt. parsing C++.
There are some issues with new C++ constructs, like 'template' keyword
inside scoped identifier, 'explicit' or 'typename'. However fixing a parser
should not be a big problem.

> , symbol table management ,

The symbol table in OpenC++ is somewhat distributed. It is fundamentally
implemented by objects of "Environment" and "Class" connected through
miscellaneous pointers. The information on variables is not stored, i.e.
once the translator leaves the block of function scope it forgets what types
or variables were defined within. Thus if you want to have this information
available after parsing, you have to implmenet some storage mechanism by
yourself. Also OpenC++ does not remember the types it derives for symbols or
expressions. If you want to annotate the parse tree with types, you have to
implement this by yourself. I am told that Synopsis project, which is
reusing OpenC++ source, implemented the parse tree type annotations.

> control flow analysis and
> data flow analysis.

I am not sure what exactly you mean, but I think that OpenC++ does not
provide any data structures or algorithms to perform this kind of analysis.
However e.g. in my project I have implemented simple intra-procedural data
flow analysis and it went quite easy, I just derived my analyzer from
"Walker" class.

To give you full picture, this is a list of current issues with OpenC++:

* tampering with parse tree is difficult; parse tree is very verbose,
  every C++ construct is encoded (roughly) using lisp-like list
  constructors. The problem is that this structure is exposed to clients, in
  particular to the methods of Walker (or its derivatives) which perform
  translation. In other words your walker is given a pointer 'p' to the tree
  representing function definition, and in order to get function name you
  have to call 'p->Cdr()-Car()', in order to get function body you call
  'p->Cdr()->Cdr()->Cdr()->Car()', which again you will access with Cdr's
  and Car's (or other similar functions, which are nevertheless just
  shortcuts for some combinations of Cdr's and Car's). Your program works on
  lists, and you have to know how C++ constructs map into those lists (you
  can learn it from parser or from tree dumps). Also, when you write your
  code, you are hard-coding this knowledge. Thus if you want to change the
  parse tree, e.g. to add type annotation node (so-called 'Decorator'), you
  cannot simply change the representation of parse tree slightly, because
  you would break all the existing code, *including* the code in Walker on
  which you build your system. I think this is the major design problem at
  the moment, since it makes playing with existing OpenC++ very unsafe and
  difficult (however possible, as Synopsis example shows). I am thinking
  about a solution to this problem, but I do not expect to have any working
  solution soon. And even if I had I probably would not have time to
  implement it all by myself.

* templates are not fully supported. Templates were not in a language when
  OpenC++ was designed, so the support was added latter and is a kind of
  a patch. I was trying to fix it some time earlier this year and now HEAD
  branch can instantiate templates and identify types of the form
  'tmpl<args>::type'. However this implementation is far from being perfect,
  it does not take default arguments into account (this is easy to fix)
  nor does it understand templates specializations. Probably it is also
  going to be slow, as it does not cache instantiations (I left this issue
  for later, as for now it is not a problem for me).

* there is a quite low limit on the length of type identifiers; all type
  names are encoded and stored as strings. The length of those strings
  cannot exceed 256 or something. With template arguments it may be a
  problem, since type names may be arbitrary long, and the problem is that
  they really are even in Standard Library. This is a technical issue, but
  may require many modifications in the code if you really hit it.

* OpenC++ has problems with compiling Standard Library comming with newer
  compilers, like g++ versions 3.0 and above. This is not a big problem
  at the moment, since e.g. g++ 2.95 is quite a solid version and it is
  going to be around for a while, but eventually those problems need
  to be addressed. As I remember the problems arise due to some of the
  limitations listed above.

* The type evaluation is not completely implemented, i.e. typing of some
  expressions is simplified (e.g. operators). I remember that I had to
  augment the typing algorithm in several places and that it went quite
  smoothly. While this is some work due to many different cases, it
  should be generally easy to do.

* Overload resolution in type evauation is not implemented (this should be
  moderately difficult to implement). However OpenC++ correctly recognizes
  overloaded function/method definitions.

* OpenC++ use model needs a face-lift, e.g. on this list we agreed at some moment
  that "driver" functionality should be eventually moved out of the main
  executable, so that the translation is separated from other housekeeping
  tasks (calling preprocessor, linker, compiler etc.).

Those are the bad things. Remember however, that the list of good things
that OpenC++ provides is much longer. The work already accumulated in the
OpenC++ project is substantial and I doubt that you can quickly write
similar product from scratch, or even if you begin with yacc C++ grammar.
And even despite what I wrote about syntax tree, OpenC++ has object-oriented
design and is quite well structured.

Let me know if you have any other questions.

Best regards
Grzegorz

##################################################################
# Grzegorz Jakacki                       Huada Electronic Design #
# Senior Engineer, CAD Dept.              1 Gaojiayuan, Chaoyang #
# tel. +86-10-64365577 x2074               Beijing 100015, China #
# Copyright (C) 2002 Grzegorz Jakacki, HED. All Rights Reserved. #
##################################################################