From: Bernardi M. L. <mar...@in...> - 2002-10-08 21:02:33
|
Hi all , I'm a Computer Science Engineering Student. I'm working at my master thesis on the subject of Reverse Engineering. To be precise my thesis is about the recovering of scenarios ( use cases view ) from C++ source code. I've studied some methodologies about this recovering process as MM-Path and ASF ( Atomic System Function ). To implement a software tool that use these methodologies i need to do some analysis on c++ code such as parsing , symbol table management , control flow analysis and data flow analysis. I've not already done a deep analysis on *all* information i must to extract from code but reading of OpenC++ it has interested me in some ways. I'd like to know if it is suitable as a analysis platform to extract from code the info i require ( for example list of Methods , Class Hierarchy , Symbols informations and the capability of identify some particular type of statements[like I/O statements] to record in witch method a MM-Path start and/or die). If someone who has a certain degree of practice with it can summarize me the features that the reflective approach can give to my goal i would thank him a lot! I've read some papers from the OpenC++ site but i need to better understand what openc++ can do and what remains on my shoulder :) Cheers , Mario Luca Bernardi |
From: Grzegorz J. <ja...@he...> - 2002-10-11 07:06:54
|
On 8 Oct 2002, Bernardi Mario Luca wrote: [intro snipped] > I'd like to know if it is suitable as > a analysis platform to extract from code the info i require ( for > example list of Methods , Class Hierarchy , Symbols informations and > the capability of identify some particular type of statements[like I/O > statements] to record in witch method a MM-Path start and/or die). Yes. Observe however, that there are two major modes of using OpenC++: (1) Deriving classes from "Class" in order to customize the source-to-source translation. (2) Taking OpenC++ source code as a code base to build your own application of top of it (it usually means deriving from "Walker" and/or making modifications to existing code). From what you wrote about your requirements it looks like you should be looking at mode (2). > If someone who has a certain degree of practice with it can summarize me > the features that the reflective approach can give to my goal i would > thank him a lot! I've read some papers from the OpenC++ site but i need > to better understand what openc++ can do and what remains on my > shoulder :) > i need to do some analysis on c++ code > such as parsing Probably OpenC++ has almost all of the things you need wrt. parsing C++. There are some issues with new C++ constructs, like 'template' keyword inside scoped identifier, 'explicit' or 'typename'. However fixing a parser should not be a big problem. > , symbol table management , The symbol table in OpenC++ is somewhat distributed. It is fundamentally implemented by objects of "Environment" and "Class" connected through miscellaneous pointers. The information on variables is not stored, i.e. once the translator leaves the block of function scope it forgets what types or variables were defined within. Thus if you want to have this information available after parsing, you have to implmenet some storage mechanism by yourself. Also OpenC++ does not remember the types it derives for symbols or expressions. If you want to annotate the parse tree with types, you have to implement this by yourself. I am told that Synopsis project, which is reusing OpenC++ source, implemented the parse tree type annotations. > control flow analysis and > data flow analysis. I am not sure what exactly you mean, but I think that OpenC++ does not provide any data structures or algorithms to perform this kind of analysis. However e.g. in my project I have implemented simple intra-procedural data flow analysis and it went quite easy, I just derived my analyzer from "Walker" class. To give you full picture, this is a list of current issues with OpenC++: * tampering with parse tree is difficult; parse tree is very verbose, every C++ construct is encoded (roughly) using lisp-like list constructors. The problem is that this structure is exposed to clients, in particular to the methods of Walker (or its derivatives) which perform translation. In other words your walker is given a pointer 'p' to the tree representing function definition, and in order to get function name you have to call 'p->Cdr()-Car()', in order to get function body you call 'p->Cdr()->Cdr()->Cdr()->Car()', which again you will access with Cdr's and Car's (or other similar functions, which are nevertheless just shortcuts for some combinations of Cdr's and Car's). Your program works on lists, and you have to know how C++ constructs map into those lists (you can learn it from parser or from tree dumps). Also, when you write your code, you are hard-coding this knowledge. Thus if you want to change the parse tree, e.g. to add type annotation node (so-called 'Decorator'), you cannot simply change the representation of parse tree slightly, because you would break all the existing code, *including* the code in Walker on which you build your system. I think this is the major design problem at the moment, since it makes playing with existing OpenC++ very unsafe and difficult (however possible, as Synopsis example shows). I am thinking about a solution to this problem, but I do not expect to have any working solution soon. And even if I had I probably would not have time to implement it all by myself. * templates are not fully supported. Templates were not in a language when OpenC++ was designed, so the support was added latter and is a kind of a patch. I was trying to fix it some time earlier this year and now HEAD branch can instantiate templates and identify types of the form 'tmpl<args>::type'. However this implementation is far from being perfect, it does not take default arguments into account (this is easy to fix) nor does it understand templates specializations. Probably it is also going to be slow, as it does not cache instantiations (I left this issue for later, as for now it is not a problem for me). * there is a quite low limit on the length of type identifiers; all type names are encoded and stored as strings. The length of those strings cannot exceed 256 or something. With template arguments it may be a problem, since type names may be arbitrary long, and the problem is that they really are even in Standard Library. This is a technical issue, but may require many modifications in the code if you really hit it. * OpenC++ has problems with compiling Standard Library comming with newer compilers, like g++ versions 3.0 and above. This is not a big problem at the moment, since e.g. g++ 2.95 is quite a solid version and it is going to be around for a while, but eventually those problems need to be addressed. As I remember the problems arise due to some of the limitations listed above. * The type evaluation is not completely implemented, i.e. typing of some expressions is simplified (e.g. operators). I remember that I had to augment the typing algorithm in several places and that it went quite smoothly. While this is some work due to many different cases, it should be generally easy to do. * Overload resolution in type evauation is not implemented (this should be moderately difficult to implement). However OpenC++ correctly recognizes overloaded function/method definitions. * OpenC++ use model needs a face-lift, e.g. on this list we agreed at some moment that "driver" functionality should be eventually moved out of the main executable, so that the translation is separated from other housekeeping tasks (calling preprocessor, linker, compiler etc.). Those are the bad things. Remember however, that the list of good things that OpenC++ provides is much longer. The work already accumulated in the OpenC++ project is substantial and I doubt that you can quickly write similar product from scratch, or even if you begin with yacc C++ grammar. And even despite what I wrote about syntax tree, OpenC++ has object-oriented design and is quite well structured. Let me know if you have any other questions. Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2002 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |