CPPRefactory / Discussion / Developers: Project Structure

John Edward Judd - 2002-01-25

When I first started, I sat down and had a good think about the structure the software should take. I did some web browsing on similar programs, and came up with the following.

There are six logical components of a Refactoring Browser.

The Code Model contains the structure of the software to be refactored.

The Parser generates the code model from raw source code.

The Navigator interfaces to the Code Model and allows searching and movement through it.

The Browser is a GUI (or other) application that allows user interaction with the Code Model and Refactoring Tools through the Navigator. This is effectively the front end and there can be several of these for different systems (Win/Linux/Mac) and requirements (browsers/add-ins/libraries).

The Refactorer provides a set of tools that use the Code Model to generate a set of commands for the Writer.

The Writer takes a series of commands, checks the Code Model and writes the changes to the source code.

I've already created a Visual Studio project for this structure and put it into the CVS repository under CppRefactory. (Theres another entry call cpptool, but that was an earlier experiment with CVS.) I may have jumped the gun doing this, but hey I was excited at the time. :-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- David Barker - 2002-01-25
  
  I think this is an excellent break down of the concerns in the software.
  
  The 'hard' work will be:
  Deciding on the code model structure. This will be the deciding factor in what we can reason about, and how easy it is to do so.
  
  The parsing - halleujah, this should come "quite easy" - after a lot of reading documentation.
  
  The writer. I think this is the computational guts of the project and will need to be able to reason about the code model.
  
  The GUI is very important, the tool will to cumbersome for use (I think) if the GUI isn't present. However, for us, getting started, this isn't such a problem - E.g. automated testing.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - John Edward Judd - 2002-01-25
    
    I think the parsing will trickier than you think, at least we should be prepared for that to be the case.
    
    The writer also wont be as smart as that. The refactor tool will do the reasoning and issue commands to the writer. These commands contain information about where in the code the changes are to be made, also allowing the undoing of changes if need be.
    
    There are several ways of having the writer operate. One is to rewrite an entire file when a change is made. The other is to patch the file by just adding/removing/changing code where the command specifies. Patching is my prefered option. It preserves the formating that the developer has used, we dont need to store (or parse) comments, and its simpler.
    
    Automated testing is definetly an option, and you'll notice that in the CVS repository (and project structure) I've included CppUnit which is a very good test harness. However, something that I'd like to do is develop this highly iteratively using small releases. I envision that the first release (0.1) will provide a browser that allows a single refactoring like rename.
    
    One advantage of keeping the software modular is that we can create an API for the code model and writer, allowing other developers to easily create new refactor tools. It might even be good to allow developers to create plugins for refactorings. Thats a way down the track though, for the moment we need to get first things done first.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- David Barker - 2002-01-26
  
  I'm in agreement here 100%
  Last comment about publishing APIs to the writer and code model is sound.
  
  As you say, I reckon we both get cracking, and try to implement the rename method refactoring. then we'll know a whole lot more.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - John Edward Judd - 2002-01-26
    
    Good. First though lets do some spikes [see other post for an explanation] of the parser and preprocessor. I've been having some ideas on the preprocessor [a painful thing ;-) ] and might experiment with that for a while. I'll post some stuff when Ive thought it through a bit better.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dakshinamurthy Karra - 2002-02-16
  
  The structure looks good. I am more interested in making this a little cross-platform (My laptop runs Suse Linux!). The mechanism I am thinking is to concentrate on the Code Model, Parser, Refactor and writer modules to be as platform independent as possible.I do not think it will be a major problem.
  
  I think we need a high level API - I presume that will be provided by the Refactor module. Similar to RenameVariable (variableName, referringSourceInformation) or such. In the scheme of things the Refactor is supposed to be the entry point for all external applications. Am I right?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - David Barker - 2002-02-16
    
    Hmmm, a high level interface would be nice...
    I'm worrying about a lower DB interface at the minute, hoping that I can run over a parse tree registering namespaces, classes, methods and variables with the DB.
    
    In this vein, I've started an experimental DB interface... This is what I have right now:
    
    NamespaceId getGlobalNamespace() = 0;
    
    TypeId getNoClassId() = 0;
    
    NamespaceId registerNamespace(const Context &context, const char * name) = 0;
    
    TypeId registerClassDeclaration(const Context &context, const char * name) = 0;
    
    VariableId registerVariable(const Context &context, const char * name, const TypeId &type, const Modifiers &modifiers) = 0;
    
    bool forgetDefinitionsInFile(const char * filename) = 0;
    
    FunctionId registerFunctionDeclaration(const Context &context, const char * name, const ParameterList &parameters, const TypeId &returnType, bool isStatic, bool isConst) = 0;
    
    bool functionCallsFunction(const FunctionId &caller, const FunctionId &callee) = 0;
    
    bool functionUsesVariable(const FunctionId &function, const VariableId &variable) = 0;
    
    Context has a NamespaceId, a ClassId and a filename...
    
    This interface is write only (excepting forgetAllDefinitions...), later, I will need methods that alter the DB, and therfore program.
    
    Obviously, we have no way of recinding actual code inside the methods. This is needed - but not yet. All I want to do is implement the "renameMethod" refactoring, which I think this DB structure will allow.
    
    What do you think?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dakshinamurthy Karra - 2002-02-16
  
  I started with a few tests for renameVariable - currently considering only a simple variable. I have some structure getting out of the chaos currently.
  
  The idea runs in the lines of using the same structure for all the non-terminal symbols (read identifiers). A quick example:
  
  void
  f ()
  { int i ;
  
  for (int i = 0; i < 100; i++)
  ;
  }
  
  Each of the 'i's will get unique ids and the references are stored using these unique ids.
  
  I know that I am not very clear yet, but I shall give more details tomorrow.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dakshinamurthy Karra - 2002-02-17
  
  I added a small file called renamevariable.doc to CppRefactory/Doc. Please comment.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - David Barker - 2002-02-17
    
    Using mangled names internally, will yeild a simpler schema for sure.
    
    I agree with the ideas in the document, As I've mentioned before, I'm worrying about the interface to such a DB. How does this sound:
    
    The parser has a context stack.
    To begin with the context is the global namspace, and no class at present.
    Whenever a class, struct, namespace declaration, or block is encountered, the parser makes a new context by transforming the old context with a DB call such as.
    
    Context enterNamespace(Context &prevContext, char * NsName);
    or
    Context enterClass(Context &prevContext, char * className)
    et.c.
    Whan a variable declaration was encountered, the parser would use
    VarId registerVaraible(Context, varName);
    using the context at the top of its stack.
    
    when a closing brace is found, the top context is popped.
    
    then the parser would look up names with another method such as
    VarId lookupVariable(Context, name);
    this would perform the process of disambiguation, and would yield the id of a varaible that had previously been registered agaist a the context, or a context accessible from it (I.e. down the stack somewhere)
    
    All the worries about lookup, name-mangling, outputting a disambiguated name are left to the DB internals. - which will use a scheme such as the one you outlined.
    
    Does this seem fair enough?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Dakshinamurthy Karra - 2002-02-17
      
      Looks like too much is being put into the DB class. Can't we use something like gdbm and get the work done? I am sure there will be few ISAM libraries in the opensource world and we can have a look at those.
      
      The mangling of names should still be left to the parser, because he owns the context stack (I prefer using the term scope stack). I am looking at something similar to what you have mentioned.
      
      My first infrastructure (along with CppUnit) for ANTLR based CppParser is ready. I am trying out the lexer first. I took the lexer from the tinyc example and enhancing a bit. Currently, it works properly except for tri-graphs and some escape sequences.
      
      I will continue some more work over this week and the weekend. Hope I can get a decent parser up and running by that time.
      
      PS: I am using a modified version of CPPUnit, where the global declaration of the TestCase object registers the UT and that too on Linux. Currently do not have access to a Win32 box - hope this is OK with you guys.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - David Barker - 2002-02-18
        
        I have to agree that in my last description, the DB class was trying to do too much.
        
        I currently, think that the DB should be there to guide where the refactoring is done. What I mean by this is that it hold cross referenceing information, so you know what depends on what. When renaming a method, you have to know where it's declared, and who uses it, in order for all the code to recompile. the DB would be there to link things up. else, the refactorer would have to go over every translation unit, reparsing the entire source. Since some (at least the source I work on at work) takes many hours (seven) to compile afresh, this is not a good idea.
        
        So I think the DB will not maintain an AST, but rather cross references, the AST will be rebuilt as the translation units are refactored, and the DB will tell the refactorer which translation units to work on.
        
        Sound good?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        John Edward Judd - 2002-02-18
        
        Yep.
        
        I reckon the DB would work better separately from any AST as a symbol table. It would still be used by the AST, but wouldnt contain it.
        
        Is this what you were thinking?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - John Edward Judd - 2002-02-19
        
        DB in the context of this project isnt refering to a tradition DBMS. Its really just a collection of data representing the code, like a symbol table.
        
        We probably dont even need to store this data. Considering that the user will probably change the source code between using the refactoring tool, the db will need to be regenerated each time. In fact I think this is how JRefactory does it.
        
        The Linux stuff is cool. I do want this to be cross platform an as portable as possible. I dont have access to a Linux box though, so I cant help you there.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Dakshinamurthy Karra - 2002-02-19
        
        I think it is better to use a traditional database for keeping the reference information and this can be updated either whenever a source file is changed or a refactoring is performed on source. It might be too much of a burden on the refactoring tool if we need to parse each and every source file whenever a change is done.
        
        Another advantage of having a database is that it might actually simplify our design to a large extent. We can work on the refactoring part by manually creating the db files and then use modify parser to create the db file later.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        John Edward Judd - 2002-02-19
        
        Ah. No. You wouldnt have to reparse the entire source base each time a change is made. I was thinking of something along the lines of Visual Studio, where when a external change is made to a file, it reloads just that file. That should be nice and quick.
        
        One of my concerns with a traditional db is speed. One of our requirements is that the tool is to be fast [loading, refactoring, writing] since it will become a burden to use if its slow. Databases generally arent written for high performance applications, rather they tend towards data integrity issues and security.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Project Structure

Forums

Help

Project Structure

Project Structure

Forums

Help

Project Structure document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Project Structure