Mapping StructuredBlocks to Blocks

Help
2006-12-07
2013-05-02
  • Trevor Harmon
    Trevor Harmon
    2006-12-07

    I'm trying to use JODE to generate control flow graphs of Java methods. This is easy enough using the StructuredBlock objects of the flow package. For instance, consider the following (non-sensical) Java method:

    if (i > 1) {
    __if (i > 2)
    ____i += 3;
    } else {
    __i += 4;
    __i *= 5;
    }
    int j = 6;
    j += 7;
    return j;

    After telling JODE to decompile the class for this method, I can grab the root StructuredBlock by doing MethodAnalyzer.getMethodHeader().getBlock(). Then, by recursively following its subblocks, I can produce a tree of the method's structured block objects:

    IfThenElseBlock
    __IfThenElseBlock
    ____InstructionBlock
    Else
    __InstructionBlock
    __InstructionBlock
    InstructionBlock
    InstructionBlock
    ReturnBlock

    This is great except for one problem: I need the bytecode that corresponds to each structured block. (I'll be using the CFG for worst-case execution time analysis, so I need to know how much CPU time each block takes up, and that requires knowing the bytecode.)

    I understand that the basic block bytecode for a method can be obtained by calling MethodInfo.getBasicBlocks(), but this is just an array of blocks. There doesn't seem to be a way of mapping these Block objects to their corresponding StructuredBlock objects. Is this possible?

    Thanks.

     
    • It is as you noticed:  StructuredBlock has no reference to the original bytecode any more.  With MethodInfo.getBasicBlock() there is no way to get the If-Then-Else structures or the Loops directly. You would have to do the T1/T2-analysis that the FlowBlock class implements.  Or you could extend the structured blocks to store the original bytecode.

      If you are just interested in the flow graph, then the basic blocks (net.sf.jode.bytecode.MethodInfo) is enough.  You can ask each Block for its Sucessors.  If there are more than one successors, then the last instruction of the block is the conditional jump instruction.  The only caveat is that some instructions are already canonicalized, e.g. unconditional jumps are removed, lookupswitch replaced by tableswitch, etc.  So if you're interested in exact timing, you have to take this into account.

       
      • Chad McHenry
        Chad McHenry
        2006-12-07

        If you're not interested in the actual decompilation to source code, you may find one of the other byte code analyzers to have better documentation, and be more current. JODE's bytecode manipulation classes are excellent, but getting rather old, and the API is not as intuitive as the others. That said, if decompilation is the end goal, I've found nothing better than JODE, including several commercial and academic tools I tried.

        ASM <http://asm.objectweb.org/> is well maintained, current to java 1.5, and has decent documentation. BCEL <http://jakarta.apache.org/bcel/>, though apparently still making releases, does not see regular maintenence, and is behind ASM in capabilities (last I looked it had no support for java 1.5 features).

         
        • Trevor Harmon
          Trevor Harmon
          2006-12-07

          "If you're not interested in the actual decompilation to source code, you may find one of the other byte code analyzers to have better documentation, and be more current."

          Well, I'm not interested in decompilation per se, but I am interested in the source code. I need to be able to display the source code corresponding to each component of the control flow graph. I had considered simply using the LineNumberTable to jump back to the appropriate source code line, but that gets messy due to comments, different brace conventions, etc.

          Because JODE does such a good job of reconstructing the source code, and also generates the control flow graph as part of its analysis, it solves this problem for me. The only catch is that it doesn't maintain the underlying bytecode for a timing analysis. If it could -- and I'm hoping to modify JODE's source code to do so -- then it would be the perfect tool for my needs.

          As for ASM and BCEL, yes, I had actually tried out both of them, as well as a similar tool called Javassist. My plan was to do all the bytecode analysis myself and use the tool simply for extracting the bytecode of a given method. With input programs containing only If/Then/Else constructs, I was having a lot of success, but as soon as I started to consider loops (for/do/while), my task suddenly became much harder. It's quite difficult to identify whether a brach instruction is part of an If, a While, or whatever. ASM and BCEL can't help with this problem, as far as I know, but JODE can. That's why I'm focusing on JODE instead of lower-level bytecode tools.

           
      • Trevor Harmon
        Trevor Harmon
        2006-12-07

        "Or you could extend the structured blocks to store the original bytecode."

        That's the approach I'd like to try. If I can get it to work, I'd have both the high-level StructuredBlock control flow combined with the low-level bytecode for accurate timing analysis.

        It seems like this wouldn't be overly difficult, given that the bytecode is already there as the input to the analysis. I just need to prevent the bytecode from being discarded during the process.

        How should I go about implementing this? Could you give me some pointers on where to get started?

        Thanks!

         
        • > "Or you could extend the structured blocks to store the original bytecode."

          > How should I go about implementing this? Could you give me some pointers on where to get started?

          Add a field to StructuredBlock to store list of instructions.  Then modify decompiler.Opcodes to include the instructions in newly created blocks (I think only the createXYZ methods need to be changed).

          Then check all methods that create new blocks, e.g. grep for "new [A-Za-z]*Block" and check whether you need to move bytecode instructions, because the block that contained them previously is removed (e.g. ConditionalBlock->IfThenBlock).

           
          • Trevor Harmon
            Trevor Harmon
            2006-12-12

            "Add a field to StructuredBlock to store list of instructions. Then modify decompiler.Opcodes to include the instructions in newly created blocks (I think only the createXYZ methods need to be changed)."

            Thanks for the tip. Yes, some of the createXYZ methods had to be changed. I also had to change the handling of the return opcode because it creates a ReturnBlock object directly.

            "Then check all methods that create new blocks, e.g. grep for "new [A-Za-z]*Block" and check whether you need to move bytecode instructions, because the block that contained them previously is removed (e.g. ConditionalBlock->IfThenBlock)."

            I haven't gotten to the control flow expressions yet, but I think I found an easy solution for the basic blocks. Note that basic blocks start out as multiple StructuredBlocks, one for each opcode, and are then merged into one StructuredBlock per Java statement. This merging causes all but the last opcode in each statement to be lost.

            To fix this problem, I'm using the StructuredBlock.moveDefinitions method. In JODE, this method is apparently unused, but because it's called at certain key points during the analysis, I discovered that I can use it to merge bytecode instructions. Using the "from" parameter, I simply copy the bytecode instructions in the first sub-block to the front of the second sub-block's instruction list. This appears to be the only change necessary to avoid losing bytecode instructions in basic blocks.

            I will now work on the higher-level constructs (IfThenElseBlock, LoopBlock, etc.). When I'm finished, would you be interested in merging the changes into the JODE trunk?

            One more thing: I noticed that the return opcode is preserved only for methods that return a value. If the method returns void, JODE doesn't create a ReturnBlock, even in the places where the return opcode appears. In fact, the Opcodes function never gets called with opc_return as a parameter, and even if it did, it would throw an InternalError. Why is this?

            Thanks!

             
            • > One more thing: I noticed that the return opcode is preserved only for methods that return a value. If the method returns void, JODE doesn't create a ReturnBlock, even in the places where the return opcode appears. In fact, the Opcodes function never gets called with opc_return as a parameter, and even if it did, it would throw an InternalError. Why is this?

              A return opcode is translated by the bytecode package to a "jmp null", and null is interpreted as end of method.  The main reason for this is that it automatically leads to better decompiled code.  Consider a simple void function

              void f() { if (cond) then stmt1 else stmt2 }

              An optimizing compiler produces the following code:

              compute cond
              ifeq lab1
              stmt1
              return
              lab1:
              stmt2
              return

              If I would not translate the return to jumps this would be decompiled as
              void f() { if (cond) then {stmt1 return;} else {stmt2 return;}}

              Note, that also instructions for unconditional jumps are completely removed in BasicBlocks.  Instead this information is only stored in the successor field of a block.