Some opcodes are re-mapped as they are read

Help
2006-12-14
2013-05-02
  • Trevor Harmon

    Trevor Harmon - 2006-12-14

    When JODE reads in the opcodes of a class, it performs a kind of multiplexing on them. For example, bipush, sipush, iconst_0, and many other opcodes are all mapped to ldc. You can see this happening in BasicBlockReader.readCode(...).

    While this may simplify the decompiling code, it's a problem for me because I need to preserve the original opcodes in order to do an accurate timing analysis.

    I'm guessing there might be an easy fix. Could I simply "de-multiplex" the code? That is, I could translate this:

    case opc_iconst_0: case opc_iconst_1: ...
        instr = new ConstantInstruction(opc_ldc, constants[opcode - opc_aconst_null]);
        length = 1;
        break;

    into this:

    case opc_iconst_0:
        instr = new ConstantInstruction(opc_iconst_0, constants[opcode - opc_aconst_null]);
        length = 1;
        break;
    case opc_iconst_1:
        instr = new ConstantInstruction(opc_iconst_1, constants[opcode - opc_aconst_null]);
        length = 1;
        break;
    ...

    And then I would simply add handlers for these opcodes in Opcodes.addOpcode. Is that an adequate solution, or is a more complicated change necessary? Thanks.

     
    • Jochen Hoenicke

      Jochen Hoenicke - 2006-12-14

      You could do this, but you have to change all occurences of opc_ldc (e.g. in the Verifier, BytecodeInterpreter, and so on) and the same for the other mapped opcodes (see comment in Instruction.java):

      [iflda]load_x           -> [iflda]load
      [iflda]store_x          -> [iflda]store
      [ifa]const_xx, ldc_w    -> ldc
      [dl]const_xx            -> ldc2_w
      wide opcode             -> opcode
      tableswitch             -> lookupswitch
      [a]newarray             -> multianewarray

      one problem is "wide" as it is a prefix opcode. You should not add a new field to instruction, as it is size optimized (the obfuscator package may need to load many instructions into memory).

      BTW. Your code above can be simplified to:
      case opc_iconst_0: case opc_iconst_1: ...
      instr = new ConstantInstruction(opcode, constants[opcode - opc_aconst_null]);
      length = 1;
      break;

       
      • Trevor Harmon

        Trevor Harmon - 2006-12-17

        "You could do this, but you have to change all occurences of opc_ldc (e.g. in the Verifier, BytecodeInterpreter, and so on) and the same for the other mapped opcodes (see comment in Instruction.java)"

        Yes, I see that BasicBlockWriter will also have to be changed. There are actually quite a few changes that will have to be made. Would you be interested in integrating these changes into the JODE trunk? It may be of interest even to those who are not interested in time analysis of bytecode. (For instance, it would allow a disassembly of a class file that includes its structure.)

        "one problem is "wide" as it is a prefix opcode. You should not add a new field to instruction, as it is size optimized (the obfuscator package may need to load many instructions into memory)."

        Interesting. Does the wide instruction actually occur often enough that it would make a noticeable impact on memory? I would not have expected this to be an issue.

        Also, what about the other mappings? For example, how can mapping bipush into ldc save memory? I'm just not understanding the purpose of these other mappings.

         
        • Jochen Hoenicke

          Jochen Hoenicke - 2006-12-17

          > Yes, I see that BasicBlockWriter will also have to be changed. There are actually quite a few changes that will have to be made. Would you be interested in integrating these changes into the JODE trunk? It may be of interest even to those who are not interested in time analysis of bytecode. (For instance, it would allow a disassembly of a class file that includes its structure.)

          As long as not too many things need to be changed, I think it is okay to include it in the trunk.  One problematic class is probably the SyntheticAnalyzer.  It does many checks for "== opc_aload/astore".  Another issue is if changing the slot of a local variable should change the opcode, too.  The obfuscator has some code that reorders variables.

          > Interesting. Does the wide instruction actually occur often enough that it would make a noticeable impact on memory? I would not have expected this to be an issue.

          No.  As long as the extra memory is only needed for wide instructions that is no problem at all.  But if you add a field to Instruction it affects the memory for all instructions.  I wonder if you need this at all.  Every compiler should only issue a wide instruction if its really needed and you can just check for getLocalSlot() >= 256, and check if getIncrement() is in byte range.

          > Also, what about the other mappings? For example, how can mapping bipush into ldc save memory? I'm just not understanding the purpose of these other mappings.

          These don't save memory, but they save case distinctions in Verifier, decompiler.Opcodes and Interpreter.  Also they make it easier to change the variable slot without the need to change the opcode.  The only price is slightly more code in BasicBlockWriter.  But since the purpose of BasicBlockWriter is to write out optimized bytecode we have to pay this prize anyway.

          I think the main problem you will get to recover original byte code is the omission of jump instructions at the end of the blocks.  Also in some code a jump instruction jumps to another jump instruction (sometimes but not always the compiler optimizes it away).  After the BasicBlock transformation these jumps are removed, so you cannot recover the original bytecode.

           

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks