Menu

help! I like to make changes about boomerang

Mary
2009-01-26
2013-05-28
  • Mary

    Mary - 2009-01-26

    First, I thank you for spending time answering our questions, and it becomes not that difficult to understand Boomerang with your help.
    I am a thesis student, and I am working on my thesis. I like to implement the idea of my thesis on top of Boomerang, and I like to make sure whether it is a good choice for my thesis. Actually, what I will do are the followings:
    A)For a disassembled executable (which was written in assembly language originally ), represent assembly instructions with semantics (e.g. POP EAX is represented with SS[ESP]:= EAX, ESP := ESP -1);
    B)Use local optimization techniques to remove dead codes, propagate the values of intermediate variables, and simplify expressions;
    C)Basically, Boomerang has done the work, but I like to extend its function, such as extending the operators of expressions (e.g. adding “|”, “&”,”^” etc.), and to simplify the expressions in my own way (I have design simplification rules and the corresponding algorithms) for a special purpose. That is, I have to modify and extend the expression simplifications.
    I like to know whether there are design documents about Boomerang, so that I can understand it easier and can work it quickly. Also I like to know whether it is not difficult to do my work(I plan to spend two months implementing my thesis).

    Thanks.

     
    • Anonymous

      Anonymous - 2009-02-25

      Hi Mary,

      Really interesting...

      I do have a project similar to boomerang.

      In the past I contacted [if I remember well], Mike Van Emmerik to ask something about Boomerang, because I have worked with something similar, and my interest was to learn with Boomerang and implement my own system in my project.

      But, I got nothing more than "you need to do much effort on it" and some other words that really dont tell much.

      I think that the lack of interest on Boomerang is because Mike is working in a company that have one project similar to Boomerang, and is evident that there is conflicts of interests. You can take a look here to know more about  ->
      https://sourceforge.net/forum/forum.php?forum_id=611853

      Well, I looked a bit to Boomerang codes, and... I think there is a big fail in the design. Maybe because the lack of time, or lack of care in doing it. The reason is simple, Boomerang tries to output the opcodes directly to the intermediate language [SSL].
      But, in my opinion, it is not right, would be better if you output the code to ASM and latter prepare the code to be translated to the intermediate language, because you can have many problems related with the task of "decoding the opcodes", so what will happen if you get any problem in the "decoding" stage, is much probable that you will take a wrong interpretation.

      Also, in the case of X86 architecture, you have 3 "general" instruction sets, that is 16bits, 32bits, 64bits. And more at least 8 [there are many others] specific instruction sets like FPU, 3DNOW, MMX, SSE1/2/3/4/5...
      So, Boomerang can work only with 32bits general instruction set. What I want to mean is that will be a bigger problem to add support to the other instruction sets in the way Boomerang is designed, and I bet that the best way is to first "decode the instructions to ASM", second "pre-process the decoded instructions" to remove possible obfuscations like "interleaved jmps", "null instructions" and many others; third step would be the recognition of some structures and maybe some other things.
      SO, until you do a bigger cleanup, and have a working code from the disassembled file, I think you are unable to do any type of translation to the intermediate language. The reason is that there is many things you need to do, like removing C runtime functions, buffer security check, SEH exception handling and some other things that is leaved by the compiler in one compiled executable [sure, it is only in the case of C and C++ programs].

      For recognizing win32 API structures, I bet the best to do is read all C/C++ headers from Win32SDK, so, will be possible to recognize any type of structure... also, will be possible to take many informations like calling conventions, variable names, variable types/sizes, equates, structure specifications....  In this case, Boomerang was going in the right way, the only one problem is that Boomerang cant deal with C/C++ headers like the ones that is availbale on Win32SDK and on real programs.

      Sorry of writing many things.

      Best Regards. ;-)