|
From: Stephen T. <st...@to...> - 2007-01-04 21:30:03
|
Is the guest_bytes_addr variable marking the beginning of the memory containing binary instructions in memory? That is similar to the image base in Windows PE files? For example if a program loaded into memory has a image base of 0x1000000 and a base of code 0x1000 so that our file loaded into memory should begin at 0x1001000. Is the 0x1001000 what I put in the guest_bytes_addr? Stephen |
|
From: Stephen T. <st...@to...> - 2007-01-04 22:03:58
|
On Thu, 2007-01-04 at 15:29 -0600, Stephen Torri wrote: > Is the guest_bytes_addr variable marking the beginning of the memory > containing binary instructions in memory? That is similar to the image > base in Windows PE files? > > For example if a program loaded into memory has a image base of > 0x1000000 and a base of code 0x1000 so that our file loaded into memory > should begin at 0x1001000. Is the 0x1001000 what I put in the > guest_bytes_addr? > > Stephen This is in reference to the VEX library. Sorry for not mentioning it. Stephen |
|
From: Julian S. <js...@ac...> - 2007-01-05 15:21:46
|
On Thursday 04 January 2007 21:29, Stephen Torri wrote: > Is the guest_bytes_addr variable marking the beginning of the memory > containing binary instructions in memory? That is similar to the image > base in Windows PE files? Well, sort of. Bear in mind vex just does the donkeywork of disassembling small fragments of code - extended basic blocks. You have to tell it, via repeated calls to LibVEX_Translate, where those blocks are - it has no understanding of executable file formats or any such. That said, UChar* guest_bytes need to point at the address in the host's memory (ie, the machine running vex) where the instructions are. Whereas Addr64 guest_bytes_addr needs to say what (guest) program counter corresponds to that fragment of code. > For example if a program loaded into memory has a image base of > 0x1000000 and a base of code 0x1000 so that our file loaded into memory > should begin at 0x1001000. Is the 0x1001000 what I put in the > guest_bytes_addr? In that case you need to put 0x1001000 guest_bytes_addr and the actual address of where you have it in the host's memory into guest_bytes. J |
|
From: Stephen T. <st...@to...> - 2007-01-05 15:43:12
|
Thanks. I think I am beginning to understand. I see how I need to make repeated calls to LibVEX_Translate. How do I get back the address its trying to jump to when it finishes the end of a basic block? If I enable tracing I can see that it wants to jump or call a function at a address. How do I get that address? Stephen On Fri, 2007-01-05 at 15:32 +0000, Julian Seward wrote: > On Thursday 04 January 2007 21:29, Stephen Torri wrote: > > Is the guest_bytes_addr variable marking the beginning of the memory > > containing binary instructions in memory? That is similar to the image > > base in Windows PE files? > > Well, sort of. Bear in mind vex just does the donkeywork of disassembling > small fragments of code - extended basic blocks. You have to tell it, via > repeated calls to LibVEX_Translate, where those blocks are - it has no > understanding of executable file formats or any such. > > That said, UChar* guest_bytes need to point at the address in the host's > memory (ie, the machine running vex) where the instructions are. Whereas > Addr64 guest_bytes_addr needs to say what (guest) program counter corresponds > to that fragment of code. > > > For example if a program loaded into memory has a image base of > > 0x1000000 and a base of code 0x1000 so that our file loaded into memory > > should begin at 0x1001000. Is the 0x1001000 what I put in the > > guest_bytes_addr? > > In that case you need to put 0x1001000 guest_bytes_addr and the actual address > of where you have it in the host's memory into guest_bytes. > > J |
|
From: Julian S. <js...@ac...> - 2007-01-05 16:04:28
|
On Friday 05 January 2007 15:42, Stephen Torri wrote: > Thanks. I think I am beginning to understand. I see how I need to make > repeated calls to LibVEX_Translate. How do I get back the address its > trying to jump to when it finishes the end of a basic block? If I enable > tracing I can see that it wants to jump or call a function at a address. > How do I get that address? Well, if it's an unconditional branch at the end of a block it will be in the IRSB.next field. The destination for a conditional exit from the block is contained in the IRStmt.Ist.Exit.dst field. Note these names apply to the most recent trunk versions of vex, not the 3.2.1 stuff. You should use the trunk if you are not already as libvex_ir.h contains enhanced documentation and some renaming to make it a bit clearer. J |
|
From: Stephen T. <st...@to...> - 2007-01-05 19:48:35
|
Do I need to provide a instrument function so that I can have access to the IRSB just processed by LibVEX_Translate? I have been looking at valgrind/VEX/test_main.c as an example of how to work with VEX. In vex_main.c I see how the instrument1 and instrument2 function are used but I am concerned about keeping the pointer to the irsb variable in the class that is calling LibVEX_Translate. Its seems that the contents of irsb in the vex_main.c LibVEX_Translate function will be destroyed when the function call returns back to my class. So then should I attempt to use this pointer copy I will get a segfault. Is that true? There are two actions I would like to take: 1. Using the IRSB grab the value of the 'next' field and use that to prep the translation of the next block. 2. Get access to the assembly code or the data used to make the assembly code so that I can create a basic block graph writer to graphically display the basic block structure. I know the first has been suggested to me as being possible but how eludes me. The second might be just a dream. Stephen On Fri, 2007-01-05 at 16:14 +0000, Julian Seward wrote: > On Friday 05 January 2007 15:42, Stephen Torri wrote: > > Thanks. I think I am beginning to understand. I see how I need to make > > repeated calls to LibVEX_Translate. How do I get back the address its > > trying to jump to when it finishes the end of a basic block? If I enable > > tracing I can see that it wants to jump or call a function at a address. > > How do I get that address? > > Well, if it's an unconditional branch at the end of a block it will be in > the IRSB.next field. The destination for a conditional exit from the block > is contained in the IRStmt.Ist.Exit.dst field. > > Note these names apply to the most recent trunk versions of vex, not the > 3.2.1 stuff. You should use the trunk if you are not already as libvex_ir.h > contains enhanced documentation and some renaming to make it a bit clearer. > > J |
|
From: Julian S. <js...@ac...> - 2007-01-06 17:34:51
|
On Friday 05 January 2007 19:48, Stephen Torri wrote: > Do I need to provide a instrument function so that I can have access to > the IRSB just processed by LibVEX_Translate? > > I have been looking at valgrind/VEX/test_main.c as an example of how to > work with VEX. In vex_main.c I see how the instrument1 and instrument2 > function are used but I am concerned about keeping the pointer to the > irsb variable in the class that is calling LibVEX_Translate. Its seems > that the contents of irsb in the vex_main.c LibVEX_Translate function > will be destroyed when the function call returns back to my class. So > then should I attempt to use this pointer copy I will get a segfault. Is > that true? Yes. If you want to hold onto the IRSB permanently, you need to copy it elsewhere. deepCopyIRSB is provided for that purpose. Unfortunately that just allocates memory (by calling LibVEX_Alloc) which is short-lived so doesn't immediately solve your problem. What you will need to do is mess with LibVEX_Alloc (see libvex.h), so that you switch to allocating from a permanent area of your own, then call deepCopyIRSB, then switch back. It's ugly - it is like that because vex is tuned to be part of Valgrind and allocate as quickly as possible. > 1. Using the IRSB grab the value of the 'next' field and use that to > prep the translation of the next block. What specific problem are you having here? > 2. Get access to the assembly code or the data used to make the assembly > code so that I can create a basic block graph writer to graphically > display the basic block structure. Well, if you unpick the procedure by doing multiple calls to LibVEX_Translate, then you should be able to build up the graph. ---------- You're aware that disassembling arbitrary blocks of code back into a control flow graph is in the general case impossible? (that is, equivalent to solving the halting problem). Because you have no way to know what is code and what is data. Indeed, one of the the basic design decisions in Valgrind - to disassemble basic blocks only when execution demands them - is in part so as to sidestep precisely this problem. Your difficulties will set in when the 'next' field is not a constant but a value computed at run-time; then you cannot know where it might jump to. J |
|
From: Stephen T. <st...@to...> - 2007-01-06 21:02:23
|
On Sat, 2007-01-06 at 17:45 +0000, Julian Seward wrote: > Yes. If you want to hold onto the IRSB permanently, you need to copy > it elsewhere. deepCopyIRSB is provided for that purpose. Unfortunately > that just allocates memory (by calling LibVEX_Alloc) which is short-lived > so doesn't immediately solve your problem. What you will need to do is mess > with LibVEX_Alloc (see libvex.h), so that you switch to allocating from a > permanent area of your own, then call deepCopyIRSB, then switch back. > It's ugly - it is like that because vex is tuned to be part of Valgrind > and allocate as quickly as possible. > > > 1. Using the IRSB grab the value of the 'next' field and use that to > > prep the translation of the next block. > > What specific problem are you having here? A problem of understanding. I know that you said I need to get the IRSB to be able to read the IRSB->next field to find out what address I need move to next. So when I look at the LibVEX_Translate I see that I do not get a handle to the IRSB pointer value. So I am not sure how I am to get the value you say I need. So I call this a problem of understanding. I know what you want done. It makes perfect sense. How do it? > > 2. Get access to the assembly code or the data used to make the assembly > > code so that I can create a basic block graph writer to graphically > > display the basic block structure. > > Well, if you unpick the procedure by doing multiple calls to LibVEX_Translate, > then you should be able to build up the graph. > > ---------- > > You're aware that disassembling arbitrary blocks of code back into a > control flow graph is in the general case impossible? (that is, equivalent > to solving the halting problem). Because you have no way to know what > is code and what is data. Indeed, one of the the basic design decisions > in Valgrind - to disassemble basic blocks only when execution demands > them - is in part so as to sidestep precisely this problem. Your > difficulties will set in when the 'next' field is not a constant but a > value computed at run-time; then you cannot know where it might jump to. I understand that statically disassembling arbitrary blocks of code read from a binary file does not always guarantee you that the assembly is the executed code. Which was why I was wondering if this could be gather at run-time. An idea. Whether or not its possible or some derivation might be. I thought that I could use VEX to run a segment of code, step-by-step, to output its real assembly as oppose to what is grab statically (e.g. compressed program code of the real program is not know until run-time). This is what I am working towards. As a byproduct of investigating compressed binaries I can also look at the decompression programs themselves. So I want to run the decompressed programs step-by-step to recover their code structure and generate a control flow graph. As well as stopping the decompressed program when it attempts to execute code that was loaded as data (e.g. self-modifying code). So to recap this email I am having a problem of understanding. I know in my mind what I would like VEX to do. How do I do it is where I am at? I know I am new to this library and that there will be a learning curve to climb. Stephen |