|
From: Eric L. <ew...@an...> - 2006-06-02 20:19:26
|
Hi, How would I go about translating a binary into the VEX IR and obtaining the IR so I can do further processing with it? I've looked into VEX's exported interface and I only see LibVEX_Translate which does all the steps (bb to ir, optimize, instrument, etc.) of translating from the guest bytes to the host bytes but I only want the IR that it spits out in the middle, preferably wrapped in some kinda nice clean data structure. Any help is appreciated. Thanks! Eric |
|
From: Nicholas N. <nj...@cs...> - 2006-06-02 22:13:41
|
On Fri, 2 Jun 2006, Eric Li wrote: > How would I go about translating a binary into the VEX IR and obtaining > the IR so I can do further processing with it? > > I've looked into VEX's exported interface and I only see LibVEX_Translate > which does all the steps (bb to ir, optimize, instrument, etc.) of > translating from the guest bytes to the host bytes but I only want the IR > that it spits out in the middle, preferably wrapped in some kinda nice > clean data structure. You can use the --trace-flags option to dump the IR at various stages as text. Use --help-debug or look in the manual for details. If text is not what you want, I think you're out of luck; Vex doesn't have, AFAIK, any way of handing over the IR. You'll just have to stick your code into the middle of it. But if you tell us what you want to do we might be able to give you more help. Nick |
|
From: Nicholas N. <nj...@cs...> - 2006-06-02 22:30:38
|
On Sat, 3 Jun 2006, Nicholas Nethercote wrote: > You can use the --trace-flags option to dump the IR at various stages as > text. Use --help-debug or look in the manual for details. Oh yes: use --trace-notbelow=0 with this, otherwise it won't give you the full details. Nick |
|
From: Julian S. <js...@ac...> - 2006-06-03 00:17:54
|
> I've looked into VEX's exported interface and I only see LibVEX_Translate > which does all the steps (bb to ir, optimize, instrument, etc.) of > translating from the guest bytes to the host bytes but I only want the IR > that it spits out in the middle, That much at least is easy enough. It hands the optimised, uninstrumented IR to the tool's primary instrumentation function; you can print it out or whatever at that point. > preferably wrapped in some kinda nice clean data structure. Personally I think the IR types are about as clean as you're going to get. I've found them pleasant to work with over the past couple of years. J |
|
From: Eric L. <ew...@an...> - 2006-06-03 05:02:28
|
>> I've looked into VEX's exported interface and I only see >> LibVEX_Translate which does all the steps (bb to ir, optimize, >> instrument, etc.) of translating from the guest bytes to the host bytes >> but I only want the IR that it spits out in the middle, > > That much at least is easy enough. It hands the optimised, > uninstrumented IR to the tool's primary instrumentation function; you can > print it out or whatever at that point. > >> preferably wrapped in some kinda nice clean data structure. > > Personally I think the IR types are about as clean as you're going to get. > I've found them pleasant to work with over the past couple of years. Yea you're right, the IR data structures are easy enough to understand and use after I looked into them some more. But what I want is to use VEX as a library to generate IR and then use that IR in my own project. Initially, I thought VEX did just that and handed the IR over to the tools for further processing, but now it's clear to me that it only lets you define the instrumentation functions. Would it be a fairly involved endeavor to take apart VEX and extract just the parts that generate the IR? e.g. make calls to bb_to_IR? I mean does VEX have sub modules that would make that easier? or would I have to first understand the whole thing, then change and recompile it? Thanks, Eric > > J > > |
|
From: Nicholas N. <nj...@cs...> - 2006-06-03 06:44:59
|
On Sat, 3 Jun 2006, Eric Li wrote: > But what I want is to use VEX as a library to generate IR and then use > that IR in my own project. Initially, I thought VEX did just that and > handed the IR over to the tools for further processing, but now it's clear > to me that it only lets you define the instrumentation functions. "Defining the instrumentation functions" is exactly equivalent to "handing over the IR for further processing". The instrumentation function is given each BB's IR and can then do anything it wants to it (although if the end result isn't functionally equivalent the program's behaviour will be changed). > Would it be a fairly involved endeavor to take apart VEX and extract just > the parts that generate the IR? AIUI that's really all Vex does. It has a couple of minor Valgrind-specific hooks, but it should be usable standalone. Nick |
|
From: Julian S. <js...@ac...> - 2006-06-03 11:46:44
|
On Saturday 03 June 2006 07:33, Nicholas Nethercote wrote: > On Sat, 3 Jun 2006, Eric Li wrote: > > But what I want is to use VEX as a library to generate IR and then use > > that IR in my own project. Initially, I thought VEX did just that and > > handed the IR over to the tools for further processing, but now it's > > clear to me that it only lets you define the instrumentation functions. > > "Defining the instrumentation functions" is exactly equivalent to "handing > over the IR for further processing". The instrumentation function is given > each BB's IR and can then do anything it wants to it (although if the end > result isn't functionally equivalent the program's behaviour will be > changed). One question is: do you really want to ship bits of IR off outside the Valgrind framework? Usually people want to mess with the IR and then run the results, which is precisely what the whole framework (Valgrind) exists for. It provides loads of useful infrastructure. I guess you could send the IR to an external tool for some kind of offline analysis, but once outside the Valgrind infrastructure it will be difficult to run the results. > > Would it be a fairly involved endeavor to take apart VEX and extract just > > the parts that generate the IR? > > AIUI that's really all Vex does. It has a couple of minor > Valgrind-specific hooks, but it should be usable standalone. Yes, it's very self-contained. VEX/test_main.c uses the main function (LibVEX_Translate) to deal with single BBs. VEX/switchback/switchback.c, despite being somewhat broken, is a simple dynamic-translation based program-runner based around Vex, in one file. (Both of these are for testing/debugging it.) J |
|
From: Eric L. <ew...@an...> - 2006-06-05 18:56:31
|
Thanks for all the useful advice! Really appreciate it. I hope it's not asking too much but i have a few more questions. I'm gonna pass my own instrumentation function into LibVEX_Translate to obtain a handle to the IR. I actually want to be able to pass in an entire binary, and get back the IRBB's, but VEX, as far as i can see, only translates each BB to IR. Is there a module that parses binaries to BB's that I can use? I'm guessing there's something built into coregrind that does that but can I use it without the rest of coregrind, i.e. call it directly somehow? Thanks again! Eric > On Saturday 03 June 2006 07:33, Nicholas Nethercote wrote: >> On Sat, 3 Jun 2006, Eric Li wrote: >>> But what I want is to use VEX as a library to generate IR and then >>> use that IR in my own project. Initially, I thought VEX did just that >>> and handed the IR over to the tools for further processing, but now >>> it's clear to me that it only lets you define the instrumentation >>> functions. >> >> "Defining the instrumentation functions" is exactly equivalent to >> "handing over the IR for further processing". The instrumentation >> function is given each BB's IR and can then do anything it wants to it >> (although if the end result isn't functionally equivalent the program's >> behaviour will be changed). > > One question is: do you really want to ship bits of IR off outside the > Valgrind framework? Usually people want to mess with the IR and then run > the results, which is precisely what the whole framework (Valgrind) exists > for. It provides loads of useful infrastructure. I guess you could send > the IR to an external tool for some kind of offline analysis, but once > outside the Valgrind infrastructure it will be difficult to run the > results. > >>> Would it be a fairly involved endeavor to take apart VEX and extract >>> just the parts that generate the IR? >> >> AIUI that's really all Vex does. It has a couple of minor >> Valgrind-specific hooks, but it should be usable standalone. > > Yes, it's very self-contained. VEX/test_main.c uses the main function > (LibVEX_Translate) to deal with single BBs. VEX/switchback/switchback.c, > despite being somewhat broken, is a simple dynamic-translation based > program-runner based around Vex, in one file. (Both of these are for > testing/debugging it.) > > J > > |
|
From: Josef W. <Jos...@gm...> - 2006-06-06 13:59:38
|
On Monday 05 June 2006 20:56, Eric Li wrote:
> Is there a module that parses binaries to BB's that I can use?
I once did a Valgrind tool (for 2.x) to get static infos about binaries.
The first time an ELF object was touched, I iterated over the code segment
space [start;end[ like this, ignoring code without debug info:
...
addr = start;
while(addr < end) {
/* search for address with line debug info */
while(addr < end) {
if (VG_(get_filename_linenum)(addr, filename, FILENAME_LEN, &line))
break;
addr++;
}
if (addr == end) break;
/* this always should be inside of a function */
if (!VG_(get_fnname)(addr, fn_name, FN_NAME_LEN)) { addr++; continue; }
/* decode a basic block */
bb_addr = addr;
cb = VG_(alloc_UCodeBlock)();
cb->orig_eip = addr;
size = VG_(disBB)(cb, addr);
if (size <=0) {
/* skip on error: not decodable? */
VG_(free_UCodeBlock)(cb);
continue;
}
...
I am not sure if VEX has an API similar to VG_(alloc_UCodeBlock)() and
VG_(disBB)(cb, addr) of VG 2.x.
Note that the above was still a hack, as the UCode block structure returned
by VG_(disBB) was officially not visible to tools, so I copied the definition
into the tool.
Above code would be useful for a code coverage tool: callgrind/cachegrind
optionally could include information about code which never was executed.
A postprocessing tool could say: "In this library, only 80% of code which
has debug info was touched."
This currently is impossible.
Josef
|
|
From: Julian S. <js...@ac...> - 2006-06-05 22:40:11
|
> translates each BB to IR. Is there a module that parses binaries to BB's > that I can use? I'm guessing there's something built into coregrind that > does that but can I use it without the rest of coregrind, i.e. call it > directly somehow? No. This is somewhere between very difficult and impossible; in the most general case distinguishing code from data is equivalent to solving the halting problem I believe. Valgrind carefully avoids this by translating code on demand. What are you really trying to achieve? J |
|
From: Eric L. <ew...@an...> - 2006-06-06 01:12:28
|
>> translates each BB to IR. Is there a module that parses binaries to >> BB's that I can use? I'm guessing there's something built into coregrind >> that does that but can I use it without the rest of coregrind, i.e. call >> it directly somehow? > > No. This is somewhere between very difficult and impossible; in the most > general case distinguishing code from data is equivalent to solving the > halting problem I believe. Valgrind carefully avoids this by translating > code on demand. OK, well, at least now I know not to bother trying. > > What are you really trying to achieve? I'm working on a research project that generates vulnerability signatures (signatures that let you detect exploits and all their polymorphic variations in a binary). The framework translates from BB to IR to GCL to WP(weakest preconditions) and we were hoping to replace our IR with the one in Valgrind because it's more mature. > > J > > |
|
From: Nicholas N. <nj...@cs...> - 2006-06-06 02:05:52
|
On Mon, 5 Jun 2006, Eric Li wrote: >> What are you really trying to achieve? > > I'm working on a research project that generates vulnerability signatures > (signatures that let you detect exploits and all their polymorphic > variations in a binary). The framework translates from BB to IR to GCL to > WP(weakest preconditions) and we were hoping to replace our IR with the > one in Valgrind because it's more mature. How do you go from BB to IR? Something must be identifying the BBs. Couldn't you keep that and then pass its output to Vex? Nick |
|
From: Eric L. <ew...@an...> - 2006-06-06 03:49:53
|
> On Mon, 5 Jun 2006, Eric Li wrote: > >>> What are you really trying to achieve? >> >> I'm working on a research project that generates vulnerability >> signatures (signatures that let you detect exploits and all their >> polymorphic variations in a binary). The framework translates from BB to >> IR to GCL to WP(weakest preconditions) and we were hoping to replace our >> IR with the one in Valgrind because it's more mature. > > How do you go from BB to IR? Something must be identifying the BBs. > Couldn't you keep that and then pass its output to Vex? > Yea, that's what I'm planning to do. But at first we wanted to try to replace as much of the binary to IR stuff with Valgrind as we can since Valgrind's code is more complete and mature. > Nick > > > |
|
From: Eric L. <ew...@an...> - 2006-06-06 17:43:38
|
Right now, my existing code gives me pointers to the start and end of a BB and I want to use LibVEX_Translate to process it. In LibVEX_Translate, is the "guest_bytes" argument a pointer to the start of the BB I want to translate? What are "guest_bytes_addr" and "guest_bytes_addr_noredir" for? And how do I specify the length (or the end) of the block I want to translate? Thanks, Eric > On Mon, 5 Jun 2006, Eric Li wrote: > >>> What are you really trying to achieve? >> >> I'm working on a research project that generates vulnerability >> signatures (signatures that let you detect exploits and all their >> polymorphic variations in a binary). The framework translates from BB to >> IR to GCL to WP(weakest preconditions) and we were hoping to replace our >> IR with the one in Valgrind because it's more mature. > > How do you go from BB to IR? Something must be identifying the BBs. > Couldn't you keep that and then pass its output to Vex? > > Nick > > > |
|
From: Eric L. <ew...@an...> - 2006-06-06 22:17:20
|
Oh, I forgot to mention that I'm doing all this translation statically. So my program currently takes in the name of the executable that I want to examine, then does all the disassembly, translation, and analysis. It seems like the 6th and 7th arguments to LibVEX_Translate, guset_bytes_addr and guest_bytes_addr_noredir, have to do with where in the process's addr space the BB is loaded into? But my binary has no process because it's not running so what should I fill in for those? Thanks again, Eric > On Mon, 5 Jun 2006, Eric Li wrote: > >>> What are you really trying to achieve? >> >> I'm working on a research project that generates vulnerability >> signatures (signatures that let you detect exploits and all their >> polymorphic variations in a binary). The framework translates from BB to >> IR to GCL to WP(weakest preconditions) and we were hoping to replace our >> IR with the one in Valgrind because it's more mature. > > How do you go from BB to IR? Something must be identifying the BBs. > Couldn't you keep that and then pass its output to Vex? > > Nick > > > |
|
From: Julian S. <js...@ac...> - 2006-06-06 22:34:40
|
> It seems like the 6th and 7th arguments to LibVEX_Translate, Don't use the 3.1.1 vex; use the one in the 3.2.0 release candidate (http://www.valgrind.org/downloads/valgrind-3.2.0rc1.tar.bz2). All these zillions of parameters got put in a struct, which is a lot cleaner. guest_bytes point at the actual insn bytes to read. guest_bytes_addr, as you say, is where we claim those bytes are in the guest process address space. You can put what you like there; I guess it depends on whether or not you care about the address relationship between different basic blocks or not. guest_bytes_addr_noredir, or whatever, is gone. All that ugly redirection crap is handled entirely on the valgrind side now. You have two main problems. One is you need to make vex not chase over bb boundaries. You can do this by using a VexControl struct with guest_chase_thresh set to zero (you have to supply this struct at some point, I can't remember where). If you do that then if you're lucky vex will not try and disassemble beyond the blocks of code you give it. But that's fragile since it relies on vex's decision about the end of a bb matching that from your own disassembler. I guess one kludge is: if you want to vexify a block which you know contains N instructions, you can initialise vex and set VexControl.guest_max_insns to N (along with setting .guest_chase_thresh to zero). Then it should stop after N insns even if it thinks it could go further. Does that make any sense at all? J |
|
From: Julian S. <js...@ac...> - 2006-06-06 22:40:43
|
One other thing -- if it hasn't been pointed out already -- is to make friends with the Valgrind flag combination --tool=none --trace-flags=10000000 --trace-notbelow=0 (also --trace-flags=10001000). I always find that seeing the IR printed out nicely makes it much easier to understand what's going on. J |
|
From: Eric L. <ew...@an...> - 2006-06-06 22:51:30
|
> You have two main problems. One is you need to make vex not chase over bb > boundaries. You can do this by using a VexControl struct with > guest_chase_thresh set to zero (you have to supply this struct at some > point, I can't remember where). What if I just created a dummy chase_into_ok function that always returned false and passed that into LibVEX_Translate? Would that work? |
|
From: Julian S. <js...@ac...> - 2006-06-06 22:58:41
|
On Tuesday 06 June 2006 23:51, Eric Li wrote: > > You have two main problems. One is you need to make vex not chase over > > bb boundaries. You can do this by using a VexControl struct with > > guest_chase_thresh set to zero (you have to supply this struct at some > > point, I can't remember where). > > What if I just created a dummy chase_into_ok function that always returned > false and passed that into LibVEX_Translate? Would that work? Yes, although it seems pointless since you have to supply a VexControl struct at some point anyway. It wouldn't solve the problem of ensuring that vex disassembles the right number of instructions though. For that you do need the other kludge I mentioned. J |
|
From: Eric L. <ew...@an...> - 2006-06-07 06:35:27
|
What's the "dispatcher" at the end of VexTranslateArgs in 3.2.0 (but not 3.1.1) for? And what should go there in my case? I've read the comments but I'm still not clear on it. Thanks, Eric |
|
From: Julian S. <js...@ac...> - 2006-06-07 11:52:15
|
On Wednesday 07 June 2006 07:35, Eric Li wrote: > What's the "dispatcher" at the end of VexTranslateArgs in 3.2.0 (but not > 3.1.1) for? And what should go there in my case? I've read the comments but > I'm still not clear on it. More or less irrelevant from an analysis point of view. It's where execution (on the host) jumps to after the generated code is run, so that the next block can be found/run. I'd set it to NULL. You won't see it in the IR anyway. J |
|
From: Eric L. <ew...@an...> - 2006-06-07 18:11:29
|
> I guess one kludge is: if you want to vexify a block which you know > contains N instructions, you can initialise vex and set > VexControl.guest_max_insns to N (along with setting .guest_chase_thresh to > zero). Then it should stop after N insns even if it thinks it could go > further. If I want to Vexity an array of BB's, each with a diff number of instructions, in a loop, do I have to call LibVEX_Init every iteration after setting the guest_max_insns of the VexControl argument for the current BB? Or can I just modify the VexControl struct only and VEX would pick it up? Are there any undesired side effects to calling LibVEX_Init multiple times? Thanks, Eric |
|
From: Eric L. <ew...@an...> - 2006-06-08 19:15:58
|
> On Wednesday 07 June 2006 07:35, Eric Li wrote: >> What's the "dispatcher" at the end of VexTranslateArgs in 3.2.0 (but >> not 3.1.1) for? And what should go there in my case? I've read the >> comments but I'm still not clear on it. > > More or less irrelevant from an analysis point of view. It's where > execution (on the host) jumps to after the generated code is run, so that > the next block can be found/run. I'd set it to NULL. You won't see it in > the IR anyway. There are asserts in LibVEX_Translate (specifically the switch stmt where the arch is X86 or AMD64) that check that "dispatch" is not NULL. Can I just create a dummy function for it instead? Thanks, Eric |
|
From: Julian S. <js...@ac...> - 2006-06-08 19:46:43
|
On Thursday 08 June 2006 20:15, Eric Li wrote: > > On Wednesday 07 June 2006 07:35, Eric Li wrote: > >> What's the "dispatcher" at the end of VexTranslateArgs in 3.2.0 (but > >> not 3.1.1) for? And what should go there in my case? I've read the > >> comments but I'm still not clear on it. > > > > More or less irrelevant from an analysis point of view. It's where > > execution (on the host) jumps to after the generated code is run, so that > > the next block can be found/run. I'd set it to NULL. You won't see it > > in the IR anyway. > > There are asserts in LibVEX_Translate (specifically the switch stmt where > the arch is X86 or AMD64) that check that "dispatch" is not NULL. Can I > just create a dummy function for it instead? Yes. J |
|
From: Eric L. <ew...@an...> - 2006-06-08 20:44:10
|
In bb_to_IR and in LibVEX_Init, there are checks that the guest_max_insns is between 1 and 100 inclusive. Is there a reason that the max instructions translated cannot exceed 100? How can I translate a BB that has more than 100 instructions? Thanks, Eric |