|
From: Emre C. S. <ec...@nc...> - 2005-07-28 14:39:13
|
I am trying to write a tool that will capture memory writes and print out the address that is being written to and the EIP. However, the address I'm looking for is a memory address so I can see what variable is being updated. My main problem I'll admit is my lack of knowledge in the internal workings of both valgrind and a real CPU. So I've written a lot of things which I think are happening, and almost certainly basic for most of you, to give you an idea of where I'm at and how much I know or don't. I tried tracking down memory writes by the function provided in tool.h (VG_(pre_mem_write)) and haven't gotten what I wanted. At this point I'm not even sure if this captures all memory writes or just those before a signal. I think due to optimization, the value of the variable is mainly kept in registers and almost never written to memory. Therefore I thought that I needed to keep track of every instruction that modifies registers. Then reading thru the forums I came across Dullard. Looking thru the code now got me puzzled however. As far as I can understand, each x86 instruction is translated to generally more than one UInstr. The list of UInstr's end with one that has opcode INCEIP in which case the "end_of_x86_..." instruction is called from the instrumentation code, which checks to see if the original x86 instr was a read or write and so forth and calls the helper functions. The address that is passed to this function is the result of a call to newTemp(), which returns a temporary register (mainly an int). At this point, SK_INSTRUMENT() calls VG_(ccall_LR_0) () with two integers as arguments. I have printed out these arguments right before the function call and in function, and they are different. So my guess is that these are offset values passed into VG_(ccall_LR_0) which are retrieved from the VG_(baseBlock) and passed onto the function actually being called. Is this correct? Then the addresses I'm reading in functions like "mem_read" print out memory locations? Another problem I think I'm facing is the lazy EIP update. If EIP is only updated when necessary how can I get an accurate EIP? The paper says that it will not be allowed to fall back more than 4 bytes. Does this mean that it is at most a single 32-bit instruction behind? I hope I've been able to give you enough information about what I'm trying to do and how much I know or don't for that matter. I would really appreciate some help and ideas. Thanks |
|
From: Nicholas N. <nj...@cs...> - 2005-07-28 15:31:26
|
On Thu, 28 Jul 2005, Emre Can Sezer wrote: > I am trying to write a tool that will capture memory writes and print out > the address that is being written to and the EIP. However, the address I'm > looking for is a memory address so I can see what variable is being > updated. Note that mapping addresses to variables is not easy. Valgrind has some support for it, but perhaps it would be easier to use source-level instrumentation rather than binary instrumentation? > My main problem I'll admit is my lack of knowledge in the internal > workings of both valgrind and a real CPU. So I've written a lot of things > which I think are happening, and almost certainly basic for most of you, > to give you an idea of where I'm at and how much I know or don't. > > I tried tracking down memory writes by the function provided in tool.h > (VG_(pre_mem_write)) and haven't gotten what I wanted. At this point I'm > not even sure if this captures all memory writes or just those before a > signal. I think due to optimization, the value of the variable is mainly > kept in registers and almost never written to memory. Therefore I thought > that I needed to keep track of every instruction that modifies registers. The pre_mem_read, pre_mem_write and post_mem_write events cover only a fraction of the memory accesses -- the ones that are not directly visible from the instruction stream. > Then reading thru the forums I came across Dullard. Looking thru the code > now got me puzzled however. As far as I can understand, each x86 > instruction is translated to generally more than one UInstr. The list of > UInstr's end with one that has opcode INCEIP in which case the > "end_of_x86_..." instruction is called from the instrumentation code, > which checks to see if the original x86 instr was a read or write and so > forth and calls the helper functions. The address that is passed to this > function is the result of a call to newTemp(), which returns a temporary > register (mainly an int). > > At this point, SK_INSTRUMENT() calls VG_(ccall_LR_0) () with two integers > as arguments. I have printed out these arguments right before the function > call and in function, and they are different. So my guess is that these > are offset values passed into VG_(ccall_LR_0) which are retrieved from the > VG_(baseBlock) and passed onto the function actually being called. Is this > correct? Then the addresses I'm reading in functions like "mem_read" print > out memory locations? Be aware that VG_(ccall_LR_0)() is adding instrumentation at instrumentation-time, and the arguments involved are not the actual addresses that are seen at run-time -- the distinction can be subtle. Dullard is definitely the best place to start, if you are having trouble understanding it then you should read more about how Valgrind works (see below). > Another problem I think I'm facing is the lazy EIP update. If EIP is only > updated when necessary how can I get an accurate EIP? The paper says that > it will not be allowed to fall back more than 4 bytes. Does this mean that > it is at most a single 32-bit instruction behind? Note that x86 instructions can be anywhere from 1 byte to 15(?) bytes long. I think the EIP updating doesn't fall behind, otherwise Cachegrind doesn't work, but I can't remember the details. You mentioned "the paper", but you didn't say which one it was. If you are reading the technical docs that come with Valgrind note that they are very out of date in places. You should read chapter 2 of http://www.valgrind.org/docs/phd2004.pdf to understand how Valgrind's instrumentation works, particularly sections 2.3.5 and 2.4.1--2.4.3, but preferably the whole chapter. Hopefully reading this and looking again at Dullard will clarify things. N |