|
From: Emre C. S. <ec...@nc...> - 2005-10-25 19:44:34
|
Hi,
I am looking at the instrumentation in Dullard and I can't understand the
reason why the MOV instruction is added. Here is an example from
instrumentation:
switch (u_in->opcode) {
// For memory-ref instrs, copy the data_addr into a temporary to be
// passed to the mem_* helper at the end of the instruction.
case LOAD:
case SSE3ag_MemRd_RegWr:
t_read = u_in->val1;
t_read_addr = newTemp(cb);
uInstr2(cb, MOV, 4, TempReg, u_in->val1, TempReg, t_read_addr);
data_size = u_in->size;
VG_(copy_UInstr)(cb, u_in);
break;
When the INCEIP opcode is seen, the instrumentation calls another
functions which finally calls the helper functions to translate the
tempReg values to real values and pretty print the memory addresses.
What I don't understand is why the tool adds the MOV UInstr there. I'm
wondering if this is a simple design choice to make things look good, or
is there a good reason for it?
Eventually, I am trying to capture updates to the %EBP and I'm trying to
extend Dullard for that. But I'm calling the helper function as soon as I
see the opcode. Here is what it looks like:
case PUT:
if (u_in->tag==1 && u_in->val1==5) { // destination is %EBP
VG_(ccall_R_0)(cb, (Addr) set_ebp, u_in->val1, 1);
}
VG_(copy_UInstr)(cb, u_in);
break;
So will this capture all %EBP updates? And is there anything wrong with
it? It seems to be working but I haven't had the chance to try it on large
scale programs yet and it is unlikely that I will for some time.
Even if what I'm doing seems right, I still would like to know the reason
for the addition of the MOV UInstr.
Thanks,
John
|
|
From: Nicholas N. <nj...@cs...> - 2005-10-25 21:18:04
|
On Tue, 25 Oct 2005, Emre Can Sezer wrote:
> I am looking at the instrumentation in Dullard and I can't understand the
> reason why the MOV instruction is added. Here is an example from
> instrumentation:
>
> switch (u_in->opcode) {
>
> // For memory-ref instrs, copy the data_addr into a temporary to be
> // passed to the mem_* helper at the end of the instruction.
> case LOAD:
> case SSE3ag_MemRd_RegWr:
> t_read = u_in->val1;
> t_read_addr = newTemp(cb);
> uInstr2(cb, MOV, 4, TempReg, u_in->val1, TempReg, t_read_addr);
> data_size = u_in->size;
> VG_(copy_UInstr)(cb, u_in);
> break;
>
> When the INCEIP opcode is seen, the instrumentation calls another
> functions which finally calls the helper functions to translate the
> tempReg values to real values and pretty print the memory addresses.
>
> What I don't understand is why the tool adds the MOV UInstr there. I'm
> wondering if this is a simple design choice to make things look good, or
> is there a good reason for it?
It's for safety. You could skip it and just use u_in->val1 directly as
the argument to the eventual CCALL... so long as you are confident that
u_in->val1 doesn't change between the LOAD and the CCALL. I think in
practice it doesn't, but saving the value in t_read_addr with the MOV
guarantees this problem cannot occur.
And why is the CCALL delayed? It's because Cachegrind has to handle
instructions that modify a memory location, ie. do a read followed by a
write. These occur in UCode as a LOAD/STORE pair, if you want to detect
modify instructions you have to look for these pairs.
> Eventually, I am trying to capture updates to the %EBP and I'm trying to
> extend Dullard for that. But I'm calling the helper function as soon as I
> see the opcode. Here is what it looks like:
>
> case PUT:
> if (u_in->tag==1 && u_in->val1==5) { // destination is %EBP
> VG_(ccall_R_0)(cb, (Addr) set_ebp, u_in->val1, 1);
> }
> VG_(copy_UInstr)(cb, u_in);
> break;
>
> So will this capture all %EBP updates?
It should work, assuming the magic numbers '1' and '5' are right -- why
not use the defined constants? It's safer and easier to read.
Nick
|
|
From: Emre C. S. <ec...@nc...> - 2005-10-27 01:52:15
|
> On Tue, 25 Oct 2005, Emre Can Sezer wrote:
>
>> I am looking at the instrumentation in Dullard and I can't understand
>> the
>> reason why the MOV instruction is added. Here is an example from
>> instrumentation:
>>
>> switch (u_in->opcode) {
>>
>> // For memory-ref instrs, copy the data_addr into a temporary to
>> be
>> // passed to the mem_* helper at the end of the instruction.
>> case LOAD:
>> case SSE3ag_MemRd_RegWr:
>> t_read = u_in->val1;
>> t_read_addr = newTemp(cb);
>> uInstr2(cb, MOV, 4, TempReg, u_in->val1, TempReg,
>> t_read_addr);
>> data_size = u_in->size;
>> VG_(copy_UInstr)(cb, u_in);
>> break;
>>
>> When the INCEIP opcode is seen, the instrumentation calls another
>> functions which finally calls the helper functions to translate the
>> tempReg values to real values and pretty print the memory addresses.
>>
>> What I don't understand is why the tool adds the MOV UInstr there. I'm
>> wondering if this is a simple design choice to make things look good, or
>> is there a good reason for it?
>
> It's for safety. You could skip it and just use u_in->val1 directly as
> the argument to the eventual CCALL... so long as you are confident that
> u_in->val1 doesn't change between the LOAD and the CCALL. I think in
> practice it doesn't, but saving the value in t_read_addr with the MOV
> guarantees this problem cannot occur.
>
> And why is the CCALL delayed? It's because Cachegrind has to handle
> instructions that modify a memory location, ie. do a read followed by a
> write. These occur in UCode as a LOAD/STORE pair, if you want to detect
> modify instructions you have to look for these pairs.
>
>> Eventually, I am trying to capture updates to the %EBP and I'm trying to
>> extend Dullard for that. But I'm calling the helper function as soon as
>> I
>> see the opcode. Here is what it looks like:
>>
>> case PUT:
>> if (u_in->tag==1 && u_in->val1==5) { // destination is %EBP
>> VG_(ccall_R_0)(cb, (Addr) set_ebp, u_in->val1, 1);
>> }
>> VG_(copy_UInstr)(cb, u_in);
>> break;
>>
>> So will this capture all %EBP updates?
>
> It should work, assuming the magic numbers '1' and '5' are right -- why
> not use the defined constants? It's safer and easier to read.
>
:) I don't have any good reasons for that.. I guess I got caught up witht
the details and forsake good coding.
While I'm at it I have another question. It might be a little out of
context but you guys are my only source when it comes to low level stuff.
I believe that I can capture funciton calls and retuns and also %EBP
updates. And I can determine the offset of local variables by looking at
the objdump of the binary. However, between the %EBP and the beginning of
the local variables, there are some saved registers. How can I determine
the size of this region so I can get the actual offsets of variables with
respect to %EBP?
And I need to learn this without running the code. I guess the information
should be available at compile time since debugging tools resolve memory
locations to local variable names. I tried readelf and objdump but can't
seem to find the info there.
Thanks a bunch,
John
> Nick
>
>
|
|
From: Josef W. <Jos...@gm...> - 2005-10-27 13:39:26
|
On Thursday 27 October 2005 03:52, Emre Can Sezer wrote: > While I'm at it I have another question. It might be a little out of > context but you guys are my only source when it comes to low level stuff. > > I believe that I can capture funciton calls and retuns and also %EBP > updates. What do you want to do with %EBP updates? It is *not* required for the compiler to always use %EBP as stackframe pointer, e.g. for static C functions, or when compiled with -fomit-framepointer. In these cases, the compiler is free to use %EBP for whatever he wants. You would need to distinuish the different use cases (e.g. check that %EBP is "near" %ESP). > And I can determine the offset of local variables by looking at > the objdump of the binary. However, between the %EBP and the beginning of > the local variables, there are some saved registers. How can I determine > the size of this region so I can get the actual offsets of variables with > respect to %EBP? Should be stored in debug info. Google for "dwarfdump". This command can print out any information which is stored in DWARF debug info in human readable form (dwarf is e.g. produced by gcc3/4 per default). This also should include how to get the value of variables. Josef |
|
From: Emre C. S. <ec...@nc...> - 2005-10-27 14:12:36
|
> On Thursday 27 October 2005 03:52, Emre Can Sezer wrote: >> While I'm at it I have another question. It might be a little out of >> context but you guys are my only source when it comes to low level >> stuff. >> >> I believe that I can capture funciton calls and retuns and also %EBP >> updates. > > What do you want to do with %EBP updates? > It is *not* required for the compiler to always use %EBP as > stackframe pointer, e.g. for static C functions, or when compiled > with -fomit-framepointer. In these cases, the compiler is free to > use %EBP for whatever he wants. You would need to distinuish the > different use cases (e.g. check that %EBP is "near" %ESP). I am currently working on implementing a new memory level IDS. Eventually, all I want is to determine if a memory address belongs to a variable. For local variables I can only get offset information so I have to know where the function's activation record is and %EBP seemed to be it. If you have any other ideas that would help me figure out which local variable a given memory address belongs to, I would welcome it. As for testing if %EBP is near %ESP, how near is near? In the case where %EBP doesn't point to the stack frame, what is it used for? Is it used as a general purpose register? > >> And I can determine the offset of local variables by looking at >> the objdump of the binary. However, between the %EBP and the beginning >> of >> the local variables, there are some saved registers. How can I determine >> the size of this region so I can get the actual offsets of variables >> with >> respect to %EBP? > > Should be stored in debug info. Google for "dwarfdump". This command > can print out any information which is stored in DWARF debug info in > human readable form (dwarf is e.g. produced by gcc3/4 per default). > This also should include how to get the value of variables. > > Josef > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. > Get Certified Today * Register for a JBoss Training Course > Free Certification Exam for All Training Attendees Through End of 2005 > Visit http://www.jboss.com/services/certification for more information > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Josef W. <Jos...@gm...> - 2005-10-27 14:43:12
|
On Thursday 27 October 2005 16:12, Emre Can Sezer wrote: > I am currently working on implementing a new memory level IDS. Eventually, > all I want is to determine if a memory address belongs to a variable. For > local variables I can only get offset information so I have to know where > the function's activation record is and %EBP seemed to be it. If you have > any other ideas that would help me figure out which local variable a given > memory address belongs to, I would welcome it. As I said: in the debug info. Look into the DWARF specification. But in the debug info you actually get the other way round: from variable name to address. That is all a debugger needs. DWARF says something like this: "For variable foo in function bar, which is of type int, look at [%EBP+8]". The formula can get quite complex, and you need the inverse to map from address to variable name. Unfortunately, AFAIK the DWARF reader for variable info reader is not existing yet. But perhaps offline analysis with dwarfdump is enough for you? > As for testing if %EBP is near %ESP, how near is near? In the case where > %EBP doesn't point to the stack frame, what is it used for? Is it used as > a general purpose register? Yes. Less register pressure. You can store %ESP at the time of the function call, and check if %EBP is among the stored and current value of %ESP. Note that you have to track thread (i.e. stack) switches for multithreaded code. Josef |
|
From: Emre C. S. <ec...@nc...> - 2005-10-27 15:46:20
|
> As I said: in the debug info. Look into the DWARF specification.
> But in the debug info you actually get the other way round: from variable
> name
> to address. That is all a debugger needs. DWARF says something like this:
> "For variable foo in function bar, which is of type int, look at
> [%EBP+8]".
> The formula can get quite complex, and you need the inverse to map from
> address to variable name.
>
> Unfortunately, AFAIK the DWARF reader for variable info reader is
> not existing yet.
> But perhaps offline analysis with dwarfdump is enough for you?
>
I only need to know the offset from %EBP. Static analysis ought to be enough.
I objdumped my code and looked into the instructions. I can see that the
offset I read from STABS output is directly used as an offset from %EBP.
So does this mean I'm not getting the EBP right? Here is the dissasembled
code and the stabs output for the variables..
SATBS output from objdump:
Symnum n_type n_othr n_desc n_value n_strx String
29 LSYM 0 6 fffffff4 937 a:(0,1)
30 LSYM 0 7 ffffffd8 945
buffer:(1,1)=ar(1,2)=r(1,2);0000000000000;0037777777777;;0;9;(0,2)
********************************************
*** Disassembled code for funciton func: ***
********************************************
...
08048390 <func>:
static char buf[10];
static int isAdmin;
int func (void)
{
8048390: 55 push %ebp
8048391: 89 e5 mov %esp,%ebp
8048393: 83 ec 28 sub $0x28,%esp
int a;
char buffer[10];
a = 10;
8048396: c7 45 f4 0a 00 00 00 movl $0xa,0xfffffff4(%ebp)
...
***********************************************
As you can see the offset being used is the same as that given from stabs
output. When I run the code and apply this offset to the EBP that I
recorded I get a difference of 0x10 between the address I find and this
value. So I'm inclined to think that I am not doing a good job in
capturing the EBP.
Here is the original simple stupid code I wrote which has a buffer
overflow vulnerability as well :)
***************
*** main1.c ***
***************
static char buf[10];
static int isAdmin;
int func (void)
{
int a;
char buffer[10];
a = 10;
printf ("In func %lx, %lx\n", &a, buffer);
}
int main (int argc, char* argv[])
{
isAdmin = 0;
scanf("%s", buf);
if (!strncmp(buf,"admin",5))
{
isAdmin = 1;
} else {
isAdmin = 0;
}
scanf("%s", buf);
if (isAdmin)
{
printf ("ADMIN\t: %s\n", buf);
} else {
printf ("NOT ADMIN\t: %s\n", buf);
}
func();
isAdmin = 10;
return 0;
}
*******************************
*******************************
Any thoughts? If you aggree with me on the fact that I might be getting
wrong EBP values here is how I do it. I'm guessing that the EBP is updated
BEFORE the jump to the function call. So in the instrumentation I capture
PUT UInstr's and see if the destination is EBP. The relevant code segments
are below:
static __attribute__ ((regparm (1)))
void set_ebp (Int addr)
{
VG_(printf)("%%EBP set to %p\n", addr);
CEBP = addr;
return;
}
UCodeBlock* SK_(instrument)(UCodeBlock* cb_in, Addr orig_addr)
{
...
cb = VG_(setup_UCodeBlock)(cb_in);
for (i = 0; i < VG_(get_num_instrs)(cb_in); i++) {
u_in = VG_(get_instr)(cb_in, i);
if (instrumented_Jcc) sk_assert(u_in->opcode == JMP);
switch (u_in->opcode) {
case PUT:
// %EBP is 5
if (u_in->tag2== 1 && u_in->val2 == 5)
{
VG_(ccall_R_0)(cb, (Addr)set_ebp, u_in->val1, 1);
}
VG_(copy_UInstr)(cb, u_in);
break;
...
}
...
}
|
|
From: Tom H. <to...@co...> - 2005-10-27 16:07:01
|
In message <235...@we...>
Emre Can Sezer <ec...@nc...> wrote:
> Any thoughts? If you aggree with me on the fact that I might be getting
> wrong EBP values here is how I do it. I'm guessing that the EBP is updated
> BEFORE the jump to the function call. So in the instrumentation I capture
> PUT UInstr's and see if the destination is EBP. The relevant code segments
> are below:
I'd stop guessing and start reading up on the x86 ABI properly...
If you have frame pointers then EBP will be updated by the callee
after the function call, not by the caller beforehand. The typical
prologue for a function with a frame pointer is:
pushl %ebp
movl %esp, %ebp
So it pushes the old frame pointer on the stack and then establishes
the new frame pointer for this function by copying the stack pointer.
Of course if you don't have frame pointers (the default on x86-64) then
the variable references in the debug data will typically ESP relative
unless the variable is in a register.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Emre C. S. <ec...@nc...> - 2005-10-27 17:53:11
|
> In message <235...@we...> > Emre Can Sezer <ec...@nc...> wrote: > >> Any thoughts? If you aggree with me on the fact that I might be getting >> wrong EBP values here is how I do it. I'm guessing that the EBP is >> updated >> BEFORE the jump to the function call. So in the instrumentation I >> capture >> PUT UInstr's and see if the destination is EBP. The relevant code >> segments >> are below: > > I'd stop guessing and start reading up on the x86 ABI properly... > > If you have frame pointers then EBP will be updated by the callee > after the function call, not by the caller beforehand. The typical > prologue for a function with a frame pointer is: > > pushl %ebp > movl %esp, %ebp > > So it pushes the old frame pointer on the stack and then establishes > the new frame pointer for this function by copying the stack pointer. > This helped. It works now. Thank you. > Of course if you don't have frame pointers (the default on x86-64) then > the variable references in the debug data will typically ESP relative > unless the variable is in a register. > > Tom > > -- > Tom Hughes (to...@co...) > http://www.compton.nu/ > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. > Get Certified Today * Register for a JBoss Training Course > Free Certification Exam for All Training Attendees Through End of 2005 > Visit http://www.jboss.com/services/certification for more information > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |