|
From: Julian S. <js...@ac...> - 2006-06-09 10:17:24
|
On Thursday 08 June 2006 21:43, Eric Li wrote: > In bb_to_IR and in LibVEX_Init, there are checks that the guest_max_insns > is between 1 and 100 inclusive. Is there a reason that the max instructions > translated cannot exceed 100? Yes, two reasons. Limiting bbs to that length has negligible performance impact when running V since almost all are shorter anyway. The benefits are (1) it bounds the amount of intermediate storage (grep for N_TEMPORARY_BYTES) to translate any given bb, and (2) (somewhat secondarily) it limits worst-case translation time costs should an O(N^2) or worse algorithm accidentally exist in the compilation pipeline, particularly in iropt.c. > How can I translate a BB that has more than > 100 instructions? Chop up any bbs longer than 100 insns into pieces. J |
|
From: Julian S. <js...@ac...> - 2006-06-09 10:19:13
|
On Wednesday 07 June 2006 19:10, Eric Li wrote: > > I guess one kludge is: if you want to vexify a block which you know > > contains N instructions, you can initialise vex and set > > VexControl.guest_max_insns to N (along with setting .guest_chase_thresh > > to zero). Then it should stop after N insns even if it thinks it could > > go further. > > If I want to Vexity an array of BB's, each with a diff number of > instructions, in a loop, do I have to call LibVEX_Init every iteration > after setting the guest_max_insns of the VexControl argument for the > current BB? Yes. > Or can I just modify the VexControl struct only and VEX would > pick it up? No. > Are there any undesired side effects to calling LibVEX_Init > multiple times? No. J |
|
From: Eric L. <ew...@an...> - 2006-06-09 14:52:37
|
>> If I want to Vexity an array of BB's, each with a diff number of >> instructions, in a loop, do I have to call LibVEX_Init every iteration >> after setting the guest_max_insns of the VexControl argument for the >> current BB? > > Yes. I tried this but LibVEX_Init has it's own internal static variable to keep track of the fact that it's been called so that it would fail the assert if I called it again. Is there an unInit of some kind that undoes this? >> Or can I just modify the VexControl struct only and VEX would pick it >> up? > > No. > >> Are there any undesired side effects to calling LibVEX_Init multiple >> times? > > No. > > J > > |
|
From: Eric L. <ew...@an...> - 2006-06-12 20:41:39
|
Are there replacements for the --optimise=no --instrument=no --single-step=yes --cleanup=no flags in the 3.x.x versions (which are documented online as ways of viewing/debugging the IR)? When I tried them, it said those were bad options. Thanks, Eric > One other thing -- if it hasn't been pointed out already -- is to make > friends with the Valgrind flag combination > > --tool=none --trace-flags=10000000 --trace-notbelow=0 > > (also --trace-flags=10001000). I always find that seeing the IR printed > out nicely makes it much easier to understand what's going on. > > J > > |
|
From: Nicholas N. <nj...@cs...> - 2006-06-12 23:24:01
|
On Mon, 12 Jun 2006, Eric Li wrote:
> Are there replacements for the
> --optimise=no
> --instrument=no
> --single-step=yes
> --cleanup=no
> flags in the 3.x.x versions (which are documented online as ways of
> viewing/debugging the IR)? When I tried them, it said those were bad
> options.
Look at the --vex-* options in the output of --help-debug:
--vex-iropt-verbosity 0 .. 9 [0]
--vex-iropt-level 0 .. 2 [2]
--vex-iropt-precise-memory-exns [no]
--vex-iropt-unroll-thresh 0 .. 400 [120]
--vex-guest-max-insns 1 .. 100 [50]
--vex-guest-chase-thresh 0 .. 99 [10]
They're not documented in the manual, but hopefully you can work them out.
Using --vex-guest-max-insns=1 is like using the old --single-step, I think.
Nick
|
|
From: Eric L. <ew...@an...> - 2006-06-13 18:48:31
|
Sorry about dragging this thread out, i'm just having some difficulty with this. All the help and advice is very much appreciated, though! Two things: 1. If I write a simple tool, and just get the IR that way from inside the instrumentation functions, is there a way to get all BB's of the target executable translated to IR? As far as I can see, only the BB's on the path of execution got translated. 2. I'm passing BB's to LibVEX_Translate with the orig_addr argument set to point to the first instruction in the BB but the translation that comes out does not match the translation from just running valgrind --tool=none --trace-flags=10000000 --trace-notbelow=0. And it's not a pre/post IR optimization issue because I also compared against all the other --trace-flags. Any suggestions as to what could be the cause? Is there a particular format the BB's have to be in for VEX, e.g. have some kinda header? Thanks, Eric |
|
From: Nicholas N. <nj...@cs...> - 2006-06-13 23:36:23
|
On Tue, 13 Jun 2006, Eric Li wrote: > Two things: > > 1. If I write a simple tool, and just get the IR that way from inside the > instrumentation functions, is there a way to get all BB's of the target > executable translated to IR? Not really. > As far as I can see, only the BB's on the > path of execution got translated. Yes. > 2. I'm passing BB's to LibVEX_Translate with the orig_addr argument set to > point to the first instruction in the BB but the translation that comes > out does not match the translation from just running valgrind --tool=none > --trace-flags=10000000 --trace-notbelow=0. And it's not a pre/post IR > optimization issue because I also compared against all the other > --trace-flags. I don't quite understand what you're saying, can you give a more detailed example, eg. with --trace-flags output? Are you comparing --tool=none against a tool you've written? But note that memory can be laid out differently for different Valgrind tools. > Any suggestions as to what could be the cause? Is there a > particular format the BB's have to be in for VEX, e.g. have some kinda > header? Not that I know of. N |
|
From: Julian S. <js...@ac...> - 2006-06-14 11:04:59
|
> > 2. I'm passing BB's to LibVEX_Translate with the orig_addr argument set > > to point to the first instruction in the BB You need to be clear about the meaning of "point to the first instruction in the BB". (Didn't we discuss this before?) Vex is set up so that a bb to be translated is characterised by two addresses: the address which they happen to reside in the host's memory (VexTranslateArgs.guest_bytes) and the address which they are claimed to come from in the guest (simulated) machine's memory (VexTranslateArgs.guest_bytes_addr). > > but the translation that > > comes out does not match the translation from just running valgrind > > --tool=none --trace-flags=10000000 --trace-notbelow=0. And it's not a > > pre/post IR optimization issue because I also compared against all the > > other --trace-flags. As Nick says, you need to send some examples of what you put in, what you got out and how that differs from what you expected to see. Without that it's more or less impossible for us to diagnose. > > Is there a > > particular format the BB's have to be in for VEX, e.g. have some kinda > > header? No. J |
|
From: Julian S. <js...@ac...> - 2006-06-14 12:20:16
|
On Friday 09 June 2006 15:52, Eric Li wrote: > >> If I want to Vexity an array of BB's, each with a diff number of > >> instructions, in a loop, do I have to call LibVEX_Init every iteration > >> after setting the guest_max_insns of the VexControl argument for the > >> current BB? > > > > Yes. > > I tried this but LibVEX_Init has it's own internal static variable to keep > track of the fact that it's been called so that it would fail the assert if > I called it again. Oh well, just get rid of the assert then. Just me being over paranoid. The main thing is you really do need to redo LibVEX_Init for each translation, because you're wanting to change some of the translation parameters for each translation. J |
|
From: Eric L. <ew...@an...> - 2006-06-14 19:44:26
|
It turns out there was a problem in the module that parsed the binary into BB's. So I was finally able to get VEX to churn out the correct IR! Thank you guys for the help! Sorry about the vagueness of my question before. I have another question about PUT/GET in VEX. In the outdated documentation, it says PUT/GET is for moving values between CPU registers and Temp registers. In the VEX IR, PUT/GET uses offsets, which implies that it's addressed like memory, so then where do all the registers (eax, ebx, etc.) go? I looked at the IR for some common instructions and noticed that eax seems to be at PUT(0), is this always true? If so, is there a mapping of which registers go at which offsets somewhere? Thanks, Eric >>> 2. I'm passing BB's to LibVEX_Translate with the orig_addr argument >>> set to point to the first instruction in the BB > > You need to be clear about the meaning of "point to the first instruction > in the BB". (Didn't we discuss this before?) > > Vex is set up so that a bb to be translated is characterised by two > addresses: the address which they happen to reside in the host's memory > (VexTranslateArgs.guest_bytes) and the address which they are claimed to > come from in the guest (simulated) machine's memory > (VexTranslateArgs.guest_bytes_addr). > >>> but the translation that comes out does not match the translation from >>> just running valgrind --tool=none --trace-flags=10000000 >>> --trace-notbelow=0. And it's not a pre/post IR optimization issue >>> because I also compared against all the other --trace-flags. > > As Nick says, you need to send some examples of what you put in, what you > got out and how that differs from what you expected to see. Without that > it's more or less impossible for us to diagnose. > >>> Is there a particular format the BB's have to be in for VEX, e.g. have >>> some kinda header? > > No. > > J > > |
|
From: Eric L. <ew...@an...> - 2006-06-14 21:30:38
|
Right now, I pass my own instrumentation function (call it instr()) into LibVEX_Translate, and then inside instr(), I save the IRBB that was passed in to a global and that's how I obtain a handle to the IR for later use, i.e.
IRBB *global;
void instr(IRBB *irbb)
{
global = irbb;
}
void translate(bb)
{
LibVEX_Translate(bb, instr);
// Do something with global here...
}
But the inline comments for LibVEX_Alloc(where the mem for the above IRBB came from) say that the temporary storage this grants will only stay alive until translation of the current BB is complete. This implies that I can't just save irbb, and that I need to make a copy of it. I saw that there is a deep copy constructor, dopyIRBB, but the memory for that also comes from LibVEX_Alloc. So my question is is the only way to get back a usable copy of the irbb to write my own deep copy constructor that doesn't rely on LibVEX_Alloc?
Thanks again,
Eric
P.S. If I could buy you guys lunch I would.
|
|
From: Nicholas N. <nj...@cs...> - 2006-06-14 23:21:21
|
On Wed, 14 Jun 2006, Eric Li wrote: > I have another question about PUT/GET in VEX. In the outdated > documentation, it says PUT/GET is for moving values between CPU registers > and Temp registers. In the VEX IR, PUT/GET uses offsets, which implies > that it's addressed like memory, so then where do all the registers (eax, > ebx, etc.) go? The guest state (registers) is stored by default in a block of memory. Each register value gets pulled into real machine registers in order to be used, and if it is changed it then gets written back to the memory block before the end of the BB. > I looked at the IR for some common instructions and noticed that eax seems > to be at PUT(0), is this always true? If so, is there a mapping of which > registers go at which offsets somewhere? Yes. Look at VEX/libvex_guest_*.h, it has the mapping for each architecture. And VEX/libvex_guest_offsets.h is auto-generated by auxprogs/genoffsets.c, I think it just gives a handy name for the integer registers on each platform, which can be useful. Nick |
|
From: Julian S. <js...@ac...> - 2006-06-15 10:41:07
|
On Thursday 15 June 2006 00:21, Nicholas Nethercote wrote: > On Wed, 14 Jun 2006, Eric Li wrote: > > I have another question about PUT/GET in VEX. In the outdated > > documentation, it says PUT/GET is for moving values between CPU registers > > and Temp registers. In the VEX IR, PUT/GET uses offsets, which implies > > that it's addressed like memory, so then where do all the registers (eax, > > ebx, etc.) go? > > The guest state (registers) is stored by default in a block of memory. > Each register value gets pulled into real machine registers in order to be > used, and if it is changed it then gets written back to the memory block > before the end of the BB. > > > I looked at the IR for some common instructions and noticed that eax > > seems to be at PUT(0), is this always true? If so, is there a mapping of > > which registers go at which offsets somewhere? > > Yes. Look at VEX/libvex_guest_*.h, it has the mapping for each > architecture. Grepping for OFFB_ in priv/guest-<whatever>/toIR.c should make this clearer. In the x86 guest case, eax just happens to be at offset zero because guest_EAX is the first field in the struct. However the front ends are written in such a way that you can put the fields in any order and still get correct IR. J |
|
From: Nicholas N. <nj...@cs...> - 2006-06-14 23:23:52
|
On Wed, 14 Jun 2006, Eric Li wrote: > But the inline comments for LibVEX_Alloc(where the mem for the above IRBB > came from) say that the temporary storage this grants will only stay alive > until translation of the current BB is complete. This implies that I can't > just save irbb, and that I need to make a copy of it. I saw that there is > a deep copy constructor, dopyIRBB, but the memory for that also comes from > LibVEX_Alloc. So my question is is the only way to get back a usable copy > of the irbb to write my own deep copy constructor that doesn't rely on > LibVEX_Alloc? Hmm, as far as I know. Julian might know otherwise. Nick |
|
From: Julian S. <js...@ac...> - 2006-06-15 10:54:04
|
On Thursday 15 June 2006 00:23, Nicholas Nethercote wrote:
> On Wed, 14 Jun 2006, Eric Li wrote:
> > But the inline comments for LibVEX_Alloc(where the mem for the above IRBB
> > came from) say that the temporary storage this grants will only stay
> > alive until translation of the current BB is complete. This implies that
> > I can't just save irbb, and that I need to make a copy of it. I saw that
> > there is a deep copy constructor, dopyIRBB, but the memory for that also
> > comes from LibVEX_Alloc. So my question is is the only way to get back a
> > usable copy of the irbb to write my own deep copy constructor that
> > doesn't rely on LibVEX_Alloc?
>
> Hmm, as far as I know. Julian might know otherwise.
Your analysis is correct.
You are of course free to replace LibVEX_Alloc with a different
allocation function more suited to your purposes, and this seems
a lot less hassle than rewriting all the constructors. One possibility
is to make it able to allocate in one of a number of different
blocks, by consulting some global variable which you add. Then
what you can do is, first make the IR translation into some
temporary block. Then switch blocks and do a deep copy, which
gets you the copy into a block you can keep hold of permanently.
The problem with just using malloc is that there are no
corresponding free-the-IR functions, and the JIT allocates
very fast, so you soon wind up running out of memory. Really
it's based around the idea of allocating fast from a big block,
then throwing the whole block away when dead ("arena allocation")
which is nicely explained in the Fraser and Hanson book about
retargetable compilers, and also in Hanson's C-I-I book
(http://www.cs.princeton.edu/software/cii)
J
|
|
From: Eric L. <ew...@an...> - 2006-07-12 21:00:06
|
Hey guys,
I have some questions about the IR I'd appreciate anyone's input on.
1. How does VEX handle exceptions like #DE(divide error) on div instructions when the quotient is too large for the destination operand?
2. Where do you set eflags for certain instructions like add? This instruction:
add $0x4,%eax
Gets translated into:
IRBB {
t0:I32 t1:I32 t2:I32
---IMark(...)---
t2 GET:I32(0)
t1 = 0x4:I32
t0 = Add32(t2,t1)
PUT(32) = 0x3:I32
PUT(36) = t2
PUT(40) = t1
PUT(44) = 0x0:I32
PUT(0) = t0
goto {Boring} ...
}
which doesn't have any assignments to any part of eflags?
3. How does the 4-word thunk mechanism involving CC_OP, CC_DEP1, CC_DEP2, and CC_NDEP work?
4. For some of these flag setting instructions (like inc) VEX reads the above 4 CC_ registers then calls a helper like x86g_calculate_eflags_c with their values passed in in order to set eflags. Why doesn't it just emit IR that sets those flags instead?
e.g. t2 = Add32(t0, t1)
PUT(ZF) = CmpEQ32(t2, 0)
Thanks,
Eric
|
|
From: Julian S. <js...@ac...> - 2006-07-12 22:11:12
|
> 1. How does VEX handle exceptions like #DE(divide error) on div
> instructions when the quotient is too large for the destination operand?
It doesn't. Basically anything in vex to do with exceptions is a kludge
of one kind or another. If I had it to do over it could be done cleaner.
Right now the divide insn is simply regenerated from IR by the back end;
if a divide by zero happens, a host signal will hit the generated code,
and who knows what will happen then. A better solution would be to generate
IR which tests the denominator for zero and does an IR side exit from the
BB, carrying a flag which tells the scheduler that the insn couldn't be
executed - in the same style that side exits are done if a segment override
prefix causes an addressing problem. Look for Ijk_MapFail in
guest-x86/toIR.c.
Even then, the IR representation of division is pretty half-arsed.
For one thing, on ppc, divide by zero doesn't give an exn, it just
produces an undefined result. On x86 you get an exn. So that would
have to be taken into account.
> 2. Where do you set eflags for certain instructions like add? This
> instruction: add $0x4,%eax
> Gets translated into:
> IRBB {
> t0:I32 t1:I32 t2:I32
>
> ---IMark(...)---
> t2 GET:I32(0)
> t1 = 0x4:I32
> t0 = Add32(t2,t1)
> PUT(32) = 0x3:I32
> PUT(36) = t2
> PUT(40) = t1
> PUT(44) = 0x0:I32
> PUT(0) = t0
> goto {Boring} ...
> }
> which doesn't have any assignments to any part of eflags?
In short we don't. Instead offsets 32/36/40/44 hold enough data about
the current operation to be able to calculate eflags (specifically,
the O S Z A C and P flags) later if needed. See section "Condition code
stuff" in guest-x86/gdefs.h.
> 3. How does the 4-word thunk mechanism involving CC_OP, CC_DEP1, CC_DEP2,
> and CC_NDEP work?
Have a look at big comment in guest-x86/gdefs.h line 201.
> 4. For some of these flag setting instructions (like inc) VEX reads the
> above 4 CC_ registers then calls a helper like x86g_calculate_eflags_c with
> their values passed in in order to set eflags. Why doesn't it just emit IR
> that sets those flags instead? e.g. t2 = Add32(t0, t1)
> PUT(ZF) = CmpEQ32(t2, 0)
What x86g_calculate_eflags_c is to do with is the idiocy that inc/dec
set only 5 of the flags - O S Z A P - and don't change C. Therefore to
set the thunk after inc/dec it is first necessary to compute the old C
flag, and so the old thunk components are handed off to this helper to
compute the old C flag.
You should preferably use the post-optimisation IR. The x86 IR flag
scheme is carefully set up so that the IR optimiser can knock out most
of this junk. This not only gives much more efficient IR, it also
replaces many of the helper function calls with in-line IR, which
makes the data flows much clearer. That may or may not be important
for you.
J
|
|
From: Eric L. <ew...@an...> - 2006-08-01 19:40:21
|
Hi, What do you mean by the post-optimization IR? I get a handle to the IR through my own instrumentation function passed into LibVEX_Translate. By the time the IR gets to me, it should have already gone through an optimization phase right? Do you mean that I should set the ir-opt level to 2 so that the code is optimized as much as possible before the instrumentation function gets it? There is another optimization phase after instrumentation, but i have no way of getting the IR after that phase. Also, I set the ir-opt level to 2(the max), and all the x86g_calculate_* type CCall helper functions still remain. Is there a way I can get those replaced with inline IR? Lastly, in the IR translation of the "sar" instruction, CC_OP can either be the op for sar (which is X86G_CC_OP_SHR*) or it can be whatever was set as the previous CC_OP. Why is this? Thanks, Eric > You should preferably use the post-optimisation IR. The x86 IR flag > scheme is carefully set up so that the IR optimiser can knock out most of > this junk. This not only gives much more efficient IR, it also replaces > many of the helper function calls with in-line IR, which makes the data > flows much clearer. That may or may not be important for you. > > J > > |
|
From: Nicholas N. <nj...@cs...> - 2006-08-01 22:49:28
|
On Tue, 1 Aug 2006, Eric Li wrote: > What do you mean by the post-optimization IR? I get a handle to the IR > through my own instrumentation function passed into LibVEX_Translate. By > the time the IR gets to me, it should have already gone through an > optimization phase right? Do you mean that I should set the ir-opt level > to 2 so that the code is optimized as much as possible before the > instrumentation function gets it? > > There is another optimization phase after instrumentation, but i have no > way of getting the IR after that phase. There is a second instrumentation phase. You can't invoke it via the core/tool interface, but you could do a little hacking to access it directly. Look for "vta.instrument2" in coregrind/m_translate.c. > Also, I set the ir-opt level to 2(the max), and all the x86g_calculate_* > type CCall helper functions still remain. Is there a way I can get those > replaced with inline IR? They may be gone by the time the 2nd instrumentation pass occurs. > Lastly, in the IR translation of the "sar" instruction, CC_OP can either > be the op for sar (which is X86G_CC_OP_SHR*) or it can be whatever was set > as the previous CC_OP. Why is this? I don't know about that one. Nick |
|
From: Julian S. <js...@ac...> - 2006-08-02 07:48:35
|
> What do you mean by the post-optimization IR? I get a handle to the IR > through my own instrumentation function passed into LibVEX_Translate. By > the time the IR gets to me, it should have already gone through an > optimization phase right? Yes. > Do you mean that I should set the ir-opt level to > 2 so that the code is optimized as much as possible before the > instrumentation function gets it? Yes. > There is another optimization phase after instrumentation, but i have no > way of getting the IR after that phase. True. But it's much weaker than the initial optimisation pass, so won't do anything helpful. It's there to clean up dead code added by instrumentation and do a bit more constant folding. > Also, I set the ir-opt level to 2(the max), and all the x86g_calculate_* > type CCall helper functions still remain. Is there a way I can get those > replaced with inline IR? They should mostly go. It's too difficult to guess what's going on without some specific examples. If you send some examples of "I put this x86 code in, and got this IR out" that would help. > Lastly, in the IR translation of the "sar" instruction, CC_OP can either be > the op for sar (which is X86G_CC_OP_SHR*) or it can be whatever was set as > the previous CC_OP. Why is this? That doesn't sound right; if it's really true it's probably a bug. Can you send the bit(s) of code from which you conclude this? J |
|
From: Eric L. <ew...@an...> - 2006-08-02 18:11:03
|
So for the "sar" instruction case, here's some Valgrind output that shows what I mean:
-------------------------------------------------------------------------
[notic@localhost temp]$ objdump -D ./foo
./foo: file format elf32-i386
Disassembly of section .text:
08048080 <.text>:
8048080: b8 ef be ad de mov $0xdeadbeef,%eax
8048085: d3 f8 sar %cl,%eax
8048087: 25 0f 00 00 00 and $0xf,%eax
Disassembly of section .comment:
00000000 <.comment>:
0: 00 54 68 65 add %dl,0x65(%eax,%ebp,2)
4: 20 4e 65 and %cl,0x65(%esi)
7: 74 77 je 0x80
9: 69 64 65 20 41 73 73 imul $0x65737341,0x20(%ebp),%esp
10: 65
11: 6d insl (%dx),%es:(%edi)
12: 62 6c 65 72 bound %ebp,0x72(%ebp)
16: 20 30 and %dh,(%eax)
18: 2e 39 38 cmp %edi,%cs:(%eax)
1b: 2e 33 39 xor %cs:(%ecx),%edi
...
[notic@localhost temp]$ valgrind --tool=none --trace-flags=10000000 --trace-notbelow=0 ./foo
==28730== Nulgrind, a binary JIT-compiler.
==28730== Copyright (C) 2002-2005, and GNU GPL'd, by Nicholas Nethercote.
==28730== Using LibVEX rev 1471, a library for dynamic binary translation.
==28730== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==28730== Using valgrind-3.1.0, a dynamic binary instrumentation framework.
==28730== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==28730== For more details, rerun with: -v
==28730==
==== BB 0 (0x8048080) BBs exec'd 0 ====
------------------------ Front end ------------------------
0x8048080: movl $0xDEADBEEF,%eax
------ IMark(0x8048080, 5) ------
PUT(0) = 0xDEADBEEF:I32
0x8048085: sarl %cl, %eax
------ IMark(0x8048085, 2) ------
PUT(60) = 0x8048085:I32
t0 = GET:I32(0)
t5 = And8(GET:I8(4),0x1F:I8)
t2 = t0
t3 = Sar32(t2,t5)
t4 = Sar32(t2,And8(Sub8(t5,0x1:I8),0x1F:I8))
PUT(32) = Mux0X(t5,GET:I32(32),0x1B:I32)
PUT(36) = Mux0X(t5,GET:I32(36),t3)
PUT(40) = Mux0X(t5,GET:I32(40),t4)
PUT(44) = 0x0:I32
t1 = t3
PUT(0) = t1
0x8048087: andl $0xF, %eax
------ IMark(0x8048087, 5) ------
PUT(60) = 0x8048087:I32
t6 = GET:I32(0)
t7 = 0xF:I32
t8 = And32(t6,t7)
PUT(32) = 0xF:I32
PUT(36) = t8
PUT(40) = 0x0:I32
PUT(44) = 0x0:I32
PUT(0) = t8
0x804808C: addb %dl,101(%eax,%ebp,2)
------ IMark(0x804808C, 4) ------
PUT(60) = 0x804808C:I32
t12 = Add32(Add32(GET:I32(0),Shl32(GET:I32(20),0x1:I8)),0x65:I32)
t11 = LDle:I8(t12)
t10 = GET:I8(8)
t9 = Add8(t11,t10)
PUT(32) = 0x1:I32
PUT(36) = 8Uto32(t11)
PUT(40) = 8Uto32(t10)
PUT(44) = 0x0:I32
STle(t12) = t9
0x8048090: andb %cl,101(%esi)
------ IMark(0x8048090, 3) ------
PUT(60) = 0x8048090:I32
t16 = Add32(GET:I32(24),0x65:I32)
t15 = LDle:I8(t16)
t14 = GET:I8(4)
t13 = And8(t15,t14)
PUT(32) = 0xD:I32
PUT(36) = 8Uto32(t13)
PUT(40) = 0x0:I32
PUT(44) = 0x0:I32
STle(t16) = t13
0x8048093: jz-8 0x804810C
------ IMark(0x8048093, 2) ------
PUT(60) = 0x8048093:I32
if (32to1(x86g_calculate_condition[mcx=0x13]{0xb008df10}(0x4:I32,GET:I32(32),GET:I32(36),GET:I32(40),GET:I32(44)):I32)) goto {Boring} 0x804810C:I32
goto {Boring} 0x8048095:I32
. 0 8048080 21
. B8 EF BE AD DE D3 F8 25 F0 00 00 00 00 54 68 65 20 4E 65 74 77
-------------------------------------------------------------------------
The IR output for "sar" has a bunch of mux0x expressions for assignment of the CC thunks. The true target seems to be previous values of the thunk. Why is this so?
I realize this is unoptimized IR, and indeed in this case, the optimized IR gets rid of these mux0x's. But on longer binaries, e.g. I was using "iwconfig", there still remain these cases even after optimization.
Regarding the x86g_helpers not getting translated into IR after optimization issue, if I translate 1 asm instruction at a time as its own basic block, would that affect Valgrind's ability to optimize away the CCall helpers?
Thanks so much,
Eric
>> What do you mean by the post-optimization IR? I get a handle to the IR
>> through my own instrumentation function passed into LibVEX_Translate.
>> By the time the IR gets to me, it should have already gone through an
>> optimization phase right?
>
> Yes.
>
>> Do you mean that I should set the ir-opt level to 2 so that the code is
>> optimized as much as possible before the instrumentation function gets
>> it?
>
> Yes.
>
>> There is another optimization phase after instrumentation, but i have
>> no way of getting the IR after that phase.
>
> True. But it's much weaker than the initial optimisation pass, so won't
> do anything helpful. It's there to clean up dead code added by
> instrumentation and do a bit more constant folding.
>
>> Also, I set the ir-opt level to 2(the max), and all the
>> x86g_calculate_* type CCall helper functions still remain. Is there a
>> way I can get those replaced with inline IR?
>
> They should mostly go.
>
> It's too difficult to guess what's going on without some specific
> examples. If you send some examples of "I put this x86 code in, and got
> this IR out" that would help.
>
>
>> Lastly, in the IR translation of the "sar" instruction, CC_OP can
>> either be the op for sar (which is X86G_CC_OP_SHR*) or it can be
>> whatever was set as the previous CC_OP. Why is this?
>
> That doesn't sound right; if it's really true it's probably a bug. Can
> you send the bit(s) of code from which you conclude this?
>
> J
>
>
|
|
From: Julian S. <js...@ac...> - 2006-08-02 22:42:38
|
> So for the "sar" instruction case, here's some Valgrind output that shows > what I mean: > 0x8048085: sarl %cl, %eax > > ------ IMark(0x8048085, 2) ------ > PUT(60) = 0x8048085:I32 > t0 = GET:I32(0) > t5 = And8(GET:I8(4),0x1F:I8) > t2 = t0 > t3 = Sar32(t2,t5) > t4 = Sar32(t2,And8(Sub8(t5,0x1:I8),0x1F:I8)) > PUT(32) = Mux0X(t5,GET:I32(32),0x1B:I32) > PUT(36) = Mux0X(t5,GET:I32(36),t3) > PUT(40) = Mux0X(t5,GET:I32(40),t4) > PUT(44) = 0x0:I32 > t1 = t3 > PUT(0) = t1 > The IR output for "sar" has a bunch of mux0x expressions for assignment of > the CC thunks. The true target seems to be previous values of the thunk. > Why is this so? Because 'sar' is defined by x86 to not change the flags if the shift amount is zero. (Nuts? Maybe, but we have to simulate it right.) So these Mux0Xs test the shift amount and if zero writes the flag thunk fields with their previous values. Mostly iropt can tell the values are nonzero and so this junk gets folded out. > Regarding the x86g_helpers not getting translated into IR after > optimization issue, if I translate 1 asm instruction at a time as its own > basic block, would that affect Valgrind's ability to optimize away the > CCall helpers? Yes, majorly. It would completely negate such optimisation; doing so requires the flag-setting and flag-using instructions to be in the same block. J |