|
From: Jeremy F. <je...@go...> - 2002-11-25 22:50:19
|
I've been thinking about the flags problem a bit, and it seems that we
can't do a great deal better than we currently do. Well, we can get rid
of some really bogus code, but we still need to keep some moderately
bogus code.
I was thinking that we could probably improve on sequences such as:
and %eax, %eax
pushf
popl 32(%ebp)
test 64, 32(%ebp)
jz XXX
jmp NEXT
Well, we can, but we still need the pushf/popl there, because we need to
save the flags for the next basic block (even though the chances are
high that the flags are considered dead from basic block to basic
block).
In this case, the best we can do is:
and %eax, %eax
pushf
popl 32(%ebp)
jnz XXXXX
jmp NEXT
The really bogus code is where one instruction sets up the flags for the
immediately following one, but the codegen puts a redundant save/load in
there:
add %edx, %eax
pushf
popl 32(%ebp)
pushl 32(%ebp)
popf
adc %ecx, %ebx
Which should be
add %edx, %eax
adc %ecx, %ebx
of course.
I don't know how much time we're spending in bogus flag saving vs.
unavoidable flag saving. Once again, it seems the only way of getting
very good improvements is to work out a way of increasing the working
basic block size in order to make our local analysis a bit more global.
But that sounds like hard work.
J
|
|
From: Julian S. <js...@ac...> - 2002-11-25 23:50:14
|
Yes, doing a good job of iccs is the hardest part of dynamic binary translation, many would say. > The really bogus code is where one instruction sets up the flags for the > immediately following one, but the codegen puts a redundant save/load in > there: > > add %edx, %eax > pushf > popl 32(%ebp) > pushl 32(%ebp) > popf > adc %ecx, %ebx > > Which should be > > add %edx, %eax > adc %ecx, %ebx > > of course. Doesn't the mythical lazy-eflags-save/restore pass clean up this particular case? > I don't know how much time we're spending in bogus flag saving vs. > unavoidable flag saving. Once again, it seems the only way of getting > very good improvements is to work out a way of increasing the working > basic block size in order to make our local analysis a bit more global. > But that sounds like hard work. Good icc handling is known to be difficult in dynamic translators. I think we can say we're running up against the limits of our local analysis. Most systems which do better (WABI, Daisy, surely others) translate groups of bbs at a time and track/optimise icc liveness across the whole group. Also, that would allow register allocation across the whole group. If I had another spare year and reimplemented the JIT from scratch I'd think about something like this. However, reality being what it is ... J |
|
From: Jeremy F. <je...@go...> - 2002-11-26 00:28:37
|
On Mon, 2002-11-25 at 15:57, Julian Seward wrote:
> > Which should be
> >
> > add %edx, %eax
> > adc %ecx, %ebx
> >
> > of course.
>
> Doesn't the mythical lazy-eflags-save/restore pass clean up this particular
> case?
Should do. Doesn't look that hard to implement either. I was vaguely
thinking of hanging a mechanism off VG_(new_emit), by adding "uses" and
"sets" flagset arguments to it, and have it manage flag saves and
restores. new_emit seemed like a nice place to hang it, since its
already being called in all the right places.
> Good icc handling is known to be difficult in dynamic translators. I think
> we can say we're running up against the limits of our local analysis. Most
> systems which do better (WABI, Daisy, surely others) translate groups of
> bbs at a time and track/optimise icc liveness across the whole group.
> Also, that would allow register allocation across the whole group. If I
> had another spare year and reimplemented the JIT from scratch I'd think about
> something like this. However, reality being what it is ...
Well, I think a longer term, but not completely impractical, approach is
the extended basic block idea. It gives us translation of multiple BBs
(including register allocation) more or less within our existing
infrastructure.
J
|