[Valgrind-developers] Re: basic block chaining

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, 2002-11-16 at 04:09, Julian Seward wrote:
> During a long train journey late this summer I worked through most of the 
> design details needed to support t-chaining cleanly.  Then I forgot most of
> them.  I am inclined to agree, it's an obvious (perhaps overdue) optimisation
> which should be looked into.  Having said that, my priority is still to freeze
> and ship 2.0 tho.
> 
> The basic idea is that each translation exists in one of two states:
> chained and unchained, and can be moved back and forth between them as
> needed.
> 
> - chained means that jumps out of it to known addresses jump directly
>   to the target translation.
> 
> - unchained means we always do a lookup in the orig->new code address
>   mapping, ie we go via the dispatcher
> 
> New translations are created in the unchained state.  Permanently associated
> with each translation is enough metadata to facilitate chaining or unchaining
> it at will.
> 
> When an unchained translation wants to make a jump to a known (orig)address,
> it pushes the orig-address it wants to call, and *calls* "patch_me"
> which is a short piece of assembly code.  This pops the args (orig-addr)
> and also pops the return address -- which points just after the call
> insn on the original translation.  patch_me can arrange to find the 
> translation and patch the caller to jump directly to it.
> 
> There is some fiddly stuff to be sorted out here:
> 
> - how to most cleanly and robustly store info to enable chaining/unchaining
> 
> - how to minimise the number of magic assembly code sequences needed (these
>   amount, you'll notice, to an ultra-minimal runtime linker)

Well, I suppose there's two: there's the sequence the codegen generates
for jumps, and there's patch_me.  Neither of those are complex.

> 
> - how to cleanly deal with jumps to unknown addresses, which always require
>   a lookup
> 
> - how to deal with jumps which have "extra semantics", ie a JumpSyscall or
>   JumpClientReq, etc.

Ignore them - just generate the code for them we generate now.

> - how to handle the event-counter falling to zero in chained translations

I think generate the decrement inline and fall into the dispatcher if we
hit 0. 

> Finally -- and this is the last part of the trick -- whenever we want to
> move or discard any translations, we first unchain *all* of them.

OK, that's nice and simple.

> - Jeremy: you mentioned something about possibly calling _from_ translations
>   to the translator machinery if a target translation is missing.  I prefer to
>   stick with the structure as it stands on the basis it's less rugged, in
>   which translations run as the highest point in the call stack.  For all
>   exceptional situations (missing translation, JmpSyscall, etc) the 
>   translations return to C land (the scheduler) which handles the situation.
>   Therefore the call stack looks like one of the following:
> 
>        scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) ->
>             (translations)
> 
>    or
> 
>        scheduler(C) -> translation-generating-machinery(C)
> 
>   But specifically I never have
> 
>        scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) ->
>          some-translation -> translation-generating-machinery(C)
> 
>   there is no case where C land runs a translation which calls back into C
>   land, and I think that is more robust.
> 
>   [ok, not entirely true; translations call helper fns, but these are 
>    pretty simple and don't mess with the global translation state at all]

Well, OK, but you didn't address what patch_me would do if the target
address isn't present.  Would it fall back into dispatcher loop, who
would then trigger a codegen, and then the next time through this BB
we'd do the chaining?  That seems reasonable to me.

> So: I have no time to chase up any of this stuff (apart from discuss possible
> designs), but if you feel the need to do some feasability-assessment hacking,
> please do!  It would be very interesting to know if the extra performance gain
> is worth the complication.

I'll look at it if I get a moment.  I want to finish up everything I've
got open at the moment, with the expectation I'll have very little
hacking time available in a month or two (whereupon my first-born
appears and I get that harried new-parent look).

> If it can be done simply and cleanly I'm in favour.  Generally my approach is
> to shoot for 80% of the available performance for 20% of the complication.
> This strikes me as good engineering for a resource-constrained small group.
> See http://www.cs.princeton.edu/software/lcc for a strikingly effective
> demonstration of the same attitude.

Yes, I like lcc's internals.

	J