|
From: Jeremy F. <je...@go...> - 2002-11-16 18:17:07
|
On Sat, 2002-11-16 at 04:09, Julian Seward wrote: > During a long train journey late this summer I worked through most of the > design details needed to support t-chaining cleanly. Then I forgot most of > them. I am inclined to agree, it's an obvious (perhaps overdue) optimisation > which should be looked into. Having said that, my priority is still to freeze > and ship 2.0 tho. > > The basic idea is that each translation exists in one of two states: > chained and unchained, and can be moved back and forth between them as > needed. > > - chained means that jumps out of it to known addresses jump directly > to the target translation. > > - unchained means we always do a lookup in the orig->new code address > mapping, ie we go via the dispatcher > > New translations are created in the unchained state. Permanently associated > with each translation is enough metadata to facilitate chaining or unchaining > it at will. > > When an unchained translation wants to make a jump to a known (orig)address, > it pushes the orig-address it wants to call, and *calls* "patch_me" > which is a short piece of assembly code. This pops the args (orig-addr) > and also pops the return address -- which points just after the call > insn on the original translation. patch_me can arrange to find the > translation and patch the caller to jump directly to it. > > There is some fiddly stuff to be sorted out here: > > - how to most cleanly and robustly store info to enable chaining/unchaining > > - how to minimise the number of magic assembly code sequences needed (these > amount, you'll notice, to an ultra-minimal runtime linker) Well, I suppose there's two: there's the sequence the codegen generates for jumps, and there's patch_me. Neither of those are complex. > > - how to cleanly deal with jumps to unknown addresses, which always require > a lookup > > - how to deal with jumps which have "extra semantics", ie a JumpSyscall or > JumpClientReq, etc. Ignore them - just generate the code for them we generate now. > - how to handle the event-counter falling to zero in chained translations I think generate the decrement inline and fall into the dispatcher if we hit 0. > Finally -- and this is the last part of the trick -- whenever we want to > move or discard any translations, we first unchain *all* of them. OK, that's nice and simple. > - Jeremy: you mentioned something about possibly calling _from_ translations > to the translator machinery if a target translation is missing. I prefer to > stick with the structure as it stands on the basis it's less rugged, in > which translations run as the highest point in the call stack. For all > exceptional situations (missing translation, JmpSyscall, etc) the > translations return to C land (the scheduler) which handles the situation. > Therefore the call stack looks like one of the following: > > scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) -> > (translations) > > or > > scheduler(C) -> translation-generating-machinery(C) > > But specifically I never have > > scheduler(C) -> run_thread_for_a_while(C) -> run_innerloop(ASM) -> > some-translation -> translation-generating-machinery(C) > > there is no case where C land runs a translation which calls back into C > land, and I think that is more robust. > > [ok, not entirely true; translations call helper fns, but these are > pretty simple and don't mess with the global translation state at all] Well, OK, but you didn't address what patch_me would do if the target address isn't present. Would it fall back into dispatcher loop, who would then trigger a codegen, and then the next time through this BB we'd do the chaining? That seems reasonable to me. > So: I have no time to chase up any of this stuff (apart from discuss possible > designs), but if you feel the need to do some feasability-assessment hacking, > please do! It would be very interesting to know if the extra performance gain > is worth the complication. I'll look at it if I get a moment. I want to finish up everything I've got open at the moment, with the expectation I'll have very little hacking time available in a month or two (whereupon my first-born appears and I get that harried new-parent look). > If it can be done simply and cleanly I'm in favour. Generally my approach is > to shoot for 80% of the available performance for 20% of the complication. > This strikes me as good engineering for a resource-constrained small group. > See http://www.cs.princeton.edu/software/lcc for a strikingly effective > demonstration of the same attitude. Yes, I like lcc's internals. J |