|
From: Julian S. <js...@ac...> - 2002-11-19 19:08:35
|
> > 1. Lack of translation chaining. Has been discussed a lot already, and I > > think has reasonable probability of getting addressed at some point, > > since we're now floating detailed ideas for doing it. > > I think this is major, and we should be able to tell soon: I hacked up > most of an implementation on the train this afternoon. Doesn't quite > work yet, but it looks convincing. Cool! Keep on hacking. I'd be very interested to hear the outcome of this. > > 2. The inability to cache simulated regs in real ones across bb > > boundaries (they all get flushed with PUTs at the end). I think JF > > mentioned this at some point. I don't know how much effect this has. > > I think this is relatively minor compared to chaining, Possibly true. I think trace cacheing would break cachegrind, which makes some assumptions about uniqueness of bb translations (or something? Nick? but that can't be right, because it copes with multiple tail translations of bbs, as we discussed once). Unfortunately there is no easy way I can see to assess the performance loss from this inability to hold values in regs for long. > > 3. (nobody mentioned this, but I think it is very significant): the > > inability of the code generator to consider groups of UInstrs together > > when generating code. Specifically, this has the following bad effect. > > [...] > > This is surely something worth looking into. An interesting mechanism > > to create would be one which measures the (dynamic) frequency of > > adjacent uinstr pairs, to search for common sequences (pairs, to start > > with) which would be worth putting special code-generation effort into. > > Of course the results would depend on what instrumentation had been added > > by the skin. > > I'm not sure its all that important. It would reduce the actual > instruction count and register pressure, but I'm not sure it would do > all that much for modern CPUs (the Via C3 being the exception, and it > does need all the help it can get). Um, yes, I guess because it merely produces insns which will turn mostly into single micro-ops in the P6/P7/K6/K7 cores anyway, yes? I'd forgotten about that. What it does mean is that the translated code is more likely to stall due to lack of insn decoder bandwidth, compared to "normal" code. J |