|
From: Julian S. <js...@ac...> - 2002-11-22 09:01:44
|
> Maybe we should at least keep EIP up to date at the basic block level,
> so we only need to deal with the intra-BB case.
As per recent msg to Nick, doing so at least makes it easy to find the
%eip (and hence %EIP) by scanning the stack for helper-function
return addresses. One PUTEIP per bb doesn't seem too onerous.
> I was thinking of a table with one entry per original instruction, which
> contains the length of the original instruction and the length of the
> corresponding generated code. It may be possible to pack this into 2
> nibbles per instruction, with an escape mechanism to deal with rare
> cases (ie, if length == 0xF then get the details from the following
> byte). That could be built as the code is generated then tacked onto
> the end of the generated code (the TTE would contain the generated code
> length plus the length of the whole translation, including the
> instruction mapping table). That way doing the mapping is a simple
> linear scan:
>
> cur_eip = tte->trans_addr;
> cur_EIP = tte->orig_addr;
> entry = 0;
> while(eip > cur_eip) {
> cur_eip += tab[entry].trans_len;
> cur_EIP += tab[entry].orig_len;
> entry++'
> }
I was thinking of the same thing. We could be simpler and just use a pair
of bytes (orig_insn_len, trans_insn_len) for each orig insn; this gives an
overhead of 2 bytes per orig insn, which is not too bad. (About 2 mbyte for
a run of Mozilla).
We can easily do a bit better with some kind of cleverer
encoding scheme as you mention, if that's needed [probably an array of
12-bit entities, 4 bits for orig len (they are all in the range 1 .. 17)
and 8 for translated length, with perhaps (N, 0xFF) indicating that the
entire next 12-bit group is a translated length, in the (appalling) case
that one orig insn generates more than 254 bytes of translation.
One comment is that it might be worth having in each TT entry a small number
of such slots, say 8. That way most basic blocks can store their table in
the TT entry. For the occasional large bb, TT will have to contain a ptr to
a table allocated in VG_(malloc)ville. This seems like a good idea because
there's a considerable space overhead using VG's malloc stuff -- about 6 words
per alloc -- so doing a large number of small malloc's is quite wasteful of
space.
Adding 8 slots to each TT covers most bbs, and would take 4 words
with the (byte,byte) encoding scheme and 3 words with the (4,8)
scheme.
J
|