[B2-devel] JIT Compiler: Direct Addressing likely to be mandatory

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Facts: using the JIT compiler when banked memory addressing is used is
awfully slow. It is actually slower than without JIT compilation at all!

Current solution to make things faster: the "baseaddr" table hack. That
table contains offsets to add to a virtual Mac address so that it
results into a native (host) equivalent. A table of MEMBaseDiff values
if you prefer.

That table is only used in compiled code because we still need the old
memory access routines to know whether it is possible to use the
baseaddr[] or not. Cases where baseaddr[] can't be used is when we
access the frame buffer and some conversions are required (e.g. for an
RGB565 layout)

Sure, some of you will tell me that I could use the VOSF "technology".
That's true but if we can use it, that means that we could use direct or
real addressing mode as well.

Unfortunately, "knowing whether it is possible to use the baseaddr[] or
not" is pure guess-work! Indeed, paraphrasing Bernie's words, we
actually make the assumption that any given instruction will either
always access real memory, or always access memory that needs some post-
or pre-processing.

Let us take an example: say I have a basic block of instructions whose
purpose is to copy some data. Hmm, say this is the _BlockMove trap. We
are likely to compile this piece of code with the assumption that only
"real" memory is accessed, i.e. the host address is determined through
the baseaddr[] table. Unfortunately, what happens if _BlockMove is used
to move data from a screen buffer to the MacFrameBuffer ? Well, if
screen depth is greater than 8bpp, this simply draws horrible portions
of screen.

Solutions:

1) Use _BlockMove and _BlockMoveData trap replacements. That will
probably not work since there are for example custom-made memcpy()
implementation and the same problem is bound to persist.

2) Use the blitters from VOSF mode. Making it portable would require to
use the old memcmp() method to determine the screen regions to update.
That's not a problem but just makes things slower. The other major
problem is we will also lose the benefit of DGA as in VOSF mode. Well,
in one way, it's better slower than buggy.

3) Forbid usage of banked memory addressing with the JIT compiler. I
will tend to this solution since direct addressing is so much faster
than banked memory addressing. Both with the JIT compiler, of course.

Lauri, is it really impossible to do Direct Addressing under Win 9X ?
As far as the Unix ports are concerned, I really don't have to do
"triple_allocation".

Rough process:

- allocate RAM + ROM at the same time but keep their respective base
address properly page-aligned

- ROMBaseMac seems to be relocateable. Therefore, it is simply
        RAMBaseMac + aligned_ram_size;

- Init MEMBaseDiff to be RAMBaseHost - RAMBaseMac. As RAMBaseMac is 0,
MEMBaseDiff simply turns out to be RAMBaseHost.

- MacFrameBaseMac seems to be relocateable too. Then, when it is time to
initialize VideoMonitor.mac_frame_base, you simply assign the result of
the call to Host2MacAddr(the_host_screen_buffer);

It will work because we have distinct regions and offsets between a
virtual (Mac) address and a native (host) address is constant.

- As for write-protecting the ROM region, you can stick with you
step_over() method or simply implement the ScratchMem subterfuge as
well.

Conclusion:

What should I do ? Struggling myself to make the JIT compiler "fast" in
banked memory addressing and therefore fix the problems I related
hereabove ? You know, current results shows that banked memory
addressing is twice as slow as direct addressing... I am just wondering
if it is worth the effort ;-)

PS: I placed a new source tarball on my website.

Bye.