Re: [B2-devel] New UAE JIT Compiler

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

> I think the complete part in uae_cpu/basilisk_glue.cpp, lines 59..82
> (#if REAL_ADDRESSING .. #endif) could be moved into main.cpp/InitAll(),
> after the call to CheckROM() has been made.

I think so as well, especially the fact that the switch/cases are
already in place. Even the memory_init() could be moved because in
DIRECT_ADDRESSING mode, I made it do nothing.

> > (b) In order to get a chance to mmap() the address space as
> > above-mentioned, MacRAM would not get allocated before VideoInit() is
> > executed.
> 
> This is harder. I placed VideoInit() at the latest possible point in the
> initialization order because it may switch to a screen mode where it's
> no longer possible to put up dialog boxes for error messages from modules
> that are initialized earlier.

Still in direct addressing mode, the trick was to call other buffer
initialization routines after InitAll() is completed in main_unix.cpp.
The function (VideoInitBuffer) takes only one parameter: the new memory
area allocated for the temporary frame buffer.

> > (d) Due to the different frame layout that could be used, I implemented
> > video handling on SIGSEGV (see below). Drawback: DGA mode will be slower
> > since a temp frame buffer will have to be used too.
> 
> But the whole point of DGA mode is to avoid a temporary frame buffer...

The rationale is currently as follows according to DGA screen depths:
-  8 : no temporary frame buffer is required
- 15 : a temp buffer is needed because because long get/put have to be
word swapped
- 16 : temp buffer required because color conversion is necessary
- 24 : byteswap

All the cases above-mentioned are true only if the host is
little-endian. Otherwise, no temp buffer at all is required.

Note: for windowed mode, the reasoning is a bit different since I should
take into account the underlying bitmap bit order instead of the host's.
See the tests you are take when calling video_x.cpp/set_video_monitor.

Another though about direct addressing:

MacOS seems to try to write to ROM then read back for some testing
purposes, right ? I used Lauri's method to handle that, i.e. protect the
ROM area from writes. When an access violation occurs, the
Screen_fault_handler() will just advance the host (x86) instruction
pointer to the next instruction. But since I don't want to have an
advance() function for any target processor, I wonder if the ScratchMem
method used for real addressing is safe all the time ? If so, correct
#if should also be added in main.cpp and rsrc_patches.cpp.

> > TC flush will occur when:
> > - Code is created and executed from Execute68k(), Execute68kTrap()
> > - FlushCodeCache() is called
> > - A-Traps: FlushCodeCacheRange, FlushInstructionCache
> > - BlockMove()
> 
> Why is this not simply done every time a MOVEC *,CACR or CPUSH is executed
> to clear the emulated 680x0's caches?

Yes, Bernie's compiler does that but I was just wondering about
self-modifying code and other ways to detect it and avoid complete
checksuming of basic blocks.

BTW, I was also wondering how hard it would be to patch the Segment
Loader and compile, possibly in a more aggressive way, blocks that got
loaded.

I was also thinking about the use of MAE's dispatching method instead of
going every time through the TLB to find and check for a compiled block.
In other words, take one unused opcode, put it in place of the first
opcode of the block and patch the jump table accordingly. This would
make it easily possible to port the JIT compiler on a system that
doesn't have GCC and its "Label as Value" extension, say under Windows
and VC++ ;-)

The TLB is still needed, to recover the original opcode and to push back
the special compile_opcode. Overlapping of compiled blocks should be
taken care of as well since the original m68k opcode got replace with
one of the special compile_opcodes.

I have not started to work on compemu_*.c yet. For starters, I will
probably create a similar framework that would just generate calls to
the appropriate instruction handlers. Then, there would just be the need
to have specific "call <target>" instruction generator per target
processor.

I am also thinking/experimenting/working on another emulator that should
enable retargetting of a JIT compiler in near no time. In fact, that's
not really a code generator, just a code "copier".

GCC will be mandatory because of the following (at least) two features:

- "Label as Value" : to determine code ranges to copy
- "Explicit Reg Vars" : for static register allocation. If the host
permits it, I intend to cache D0, D1, A0, A1, A7 in no particular order.
Q&D profiling shows that those are the most frequently used registers.

Sure this won't provide as much power as of a customized JIT compiler
with dynamic register allocation but the point is that no code generator
is needed. ;-)

I have not seen this implemented before in a real emulator. I got the
idea by reading again:

"Optimizing direct threaded code by selective inlining"
Ian Piumarta and Fabio Ricardi
PLDI'98. (ACM)

If you don't have access to the ACM, I could try to recover from my
memory the link from which I got the article.

> If you are convinced that it will still work on non-x86 machines and other
> operating systems (including NetBSD/m68k, where it runs without CPU
> emulation), you can check it in.

I won't commit the direct addressing diffs before I see it working, at
least, on Solaris/SPARC which supports siginfo_t.

> > Actually, I never used CVS
> 
> There's a nice tutorial at
> http://www-classic.be.com/aboutbe/benewsletter/volume_III/Issue40.html#Insight

Thanks, I will check it out.

> > Should I make it the default one when an i386 cpu is detected ?
> 
> If it's an improvement, then yes.

Is "it works (enable scrollbars)" a right answer ? ;-)

PS: sorry, I did not notice I wrote so much text.

Bye,
-- 
Gwenolé Beauchesne