From: <gb...@di...> - 2001-01-05 17:54:48
|
Hi, Facts: using the JIT compiler when banked memory addressing is used is awfully slow. It is actually slower than without JIT compilation at all! Current solution to make things faster: the "baseaddr" table hack. That table contains offsets to add to a virtual Mac address so that it results into a native (host) equivalent. A table of MEMBaseDiff values if you prefer. That table is only used in compiled code because we still need the old memory access routines to know whether it is possible to use the baseaddr[] or not. Cases where baseaddr[] can't be used is when we access the frame buffer and some conversions are required (e.g. for an RGB565 layout) Sure, some of you will tell me that I could use the VOSF "technology". That's true but if we can use it, that means that we could use direct or real addressing mode as well. Unfortunately, "knowing whether it is possible to use the baseaddr[] or not" is pure guess-work! Indeed, paraphrasing Bernie's words, we actually make the assumption that any given instruction will either always access real memory, or always access memory that needs some post- or pre-processing. Let us take an example: say I have a basic block of instructions whose purpose is to copy some data. Hmm, say this is the _BlockMove trap. We are likely to compile this piece of code with the assumption that only "real" memory is accessed, i.e. the host address is determined through the baseaddr[] table. Unfortunately, what happens if _BlockMove is used to move data from a screen buffer to the MacFrameBuffer ? Well, if screen depth is greater than 8bpp, this simply draws horrible portions of screen. Solutions: 1) Use _BlockMove and _BlockMoveData trap replacements. That will probably not work since there are for example custom-made memcpy() implementation and the same problem is bound to persist. 2) Use the blitters from VOSF mode. Making it portable would require to use the old memcmp() method to determine the screen regions to update. That's not a problem but just makes things slower. The other major problem is we will also lose the benefit of DGA as in VOSF mode. Well, in one way, it's better slower than buggy. 3) Forbid usage of banked memory addressing with the JIT compiler. I will tend to this solution since direct addressing is so much faster than banked memory addressing. Both with the JIT compiler, of course. Lauri, is it really impossible to do Direct Addressing under Win 9X ? As far as the Unix ports are concerned, I really don't have to do "triple_allocation". Rough process: - allocate RAM + ROM at the same time but keep their respective base address properly page-aligned - ROMBaseMac seems to be relocateable. Therefore, it is simply RAMBaseMac + aligned_ram_size; - Init MEMBaseDiff to be RAMBaseHost - RAMBaseMac. As RAMBaseMac is 0, MEMBaseDiff simply turns out to be RAMBaseHost. - MacFrameBaseMac seems to be relocateable too. Then, when it is time to initialize VideoMonitor.mac_frame_base, you simply assign the result of the call to Host2MacAddr(the_host_screen_buffer); It will work because we have distinct regions and offsets between a virtual (Mac) address and a native (host) address is constant. - As for write-protecting the ROM region, you can stick with you step_over() method or simply implement the ScratchMem subterfuge as well. Conclusion: What should I do ? Struggling myself to make the JIT compiler "fast" in banked memory addressing and therefore fix the problems I related hereabove ? You know, current results shows that banked memory addressing is twice as slow as direct addressing... I am just wondering if it is worth the effort ;-) PS: I placed a new source tarball on my website. Bye. |