Menu

CPU emulation refactoring

When Steem was first written by Russel and Anthony Hayward, performance of contemporary computers was an issue, and some compromises were necessary. Timings were computed in one go (14 cycles at once for instance). For register as well as memory operations, a pointer was used to write the result and update flags. It was smart and fast, but hard to follow. In the source, Steem authors recommended to treat CPU emulation as a black box indeed.
Long (32bit) operations were executed in one go, but the real CPU must use two 16bit bus accesses, which can change the outcome in corner cases.
Prefetching (getting operands from memory) used a fast, simplified way, a bounded pointer.
Today, our computers are more powerful, and we can afford to be more precise, closer to the hardware, within limits. On the other hand, it is possible to improve the performance of some operations.

So in v4, beside the new interrupt model, the following changes were applied to CPU emulation.

  • Every R/W (read/write) is limited to 16bit data.
  • Prefetching operands is a form of reading data: a function call for every fetch.
  • Function pointers are used to dispatch to STF or STE specific timing and R/W functions.
  • R/W is integrated in timing functions. So a single macro like CPU_BUS_ACCESS_WRITE will now compute the timing and do the write.
  • Using the current operand stored in IRC (Instruction Register Capture) and fetching the new operand to IRC are separate operations. It is why one could have the impression that some operands were "refetched" in MOVE.
  • For a typical instruction, we read the source, we read the destination, we compute the result, we update the flags, then we write the result. We also fetch the next instruction. It is all very straightforward and easier to follow, but it uses some more variables and operations.
  • C unions are used to avoid shifting data between high and low order parts of registers and memory.
    I'm so proud of this I even call them smart unions. Maybe it's nothing new. C++ references are used to reduce code clutter.

For example:

pch=dataword; // high word of pc

instead of

pc&=0x0000ffff;
pc|=(dataword<<16);
  • We try to keep some bus information up-to-date: abus (23bit) for the address bus, dbus (16bit) for the data bus.
  • We use dbus for an STF Shifter quirk: unused palette bits reflect data bus activity.
  • We use a bus mask to describe the current operation type (fetch, write high byte...). Function codes in case of bus error should be always correct now.
  • IO functions now use the LDS (lower data strobe) and UDS (upper data strobe) signals - in fact the bus mask. That way, there's only one function call for each byte or word R/W.
  • PC is a 32bit variable. On each fetch the high byte is ignored. Likewise, there's a 32bit CPU internal address bus, while abus is 23bit.
  • A table of 65536 function pointers is used to directly dispatch opcodes to the correct function. It proves faster than a chain of functions, and illegal instructions trigger at once (this was a pain in previous versions).
  • Some functions were split to avoid tests inside the instruction, notably bcc. More could be done (split write to register/write to memory). It's a trade off between code bloat and test iterations. It's also easier to follow.
  • Status register flags are handled as separate booleans, with updates each time SR is read or written. Should be faster than manipulating bits.

It is just refactoring, but it was much fun doing it regardless.
I've already stated that the original Steem was pretty well coded: all those CPU emulation changes didn't break the Debugger.

Posted by STeven 2019-06-09
  • Cyprian

    Cyprian - 2019-11-09

    a lot of cool changes.

    "We use dbus for an STF Shifter quirk: unused palette bits reflect data bus activity."
    that reflecting bus activity sounds promising.
    Would be cool to have also applied it on Shifter, when it tries fetch data from memory area where is no assigned RAM. E.g. ST with 1MB and Shifter buffer is set above 1MB.

     

    Last edit: Cyprian 2019-11-09
  • STeven

    STeven - 2019-11-10

    This is too heavy for a high level video emulation but I'll approximate the effect by displaying from memory $0 instead.

    Maybe more importantly, peeking non-existing RAM should reflect MMU/Shifter bus, for the moment it's just $FFFF, gonna change that.

     
  • STeven

    STeven - 2019-11-10

    Here the pics.


    very approximate


    more accurate (low-level)

     
  • Cyprian

    Cyprian - 2019-11-12

    more accurate looks cool,
    real bus activity isn't stable, what about that?

     
  • STeven

    STeven - 2019-11-23

    Of course those are screenshots. It even varies with mouse movements (2nd pic).

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.