x128 gets stuck when opening borders with Z80
Versatile Commodore Emulator
Brought to you by:
blackystardust,
gpz
I experimented with opening borders with Z80. The program z80openborders.prg works fine with a real PAL C128 and Z64K. VICE x128, however, gets stuck.
After a moment the program should return to the READY prompt. With VICE x128 this does not happen. This appears to be related to interrupts or HALT instruction. The counter in register BC is not decreased.
Code listing below.
main .org $1c01
.byte $0c,$08,$0a,$00,$9e,$37,$31,$38,$31,$00,$00,$00
lda $ff00 ;
pha ; store RAM config to stack
sei ; disable interrupts
lda $ffee
pha
lda $ffef
pha
lda $fff0
pha
lda #$3e ;
sta $ff00 ; select RAM config for Z80
lda #$c3 ;
sta $ffee ; store JP instruction for Z80 mode start
lda #<z80code ;
sta $ffef ; store lo-byte address
lda #>z80code;
sta $fff0 ; store hi-byte address
lda $d505 ;
pha ; store mode to stack
lda #$b0 ;
sta $d505 ; set Z80 mode - this instruction deactivates 8502 and jumps by Z80 PC to $ffee
nop
nop
pla
sta $d505
pla
sta $fff0
pla
sta $ffef
pla
sta $ffee
pla
sta $ff00
cli
rts
z80code
.byte $F3; di
.byte $01 $00 $02; ld bc,0200h
.byte $11 $00 $38; ld de,3800h
.byte $21; ld hl
.byte <endcode
.byte >endcode
.byte $ED $B0; ldir
.byte $C3 $00 $38; jp 3800h
endcode
.org 3800h
ld sp,37F0h
ld bc,0D012h
ld a,0f9h
out (c),a
ld bc,0dc0eh
in a,(c)
ld (3101h),a
ld a,0f9h
out (c),a
ld a,01h
ld bc,0d019h
out (c),a
inc c
out (c),a
ld a,01bh
ld bc,0d011h
out (c),a
ld bc,3000h
ld a,31h
_loop1:
ld (bc),a
inc c
jr nz, _loop1
inc b
ld (bc),a
ld a,0c3h
ld (3131h),a
ld bc,irq
ld (3132h),bc
ld a,30h
ld i,a
im 2
ei
ld bc,500
_loop3:
** halt
dec bc**
ld a,b
cp 0ffh
jr nz, _loop3
ld bc,0dc0eh
ld a,(3101h)
out (c),a
jp 0ffe0h
irq:
push af
push bc
ld bc,0d019h
ld a,01h
out (c),a
ld bc,0d020h
in a,(c)
inc a
out (c),a
ld bc,0d011h
in a,(c)
and 0f7h
out (c),a
inc c
_loop2:
in a,(c)
cp 00h
jr nz, _loop2
dec c
in a,(c)
and 7fh
or 08h
out (c),a
ld bc,0d020h
in a,(c)
dec a
out (c),a
pop bc
pop af
ei
ret
.end
It appears that Z80 HALT is not properly handled.
Attached is a test program, which was tested on two PAL C128s.
x128 does not respond.
Are you able to provide the source code for z80rastertimingtest.prg? I note you've also included a timing test for writing to the Z80 IO port!
Sources attached. It uses both IM1 and IM2 and two interrupt handlers with slightly different timing.
Edit: a few inconsequential changes. - 2.2.24 minor fixes; added tentative (not confirmed on real hardware) NTSC version of the test.
2024-02-20: a minor fix.
Last edit: Jussi Ala-Könni 2024-02-20
Hmm. Apparently no one is interested in fixing this. That is disappointing to say at least, I would say Z80 HALT and raster interrupts work pretty well together, all raster jitter is simply absent. Instead, an outright stability in all my experiments. Achieving a stable raster interrupt is pretty simple task on C128 and Z80. The mess with stable raster interrupt routines is simply absent.
Just a casual example of opening the side border demonstrates how easy things actually are using Z80:
Needless to say, VICE does not run it. Z64K is not perfect in Z80 related timings either, but it runs this example.
Edit. This example is for PAL C128.
Last edit: Jussi Ala-Könni 2024-02-11
I'll be looking into this. I will take me a bit to catch up on the z80 cpu code. I did some work with it before, but from what I can see right now, it doesn't handle halt correctly. It seems it just waits 4 cycles and moves on. It should wait for an interrupt, but it clearly doesn't. I will also have to be careful with this as the same core code relates to other z80 add-ons, like the commodore cpm cart.
I have a new test program based on an earlier C64 test gfxfetch made by Hannu Nuotio and Antti Lankila. I found it interesting to test these ideas with C128 and Z80 cpu. It runs properly only on real hardware (PAL C128).
Source will be provided if there is interest, I will clean it up a bit first.
And link to the original test:
https://sourceforge.net/p/vice-emu/code/HEAD/tree/testprogs/VICII/gfxfetch/
Last edit: Jussi Ala-Könni 2024-02-17
I did some tests with CIA timer measuring the number of cycles between successive interrupts, going from one scan line to the next.
Results from real PAL C128 seem solid: the timing is stable and depending on the scan line; there is 2 cycle difference in half of the measurements, depending on the scan line and number of Z80 cycles run.
Z80 and raster interrupt seems to be in sync which explains the lack of jitter.
x128 does not run the test and Z64K has a phase error.
Last edit: Jussi Ala-Könni 2024-03-06
Okay. I committed an initial fix for this in r45031. The halt is handled, but the timing results seem to look the same as z64k. I'm not exactly sure what you are doing here to derive these numbers, but I don't think the VIC and CIA are the best ways to do this because of the badlines. If you are testing instruction timing, I suggest you look at:
textprogs/general/Lorenz-2.15/src/cputiming.s
It uses a very creating way to test instruction timing. As far as I know, we don't have a way of verifying the timing of the z80 emulation. We have a functional tester (zex), but I don't think it does timing as it is generic to CPM.
The "Risen From Obvlivion" Demo measures the z80 Timing vs the VICII (iirc) at startup, perhaps that can serve as a starting point for a test program
halt_timingtest simply measures the interval between interrupts of successive scanlines, on a PAL machine the value is indeed expected to be close to 65536 - 63, but it varies since HALT takes n * 2 1MHz clock cycles to execute. Only when the total time spent on a scanline equals to 63, is constant timing (=value FFC1) expected to be seen. When it is not, during HALT state one more or less NOP (= 2 1MHz cycles) is spent on a scanline, and that time seems to depend on a scanline. Bad lines are not measured here (there is also not enough cpu time to measure a badline). The result seems to confirm that regularization happens, possibly during badlines. Z64K here seems to have a phase opposite to that of real hardware; in general Z64K has an accurate basic run of Z80 emulation (on a PAL machine one scanline is 126 Z80 cycles).
I have other tests which draw testbars on the screen, which previously did not run, but now x128 actually gave quite a good result; I was expecting a total mess on the screen, but instead it seems that x128 is consistently retarded a bit too much during Z80 code execution: 2 1MHz cycles per scanline. 8502 bar is straight as expected.
To be honest, I'm not sure if I got the whole interrupt thing right. I'm reading data sheets and other materials and it is a little vague. There is no tester for this stuff. So given that I'm at best a novice with Z80 coding and you clearly have more skills, you should develop a good tester for this. I think you are on track but should avoid the VIC and blank the screen. Just use the CIAs and fire interrupts at a particulars time to measure when the IRQ code runs via inspection of the CIA timers; also check the stack contents to see what return address is placed there. Also keep in mind about the clock stretching when accessing any IO locations.
The zex instruction exerciser only tests basic ALU and memory operations. It doesn't cover everything, so knowing how soon an interrupt is processed after a EI, or if an interrupt is stopped immediately after a DI is useful.
Running repetitive instructions and measuring the overall time difference and then dividing by the number of instructions can also give us an idea if we have the delays correct in emulation. I saw some test program that shows LDIR is slower on vice than a real machine (https://csdb.dk/release/?id=170651). I'm not sure why as I've checked the numbers, but there are a lot of z80 variants out there so I'm not is they are the same as the one in the c128.
I'm still not 100% sure how the z80 is handled timing in VICE. This area of the system is new to me so I have to look into if further.
I have also been puzzled how Z80 raster interrupts work. I suspect that some kind of regularization happens, but I don't fully understand how it works. I think it may happen during bad lines.
What comes to clock stretching, adding or removing one Z80 cycle may or may not make a difference in 1 MHz I/O output, but there are no delays. See that earlier rastertimingtest.asm source - there are two interrupt routines, which are 1 Z80 cycle apart, but produce the same output on real hardware. No delays - OUTs take 12 Z80 cycles = 6 1MHz cycles. Good that you mentioned it - I think I recognized the cause of the timing error. rastertimingtest.prg has 3 OUTs in the loop, and I identifier 4 (1MHz) cycle delay in output. Similarly, testcpuswitch has 2 OUTs in the loop, and 2 (1MHz) cycle delay in output! So, if you tried what happens if the delay with OUTs is removed?
(Edit. 3 OUTs in the loop part, no delays associated with them, so just a clean calculation of 126 cycles (PAL) for non-bad raster lines.)
When EI is executed, interrupt is executed after the instruction following EI. So for example if
is coded, interrupt can be processed after RET, not before.
(Edit: after RET, in place of after EI)
Interrupts are checked after execution of every instruction, so that is to be understood that immediately after DI interrupt is not processed.
http://www.z80.info/interrup.htm
Last edit: Jussi Ala-Könni 2024-03-25
Z80 CycleTimer gives 10.50634770 cycles per byte for a real C128, which is very close to the theoretical value, which is 10.5. Which again confirms that number of cycles can be taken "from the book". I am not aware of any differences of cycles taken by Z80 CPUs of different batches.
Z80 instruction breakdown:
http://www.z80.info/z80ins.txt
DI and EI behavior described (p. 21):
http://www.z80.info/zip/z80-documented.pdf
Actually, series of EI instructions cannot be interrupted; when series of EI instructions is programmed, interrupts are enabled only after the next instruction after EI. This has been described somewhere, I can try to find the source, but I wrote a test program for this.
Interrupt is triggered in the middle of series of EI instructions; there is 10 EI instructions still to be executed, which make 20 1MHz cycles, plus the res instruction which makes 12 more, until the interrupt is processed.
Z64K processes the interrupt in the middle of EI instruction series, which is incorrect, giving the value FFC1h = -63. x128 apparently processes one or two more EI, giving FFBDh = -67.
Real hardware with real Z80 gives FFA1h = -95. The difference FFC1h-FFA1h = 20h = 32 agrees with the calculation above.
So, to repeat: series of EI instructions cannot be interrupted, interrupts are processed only after the next instruction after the series. Years ago I wrote my own Z80 emulation engine and implemented it in this way, since I saw it so described. Nice to see it confirmed.
So, both VICE and Z64K are incorrect here.
Last edit: Jussi Ala-Könni 2024-03-25
Back to back EIs results in interrupts being delayed until one instruction after the last EI.
See:
https://floooh.github.io/2021/12/06/z80-instruction-timing.html
and
http://www.visual6502.org/JSSim/expert-z80.html?a=0&d=ed56fbfbfbfbfbfbfbfbfbfbfbfbfbfbfbfbfb0000000000000000&a=38&d=c9&int0=48&steps=200&graphics=false
So this doesn't run like you think it does. But I'm close to your results; I get FF9F right now.
I've been doing a deep dive on the current state of the code and how the z80 clock relates to the 1MHz clock. I've been able to get closer to the LDIR test as the cycles counts weren't correct for that instruction in VICE, but I can't seem to get a firm grasp on the clock stretching. As such, I would prefer if you could avoid using any OUT/IN instructions at this point. Try to just use registers, ie. a series of increments to see what the register value should be in the interrupt service routine. I know if makes things more challenging, but right now I don't know the effects of clock stretching on the 1MHz cycle count. From what I can see with the LDIR test, I am off by 1 cycle without clock stretching which is impossible. So either the cycle counts are wrong or something weird is happening when the z80 is running code. Maybe they do the z80 refresh during a 8502 refresh cycle.
Another way to maybe do the cycle counting is to use the 8502 to setup and read the timer at 1MHz so we can avoid the z80 clock stretching entirely.
I suggest you check cycle count accuracy first, the test disables VIC screen, then there shouldn't be anything weird. If this is any clue, Z80 can be freely combined, for example, 7 +7 Z80 cycles executes at the same speed as 10 + 4 (judging on the basis of VIC display stability). There is no "jerkiness" or "rounding" to more coarse 1 MHz cycles, if you understand what I mean.
So far, everything looks right, but as I said, I'm 1 cycle off without stretching. There are a lot of moving parts in VICE so it helps to have testers that focus on one thing. By removing clock stretching from the measurements, I can determine if the cycle count is proper and then move on from there.
The most reliable way is to use the CIA timer on the 8502 and avoid the VIC for anything as x128 is not cycle exact, so anything with the raster may be an issue elsewhere. I'm trying just to focus on the z80 right now.
Okay. I committed another patch that fixed the LDIR test and gets better results in all of your tests. See r45044.
It seems the Z80 doesn't do clock stretching as all of the memory and IO operations work at 1MHz.
At the moment r45044 is still delayed and not available for download.
Ugh. Trailing white space. Get r45045.
I'm thinking the timing issues now are based on when the timer is read in the IN execution. VICE emulates CPUs one instruction at a time; the Z80 on the c128 is clocked at 2MHz, but the memory and IO are at 1MHz. So depending on when the instruction "runs" with respect to the read from the CIA may be the problem.
To test this, if you can add an "odd" number of cycle delays before the IN on measurements that are currently correct, it should make the measurements wrong versus real hardware.