SDCC uses this pattern for the __critical sections:
ld a, i
di
push af
...
pop af
jp po, 1$
ei
1$:
but this implementation has severe problem on the real hardware: when interrupt happens on the ld a,i instruction, PO condition is met, and that leaves interrupts enabled, so the next halt halts forever.
manual states:
LD A,I
P/V contains contents of IFF2
If an interrupt occurs during execution of this instruction, the Parity flag contains a 0.
and i can confirm, it really does. in the current gbdk-2020 library implementation that pattern is used to protect the Master System / MSX VDP writes, and that is frequently enough to trigger the bug in games.
i suggest simpler sm83 approach for the __critical sections.
looking forward for the fix.
probably, something like this is possible, but arghhh....
this is so sloooow. and, in theory, not a 100% guarantee.
Last edit: Tony Pavlov 2024-03-30
Even that might fail, if there are two interrupts quickly after each other. I guess we'd need something like this (i.e. check if the ld a, i was interrupted, if yes, repeat):
There is an earlier report about this, but from that one, it seemed that only NMOS Z80 were affected: [feature-requests:#476].
Related
Feature Requests:
#476Last edit: Philipp Klaus Krause 2024-04-02
Yes, back then I had wrong info about this.
Anyway I remember reading on some MSX forum of a possible solution that involved zeroing and then checking the top of the stack to see if an interrupt happened while saving the IF flag status. Not sure I can find that topic again...
Related
Feature Requests:
#476That is basically what sdcc currently does for
--nmos-z80. However, even that can fail, if the ld a, i happens to be code at an address < 256. That's why I suggested to use the label l1, so instead of zeroing the top of the stack, we set it to a value known to be different from the upper byte of l1.Last edit: Philipp Klaus Krause 2024-04-02
Do we have a source (which could be experiments one of you has done) that CMOS Z80 are definitely affected?
also, is not P/V affected by
cp a, #>(l1+256), while we need zero for thepush affor the correct critical section exit processing?If the result of the
cpis 0, and we thus proceed to thepush af, there was no overflow in the subtraction, and thus P/V is set to 0, as we need.ah, correct. we retry if IFF2==0 arguing with the actual value below stack, not the opposite. but probably not loop there but just transfer NZ condition to P/V flag, is there faster code?
Sorry, I don't understand what you meant here. Are you suggesting to not retry? The NZ condition only tells us that an interrupt happened, not what the state of IFF2 was: the interrupt that resulted in getting to the NZ condition might have been an NMI.
As i get the problem, when doing
ld a, i,PEcondition, if met, is always correct, butPOcondition may not reflect the real situation.By checking the byte below the
SP, you are checking ifPOcondition above was wrong or not. So, you don't have to loop, you need to setPEcondition in flags, if bytes not match, and setPOcondition in flags, when they match.after that at the point of
l2:PEflag indicate "interrupts enabled",POindicate "interrupts disabled". You always DI just afterl2:.Last edit: Tony Pavlov 2024-04-02
By stack probing (i.e. checking that byte) I can check if an interrupt occured, which might have resulted in the PE being wrong. But the stack probing will not tell me reliably that the PE was wrong. So when the stack probing tells me there was an interrupt that might have resulted in the wrong PE result, the correct thing to do is to redo the ld a, i.
Here is the
___sdcc_critical_enterby Sergey Belyashov. It is working correctly and there is no loop. But call/ret is slow, we need the same but inline, because on the SMS/MSX-like systems you are typically protecting only the couple of port writes. So we need to make it position-independent (that#>(l1+256)trick above).combination of the both approaches may be:
jpis faster on Z80, so use themLast edit: Tony Pavlov 2024-04-02
While Sergey's code is what SDCC currently uses for
--nmos-z80, I now don't think it is fully correct. I think it has two bugs:1) Possible false positive: Consider the following case: Code placed at address >= 256. Interrupts are disabled, we enter that code. Just after the
ld a, ia NMI happens. PE will be false, but the or a will give a nonzero value. Thus the function will return true, and at the end of the critical section, interrupts will be enabled, even though they were disabled before.2) Possible false negative: If that code happens to be placed at an address < 256, and interrupts are enabled, we enter that code, a normal interrupt happens at
ld a, i, PE will be false, but the or a will give a zero value (since that stack byte contains the upper byte of the return address for the interrupt routine, which is 0 fr an address < 256). So after the critical section, interrupts will be disabled, though they were enabled before.Regarding the call/ret overhead: that is just a code speed / size trade-off, so I guess the variant should be chosen according to
--opt-code-sizevs.--opt-code-speed.yes, you are right about the NMI. looks like looping is the only option.
Do we know, which Z80 variants are affected, other than z80? r2k, r2ka and r3ka use a differnet approach, and should be fine. I've seen a claim that r800 is not affected (https://www.msx.org/forum/msx-talk/development/do-not-use-ld-ai-read-interrupt-enabled-state?page=0). Neither z180 nor ez80_z80 documentation has the "If an interrupt occurs during execution of this instruction, the Parity flag contains a 0." sentence, that z80 documentation has. z80n aims for bug-for-big compatibility with z80, so I assume that it is affected. sm83 and tlcs90 have only plain di/ei critical sections.
z80 with --nmos-z80 already uses something similar to what I proposed above.
actually, there is no much sense in knowing which z80 variants are affected from the point of writing the user code. because if you build for some target device, like ZX Spectrum, you never know which type of the CPU your code actually will run on, not even know if it is genuine Zilog or whatever obscure clones including something like the soviet Т34ВМ1. So more "strict" checking should probably be default.
Well, if the user compiles for the z80 target, we will have to do the correct thing for Z80 (indeed we do not know which Z80-compatible system the code will run on). But if e.g. Z180 and Z180compatible devices all reliably report IFF2 in PE, we can still use the old, shorter code for users that compile for the z180 target.
well, is not user already pass his intention for z180 code with the
-mz180and there is no need for the additional flag--nmos-z80neither its inverted version if you add one.so, my suggestion is that
--nmos-z80(not the exact current call, but some optimized approach we discuss above!) should be the default behavior for the z80 systems, and the opposite behavior flag is needed instead, if the user is sure that his exact Z80 is 100% safe.Last edit: Tony Pavlov 2024-04-02
Yes, removing
--nmos-z80, making the safe version the default for z80 is the way forward. I still need to know which variants are affected, so I know if the "safe" version has to become the default for some of z180, ez80_z80, z80n, r800, too.also on page 2 (3-130) starting from
"Q: I don't seem to get correct state"
http://z80.info/zip/ZilogProductSpecsDatabook129-143.pdf
Hmm. According to that document, only NMOS Z80 should be affected.
Calindro - emulicious's developer- wrote that it also affect CMOS version of some Game Gear devices
Just asked a person who was present in the discussions when Next's CPU was designed, apparently they decided to go with CMOS there, as then less code was needed for every check, and no known code depended on the older behavior.