Menu

#3721 Z80 Critical section implementation is faulty

open
nobody
None
Z80
5
2024-04-02
2024-03-30
Tony Pavlov
No

SDCC uses this pattern for the __critical sections:

    ld a, i
    di
    push af
...
    pop af
    jp po, 1$
    ei
1$:

but this implementation has severe problem on the real hardware: when interrupt happens on the ld a,i instruction, PO condition is met, and that leaves interrupts enabled, so the next halt halts forever.

manual states:

LD A,I
P/V contains contents of IFF2
If an interrupt occurs during execution of this instruction, the Parity flag contains a 0.

and i can confirm, it really does. in the current gbdk-2020 library implementation that pattern is used to protect the Master System / MSX VDP writes, and that is frequently enough to trigger the bug in games.

i suggest simpler sm83 approach for the __critical sections.

looking forward for the fix.

Related

Feature Requests: #476

Discussion

  • Tony Pavlov

    Tony Pavlov - 2024-03-30

    probably, something like this is possible, but arghhh....

        ld a, i
        push af
        ld a, i
        di
        push af
        pop de       ; put F to E
        ld hl, #0
        add hl, sp   ; hl points stored F on stack
        ld a, e
        or (hl)      ; construct conjuncted flags
        ld (hl), a   ; update the value on stack
    ...
        pop af
        jp po, 1$
        ei
    1$:
    

    this is so sloooow. and, in theory, not a 100% guarantee.

     

    Last edit: Tony Pavlov 2024-03-30
    • Philipp Klaus Krause

      Even that might fail, if there are two interrupts quickly after each other. I guess we'd need something like this (i.e. check if the ld a, i was interrupted, if yes, repeat):

      l0:
        ld a, #>(l1+256)
        push af
        pop af
        ld a, i
      l1:
        jp pe, l2
        dec sp
        dec sp
        pop af
        cp a, #>(l1+256)
        jr nz, l0
      l2:
        push af
      …
        pop af
        jp op, l4
        ei
      l4:
      

      There is an earlier report about this, but from that one, it seemed that only NMOS Z80 were affected: [feature-requests:#476].

       

      Related

      Feature Requests: #476


      Last edit: Philipp Klaus Krause 2024-04-02
      • sverx

        sverx - 2024-04-02

        There is an earlier report about this, but from that one, it seemed that only NMOS Z80 were affected: [feature-requests:#476].

        Yes, back then I had wrong info about this.

        Anyway I remember reading on some MSX forum of a possible solution that involved zeroing and then checking the top of the stack to see if an interrupt happened while saving the IF flag status. Not sure I can find that topic again...

         

        Related

        Feature Requests: #476

        • Philipp Klaus Krause

          That is basically what sdcc currently does for --nmos-z80. However, even that can fail, if the ld a, i happens to be code at an address < 256. That's why I suggested to use the label l1, so instead of zeroing the top of the stack, we set it to a value known to be different from the upper byte of l1.

           
          👍
          1

          Last edit: Philipp Klaus Krause 2024-04-02
        • Philipp Klaus Krause

          Do we have a source (which could be experiments one of you has done) that CMOS Z80 are definitely affected?

           
      • Tony Pavlov

        Tony Pavlov - 2024-04-02
        1. about the two interrupts happening quickly one after the other. that is unlikely happen, more often situation is when the interrupt is pending. but that should not be a problem, because EI/RETI makes pending interrupt fire immediately after return and no instructions are executed in main. so, there is probability, but very very slight. NMI should not be the problem either, as i get from the http://www.myquest.nl/z80undocumented/z80-documented-v0.91.pdf
        2. your code is missing DI, which position is important. DI must always be present, that's the whole point of the critical section, EI must be conditional, depending on the state of the IFF2.
         
      • Tony Pavlov

        Tony Pavlov - 2024-04-02

        also, is not P/V affected by cp a, #>(l1+256), while we need zero for the push af for the correct critical section exit processing?

         
        • Philipp Klaus Krause

          If the result of the cp is 0, and we thus proceed to the push af, there was no overflow in the subtraction, and thus P/V is set to 0, as we need.

           
          • Tony Pavlov

            Tony Pavlov - 2024-04-02

            ah, correct. we retry if IFF2==0 arguing with the actual value below stack, not the opposite. but probably not loop there but just transfer NZ condition to P/V flag, is there faster code?

             
            • Philipp Klaus Krause

              Sorry, I don't understand what you meant here. Are you suggesting to not retry? The NZ condition only tells us that an interrupt happened, not what the state of IFF2 was: the interrupt that resulted in getting to the NZ condition might have been an NMI.

               
              • Tony Pavlov

                Tony Pavlov - 2024-04-02

                As i get the problem, when doing ld a, i, PE condition, if met, is always correct, but PO condition may not reflect the real situation.

                By checking the byte below the SP, you are checking if PO condition above was wrong or not. So, you don't have to loop, you need to set PE condition in flags, if bytes not match, and set PO condition in flags, when they match.

                after that at the point of l2: PE flag indicate "interrupts enabled", PO indicate "interrupts disabled". You always DI just after l2:.

                 

                Last edit: Tony Pavlov 2024-04-02
                • Philipp Klaus Krause

                  By stack probing (i.e. checking that byte) I can check if an interrupt occured, which might have resulted in the PE being wrong. But the stack probing will not tell me reliably that the PE was wrong. So when the stack probing tells me there was an interrupt that might have resulted in the wrong PE result, the correct thing to do is to redo the ld a, i.

                   
                  👍
                  1
                  • Tony Pavlov

                    Tony Pavlov - 2024-04-02

                    Here is the ___sdcc_critical_enter by Sergey Belyashov. It is working correctly and there is no loop. But call/ret is slow, we need the same but inline, because on the SMS/MSX-like systems you are typically protecting only the couple of port writes. So we need to make it position-independent (that #>(l1+256) trick above).

                    ___sdcc_critical_enter::
                        xor  a
                        push af
                        pop  af
                        ld   a, i
                        di
                        ret  pe  ;enabled interrupts
                        dec  sp
                        dec  sp
                        pop  af
                        or   a   ;A = 0 if interrupts disabled
                        jr   NZ, 00100$
                    ;inetrrupts disabled
                        sub  a   ;force P/V = 0
                        ret
                    ;interrupts enabled
                    00100$:
                        xor  a   ;force P/V = 1
                        ret
                    

                    combination of the both approaches may be:

                        ld   a, #>(l1+256)
                        push af
                        pop  af
                    l1:
                        ld   a, i
                        di
                        jp   pe, l3  ; jump if interrupts enabled
                        dec  sp
                        dec  sp
                        pop  af
                        sub  #>(l1+256)
                        jp   NZ, l2
                    ; inetrrupts disabled
                        jp   l3      ; if ZF then P/V = 0
                    ; interrupts enabled
                    l2:
                        xor a, a     ; force P/V = 1
                    l3:
                        push af
                    
                    ; critical section payload
                    ... 
                    
                        pop af
                        jp op, l4
                        ei
                    l4:
                    

                    jp is faster on Z80, so use them

                     

                    Last edit: Tony Pavlov 2024-04-02
                    • Philipp Klaus Krause

                      While Sergey's code is what SDCC currently uses for --nmos-z80, I now don't think it is fully correct. I think it has two bugs:
                      1) Possible false positive: Consider the following case: Code placed at address >= 256. Interrupts are disabled, we enter that code. Just after the ld a, i a NMI happens. PE will be false, but the or a will give a nonzero value. Thus the function will return true, and at the end of the critical section, interrupts will be enabled, even though they were disabled before.
                      2) Possible false negative: If that code happens to be placed at an address < 256, and interrupts are enabled, we enter that code, a normal interrupt happens at ld a, i, PE will be false, but the or a will give a zero value (since that stack byte contains the upper byte of the return address for the interrupt routine, which is 0 fr an address < 256). So after the critical section, interrupts will be disabled, though they were enabled before.

                      Regarding the call/ret overhead: that is just a code speed / size trade-off, so I guess the variant should be chosen according to --opt-code-size vs. --opt-code-speed.

                       
                      👍
                      1
                      • Tony Pavlov

                        Tony Pavlov - 2024-04-02

                        yes, you are right about the NMI. looks like looping is the only option.

                         
  • Philipp Klaus Krause

    Do we know, which Z80 variants are affected, other than z80? r2k, r2ka and r3ka use a differnet approach, and should be fine. I've seen a claim that r800 is not affected (https://www.msx.org/forum/msx-talk/development/do-not-use-ld-ai-read-interrupt-enabled-state?page=0). Neither z180 nor ez80_z80 documentation has the "If an interrupt occurs during execution of this instruction, the Parity flag contains a 0." sentence, that z80 documentation has. z80n aims for bug-for-big compatibility with z80, so I assume that it is affected. sm83 and tlcs90 have only plain di/ei critical sections.
    z80 with --nmos-z80 already uses something similar to what I proposed above.

     
    • Tony Pavlov

      Tony Pavlov - 2024-04-02

      actually, there is no much sense in knowing which z80 variants are affected from the point of writing the user code. because if you build for some target device, like ZX Spectrum, you never know which type of the CPU your code actually will run on, not even know if it is genuine Zilog or whatever obscure clones including something like the soviet Т34ВМ1. So more "strict" checking should probably be default.

       
      • Philipp Klaus Krause

        Well, if the user compiles for the z80 target, we will have to do the correct thing for Z80 (indeed we do not know which Z80-compatible system the code will run on). But if e.g. Z180 and Z180compatible devices all reliably report IFF2 in PE, we can still use the old, shorter code for users that compile for the z180 target.

         
        • Tony Pavlov

          Tony Pavlov - 2024-04-02

          well, is not user already pass his intention for z180 code with the -mz180 and there is no need for the additional flag --nmos-z80 neither its inverted version if you add one.

          so, my suggestion is that --nmos-z80(not the exact current call, but some optimized approach we discuss above!) should be the default behavior for the z80 systems, and the opposite behavior flag is needed instead, if the user is sure that his exact Z80 is 100% safe.

           

          Last edit: Tony Pavlov 2024-04-02
          • Philipp Klaus Krause

            Yes, removing --nmos-z80, making the safe version the default for z80 is the way forward. I still need to know which variants are affected, so I know if the "safe" version has to become the default for some of z180, ez80_z80, z80n, r800, too.

             
  • Przemek Sulikowski

    also on page 2 (3-130) starting from
    "Q: I don't seem to get correct state"
    http://z80.info/zip/ZilogProductSpecsDatabook129-143.pdf

     
    • Philipp Klaus Krause

      Hmm. According to that document, only NMOS Z80 should be affected.

       
      • Przemek Sulikowski

        Calindro - emulicious's developer- wrote that it also affect CMOS version of some Game Gear devices

         
      • Janko Stamenović

        Just asked a person who was present in the discussions when Next's CPU was designed, apparently they decided to go with CMOS there, as then less code was needed for every check, and no known code depended on the older behavior.

         

Log in to post a comment.

MongoDB Logo MongoDB