Small Device C Compiler (SDCC) / Feature Requests / #624 Atomics

Description has changed:

Diff:

--- old
+++ new
@@ -1,13 +1,13 @@
 SDCC currently does not have any official support for atomics. Some operations on some operands have traditionally been atomic, and we see bugs reports when that changes.

-It might make snese to (partially) support atomics as in the C standard:
+It might make sense to (partially) support atomics as in the C standard:
 1) For communication between the main program and interrupt handlers
 2) For inter-thread communication on multicore devices (SDCC does not yet support such devices).
 3) For inter-thread communication when using some timesharing OS.

 The two most basic types are sig_atomic_t and atomic_flag.

-To me it seems, we coould (as I have done in the post-3.9 branhc) just make sig_atomic_t a typedef to unsigned char.
+To me it seems, we could (as I have done in the post-3.9 branhc) just make sig_atomic_t a typedef to unsigned char.
 atomic_flag, on the other hand will require a test_and_set instruction. On many targets, we could implement using an inverted 0/1 value using a right shift with indirect addressing mode. Or we could just use plain exchange, where available.

 This would give us atomic_flag on stm8, z80 and related, hc08 and s08. For mcs51, we could have atomic_flag in iram (which seems okay to me, we&#39;d just place every atomic_flag in iram, even if the memory model is different, and emit an error if someone e.g. tries to put it explicitly into xram).

Group: -->

Philipp Klaus Krause - 2024-06-09

We currently support volatile sig_atomic_t for all our ports. It is useful, and should be sufficient for typical use when there is only one hardware thread, and the normal program is being interrupted by an interrupt.

However, once it gets more complex (i.e. multiple hardware threads or interrupt handlers that can interrupt each other), more atomics are needed for some uses. And even for the simple case of just one interrupt handler, they can be useful to more naturally implement some algorithms.

In C, an atomic type if "lock-free", if it can be used in signal handlers. For SDCC, I usually consider interrupt handlers to be the closest match to C's signal handlers, and try to make interrupt handlers work in the way C mandates for signal handlers.

SDCC does not currently support multiple hardware threads, though we might want to do so in the future (in particular for the Padauk ports). But many of our targets have interrupt priorities where interrupt handlers can themselves be interrupted (e.g. mcs51 and z80).

C mandates that atomic_flag is lock-free. We currently support it well for most ports, and for some have limited support (for mcs51, atomic_flag needs to be in __idata, which is a slight non-compliance if the heap is in __xdata, as then memory from the heap cannot be used as atomic_flag; there are C2Y proposals to allow other character arrays to be used as atomic_flag, which would add to that problem ).

It would be possible to improve this situation: I reply to my earlier concerns about the implementability of atomic_flag on mcs51 and Padauk for future C version, Hans Boehm proposed a restartable sequence implementation for atomic_flag for the case of a single hardware thread. I've since further refined the approach to eliminate its interrupt latency burden, and generalized it to support more lock-free atomics:

Still, this works only for a single hardware thread. Let's look at mcs51 for now, where we would need these 3 asm functions

atomic_exchange_xdata_impl: 0: movx a, @dptr 1a: xch a, r0 2a: movx @dptr, a 3: ret atomic_compare_exchange_xdata_impl: 0: movx a, @dptr 1a: cjne a, ar1, 4 2a: xch a, r0 3a: movx @dptr, a 4: ret atomic_compare_exchange_idata_impl: 0: mov a, @r0 1a: cjne a, ar1, 4 2a: xch a, r2 3a: movx @r0, a 4: ret

These would be used to implement atomic_exchange and atomic_compare_and_exchange primitives that in turn would be used to implement atomic_flag, and lock-free atomics of size up to 8, i.e. we'd get lock-free atomic types for char, _BitInt(7), bool, etc. Once we have atomic_exchange and atomic_compare_and_exchange, we can implement all other functionality for atomics.

At the end of an interrupt handler, we take the return address from the stack, and compare it against all labels marked with labels that end in a above. If we are at one of these, we need to get registers values from the stack, undo the effect of the xch there, and set the stored pc back to corresponding label 0.

This is clearly an extra overhead, but:
1) It happens at the end of the interrupt handler, which is less critical than it would be at the beginning.
2) static analysis could prove that a given interrupt handler does not use some of these, in which case the extra code at the end of the interrupt handler could be left out.

P.S.: The asm functions only require an atomic store instruction of suitable size. For most ports we only have these up to 8 bits, but on some ports we have 16-bit ones, and use 16-bit pointers. I.e. we could have lock-free atomic pointer types, which would allow the use of a lot of non-blocking data structures. On the other hand, we'd then have to potentially check for even more pc values at the end of signal handlers.

P.P.S.: It would be good, if we could guarantee that those asm functions are all within a single 256B-aligned block of 256B, so the pc comparison can be done cheaper (i.e compare the upper byte once, then if it matches compare the lower byte against multiple values).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-06-09
  
  Looks like my idea is essentially what NetBSD already did 20 years ago:
  https://www.usenix.org/legacy/events/usenix2003/tech/freenix03/full_papers/mcgarry/mcgarry_html/
  
  P.S.: The first proposal to use restartable sequences to implement atomics is AFAIK "Fast mutual exclusion for uniprocessors" by Bershad et alii from 1992: https://dl.acm.org/doi/10.1145/143365.143523
  
  Last edit: Philipp Klaus Krause 2024-06-10
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Maarten Brock - 2024-06-10
  
  Maybe we should try to keep those restartable functions in a single area and at power-of-two offsets (8?). The return address check can then be against the lower and upper bound and a simple AND operation to restart. Also what to roll back can be easier if the same registers are used at the same offsets.
  
  For mcs51 things can be more difficult. The pointer can be a generic pointer which first requires finding out the memory space. This can be done in a wrapper that calls the appropriate restartable functions. And the pointer can also point to __pdata which is the default for the medium memory model.
  
  atomic_exchange_gptr_impl: jnb B.6, atomic_exchange_xdata_impl mov r0, dpl jb B.5, atomic_exchange_pdata_impl ; sjmp atomic_exchange_idata_impl atomic_exchange_idata_impl: mov a, r2 xch a, @r0 ret atomic_compare_exchange_gptr_impl: jnb B.6, atomic_compare_exchange_xdata_impl mov r0, dpl jb B.5, atomic_compare_exchange_pdata_impl ; sjmp atomic_compare_exchange_idata_impl .bndry 8 ; does not work yet! atomic_compare_exchange_idata_impl: 0: mov a, @r0 ; 1 B 1a: cjne a, ar1, 4 ; 3 B 2a: xch a, r2 ; 1 B 3a: mov @r0, a ; 1 B 4: ret ; 1 B nop ; 1 B ;===== ; 8 B atomic_compare_exchange_pdata_impl: 0: movx a, @r0 ; 1 B 1a: cjne a, ar1, 4 ; 3 B 2a: xch a, r2 ; 1 B 3a: movx @r0, a ; 1 B 4: ret ; 1 B nop ; 1 B ;===== ; 8 B atomic_compare_exchange_xdata_impl: 0: movx a, @dptr ; 1 B 1a: cjne a, ar1, 4 ; 3 B 2a: xch a, r2 ; 1 B 3a: movx @dptr, a ; 1 B 4: ret ; 1 B nop ; 1 B ;===== ; 8 B atomic_exchange_pdata_impl: 0: movx a, @r0 ; 1 B 1a: nop nop nop ; 3 B 2a: xch a, r2 ; 1 B 3a: movx @r0, a ; 1 B 4: ret ; 1 B ;===== ; 8 B atomic_exchange_xdata_impl: 0: movx a, @dptr ; 1 B 1a: nop nop nop ; 3 B 2a: xch a, r2 ; 1 B 3a: movx @dptr, a ; 1 B 4: ret ; 1 B ;===== ; 8 B
  
  The mcs51 also has __bit variables which can be accessed with JBC to both test and clear the flag. Unfortunately they cannot be accessed indirectly.
  
  Last edit: Maarten Brock 2024-07-10
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Philipp Klaus Krause - 2024-06-10
    
    Isn't __pdata just a subspace of __xdata, so we don't need separate functions for it?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Maarten Brock - 2024-06-10
      
      Well, yes and no. __pdata does lie somewhere in __xdata space. But a (single byte) pdata pointer does not hold the page. And a (3 bytes) generic pointer does not either.
      
      It is expected instead that the page selection is active with the high byte of s_PSEG in __XPAGE (see crtxclear.asm) for MOVX @Ri to work.
      
      Last edit: Maarten Brock 2024-07-10
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Philipp Klaus Krause - 2024-06-10
    
    Actually, having thought about it a little bit more, it gets much easier, if we never have to roll back, and can just set the pc back. We do that by having a spare copy that we can use to restore register values, e.g.:
    
    atomic_exchange_xdata_impl: 0: mov r0, ar2 1: movx a, @dptr 2a: xch a, r0 3a: movx @dptr, a 4: ret
    
    If the pc is at 2a or 3a, we set it back to 0, and the instruction at 0 will do all the work.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Maarten Brock - 2024-06-10
      
      R0 should be R3 or higher IMHO. See the pdata and _compare_ variants.
      
      This will require 16 bytes aligned routines and maybe that would finally be the first opportunity for SDCC to actually emit the XCHD A,@Ri instruction.
      
      ; roll back PC push ar1 push acc mov r1, SP dec r1 dec r1 dec r1 ; mov a,#0xF0 ; anl a,@r1 ; mov @r1,a clr a xchd a, @r1 ; clear low nibble of return address pop acc pop ar1
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Philipp Klaus Krause - 2024-06-11
        
        To enforce the alignment, I guess we could just emit this at the end of the IVT (prepending nops as necessary to avoid the routines crossing a 256B-boundary)?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Based on Maarten's code sample above, I now have this proposal, which should work and be reasonably efficient:

; Align the start of these 5 functions to an 8B boundary. Possibly by
; putting the code at the end of the IVT. Ensure that no 256B boundary
; is crossed.

atomic_exchange_rollback_impl:

atomic_exchange_pdata_impl:
0:  movx a, @r0                                   ; 1 B
1a: mov  r3, a                                    ; 1 B
2a: nop nop                                       ; 2 B
3a: mov  a, r2                                    ; 1 B
4a: movx @r0, a                                   ; 1 B
5:  mov  a, r3                                    ; 1 B
6:  ret                                           ; 1 B
                                                  ;=====
                                                  ; 8 B

atomic_exchange_xdata_impl:

0:  movx a, @dptr                                 ; 1 B
1a: mov  r3, a                                    ; 1 B
2a: nop nop                                       ; 2 B
3a: mov  a, r2                                    ; 1 B
4a: movx @dptr, a                                 ; 1 B
5:  mov  a, r3                                    ; 1 B
6:  ret                                           ; 1 B
                                                  ;=====
                                                  ; 8 B                                                

atomic_compare_exchange_idata_impl:
0:  mov  a, @r0                                   ; 1 B
1a: cjne a, ar2, 4                                ; 3 B
2a: mov  a, r3                                    ; 1 B
3a: mov @r0, a                                    ; 1 B
4:  ret                                           ; 1 B
5:  nop                                           ; 1 B
                                                  ;=====
                                                  ; 8 B

atomic_compare_exchange_pdata_impl:
0:  movx a, @r0                                   ; 1 B
1a: cjne a, ar2, 4                                ; 3 B
2a: mov  a, r3                                    ; 1 B
3a: movx @r0, a                                   ; 1 B
4:  ret                                           ; 1 B
5:  nop                                           ; 1 B
                                                  ;=====
                                                  ; 8 B

atomic_compare_exchange_xdata_impl:
0:  movx a, @dptr                                 ; 1 B
1a: cjne a, ar2, 4                                ; 3 B
2a: mov  a, r3                                    ; 1 B
3a: movx @dptr, a                                 ; 1 B
4:  ret                                           ; 1 B
                                                  ;=====
                                                  ; 7 B

; These are normal library functions, to be placed by the linker:

; Store value in r2 into byte at b:dptr, return previous byte at b:dptr in a.
; Overwrites r0, r2, r3.
atomic_exchange_gptr_impl:
    jnb B.6, atomic_exchange_xdata_impl
    mov r0, dpl
    jb  B.5, atomic_exchange_pdata_impl
atomic_exchange_idata_impl:
    mov a, r2
    xch a, @r0
    ret

; If the value of the byte at b:dptr is the value of r2, store the value
; of r3 into that byte. Return the new value of that byte in a.
; Overwrites r0, r2, r3.
atomic_compare_exchange_gptr_impl:
    jnb B.6, atomic_compare_exchange_xdata_impl
    mov r0, dpl
    jb  B.5, atomic_compare_exchange_pdata_impl
    sjmp atomic_compare_exchange_idata_impl

; Rollback restartable sequence at end of interrupt handler (version for
; non-critical handler and stack in __idata).
maybe_rollback:
    push ar0
    mov  r0, SP
    dec  r0
    cjne @r0, #>atomic_exchange_rollback_impl, outer_skip
    dec  r0
    cjne @r0, #40, here0
here0:
    jc   outer_skip
    ; we now know we are somewhere among the restartable implementations of
    ; atomic functions.
    push acc
    mov  a, @r0
    and  a, #0x07
    dec  a
    cjne @r0, #6, here1
here1:
    jc   inner_skip
    ; we actually need to restart.
    mov  a, @r0
    and  a, #0xf8
    mov  @r0, a
inner_skip:
    pop acc
outer_skip:
    pop ar0
    reti

We'll still need a version of the rollback for xstack, and see how we'll actually use those atomic_exchange_gptr_impl and atomic_compare_exchange_gptr_impl from libary functions and codegen. We also need actual compiler support for atomic types.

IMO, we can put the above into SDCC soon, and start using it for atomic_flag before we introduce support for more atomic types.

Last edit: Maarten Brock 2024-06-12

Maarten Brock - 2024-06-12

For mcs51 the call stack is always in __idata. Only parameters and local variables can go to __pdata for xstack.

Although atomic_exchange_gptr_impl and atomic_exchange_idata_impl need not be placed at 8 byte offsets, they do need to be close to atomic_exchange_pdata_impl and atomic_exchange_xdata_impl since the jump on bit instructions use relative addressing. I suggest to keep them in the same asm file. The same goes for the _compare_ variants.

The compare against #40 assumes that this code is 256 bytes aligned, not just 8 bytes aligned.

cjne @r0, #6, here1 here1: jc inner_skip

Needs to be:

cjne a, #6, here1 here1: jnc inner_skip
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-06-12
  
  I see. Let's make that #40 into #<atomic_exchange_rollback_impl+40, and put those atomic_exchange_gptr_impl and atomic_exchange_idata_impl before or after the rest (whichever makes sense to keep code mem usage low - depends on the number of interrupt vectors).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2024-07-10

This first implementation passes mcs51 regression tests.

Further steps:
0) Someone else having a look.
1) Port it to ds390.
2) Implement basic check for interrupt handlers not using any atomics (on those handlers we then omit the rollback check).
3) Port it to the pdk ports.
4) Implement support for the _Atomicqualifier in the front-end, and support more atomic types throughout the compiler (to be done much later).

P.S.: I think this passed regressions tests for the huge memory model of mcs51 by chance only.

Last edit: Philipp Klaus Krause 2024-07-10

mcs51-atomic.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-07-10
  
  Updated patch. Also contains an attempted port to ds390. Still fails for ds390 and for the huge memory model of mcs51.
  
  Last edit: Philipp Klaus Krause 2024-07-10
  
  mcs51-ds390-atomic.patch
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Philipp Klaus Krause - 2024-07-10
    
    ds390 should be fixed now, and the check for handlers that don't use atomics is implemented, too. Still fails regression tests for mcs51 huge memory model.
    
    mcs51-ds390-atomic2.patch
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maarten Brock - 2024-07-10

First review:
Comment atomic_flag_test_and_set.c - C run-time: C11 atomic flag is wrong for the rollback.
Leaving out the default areas is dangerous though less so for libraries.
here0, etc are not proper local labels.
There is no check for the return address lsb to be >= __sdcc_atomic_exchange_rollback_impl
jc outer_skip should be jnc I think.
cjne @r0, #6, here1 should be cjne a, #5, here1
jc inner_skip should be jnc I think.
beign is not a word.
There seems no need for mov a,r3 at the end of each asm block.

Maarten

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2024-07-11
  
  Thanks for the feedback. Here's a new version of the patch.
  It fixes the mcs51 huge memory model failures, and the issues in your comment, except for these two:
  
  I did not change the #6 to #5. My understanding: if we are at start-of-routine+#5, the "pc" points to the instruction that writes the new value, but that instruction has not yet been executed, so no value was written, and we still need to roll back. So we want to check for < 6.
  
  I did not remove the mov a, r3. The sdcc_atomic_exchange routines need to return the original value of the atomic object, and the comments say that they do so in a (and atomic_flag_test_and_set currently assumes this). I have no idea if it would be better to specify that the original value is returned in r3 instead.
  
  Last edit: Philipp Klaus Krause 2024-07-11
  
  mcs51-ds390-atomic3.patch
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Maarten Brock - 2024-07-11
    
    What is the dec a for then? Is it not for skipping start-of-routine+#0 ?
    
    If you specify the return to be in r3, you can load dpl in the caller directly from r3. This only applies to the non-comparing case.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Philipp Klaus Krause - 2024-07-11
      
      I see. I suggest we remove that dec a instead of changing the constant. The dec a allows us to skip a restart in rare cases, but adds one extra instruction, so its probably not worth it.
      
      Via a is faster: we currently do mov a, r3 + mov dpl, a, which is 24 cycles. the alternative would be nop + mov dpl, r3, which is 36 cycles.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Maarten Brock - 2024-07-12
        
        Removing dec a is fine with me. It's not really restarting there either because it hasn't done anything yet. But this gives room for more optimizations as the 2 nops can be moved out of the routine.
        
        Further, we do not need to execute an extra nop when r3 contains the return value. Instead we can return earlier, placing the nop after the ret. Or even better, don´t use a call at all saving precious stack space.
        
        And we can drop another call if we rename the gptr entry point to _atomic_flag_test_and_set
        
        0: nop ; 1 B 1: nop ; 1 B sdcc_atomic_exchange_pdata_impl: 2: movx a, @r0 ; 1 B 3a: mov r3, a ; 1 B 4a: mov a, r2 ; 1 B 5a: movx @r0, a ; 1 B 6: sjmp sdcc_atomic_exchange_exit ; 2 B ;===== ; 8 B 0: nop ; 1 B 1: nop ; 1 B sdcc_atomic_exchange_xdata_impl: 2: movx a, @dptr ; 1 B 3a: mov r3, a ; 1 B 4a: mov a, r2 ; 1 B 5a: movx @dptr, a ; 1 B 6: sjmp sdcc_atomic_exchange_exit ; 2 B ;===== ; 8 B sdcc_atomic_exchange_exit: mov dpl, r3 ret _atomic_flag_test_and_set:: mov r2, #1 jnb B.6, sdcc_atomic_exchange_xdata_impl mov r0, dpl jb B.5, sdcc_atomic_exchange_pdata_impl ; sjmp sdcc_atomic_exchange_idata_impl sdcc_atomic_exchange_idata_impl: mov a, r2 xch a, @r0 mov dpl, a ret
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2024-07-12
        
        Here is the atomic3.patch with many comments marked with >>>. It is not complete.
        
        Last edit: Maarten Brock 2024-07-12
        
        mcs51-ds390-atomic3.nopatch
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2024-07-15
        
        Thanks. I moved those nop in [r14918]. The dec a never made it to trunk anyway.
        Regarding the other two changes I'm not sure yet: later, these support functions will also be used to implement atomic_exchangefor 8-bit types, so we don't want to overspecialize them towards atomic_flag_test_and_set now.
        
        That would be the function family:
        
        C atomic_exchange(volatile _Atomic C *object, C desired);
        
        For every unqualified 8-bit type C (see section 7.17.7.3 of the ISO C23 standard).
        
        Related
        
        Commit: [r14918]
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2024-07-16
        
        The dec r0 bug is still present. I will fix this.
        
        I will also rename __sdcc_atomic_exchange_rollback_impl to sdcc_atomic_exchange_rollback_start. I see absolutely no reason why this implementation must be visible in the C namespace.
        
        I will add sdcc_atomic_exchange_rollback_end.
        
        I will add sdcc_atomic_exchange_exit for returning in DPL.
        
        sdcc_atomic_exchange_exit: mov dpl, r3 ret
        
        I will modify _atomic_flag_test_and_set to
        
        mov r2, #0x01 ljmp ___sdcc_atomic_exchange_gptr_impl
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2024-07-17
        
        The additional dec r0 is not needed if push psw is moved down.
        Along with the above implemented in [r14923].
        
        Related
        
        Commit: [r14923]
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2024-07-17
        
        Thanks. I don't understand the point of
        
        .area HOME (CODE) + .area GSINIT (CODE) + .area GSFINAL (CODE) + .area HOME (CODE)
        
        in __sdcc_atomic_maybe_rollback.c, though. After all this is a C source file, not an assembler source file, so the compiler already places the ordering at the beginning of the file, before that inline assembler code.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Atomics

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#624 Atomics

Discussion

Related

Related