SDCC currently does not have any official support for atomics. Some operations on some operands have traditionally been atomic, and we see bugs reports when that changes.
It might make sense to (partially) support atomics as in the C standard:
1) For communication between the main program and interrupt handlers
2) For inter-thread communication on multicore devices (SDCC does not yet support such devices).
3) For inter-thread communication when using some timesharing OS.
The two most basic types are sig_atomic_t and atomic_flag.
To me it seems, we could (as I have done in the post-3.9 branhc) just make sig_atomic_t a typedef to unsigned char.
atomic_flag, on the other hand will require a test_and_set instruction. On many targets, we could implement using an inverted 0/1 value using a right shift with indirect addressing mode. Or we could just use plain exchange, where available.
This would give us atomic_flag on stm8, z80 and related, hc08 and s08. For mcs51, we could have atomic_flag in iram (which seems okay to me, we'd just place every atomic_flag in iram, even if the memory model is different, and emit an error if someone e.g. tries to put it explicitly into xram).
Unfortunately, pdk does not have indirect adressing mode on their right shift and exchange instructions.
The C standard requires atomic_flag to be lock-free. I don't see a way to do this on pdk. On the other hand, being lock-free only has real menaing in interaction with signal handlers. And we could just declare interrupt handlers on pdk to be something other than signal handlers. Though this will greatly reduce the usefulness of atomic_flag on pdk13, pdk14, pdk15.
atomic_flag, in turn, can be used to implement spinlocks, which give us all atomic types (though not in a lock-free version). Atomics that are not lock-free seem only really useful for multithreading, though.
Diff:
We currently support
volatile sig_atomic_t
for all our ports. It is useful, and should be sufficient for typical use when there is only one hardware thread, and the normal program is being interrupted by an interrupt.However, once it gets more complex (i.e. multiple hardware threads or interrupt handlers that can interrupt each other), more atomics are needed for some uses. And even for the simple case of just one interrupt handler, they can be useful to more naturally implement some algorithms.
In C, an atomic type if "lock-free", if it can be used in signal handlers. For SDCC, I usually consider interrupt handlers to be the closest match to C's signal handlers, and try to make interrupt handlers work in the way C mandates for signal handlers.
SDCC does not currently support multiple hardware threads, though we might want to do so in the future (in particular for the Padauk ports). But many of our targets have interrupt priorities where interrupt handlers can themselves be interrupted (e.g. mcs51 and z80).
C mandates that
atomic_flag
is lock-free. We currently support it well for most ports, and for some have limited support (for mcs51,atomic_flag
needs to be in__idata
, which is a slight non-compliance if the heap is in__xdata
, as then memory from the heap cannot be used asatomic_flag
; there are C2Y proposals to allow other character arrays to be used asatomic_flag
, which would add to that problem ).It would be possible to improve this situation: I reply to my earlier concerns about the implementability of
atomic_flag
on mcs51 and Padauk for future C version, Hans Boehm proposed a restartable sequence implementation foratomic_flag
for the case of a single hardware thread. I've since further refined the approach to eliminate its interrupt latency burden, and generalized it to support more lock-free atomics:Still, this works only for a single hardware thread. Let's look at mcs51 for now, where we would need these 3 asm functions
These would be used to implement atomic_exchange and atomic_compare_and_exchange primitives that in turn would be used to implement atomic_flag, and lock-free atomics of size up to 8, i.e. we'd get lock-free atomic types for
char
,_BitInt(7)
,bool
, etc. Once we have atomic_exchange and atomic_compare_and_exchange, we can implement all other functionality for atomics.At the end of an interrupt handler, we take the return address from the stack, and compare it against all labels marked with labels that end in a above. If we are at one of these, we need to get registers values from the stack, undo the effect of the
xch
there, and set the stored pc back to corresponding label 0.This is clearly an extra overhead, but:
1) It happens at the end of the interrupt handler, which is less critical than it would be at the beginning.
2) static analysis could prove that a given interrupt handler does not use some of these, in which case the extra code at the end of the interrupt handler could be left out.
P.S.: The asm functions only require an atomic store instruction of suitable size. For most ports we only have these up to 8 bits, but on some ports we have 16-bit ones, and use 16-bit pointers. I.e. we could have lock-free atomic pointer types, which would allow the use of a lot of non-blocking data structures. On the other hand, we'd then have to potentially check for even more pc values at the end of signal handlers.
P.P.S.: It would be good, if we could guarantee that those asm functions are all within a single 256B-aligned block of 256B, so the pc comparison can be done cheaper (i.e compare the upper byte once, then if it matches compare the lower byte against multiple values).
Looks like my idea is essentially what NetBSD already did 20 years ago:
https://www.usenix.org/legacy/events/usenix2003/tech/freenix03/full_papers/mcgarry/mcgarry_html/
P.S.: The first proposal to use restartable sequences to implement atomics is AFAIK "Fast mutual exclusion for uniprocessors" by Bershad et alii from 1992: https://dl.acm.org/doi/10.1145/143365.143523
Last edit: Philipp Klaus Krause 2024-06-10
Maybe we should try to keep those restartable functions in a single area and at power-of-two offsets (8?). The return address check can then be against the lower and upper bound and a simple AND operation to restart. Also what to roll back can be easier if the same registers are used at the same offsets.
For mcs51 things can be more difficult. The pointer can be a generic pointer which first requires finding out the memory space. This can be done in a wrapper that calls the appropriate restartable functions. And the pointer can also point to __pdata which is the default for the medium memory model.
The mcs51 also has __bit variables which can be accessed with JBC to both test and clear the flag. Unfortunately they cannot be accessed indirectly.
Last edit: Maarten Brock 2024-07-10
Isn't
__pdata
just a subspace of__xdata
, so we don't need separate functions for it?Well, yes and no.
__pdata
does lie somewhere in__xdata
space. But a (single byte) pdata pointer does not hold the page. And a (3 bytes) generic pointer does not either.It is expected instead that the page selection is active with the high byte of s_PSEG in
__XPAGE
(see crtxclear.asm) for MOVX @Ri to work.Last edit: Maarten Brock 2024-07-10
Actually, having thought about it a little bit more, it gets much easier, if we never have to roll back, and can just set the pc back. We do that by having a spare copy that we can use to restore register values, e.g.:
If the pc is at 2a or 3a, we set it back to 0, and the instruction at 0 will do all the work.
R0 should be R3 or higher IMHO. See the
pdata
and_compare_
variants.This will require 16 bytes aligned routines and maybe that would finally be the first opportunity for SDCC to actually emit the XCHD A,@Ri instruction.
To enforce the alignment, I guess we could just emit this at the end of the IVT (prepending nops as necessary to avoid the routines crossing a 256B-boundary)?
Based on Maarten's code sample above, I now have this proposal, which should work and be reasonably efficient:
We'll still need a version of the rollback for xstack, and see how we'll actually use those
atomic_exchange_gptr_impl
andatomic_compare_exchange_gptr_impl
from libary functions and codegen. We also need actual compiler support for atomic types.IMO, we can put the above into SDCC soon, and start using it for
atomic_flag
before we introduce support for more atomic types.Last edit: Maarten Brock 2024-06-12
For mcs51 the call stack is always in
__idata
. Only parameters and local variables can go to__pdata
for xstack.Although
atomic_exchange_gptr_impl
andatomic_exchange_idata_impl
need not be placed at 8 byte offsets, they do need to be close toatomic_exchange_pdata_impl
andatomic_exchange_xdata_impl
since the jump on bit instructions use relative addressing. I suggest to keep them in the same asm file. The same goes for the_compare_
variants.The compare against #40 assumes that this code is 256 bytes aligned, not just 8 bytes aligned.
Needs to be:
I see. Let's make that
#40
into#<atomic_exchange_rollback_impl+40
, and put thoseatomic_exchange_gptr_impl
andatomic_exchange_idata_impl
before or after the rest (whichever makes sense to keep code mem usage low - depends on the number of interrupt vectors).This first implementation passes mcs51 regression tests.
Further steps:
0) Someone else having a look.
1) Port it to ds390.
2) Implement basic check for interrupt handlers not using any atomics (on those handlers we then omit the rollback check).
3) Port it to the pdk ports.
4) Implement support for the
_Atomic
qualifier in the front-end, and support more atomic types throughout the compiler (to be done much later).P.S.: I think this passed regressions tests for the huge memory model of mcs51 by chance only.
Last edit: Philipp Klaus Krause 2024-07-10
Updated patch. Also contains an attempted port to ds390. Still fails for ds390 and for the huge memory model of mcs51.
Last edit: Philipp Klaus Krause 2024-07-10
ds390 should be fixed now, and the check for handlers that don't use atomics is implemented, too. Still fails regression tests for mcs51 huge memory model.
First review:
Comment
atomic_flag_test_and_set.c - C run-time: C11 atomic flag
is wrong for the rollback.Leaving out the default areas is dangerous though less so for libraries.
here0
, etc are not proper local labels.There is no check for the return address lsb to be >=
__sdcc_atomic_exchange_rollback_impl
jc outer_skip
should bejnc
I think.cjne @r0, #6, here1
should becjne a, #5, here1
jc inner_skip
should bejnc
I think.beign is not a word.
There seems no need for
mov a,r3
at the end of each asm block.Maarten
Thanks for the feedback. Here's a new version of the patch.
It fixes the mcs51 huge memory model failures, and the issues in your comment, except for these two:
mov a, r3
. The sdcc_atomic_exchange routines need to return the original value of the atomic object, and the comments say that they do so ina
(and atomic_flag_test_and_set currently assumes this). I have no idea if it would be better to specify that the original value is returned inr3
instead.Last edit: Philipp Klaus Krause 2024-07-11
dec a
for then? Is it not for skipping start-of-routine+#0 ?dec a
instead of changing the constant. Thedec a
allows us to skip a restart in rare cases, but adds one extra instruction, so its probably not worth it.a
is faster: we currently domov a, r3
+mov dpl, a
, which is 24 cycles. the alternative would benop
+mov dpl, r3
, which is 36 cycles.dec a
is fine with me. It's not really restarting there either because it hasn't done anything yet. But this gives room for more optimizations as the 2nop
s can be moved out of the routine.nop
whenr3
contains the return value. Instead we can return earlier, placing thenop
after theret
. Or even better, don´t use acall
at all saving precious stack space._atomic_flag_test_and_set
Here is the atomic3.patch with many comments marked with >>>. It is not complete.
Last edit: Maarten Brock 2024-07-12
Thanks. I moved those
nop
in [r14918]. Thedec a
never made it to trunk anyway.Regarding the other two changes I'm not sure yet: later, these support functions will also be used to implement
atomic_exchange
for 8-bit types, so we don't want to overspecialize them towardsatomic_flag_test_and_set
now.That would be the function family:
For every unqualified 8-bit type C (see section 7.17.7.3 of the ISO C23 standard).
Related
Commit: [r14918]
dec r0
bug is still present. I will fix this.__sdcc_atomic_exchange_rollback_impl
tosdcc_atomic_exchange_rollback_start
. I see absolutely no reason why this implementation must be visible in the C namespace.sdcc_atomic_exchange_rollback_end
.sdcc_atomic_exchange_exit
for returning in DPL._atomic_flag_test_and_set
toThe additional
dec r0
is not needed ifpush psw
is moved down.Along with the above implemented in [r14923].
Related
Commit: [r14923]
Thanks. I don't understand the point of
in
__sdcc_atomic_maybe_rollback.c
, though. After all this is a C source file, not an assembler source file, so the compiler already places the ordering at the beginning of the file, before that inline assembler code.