Small Device C Compiler (SDCC) / Patches / #462 mcs51: implement stack probing

Maarten Brock - 2023-07-03

The only option is to rebuild the library with that option.
Is the user supposed to bring his own stack_probe_err() function?
Why not implement a default one in the library?

Why are you adding 8 to the SP? Why not the size of the stack frame?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-07-03

Thanks for looking at it @maartenbrock

The only option is to rebuild the library with that option.

That's what I've been doing so far -- hack it in the device lib Makefile.in as compiler option. Doesn't "scale well" though.

Real world example: I've got two different 8051 devices. The first one has a larger flash and is prone to stack overflows. So the checks help a lot during development and increased code size is not an issue. The second device has only a small code flash and has usually no issue with the stack, so would like to omit the stack frame checks to not waste code flash.

Recompiling & reinstalling the device libs in SDCC is really cumbersome.

So what I'd like to add is something like GCC's multilib for --stack-probe option. It seems something similar is already done for stack-auto?

Is the user supposed to bring his own stack_probe_err() function?
Why not implement a default one in the library?

Yes, that's what I've been doing for myself. The stack_probe_err is a special function that is implemented by the application / "board support library". It uses board specific ways to print an abort message somewhere -- usually the uart.
A default implementation could be added, which could use printf / puts. But it should be overrideable by the user application. Not sure how to implement that within SDCC.

Why are you adding 8 to the SP? Why not the size of the stack frame?

Good point! Didn't think about it and/or didn't bother to check how to obtain the stack frame size at that time, as I needed "something working quickly".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Maarten Brock - 2023-07-03
  
  The default could be as simple as looping forever. If you have in-circuit-debugging you can break and inspect things.
  
  Any function in the library can be overridden by supplying your own. The linker uses all objects you give to it and then looks for the missing pieces in the libraries.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Oleg Endo - 2023-07-04
    
    The default could be as simple as looping forever. If you have in-circuit-debugging you can break and inspect things.
    
    Something like the default impl. of device/lib/__assert.c looks useful. It printfs something and infinite-loops after.
    
    Any function in the library can be overridden by supplying your own. The linker uses all objects you give to it and then looks for the missing pieces in the libraries.
    
    Alright. I'll give a another shot. Thanks for the pointer.
    
    What about the multi-lib issue? Do the same as for stack-auto? Looks like it's going to be a bigger changeset ...
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-07-18

I've integrated the stack probing into the stack adjustment code. Default impl. of library function 'stack_probe_err' is still not there, will add it later.

@maartenbrock could you please have a look whether this is reasonable like that? In particualr, I'm not sure how reliable 'pop sp' is across all the different 8051 core implementations?

0001-implement-stack-probing-for-mcs51_1.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Maarten Brock - 2023-07-19
  
  There are some issues with this patch.
  * There is no 'push a' and 'push acc' is 2 bytes
  * stack_probe_restore_acc_code_line is still unused
  * Internal stack can also be probed when xstack is specified as it is still used
  * 'i + 1' vs (i - 1)
  
  0001-implement-stack-probing-for-mcs51_2.patch
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Maarten Brock - 2023-07-19
  
  I've tested POP SP on a SiLabs EFM8 Universal Bee and it does what you expect: SP contains the popped value afterwards. This case is also explicitly documented in the MCS-51 Family of Single Chip Microcomputers User's Manual.
  
  The stack pointer originally contains the value 32H, and internal RAM locations 30H through 32H contain the values 20H, 23H, and 01H, respectively. The instruction sequence,
  POP DPH
  POP DPL
  will leave the stack pointer equal to the value 30H and the data pointer set to 0123H. At this point the instruction,
  POP SP
  will leave the stack pointer set to 20H. Note that in this special case the stack pointer was decremented to 2FH before being loaded with the value popped (20H).
  
  Although contradictory to the following definition of operation:
  
  (direct) <- ((SP))
  (SP) <- (SP) - 1
  
  On the EFM8 PUSH SP stores SP as it was before the push. This seems contradictory to the original User's Manual which has this definition of operation:
  
  (SP) <- (SP) + 1
  ((SP)) <- (direct)
  
  It has no special example for PUSH SP.
  
  The ucsim s51 simulator stores the incremented value.
  
  But further down in the User's Manual in the 8051 Instruction Set Simulator code it says:
  
  /* "PUSH direct" INSTRUCTION: */ PUSH$DIR: PROCEDURE; DIR$ADDR=FETCH$PROGRAM(PC+1); DIR$DATA=FETCH$DIR(DIR$ADDR); CALL PUSH$STACK(DIR$DATA); PC=PC+2; END PUSH$DIR;
  
  The stack pointer is incremented inside the PUSH$STACK procedure.
  
  This means that PUSH SP should store the non-incremented stack pointer value.
  
  Last edit: Maarten Brock 2023-07-19
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-07-20

@maartenbrock thanks for checking it!

There is no 'push a' and 'push acc' is 2 bytes

It assembles and links just fine here it seems. The assembler seems to take care of it. But I in other places SDCC emits 'acc', so perhaps it's good to be consistent. I know that that insn is 2 bytes in size.

stack_probe_restore_acc_code_line is still unused

Oops. I forgot to remove this leftover piece.

Internal stack can also be probed when xstack is specified as it is still used

Oh, OK. I've never used xstack myself, so didn't pay attention to that.

'i + 1' vs (i - 1)
/ +/-1 Which one is correct? /

That was basically my main uncertainty regarding using 'pop sp'. 'push sp' implementation is irrelevant as it's not used. If EFM8's CIP-51 core implements the special case of 'pop sp' as per original manual, then perhaps that's what can be expected there? I will try it out on the DW8051 (synopsys) core to double check.

Your change of original patch:

if (i > 5)

{

...

emitcode ("pop", "sp"); / sp = @(sp); sp -= 1 /

/ total sp adjust insn length = 14 bytes /

to

if (i > 7)

{

...

emitcode ("pop", "sp"); / sp = @(sp); sp -= 1 /

/ total sp adjust insn length = 18 bytes /

I was counting from the first insn after the 'jc' insn onwards. Not the whole code emitted in the if block, but only the code to do the actual adjustment:

emitcode ("xch", "a,r0"); // 1 emitcode ("push", "acc"); // 2 emitcode ("xch", "a,r0"); // 1 emitcode ("mov", "r0,sp"); // 2 emitcode ("dec", "r0"); // 1 emitcode ("xch", "a,@r0"); // 1 emitcode ("xch", "a,r0"); // 1 emitcode ("pop", "acc"); // 2 emitcode ("xch", "a,r0"); // 1 emitcode ("pop", "sp"); // 2 total = 14 bytes

... because the 'add' and 'jc' will be always there, in all cases.
So the 14 bytes are worth 7 'inc sp' insns. However, that case includes also a 'pop acc', so it's worth only effectively 6 'insn sp' insns. Hence I put the threshold to 'i > 5'.

emitcode ("add", "a,#(__start__stack - 1 + !constbyte)", i);

emitcode ("jc", "%s_stack_probe_err", sym->rname);

emitcode ("pop", "acc");

while (i--)

emitcode ("inc", "sp");

/ total sp adjust insn length <= 16 bytes /

If I count all the code that is emitted in that else block, then I arrive at 20 for i = 7, which is the max. i that this else case can hit as per your change to 'if (i > 7)' above. Or not?

Last edit: Oleg Endo 2023-07-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Maarten Brock - 2023-07-20
  
  'i + 1' vs (i - 1) / +/-1 Which one is correct? /
  
  That was basically my main uncertainty regarding using 'pop sp'.
  
  What I meant was that the comment states you need i+1 and the code actually uses i-1 !
  And now we know that you don't need either.
  
  Let's count all bytes. And my second piece was wrong in that regard. Your threshold of 5 is correct.
  
  The non-probing case for i>9 is not right. Not only is there no post-decrement on the popped SP value, but you have already pushed twice when SP is read into A. Also SP can be read before saving R0 which saves the need for DEC R0.
  
  And similarly in the stack probing case, when A is pushed because there is no free reg, the offset changes. ~~Come to think of it, all R0-R7 registers should always be free at function entry, unless it's switching register bank.~~
  
  But why not make the stack probing a support routine that takes the stack offset as parameter in A and returns the new stack value in A? E.g.
  
  xch a,freereg ;optional mov a,#stackAdjust lcall ___sdcc_stack_probe mov SP,a xch a,freereg ;optional
  
  The stack probing routine can check underflow and overflow, has the caller automatically on stack and needs no lcall before the function.
  
  Last edit: Maarten Brock 2023-07-21
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Oleg Endo - 2023-07-21
    
    Let's count all bytes.
    
    OK, perhaps easier to follow.
    
    The non-probing case for i>9 is not right.
    
    Good point. I've refined it. Code can be shortened a bit.
    
    And similarly in the stack probing case, when A is pushed because there is no free reg, the offset changes.
    
    Should be fixed.
    
    Come to think of it, all R0-R7 registers should always be free at function entry, unless it's switching register bank.
    
    All my software never hits those cases when accIsFree == false || freereg == NULL. I also thought it's a bit odd that freereg == NULL is possible at all. But then, this path was already there so I didn't dig further...
    
    But why not make the stack probing a support routine that takes the stack offset as parameter in A and returns the new stack value in A?
    
    Yeah, I've seen DS390 uses this approach. However, one of my main devices here has über-slow jumps -- ~48 cycles or so. So having the lcall ___sdcc_stack_probe in there costs each function call an additional 100 cycles at least ('lcall' + 'ret' from the probe function). Hence I implemented it originally inlined in the epilogue.
    
    Devices like EFM8 (which I also use) don't suffer from that so much, but still an 'lcall' is up to 7 cycles and a 'ret' is up to '8' cycles, while the not taken 'jc' conditional branches cost only 2 cycles. So I'd prefer to have inlined stack probing.
    
    Attached updated patch now also includes the default implementation for the __stack_probe_err function and documentation update.
    
    Last edit: Oleg Endo 2023-07-21
    
    0001-implement-stack-probing-for-mcs51_3.patch
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Maarten Brock - 2023-07-21
      
      The epilogue is what happens at the end (of the function). The prologue is what happens at the beginning. Your code is in the prologue.
      
      The _stack_probe_err can be marked _Noreturn or [[ noreturn ]] to make sure it will not return.
      
      I would definitely not use printf() in this error function. It is a very complex function that is very likely to fail when the stack is corrupted.
      
      Instead of __start__stack you can also use s_SSEG (they are equal). It is accompanied by l_SSEG which holds the length of the stack segment. And since the linker doesn't support adding two relocatable symbols either, I consider adding e_SSEG.
      
      The underflow check does not yet take the PUSH ACC into account and neither does the i<=5 case. With the underflow check fixed the overflow no longer needs i-1.
      
      There are very few 8051 derivatives where jumps are so relatively expensive. How many cycles do your checks cost in the accIsFree case? Does the device have a prefetch buffer and/or cache? I would prefer to use a support routine and save bytes.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Philipp Klaus Krause - 2023-07-21
        
        To me, it looks like there is a trade-off between code size and speed where different users have different preferences.
        Usually, in such cases, we go with the solution that yields lower code size by default, but use the faster one if --opt-code-size is specified by the user.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oleg Endo - 2023-07-23
        
        Usually, in such cases, we go with the solution that yields lower code size by default, but use the faster one if --opt-code-size is specified by the user.
        
        -opt-code-speed vs. --opt-code-sizetypo?
        
        Like I wrote in my other reply, I would suggest to add an alternative space optimizing implementation of stack probing afterwards.
        
        I'm not going to propose to enable stack probing by default, so there will be no code size regression/increase at all, unless stack probing is enabled by the user specifically.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2023-07-23
        
        Yes, I meant --opt-code-speed.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Oleg Endo - 2023-07-21
        
        The epilogue is what happens at the end (of the function). The prologue is what happens at the beginning. Your code is in the prologue.
        
        I know. Just a typo. Fixed.
        
        The _stack_probe_err can be marked _Noreturn or [[ noreturn ]] to make sure it will not return.
        
        Yes, that's what I have in my app code. I wasn't sure if that's OK for the device libs. I've copied the code from assert.c which doesn't specify noreturn.
        Since the call site always uses 'lcall' I think specifying noreturn will do effectively nothing in this case. But I've added it anyway.
        
        I would definitely not use printf() in this error function. It is a very complex function that is very likely to fail when the stack is corrupted.
        
        Ttrue. I didn't consider that (In my app implementation it resets the stack before doing any printing). Removed.
        
        Instead of __start__stack you can also use s_SSEG (they are equal). It is accompanied by l_SSEG which holds the length of the stack segment. And since the linker doesn't support adding two relocatable symbols either, I consider adding e_SSEG.
        
        I'm not sure how to use the area/section/segment names? What would that improve? Can you give an example?
        
        The underflow check does not yet take the PUSH ACC into account
        
        By underflow you mean the 'subb' check? If so ...
        Why does it have to? At that point, all it cares about is whether the stack pointer has rolled over and is somewhere in the range 0x00 - start_stack. Note that earlier in the prologue, there might be other stack pushes, e.g. for the frame pointer. But I don't see how that would be relevant at that point.
        
        and neither does the i<=5 case. With the underflow check fixed the overflow no longer needs i-1.
        
        Sorry, I don't follow. Please point out in the code.
        
        There are very few 8051 derivatives where jumps are so relatively expensive.
        
        IMO, even on the EFM8UB I wouldn't want to have those extra 15 cycles for each function call. The thing is already slow as-is.
        
        How many cycles do your checks cost in the accIsFree case?
        
        emitcode ("clr", "c"); emitcode ("mov", "a,sp"); emitcode ("subb", "a,#__start__stack - 1"); emitcode ("jc", "%s_stack_probe_err", sym->rname); // EFM8UB: 1+2+2+2 = 7 cycles (jc not taken) // DW8051: 1+2+2+3 = 8 cycles emitcode ("add", "a,#(__start__stack - 1 + !constbyte)", i); emitcode ("jc", "%s_stack_probe_err", sym->rname); emitcode ("mov", "sp,a"); // EFM8UB:2+2+2 = 6 cycles (jc not taken) // DW8051: 2+3+2 = 7 cycles // total cycles // EFM8: 7 + 6 = 13 cycles (6 cycles are for stack adjust) // DW8051: 8 + 7 = 15 cycles (6 cycles are for stack adjust)
        
        Does the device have a prefetch buffer and/or cache?
        
        The EFM8UB does have a prefetch buffer, and it makes taken jumps more expensive than if the prefetch buffer is disabled. The DW8051 device doesn't have any buffer/cache.
        
        I would prefer to use a support routine and save bytes.
        
        For the cases like freereg == null, it might be good to outsource the stack adjustment + probing code into a support routine, when optimizing for size. For the other cases, I can't agree with it. Can we leave that as a potential future improvement? (If anybody cares at all, since I won't enable stack probing by default for everything).
        
        0001-implement-stack-probing-for-mcs51_4.patch
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oleg Endo - 2023-07-21
        
        Instead of __start__stack you can also use s_SSEG (they are equal). It is accompanied by l_SSEG which holds the length of the stack segment. And since the linker doesn't support adding two relocatable symbols either, I consider adding e_SSEG.
        
        I'm not sure how to use the area/section/segment names? What would that improve? Can you give an example?
        
        OK, I've tried it out, I can replace __start__stack with s_SSEG. But as such, it doesn't change much for the current implementation.
        
        Support for stack end != 0xFF could be added of course, but is there any practical use case for that? I'm not sure ....
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2023-07-25
        
        I am also not sure about checking stack end != 0xFF. It can happen, but is unlikely.
        But at least by using l_SSEG it is possible. There is no __end__stack or __len__stack or similar and there probably never will be.
        
        However, if the stack probing was done by an external assembly routine, it would be easier for any user to modify it to add this length check. It could even be implemented behind an .if clause.
        
        Example:
        
        .area DSEG (DATA) stack_temp: .ds 1 .area HOME (CODE) __sdcc_stack_probe_1:: mov stack_temp,a mov a,#s_SSEG + 1 sjmp __sdcc_stack_cmp __sdcc_stack_probe:: mov stack_temp,a ; stack_temp = requested additional stack space mov a,#s_SSEG __sdcc_stack_cmp: xrl a,#0xFF add a,sp ; a = allocated stack space before call here jnc __sdcc_stack_error add a,stack_temp ; a = needed allocated stack space jc __sdcc_stack_error .if 0 mov stack_temp,a ; stack_temp = needed allocated stack space mov a,#l_SSEG xrl a,#0xFF add a,stack_temp ; a = space - l_SSEG - 1 jc __sdcc_stack_error mov a,stack_temp .endif add a,#s_SSEG - 1 ; a = new stack pointer value ret __sdcc_stack_error:: sjmp .
        
        __sdcc_stack_probe_1 is for the case that A needs to pushed before the call.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2023-07-25
        
        Specifying noreturn should prevent the user from creating a function that does return since it has no proper return address. It should probably also be added to the intrinsic functions prototypes in SDCC so it will always protest when noreturn is left out.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2023-07-25
        
        By underflow you mean the 'subb' check? If so ...
        Why does it have to? At that point, all it cares about is whether the stack pointer has rolled over and is somewhere in the range 0x00 - start_stack. Note that earlier in the prologue, there might be other stack pushes, e.g. for the frame pointer. But I don't see how that would be relevant at that point.
        
        You can at least take into account that you pushed ACC here. If more pushes were done in between, it would be best if they are taken into account as well. Who knows, it might have overflown back into valid stack space!
        
        emitcode ("clr", "c"); emitcode ("mov", "a,sp"); if (!accIsFree && freereg == NULL) emitcode ("subb", "a,#__start__stack"); else emitcode ("subb", "a,#__start__stack - 1"); emitcode ("jc", "%s_stack_probe_err", sym->rname);
        
        After this, you don't need to use 'i - 1' any more in the (i > 5) case and neither in the else where you missed it (i<=5).
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2023-07-25
        
        The DW8051 has no buffer/cache but making a jump still takes approx. 50 cycles where other instructions take approx. 2. I'm very surprised.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-07-20

@maartenbrock I've tried out the following on an DW8051 core:

void test (void)
{

__asm mov r0,sp mov r1,sp mov a,#0x20 push a pop sp mov a,sp mov sp,r1 __endasm; uint8_t test_sp_val = ACC; printf ("AAAAAAAAAAAAAAA test %u\n", test_sp_val); while (true); }

it prints '32'.

Looks like it also follows the 'pop sp' special case of the original mcs-51 like the EFM8. So can we assume that this works? I've tried to look at some open source 8051 soft core implementations but their operation in this regard is not straight forward to see through for me...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-07-26

@maartenbrock replying in new message (it's getting too cluttered here)

Specifying noreturn should prevent the user from creating a function that does return since it has no proper return address. It should probably also be added to the intrinsic functions prototypes in SDCC so it will always protest when noreturn is left out.

OK, makes sense to me.

am also not sure about checking stack end != 0xFF. It can happen, but is unlikely.
But at least by using l_SSEG it is possible. There is no __end__stack or __len__stack or similar and there probably never will be.

If we set iram size = 128, then stack end is implicitly 0x7F. That's what I was primarily thinking of. But I'm not sure how widely devices with iram size = 128 are still used. Most have 256 these days, I think. There is a --stack-size option. Theoretically it can be used to place user variables after the stack space (up to end of iram). But also not sure what's the advantage of that over placing those variables before the stack ...

Perhaps easiest way for now is to assume that iram size = 256 and stack space sits at the end of iram. If that's not good enough for somebody, they are free to report :)

However, if the stack probing was done by an external assembly routine, it would be easier for any user to modify it to add this length check. It could even be implemented behind an .if clause.

Example:
....

Yes, sure, no doubt about it. The external function would allow for a more flexible stack probing implementation and it's easy to add complex sophisticated detailed checks, whereas my goal for this one was to have fast-best-effort checks so that stack probing can be left in the production build of the software.

We could make this available via options. E.g. --stack-probe would do the function call approach while --stack-probe-inline would do what I'm proposing here.

You can at least take into account that you pushed ACC here. If more pushes were done in between, it would be best if they are taken into account as well.

After this, you don't need to use 'i - 1' any more in the (i > 5) case and neither in the else where you missed it (i<=5).

Ah! OK, I get it now. Yes, it's more obvious that way. I'll change it. Thanks!

Who knows, it might have overflown back into valid stack space!

Yes, it's not perfect. Every push onto the stack in the prologue code could overflow it. In fact, the call-insn itself might overflow it, too. Of course we can't check after every single push insn. The 'subb' check does basically cumulative checking of all those, in the hope that the number of bytes pushed onto the stack so far has not exceeded 'stack_start - 0x00'. If that were the case, it would wrongly judge that the 'sp' is OK.

We can figure out how many bytes would need to be pushed by the prologue. I'd just count all the emitted push insns and then insert the following snippet at the very top of the function:

push a mov a,sp add a,#prologue_num_bytes pop a jc stack_probe_err

It would take this whole thing back to almost the original version of my patch, where I was doing the 'add' check first, followed by 'subb'.

The current version of the patch produces overall smaller code size for me, as the 'add' check is integrated into the actual stack adjustment code and the number of pushed bytes in the prologue is usually zero (I'm always using --fomit-frame-pointer).

So if there are any number of pushed bytes in the prologue, it should emit the 'add' check at the top of the function and just add the 'stackAdjust' number to the number of prologue bytes. That would be more robust. I could add this if you insist. Although I don't have any need for that myself at the moment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-08-29

@maartenbrock Sorry, got worked up with another thing.

I've revised the patch, to address your point:

You can at least take into account that you pushed ACC here. If more pushes were done in between, it would be best if they are taken into account as well. Who knows, it might have overflown back into valid stack space!

The patch is a bit shorter now.

0001-implement-stack-probing-for-mcs51_5.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

@maartenbrock While rebasing the patch onto the current SVN trunk, I've noticed that there is a problem with the --all-callee-saves option.

The following test code:

void bleh_10 (uint32_t x);

void bleh_11 (uint32_t x)
{
   bleh_10 (0x11223344ul);
   bleh_10 (x);
}

Compiled with sdcc -c --model-large --std-sdcc2x -mmcs51 --stack-auto --nogcse --fverbose-asm --stack-probe --all-callee-saves

000000                        120 _bleh_11_stack_probe_err:
      000000 12r00r00         [24]  121     lcall   __stack_probe_err
      000003                        122 _bleh_11:
                           000007   123     ar7 = 0x07
                           000006   124     ar6 = 0x06
                           000005   125     ar5 = 0x05
                           000004   126     ar4 = 0x04
                           000003   127     ar3 = 0x03
                           000002   128     ar2 = 0x02
                           000001   129     ar1 = 0x01
                           000000   130     ar0 = 0x00
      000003 C0 07            [24]  131     push    ar7
      000005 C0 06            [24]  132     push    ar6
      000007 C0 05            [24]  133     push    ar5
      000009 C0 04            [24]  134     push    ar4
                                    135 ;   genReceive
      00000B AC 82            [24]  136     mov r4,dpl
      00000D AD 83            [24]  137     mov r5,dph
      00000F AE F0            [24]  138     mov r6,b
      000011 FF               [12]  139     mov r7,a
      000012 C3               [12]  140     clr c
      000013 E5 81            [12]  141     mov a,sp
      000015 94rFF            [12]  142     subb    a,#__start__stack - 1
      000017 40 E7            [24]  143     jc  _bleh_11_stack_probe_err
                                    144 ;   bug.c:872: bleh_10 (0x11223344ul);
                                    145 ;   genCall
                                    146 ;   Peephole 182.b  used 16 bit load of dptr
      000019 90 33 44         [24]  147     mov dptr,#0x3344
      00001C 75 F0 22         [24]  148     mov b,#0x22
      00001F 74 11            [12]  149     mov a,#0x11
      000021 12r00r00         [24]  150     lcall   _bleh_10
                                    151 ;   bug.c:873: bleh_10 (x);
                                    152 ;   genCall
      000024 8C 82            [24]  153     mov dpl,r4
      000026 8D 83            [24]  154     mov dph,r5
      000028 8E F0            [24]  155     mov b,r6
      00002A EF               [12]  156     mov a,r7
                                    157 ;   bug.c:874: }
                                    158 ;   Peephole 400.b  replaced lcall/ret with ljmp
      00002B 02r00r00         [24]  159     ljmp    _bleh_10
                                    160 ;
                                    161 ;   eliminated unneeded push/pop ar7
                                    162 ;   eliminated unneeded push/pop ar6
                                    163 ;   eliminated unneeded push/pop ar5
                                    164 ;   eliminated unneeded push/pop ar4
                                    165     .area BANK1   (CODE)
                                    166     .area BANK1   (CODE)
                                    167     .area XINIT   (CODE)
                                    168     .area CABS    (ABS,CODE)

The code at the end of genEndFunction wrongly eliminates the pop insns. This happens when any one of the stack adjustment or stack probe insns is inserted.

This happens because those stack modification insns will have the iCode set to FUNCTION. And the code in genEndFunction uses that as a terminating condition for scanning for push insns when it checks if a pop insn can be eliminated. And thus it wrongly eliminates the pop insns.

This looks like a latent bug, because it would also happen if any of the previous stack / xstack adjustment insns are emitted. It seems to work OK if the iCode of the emitted insns is set to NULL.

Attached is the updated patch which should now apply cleanly onto trunk.

0001-implement-stack-probing-for-mcs51_6.patch

Oleg Endo - 2023-10-18

@maartenbrock any updates on this one?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mcs51: implement stack probing

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#462 mcs51: implement stack probing

Discussion