Menu

#462 mcs51: implement stack probing

open
nobody
None
5
4 hours ago
2023-07-01
Oleg Endo
No

I've been using this patch in my local version for a while now. As the title says, it implements stack probing for mcs51,but not for xstack, only the regular stack.

It does so by injecting the stack pointer check in the prologue of each function and assumes that the conditional branch is normally not taken if the stack check passes.

This is not perfect of course. It doesn't catch wraparound of the stack pointer and also doesn't catch a stack overflow after allocating the stack frame inside of the function (because the check is done before that). But it's better than nothing at all and has proven useful for me.

One issue I don't know how to deal with is how to extend the device lib set with the permutation for the option '--stack-probe'. The idea is to have stack probing also enabled for rt / c std device libs (printf can overflow the stack easily for example) when it's specified in the user application.

@spth do you have any suggestion?

1 Attachments

Discussion

1 2 > >> (Page 1 of 2)
  • Maarten Brock

    Maarten Brock - 2023-07-03

    The only option is to rebuild the library with that option.
    Is the user supposed to bring his own stack_probe_err() function?
    Why not implement a default one in the library?

    Why are you adding 8 to the SP? Why not the size of the stack frame?

     
  • Oleg Endo

    Oleg Endo - 2023-07-03

    Thanks for looking at it @maartenbrock

    The only option is to rebuild the library with that option.

    That's what I've been doing so far -- hack it in the device lib Makefile.in as compiler option. Doesn't "scale well" though.

    Real world example: I've got two different 8051 devices. The first one has a larger flash and is prone to stack overflows. So the checks help a lot during development and increased code size is not an issue. The second device has only a small code flash and has usually no issue with the stack, so would like to omit the stack frame checks to not waste code flash.

    Recompiling & reinstalling the device libs in SDCC is really cumbersome.

    So what I'd like to add is something like GCC's multilib for --stack-probe option. It seems something similar is already done for stack-auto?

    Is the user supposed to bring his own stack_probe_err() function?
    Why not implement a default one in the library?

    Yes, that's what I've been doing for myself. The stack_probe_err is a special function that is implemented by the application / "board support library". It uses board specific ways to print an abort message somewhere -- usually the uart.
    A default implementation could be added, which could use printf / puts. But it should be overrideable by the user application. Not sure how to implement that within SDCC.

    Why are you adding 8 to the SP? Why not the size of the stack frame?

    Good point! Didn't think about it and/or didn't bother to check how to obtain the stack frame size at that time, as I needed "something working quickly".

     
    • Maarten Brock

      Maarten Brock - 2023-07-03

      The default could be as simple as looping forever. If you have in-circuit-debugging you can break and inspect things.

      Any function in the library can be overridden by supplying your own. The linker uses all objects you give to it and then looks for the missing pieces in the libraries.

       
      • Oleg Endo

        Oleg Endo - 2023-07-04

        The default could be as simple as looping forever. If you have in-circuit-debugging you can break and inspect things.

        Something like the default impl. of device/lib/__assert.c looks useful. It printfs something and infinite-loops after.

        Any function in the library can be overridden by supplying your own. The linker uses all objects you give to it and then looks for the missing pieces in the libraries.

        Alright. I'll give a another shot. Thanks for the pointer.

        What about the multi-lib issue? Do the same as for stack-auto? Looks like it's going to be a bigger changeset ...

         
  • Oleg Endo

    Oleg Endo - 2023-07-18

    I've integrated the stack probing into the stack adjustment code. Default impl. of library function 'stack_probe_err' is still not there, will add it later.

    @maartenbrock could you please have a look whether this is reasonable like that? In particualr, I'm not sure how reliable 'pop sp' is across all the different 8051 core implementations?

     
    • Maarten Brock

      Maarten Brock - 2023-07-19

      There are some issues with this patch.
      * There is no 'push a' and 'push acc' is 2 bytes
      * stack_probe_restore_acc_code_line is still unused
      * Internal stack can also be probed when xstack is specified as it is still used
      * 'i + 1' vs (i - 1)

       
    • Maarten Brock

      Maarten Brock - 2023-07-19

      I've tested POP SP on a SiLabs EFM8 Universal Bee and it does what you expect: SP contains the popped value afterwards. This case is also explicitly documented in the MCS-51 Family of Single Chip Microcomputers User's Manual.

      The stack pointer originally contains the value 32H, and internal RAM locations 30H through 32H contain the values 20H, 23H, and 01H, respectively. The instruction sequence,
      POP DPH
      POP DPL
      will leave the stack pointer equal to the value 30H and the data pointer set to 0123H. At this point the instruction,
      POP SP
      will leave the stack pointer set to 20H. Note that in this special case the stack pointer was decremented to 2FH before being loaded with the value popped (20H).

      Although contradictory to the following definition of operation:

      (direct) <- ((SP))
      (SP) <- (SP) - 1

      On the EFM8 PUSH SP stores SP as it was before the push. This seems contradictory to the original User's Manual which has this definition of operation:

      (SP) <- (SP) + 1
      ((SP)) <- (direct)

      It has no special example for PUSH SP.

      The ucsim s51 simulator stores the incremented value.

      But further down in the User's Manual in the 8051 Instruction Set Simulator code it says:

      /* "PUSH direct" INSTRUCTION: */
      PUSH$DIR: PROCEDURE;
          DIR$ADDR=FETCH$PROGRAM(PC+1);
          DIR$DATA=FETCH$DIR(DIR$ADDR);
          CALL PUSH$STACK(DIR$DATA);
          PC=PC+2;
      END PUSH$DIR;
      

      The stack pointer is incremented inside the PUSH$STACK procedure.

      This means that PUSH SP should store the non-incremented stack pointer value.

       

      Last edit: Maarten Brock 2023-07-19
  • Oleg Endo

    Oleg Endo - 2023-07-20

    @maartenbrock thanks for checking it!

    • There is no 'push a' and 'push acc' is 2 bytes

    It assembles and links just fine here it seems. The assembler seems to take care of it. But I in other places SDCC emits 'acc', so perhaps it's good to be consistent. I know that that insn is 2 bytes in size.

    stack_probe_restore_acc_code_line is still unused

    Oops. I forgot to remove this leftover piece.

    Internal stack can also be probed when xstack is specified as it is still used

    Oh, OK. I've never used xstack myself, so didn't pay attention to that.

    'i + 1' vs (i - 1)
    / +/-1 Which one is correct? /

    That was basically my main uncertainty regarding using 'pop sp'. 'push sp' implementation is irrelevant as it's not used. If EFM8's CIP-51 core implements the special case of 'pop sp' as per original manual, then perhaps that's what can be expected there? I will try it out on the DW8051 (synopsys) core to double check.

    Your change of original patch:

    • if (i > 5)
    • {
    • ...
    • emitcode ("pop", "sp"); / sp = @(sp); sp -= 1 /
    • / total sp adjust insn length = 14 bytes /

    to

    • if (i > 7)
    • {
    • ...
    • emitcode ("pop", "sp"); / sp = @(sp); sp -= 1 /
    • / total sp adjust insn length = 18 bytes /

    I was counting from the first insn after the 'jc' insn onwards. Not the whole code emitted in the if block, but only the code to do the actual adjustment:

                  emitcode ("xch", "a,r0");  // 1
                  emitcode ("push", "acc");  // 2
                  emitcode ("xch", "a,r0");  // 1
    
                  emitcode ("mov", "r0,sp");  // 2
                  emitcode ("dec", "r0");     // 1
                  emitcode ("xch", "a,@r0");  // 1
    
                  emitcode ("xch", "a,r0");   // 1
                  emitcode ("pop", "acc");    // 2
                  emitcode ("xch", "a,r0");   // 1
    
                  emitcode ("pop", "sp");     // 2     total = 14 bytes
    

    ... because the 'add' and 'jc' will be always there, in all cases.
    So the 14 bytes are worth 7 'inc sp' insns. However, that case includes also a 'pop acc', so it's worth only effectively 6 'insn sp' insns. Hence I put the threshold to 'i > 5'.

    • emitcode ("add", "a,#(__start__stack - 1 + !constbyte)", i);
    • emitcode ("jc", "%s_stack_probe_err", sym->rname);
    • emitcode ("pop", "acc");
    • while (i--)
    • emitcode ("inc", "sp");
    • / total sp adjust insn length <= 16 bytes /

    If I count all the code that is emitted in that else block, then I arrive at 20 for i = 7, which is the max. i that this else case can hit as per your change to 'if (i > 7)' above. Or not?

     

    Last edit: Oleg Endo 2023-07-21
    • Maarten Brock

      Maarten Brock - 2023-07-20
      'i + 1' vs (i - 1)
      / +/-1 Which one is correct? /
      

      That was basically my main uncertainty regarding using 'pop sp'.

      What I meant was that the comment states you need i+1 and the code actually uses i-1 !
      And now we know that you don't need either.

      Let's count all bytes. And my second piece was wrong in that regard. Your threshold of 5 is correct.

      The non-probing case for i>9 is not right. Not only is there no post-decrement on the popped SP value, but you have already pushed twice when SP is read into A. Also SP can be read before saving R0 which saves the need for DEC R0.

      And similarly in the stack probing case, when A is pushed because there is no free reg, the offset changes. Come to think of it, all R0-R7 registers should always be free at function entry, unless it's switching register bank.

      But why not make the stack probing a support routine that takes the stack offset as parameter in A and returns the new stack value in A? E.g.

      xch     a,freereg           ;optional
      mov     a,#stackAdjust
      lcall   ___sdcc_stack_probe
      mov     SP,a
      xch     a,freereg           ;optional
      

      The stack probing routine can check underflow and overflow, has the caller automatically on stack and needs no lcall before the function.

       

      Last edit: Maarten Brock 2023-07-21
      • Oleg Endo

        Oleg Endo - 2023-07-21

        Let's count all bytes.

        OK, perhaps easier to follow.

        The non-probing case for i>9 is not right.

        Good point. I've refined it. Code can be shortened a bit.

        And similarly in the stack probing case, when A is pushed because there is no free reg, the offset changes.

        Should be fixed.

        Come to think of it, all R0-R7 registers should always be free at function entry, unless it's switching register bank.

        All my software never hits those cases when accIsFree == false || freereg == NULL. I also thought it's a bit odd that freereg == NULL is possible at all. But then, this path was already there so I didn't dig further...

        But why not make the stack probing a support routine that takes the stack offset as parameter in A and returns the new stack value in A?

        Yeah, I've seen DS390 uses this approach. However, one of my main devices here has über-slow jumps -- ~48 cycles or so. So having the lcall ___sdcc_stack_probe in there costs each function call an additional 100 cycles at least ('lcall' + 'ret' from the probe function). Hence I implemented it originally inlined in the epilogue.

        Devices like EFM8 (which I also use) don't suffer from that so much, but still an 'lcall' is up to 7 cycles and a 'ret' is up to '8' cycles, while the not taken 'jc' conditional branches cost only 2 cycles. So I'd prefer to have inlined stack probing.

        Attached updated patch now also includes the default implementation for the __stack_probe_err function and documentation update.

         

        Last edit: Oleg Endo 2023-07-21
        • Maarten Brock

          Maarten Brock - 2023-07-21

          The epilogue is what happens at the end (of the function). The prologue is what happens at the beginning. Your code is in the prologue.

          The _stack_probe_err can be marked _Noreturn or [[ noreturn ]] to make sure it will not return.

          I would definitely not use printf() in this error function. It is a very complex function that is very likely to fail when the stack is corrupted.

          Instead of __start__stack you can also use s_SSEG (they are equal). It is accompanied by l_SSEG which holds the length of the stack segment. And since the linker doesn't support adding two relocatable symbols either, I consider adding e_SSEG.

          The underflow check does not yet take the PUSH ACC into account and neither does the i<=5 case. With the underflow check fixed the overflow no longer needs i-1.

          There are very few 8051 derivatives where jumps are so relatively expensive. How many cycles do your checks cost in the accIsFree case? Does the device have a prefetch buffer and/or cache? I would prefer to use a support routine and save bytes.

           
          • Philipp Klaus Krause

            To me, it looks like there is a trade-off between code size and speed where different users have different preferences.
            Usually, in such cases, we go with the solution that yields lower code size by default, but use the faster one if --opt-code-size is specified by the user.

             
            • Oleg Endo

              Oleg Endo - 2023-07-23

              Usually, in such cases, we go with the solution that yields lower code size by default, but use the faster one if --opt-code-size is specified by the user.

              -opt-code-speed vs. --opt-code-sizetypo?

              Like I wrote in my other reply, I would suggest to add an alternative space optimizing implementation of stack probing afterwards.

              I'm not going to propose to enable stack probing by default, so there will be no code size regression/increase at all, unless stack probing is enabled by the user specifically.

               
              • Philipp Klaus Krause

                Yes, I meant --opt-code-speed.

                 
          • Oleg Endo

            Oleg Endo - 2023-07-21

            The epilogue is what happens at the end (of the function). The prologue is what happens at the beginning. Your code is in the prologue.

            I know. Just a typo. Fixed.

            The _stack_probe_err can be marked _Noreturn or [[ noreturn ]] to make sure it will not return.

            Yes, that's what I have in my app code. I wasn't sure if that's OK for the device libs. I've copied the code from assert.c which doesn't specify noreturn.
            Since the call site always uses 'lcall' I think specifying noreturn will do effectively nothing in this case. But I've added it anyway.

            I would definitely not use printf() in this error function. It is a very complex function that is very likely to fail when the stack is corrupted.

            Ttrue. I didn't consider that (In my app implementation it resets the stack before doing any printing). Removed.

            Instead of __start__stack you can also use s_SSEG (they are equal). It is accompanied by l_SSEG which holds the length of the stack segment. And since the linker doesn't support adding two relocatable symbols either, I consider adding e_SSEG.

            I'm not sure how to use the area/section/segment names? What would that improve? Can you give an example?

            The underflow check does not yet take the PUSH ACC into account

            By underflow you mean the 'subb' check? If so ...
            Why does it have to? At that point, all it cares about is whether the stack pointer has rolled over and is somewhere in the range 0x00 - start_stack. Note that earlier in the prologue, there might be other stack pushes, e.g. for the frame pointer. But I don't see how that would be relevant at that point.

            and neither does the i<=5 case. With the underflow check fixed the overflow no longer needs i-1.

            Sorry, I don't follow. Please point out in the code.

            There are very few 8051 derivatives where jumps are so relatively expensive.

            IMO, even on the EFM8UB I wouldn't want to have those extra 15 cycles for each function call. The thing is already slow as-is.

            How many cycles do your checks cost in the accIsFree case?

                  emitcode ("clr", "c");
                  emitcode ("mov", "a,sp");
                  emitcode ("subb", "a,#__start__stack - 1");
                  emitcode ("jc", "%s_stack_probe_err", sym->rname);
            
                  // EFM8UB: 1+2+2+2 = 7 cycles (jc not taken)
                  // DW8051: 1+2+2+3 = 8 cycles
            
                  emitcode ("add", "a,#(__start__stack - 1 + !constbyte)", i);
                  emitcode ("jc", "%s_stack_probe_err", sym->rname);
                  emitcode ("mov", "sp,a");
            
                  // EFM8UB:2+2+2 = 6 cycles (jc not taken)
                  // DW8051: 2+3+2 = 7 cycles
            
                  // total cycles
                   //  EFM8: 7 + 6 = 13 cycles (6 cycles are for stack adjust)
            
                   // DW8051: 8 + 7 = 15 cycles (6 cycles are for stack adjust)
            

            Does the device have a prefetch buffer and/or cache?

            The EFM8UB does have a prefetch buffer, and it makes taken jumps more expensive than if the prefetch buffer is disabled. The DW8051 device doesn't have any buffer/cache.

            I would prefer to use a support routine and save bytes.

            For the cases like freereg == null, it might be good to outsource the stack adjustment + probing code into a support routine, when optimizing for size. For the other cases, I can't agree with it. Can we leave that as a potential future improvement? (If anybody cares at all, since I won't enable stack probing by default for everything).

             
            • Oleg Endo

              Oleg Endo - 2023-07-21

              Instead of __start__stack you can also use s_SSEG (they are equal). It is accompanied by l_SSEG which holds the length of the stack segment. And since the linker doesn't support adding two relocatable symbols either, I consider adding e_SSEG.

              I'm not sure how to use the area/section/segment names? What would that improve? Can you give an example?

              OK, I've tried it out, I can replace __start__stack with s_SSEG. But as such, it doesn't change much for the current implementation.

              Support for stack end != 0xFF could be added of course, but is there any practical use case for that? I'm not sure ....

               
              • Maarten Brock

                Maarten Brock - 2023-07-25

                I am also not sure about checking stack end != 0xFF. It can happen, but is unlikely.
                But at least by using l_SSEG it is possible. There is no __end__stack or __len__stack or similar and there probably never will be.

                However, if the stack probing was done by an external assembly routine, it would be easier for any user to modify it to add this length check. It could even be implemented behind an .if clause.

                Example:

                    .area DSEG    (DATA)
                stack_temp:
                    .ds 1
                
                    .area HOME    (CODE)
                
                __sdcc_stack_probe_1::
                    mov stack_temp,a
                    mov a,#s_SSEG + 1
                    sjmp    __sdcc_stack_cmp
                __sdcc_stack_probe::
                    mov stack_temp,a        ; stack_temp = requested additional stack space
                    mov a,#s_SSEG
                __sdcc_stack_cmp:
                    xrl a,#0xFF
                    add a,sp                ; a = allocated stack space before call here
                    jnc __sdcc_stack_error
                    add a,stack_temp        ; a = needed allocated stack space
                    jc  __sdcc_stack_error
                    .if 0
                    mov stack_temp,a        ; stack_temp = needed allocated stack space
                    mov a,#l_SSEG
                    xrl a,#0xFF
                    add a,stack_temp        ; a = space - l_SSEG - 1
                    jc  __sdcc_stack_error
                    mov a,stack_temp
                    .endif
                    add a,#s_SSEG - 1       ; a = new stack pointer value
                    ret
                __sdcc_stack_error::
                    sjmp    .
                

                __sdcc_stack_probe_1 is for the case that A needs to pushed before the call.

                 
            • Maarten Brock

              Maarten Brock - 2023-07-25

              Specifying noreturn should prevent the user from creating a function that does return since it has no proper return address. It should probably also be added to the intrinsic functions prototypes in SDCC so it will always protest when noreturn is left out.

               
            • Maarten Brock

              Maarten Brock - 2023-07-25

              By underflow you mean the 'subb' check? If so ...
              Why does it have to? At that point, all it cares about is whether the stack pointer has rolled over and is somewhere in the range 0x00 - start_stack. Note that earlier in the prologue, there might be other stack pushes, e.g. for the frame pointer. But I don't see how that would be relevant at that point.

              You can at least take into account that you pushed ACC here. If more pushes were done in between, it would be best if they are taken into account as well. Who knows, it might have overflown back into valid stack space!

                    emitcode ("clr", "c");
                    emitcode ("mov", "a,sp");
                    if (!accIsFree && freereg == NULL)
                      emitcode ("subb", "a,#__start__stack");
                    else
                      emitcode ("subb", "a,#__start__stack - 1");
                    emitcode ("jc", "%s_stack_probe_err", sym->rname);
              

              After this, you don't need to use 'i - 1' any more in the (i > 5) case and neither in the else where you missed it (i<=5).

               
            • Maarten Brock

              Maarten Brock - 2023-07-25

              The DW8051 has no buffer/cache but making a jump still takes approx. 50 cycles where other instructions take approx. 2. I'm very surprised.

               
  • Oleg Endo

    Oleg Endo - 2023-07-20

    @maartenbrock I've tried out the following on an DW8051 core:

    void test (void)
    {

    __asm
      mov  r0,sp
      mov  r1,sp
    
      mov  a,#0x20
      push a
    
      pop sp
    
      mov a,sp
    
      mov sp,r1
    __endasm;
    
      uint8_t test_sp_val = ACC;
    
      printf ("AAAAAAAAAAAAAAA test %u\n", test_sp_val);
    
      while (true);
    }
    

    it prints '32'.

    Looks like it also follows the 'pop sp' special case of the original mcs-51 like the EFM8. So can we assume that this works? I've tried to look at some open source 8051 soft core implementations but their operation in this regard is not straight forward to see through for me...

     
  • Oleg Endo

    Oleg Endo - 2023-07-26

    @maartenbrock replying in new message (it's getting too cluttered here)

    Specifying noreturn should prevent the user from creating a function that does return since it has no proper return address. It should probably also be added to the intrinsic functions prototypes in SDCC so it will always protest when noreturn is left out.

    OK, makes sense to me.

    am also not sure about checking stack end != 0xFF. It can happen, but is unlikely.
    But at least by using l_SSEG it is possible. There is no __end__stack or __len__stack or similar and there probably never will be.

    If we set iram size = 128, then stack end is implicitly 0x7F. That's what I was primarily thinking of. But I'm not sure how widely devices with iram size = 128 are still used. Most have 256 these days, I think. There is a --stack-size option. Theoretically it can be used to place user variables after the stack space (up to end of iram). But also not sure what's the advantage of that over placing those variables before the stack ...

    Perhaps easiest way for now is to assume that iram size = 256 and stack space sits at the end of iram. If that's not good enough for somebody, they are free to report :)

    However, if the stack probing was done by an external assembly routine, it would be easier for any user to modify it to add this length check. It could even be implemented behind an .if clause.

    Example:
    ....

    Yes, sure, no doubt about it. The external function would allow for a more flexible stack probing implementation and it's easy to add complex sophisticated detailed checks, whereas my goal for this one was to have fast-best-effort checks so that stack probing can be left in the production build of the software.

    We could make this available via options. E.g. --stack-probe would do the function call approach while --stack-probe-inline would do what I'm proposing here.

    You can at least take into account that you pushed ACC here. If more pushes were done in between, it would be best if they are taken into account as well.

    After this, you don't need to use 'i - 1' any more in the (i > 5) case and neither in the else where you missed it (i<=5).

    Ah! OK, I get it now. Yes, it's more obvious that way. I'll change it. Thanks!

    Who knows, it might have overflown back into valid stack space!

    Yes, it's not perfect. Every push onto the stack in the prologue code could overflow it. In fact, the call-insn itself might overflow it, too. Of course we can't check after every single push insn. The 'subb' check does basically cumulative checking of all those, in the hope that the number of bytes pushed onto the stack so far has not exceeded 'stack_start - 0x00'. If that were the case, it would wrongly judge that the 'sp' is OK.

    We can figure out how many bytes would need to be pushed by the prologue. I'd just count all the emitted push insns and then insert the following snippet at the very top of the function:

      push a
      mov a,sp
      add a,#prologue_num_bytes
      pop a
      jc stack_probe_err
    

    It would take this whole thing back to almost the original version of my patch, where I was doing the 'add' check first, followed by 'subb'.

    The current version of the patch produces overall smaller code size for me, as the 'add' check is integrated into the actual stack adjustment code and the number of pushed bytes in the prologue is usually zero (I'm always using --fomit-frame-pointer).

    So if there are any number of pushed bytes in the prologue, it should emit the 'add' check at the top of the function and just add the 'stackAdjust' number to the number of prologue bytes. That would be more robust. I could add this if you insist. Although I don't have any need for that myself at the moment.

     
  • Oleg Endo

    Oleg Endo - 2023-08-29

    @maartenbrock Sorry, got worked up with another thing.

    I've revised the patch, to address your point:

    You can at least take into account that you pushed ACC here. If more pushes were done in between, it would be best if they are taken into account as well. Who knows, it might have overflown back into valid stack space!

    The patch is a bit shorter now.

     
  • Oleg Endo

    Oleg Endo - 2023-09-01

    @maartenbrock While rebasing the patch onto the current SVN trunk, I've noticed that there is a problem with the --all-callee-saves option.

    The following test code:

    void bleh_10 (uint32_t x);
    
    void bleh_11 (uint32_t x)
    {
       bleh_10 (0x11223344ul);
       bleh_10 (x);
    }
    

    Compiled with sdcc -c --model-large --std-sdcc2x -mmcs51 --stack-auto --nogcse --fverbose-asm --stack-probe --all-callee-saves

    000000                        120 _bleh_11_stack_probe_err:
          000000 12r00r00         [24]  121     lcall   __stack_probe_err
          000003                        122 _bleh_11:
                               000007   123     ar7 = 0x07
                               000006   124     ar6 = 0x06
                               000005   125     ar5 = 0x05
                               000004   126     ar4 = 0x04
                               000003   127     ar3 = 0x03
                               000002   128     ar2 = 0x02
                               000001   129     ar1 = 0x01
                               000000   130     ar0 = 0x00
          000003 C0 07            [24]  131     push    ar7
          000005 C0 06            [24]  132     push    ar6
          000007 C0 05            [24]  133     push    ar5
          000009 C0 04            [24]  134     push    ar4
                                        135 ;   genReceive
          00000B AC 82            [24]  136     mov r4,dpl
          00000D AD 83            [24]  137     mov r5,dph
          00000F AE F0            [24]  138     mov r6,b
          000011 FF               [12]  139     mov r7,a
          000012 C3               [12]  140     clr c
          000013 E5 81            [12]  141     mov a,sp
          000015 94rFF            [12]  142     subb    a,#__start__stack - 1
          000017 40 E7            [24]  143     jc  _bleh_11_stack_probe_err
                                        144 ;   bug.c:872: bleh_10 (0x11223344ul);
                                        145 ;   genCall
                                        146 ;   Peephole 182.b  used 16 bit load of dptr
          000019 90 33 44         [24]  147     mov dptr,#0x3344
          00001C 75 F0 22         [24]  148     mov b,#0x22
          00001F 74 11            [12]  149     mov a,#0x11
          000021 12r00r00         [24]  150     lcall   _bleh_10
                                        151 ;   bug.c:873: bleh_10 (x);
                                        152 ;   genCall
          000024 8C 82            [24]  153     mov dpl,r4
          000026 8D 83            [24]  154     mov dph,r5
          000028 8E F0            [24]  155     mov b,r6
          00002A EF               [12]  156     mov a,r7
                                        157 ;   bug.c:874: }
                                        158 ;   Peephole 400.b  replaced lcall/ret with ljmp
          00002B 02r00r00         [24]  159     ljmp    _bleh_10
                                        160 ;
                                        161 ;   eliminated unneeded push/pop ar7
                                        162 ;   eliminated unneeded push/pop ar6
                                        163 ;   eliminated unneeded push/pop ar5
                                        164 ;   eliminated unneeded push/pop ar4
                                        165     .area BANK1   (CODE)
                                        166     .area BANK1   (CODE)
                                        167     .area XINIT   (CODE)
                                        168     .area CABS    (ABS,CODE)
    

    The code at the end of genEndFunction wrongly eliminates the pop insns. This happens when any one of the stack adjustment or stack probe insns is inserted.

    This happens because those stack modification insns will have the iCode set to FUNCTION. And the code in genEndFunction uses that as a terminating condition for scanning for push insns when it checks if a pop insn can be eliminated. And thus it wrongly eliminates the pop insns.

    This looks like a latent bug, because it would also happen if any of the previous stack / xstack adjustment insns are emitted. It seems to work OK if the iCode of the emitted insns is set to NULL.

    Attached is the updated patch which should now apply cleanly onto trunk.

     
  • Oleg Endo

    Oleg Endo - 2023-10-18

    @maartenbrock any updates on this one?

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.