Menu

#195 Optimize _divulong on mcs51

None
closed-rejected
None
5
2015-11-09
2012-09-06
Anonymous
No

The _divulong function use a bool variables, but this is not optimal:

- this force the use of the "T" reg as soon as a 32bit division is used on
mcs51, thus potentially reducing the stack size up to 24 bytes.
- This force the compiler to use a bit, which produce suboptimal code
- This does not reduce stack useage as the compiler can allocate the bool
variable to a register

The following patch use a unsigned char for the boolean variable the generated
code is changed as following for the large stack auto code model:

;x Allocated to stack - _bp +1
;reste Allocated to registers r2 r3 r6 r7
;count Allocated to registers r5
-;c Allocated to registers b0
+;c Allocated to registers r4
;------------------------------------------------------------
; _divulong.c:335: _divulong (unsigned long x, unsigned long y)
; -----------------------------------------
@@ -148,8 +134,6 @@
rl a
anl a,#0x01
mov r4,a
- add a,#0xff
- mov b0,c
; _divulong.c:345: x <<= 1;
mov r0,_bp
inc r0
@@ -182,7 +166,8 @@
rlc a
mov r7,a
; _divulong.c:347: if (c)
- jnb b0,00102$
+ mov a,r4
+ jz 00102$
; _divulong.c:348: reste |= 1L;
orl ar2,#0x01
00102$:

On cc253x SoC this lead to a reduction of 4 byte in code size and 5 cycle per
loop iteration, thus saving 160 cycle on a 32bits division.

Discussion

  • Philipp Klaus Krause

    I don't like having a special case for such a small gain. IMO, the right thing to do would be to just implement _Bool for mcs51 (instead of the non-compliant fake using __bit we have now).

    Philipp

     
  • Maarten Brock

    Maarten Brock - 2013-07-19

    A full implementation of _Bool for mcs51 would make no difference IMO, because it should still map booleans to bitspace whenever possible. In general this should be more optimal. Also most of the time there are ISR's that use alternate register banks and some global variables that easily fill the 24 bytes.

    What remains is that the current implementation does not generate optimal code.

     
  • Maarten Brock

    Maarten Brock - 2015-11-09
    • status: open --> closed-rejected
    • assigned_to: Maarten Brock
    • Group: -->
     
  • Maarten Brock

    Maarten Brock - 2015-11-09

    The current version of SDCC 3.5.5 #9397 no longer generates the inefficient

    rl a
    anl a,#0x01
    mov r4,a
    add a,#0xff
    

    But instead gives

    rlc a
    

    This makes the current implementation with bool 1 byte more efficient than the unsigned char in this patch.

     

Log in to post a comment.