## Re: [Sdcc-user] Optimising a+=b;

 Re: [Sdcc-user] Optimising a+=b; From: Philipp Klaus Krause - 2014-06-24 17:40:53 ```-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think it would be helpful to readers of your post, if you'd * State which target this is about * State which data types the variables are (or even better: give a compileable function) While these could be inferred from the assembly code you posted, they would still make reading easier. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iEYEARECAAYFAlOpuBkACgkQbtUV+xsoLprMJQCgnSxz07CLU7dkx1eGdQMK6IGh TBcAnj62xyTYqBXFYQ8qm8nKpyKKshKr =AENS -----END PGP SIGNATURE----- ```

 [Sdcc-user] Optimising a+=b; From: Kustaa Nyholm - 2014-06-24 17:23:01 ```Hi, its me again ;) Any idea how difficult it would be to add optimisation operators like '+=' and company? Here is a piece of compiled C from my project: ; .line 150; stepperirq.c MOTOR.nco += MOTOR.speed; MOVF (_g + 10), W, B ADDWF (_g + 4), W, B MOVWF r0x00 MOVF (_g + 11), W, B ADDWFC (_g + 5), W, B MOVWF r0x01 MOVF r0x00, W MOVWF (_g + 4), B MOVF r0x01, W MOVWF (_g + 5), B This look rather long compared to this snipper I copied from the web: movf a,w addwf b,f btfsc STATUS,C incf b+1,f movf a+1,w addwf b+1,f 6 versus 10 instructions and a two one byte temp vars. I have a beef with the temp vars especially as (unlike I thought before) these get overlaid with temp vars (registers really for the compiler I guess) which necessitates altogether four push/pull instructions in my critical interrupt handler. That [MOTOR.nco += MOTOR.speed] innocent looking C statement is responsible for almost about 30% of my execution time! Of course I could do this in in assembly but that would be more pain as the compiler does a nice job of simplifying the MOTOR.xxxx variable access which is a bit more complicated than it looks and if I go to assembly then I'm on my own. OTH then I could take advantage of the carry flag of this 16 bit add operation that would be useful for me and further improve my execution speed. Anyway would it be difficult to add this optimisation, in the peephole optimiser or earlier in the pipeline? Any other ideas how to optimise that piece of code? br Kusti This e-mail may contain confidential or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We will not be liable for direct, indirect, special or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on or as of transmission of this e-mail in general. ```
 Re: [Sdcc-user] Optimising a+=b; From: Philipp Klaus Krause - 2014-06-24 17:40:53 ```-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think it would be helpful to readers of your post, if you'd * State which target this is about * State which data types the variables are (or even better: give a compileable function) While these could be inferred from the assembly code you posted, they would still make reading easier. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iEYEARECAAYFAlOpuBkACgkQbtUV+xsoLprMJQCgnSxz07CLU7dkx1eGdQMK6IGh TBcAnj62xyTYqBXFYQ8qm8nKpyKKshKr =AENS -----END PGP SIGNATURE----- ```
 Re: [Sdcc-user] Optimising a+=b; From: Kustaa Nyholm - 2014-06-24 18:56:27 ```Right, sorry about that. I'm working on PIC18F4550. Compilable function ... well I can give you the defs and a code snippet: typedef struct { union { uint16_t nco; struct { unsigned nco_low_bits :15; unsigned nco_hi_bit :1; }; }; struct { unsigned has_next :1; unsigned reserve_b1; unsigned reserve_b2 :1; unsigned reserve_b3 :1; unsigned reserve_b4 :1; unsigned reserve_b5 :1; unsigned reserve_b6 :1; unsigned reserve_b7 :1; }; uint16_t speed; uint8_t next_steps; uint16_t next_speed; uint8_t next_dir; uint8_t steps; } motor_t; #define MOTOR g.motor[0] and the code snippet: .... MOTOR.nco += MOTOR.speed; if (MOTOR.steps) { if (MOTOR.nco_hi_bit) { // speed NCO overflowed and we have steps left => generate pulse MOTOR.steps--; .... so basically it is just advancing a 16 bit unsigned variable by adding an other 16 bit unsigned to it. I treat the value as 15 bit and check the highest bit for overflow, in assembler I would have access to the carry bit and could use the full 16 bits which would better but I'm a happy to use just 15 bits if I can stay in the C-domain. Sorry that I did not provide this info in the first place, I was so deeply into my code that I took all this granted, which it isn't of course. br Kusti On 24/06/2014 20:40, "Philipp Klaus Krause" wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >I think it would be helpful to readers of your post, if you'd > >* State which target this is about >* State which data types the variables are (or even better: give a >compileable function) > >While these could be inferred from the assembly code you posted, they >would still make reading easier. > >Philipp > >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v1 >Comment: Using GnuPG with Icedove - http://www.enigmail.net/ > >iEYEARECAAYFAlOpuBkACgkQbtUV+xsoLprMJQCgnSxz07CLU7dkx1eGdQMK6IGh >TBcAnj62xyTYqBXFYQ8qm8nKpyKKshKr >=AENS >-----END PGP SIGNATURE----- > >-------------------------------------------------------------------------- >---- >Open source business process management suite built on Java and Eclipse >Turn processes into business applications with Bonita BPM Community >Edition >Quickly connect people, data, and systems into organized workflows >Winner of BOSSIE, CODIE, OW2 and Gartner awards >http://p.sf.net/sfu/Bonitasoft >_______________________________________________ >Sdcc-user mailing list >Sdcc-user@... >https://lists.sourceforge.net/lists/listinfo/sdcc-user > -- Kustaa Nyholm Research Manager, Software Research and Technology Division PLANMECA OY Asentajankatu 6 00880 HELSINKI FINLAND Please note our new telephone and fax numbers! Tel: +358 20 7795 572 (direct) Fax: +358 20 7795 676 GSM: +358 40 580 5193 e-mail: kustaa.nyholm@... This e-mail may contain confidential or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We will not be liable for direct, indirect, special or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on or as of transmission of this e-mail in general. ```
 Re: [Sdcc-user] Optimising a+=b; From: Raphael Neider - 2014-06-24 20:21:12 Attachments: Message as HTML ```Hi, I can currently not verify this, but anyways: I think that union's were and possibly are implicitly treated as volatile by SDCC and/or the PIC backend(s) due to problems in maintaining correct potential alias sets. This might explain the bad code that is generated for the a+=b operation you see, as your code does use unions. To test this, you could x+=y two global uint16_t variables and see what happens. For structs, I would expect the same code as for globals. It might also be interesting to see what happens if the right operand is simple (say, just a global variable or even a function argument): uint16_t x, y; struct { uint16_t v; uint16_t w; } s; union { uint16_t a; uint16_t b; } u; void foo() { x += y; s.v += s.w; u.a += u.b; } void bar() { x += y; s.v += y; u.a += y; } By the way, I can imagine that += on bitfields is even worse ... Best regards, Raphael On Tue, Jun 24, 2014 at 8:56 PM, Kustaa Nyholm wrote: > Right, > > sorry about that. > > I'm working on PIC18F4550. > > Compilable function ... well I can give you the defs and a code > snippet: > > typedef struct { > union { > uint16_t nco; > struct { > unsigned nco_low_bits :15; > unsigned nco_hi_bit :1; > }; > }; > struct { > unsigned has_next :1; > unsigned reserve_b1; > unsigned reserve_b2 :1; > unsigned reserve_b3 :1; > unsigned reserve_b4 :1; > unsigned reserve_b5 :1; > unsigned reserve_b6 :1; > unsigned reserve_b7 :1; > }; > uint16_t speed; > uint8_t next_steps; > uint16_t next_speed; > uint8_t next_dir; > uint8_t steps; > } motor_t; > > > #define MOTOR g.motor[0] > > > and the code snippet: > > .... > MOTOR.nco += MOTOR.speed; > if (MOTOR.steps) { > if (MOTOR.nco_hi_bit) { // speed NCO overflowed and we have steps left => > generate pulse > MOTOR.steps--; > .... > > > so basically it is just advancing a 16 bit unsigned variable > by adding an other 16 bit unsigned to it. I treat the value > as 15 bit and check the highest bit for overflow, in assembler > I would have access to the carry bit and could use the full > 16 bits which would better but I'm a happy to use just 15 bits > if I can stay in the C-domain. > > Sorry that I did not provide this info in the first place, > I was so deeply into my code that I took all this granted, > which it isn't of course. > > br Kusti > > > > > > > > On 24/06/2014 20:40, "Philipp Klaus Krause" wrote: > > >-----BEGIN PGP SIGNED MESSAGE----- > >Hash: SHA1 > > > >I think it would be helpful to readers of your post, if you'd > > > >* State which target this is about > >* State which data types the variables are (or even better: give a > >compileable function) > > > >While these could be inferred from the assembly code you posted, they > >would still make reading easier. > > > >Philipp > > > >-----BEGIN PGP SIGNATURE----- > >Version: GnuPG v1 > >Comment: Using GnuPG with Icedove - http://www.enigmail.net/ > > > >iEYEARECAAYFAlOpuBkACgkQbtUV+xsoLprMJQCgnSxz07CLU7dkx1eGdQMK6IGh > >TBcAnj62xyTYqBXFYQ8qm8nKpyKKshKr > >=AENS > >-----END PGP SIGNATURE----- > > > >-------------------------------------------------------------------------- > >---- > >Open source business process management suite built on Java and Eclipse > >Turn processes into business applications with Bonita BPM Community > >Edition > >Quickly connect people, data, and systems into organized workflows > >Winner of BOSSIE, CODIE, OW2 and Gartner awards > >http://p.sf.net/sfu/Bonitasoft > >_______________________________________________ > >Sdcc-user mailing list > >Sdcc-user@... > >https://lists.sourceforge.net/lists/listinfo/sdcc-user > > > > > -- > Kustaa Nyholm > Research Manager, Software > Research and Technology Division > PLANMECA OY > Asentajankatu 6 > 00880 HELSINKI > FINLAND > > Please note our new telephone and fax numbers! > Tel: +358 20 7795 572 (direct) > Fax: +358 20 7795 676 > GSM: +358 40 580 5193 > e-mail: kustaa.nyholm@... > > > > > > > This e-mail may contain confidential or privileged information. If you are > not the intended recipient (or have received this e-mail in error) please > notify the sender immediately and destroy this e-mail. Any unauthorized > copying, disclosure or distribution of the material in this e-mail is > strictly forbidden. We will not be liable for direct, indirect, special or > consequential damages arising from alteration of the contents of this > message by a third party or as a result of any virus being passed on or as > of transmission of this e-mail in general. > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Sdcc-user mailing list > Sdcc-user@... > https://lists.sourceforge.net/lists/listinfo/sdcc-user > ```
 Re: [Sdcc-user] Optimising a+=b; From: Kustaa Nyholm - 2014-06-25 03:39:26 ```>I can currently not verify this, but anyways: I think that union's were >and possibly are implicitly >treated as volatile by SDCC and/or the PIC backend(s) due to problems in >maintaining correct >potential alias sets. This might explain the bad code that is generated >for the a+=b operation >you see, as your code does use unions. To test this, you could x+=y two >global uint16_t variables >and see what happens. For structs, I would expect the same code as for >globals. > It might also be interesting to see what happens if the right operand is >simple > (say, just a global variable or even a function argument): Thanks for the suggestions, it looks you are right, see below. Adding two globals produces indeed is best with no unnecessary temp vars. Unfortunately this brings back the unnecessary BANKSEL problem. I'd be happy to use globals, the unions were only introduced to get rid of the extra BANSELs. The right hand operator type (struct or union) does not seem to make a difference. _bar: ; .line 225; stepperirq.c void bar() MOVFF FSR2L, POSTDEC1 MOVFF FSR1L, FSR2L MOVFF r0x00, POSTDEC1 MOVFF r0x01, POSTDEC1 BANKSEL _y ; .line 227; stepperirq.c x += y; MOVF _y, W, B BANKSEL _x ADDWF _x, F, B BANKSEL (_y + 1) MOVF (_y + 1), W, B BANKSEL (_x + 1) ADDWFC (_x + 1), F, B BANKSEL _y ; .line 228; stepperirq.c s.v += y; MOVF _y, W, B BANKSEL _s ADDWF _s, W, B MOVWF r0x00 BANKSEL (_y + 1) MOVF (_y + 1), W, B BANKSEL (_s + 1) ADDWFC (_s + 1), W, B MOVWF r0x01 MOVF r0x00, W ; removed redundant BANKSEL MOVWF _s, B MOVF r0x01, W ; removed redundant BANKSEL MOVWF (_s + 1), B BANKSEL _y ; .line 229; stepperirq.c u.a += y; MOVF _y, W, B BANKSEL _u ADDWF _u, W, B MOVWF r0x00 BANKSEL (_y + 1) MOVF (_y + 1), W, B BANKSEL (_u + 1) ADDWFC (_u + 1), W, B MOVWF r0x01 MOVF r0x00, W ; removed redundant BANKSEL MOVWF _u, B MOVF r0x01, W ; removed redundant BANKSEL MOVWF (_u + 1), B MOVFF PREINC1, r0x01 MOVFF PREINC1, r0x00 MOVFF PREINC1, FSR2L RETURN ; ; Starting pCode block S_stepperirq__foo code _foo: ; .line 218; stepperirq.c void foo() MOVFF FSR2L, POSTDEC1 MOVFF FSR1L, FSR2L MOVFF r0x00, POSTDEC1 MOVFF r0x01, POSTDEC1 BANKSEL _y ; .line 220; stepperirq.c x += y; MOVF _y, W, B BANKSEL _x ADDWF _x, F, B BANKSEL (_y + 1) MOVF (_y + 1), W, B BANKSEL (_x + 1) ADDWFC (_x + 1), F, B BANKSEL (_s + 2) ; .line 221; stepperirq.c s.v += s.w; MOVF (_s + 2), W, B ; removed redundant BANKSEL ADDWF _s, W, B MOVWF r0x00 ; removed redundant BANKSEL MOVF (_s + 3), W, B ; removed redundant BANKSEL ADDWFC (_s + 1), W, B MOVWF r0x01 MOVF r0x00, W ; removed redundant BANKSEL MOVWF _s, B MOVF r0x01, W ; removed redundant BANKSEL MOVWF (_s + 1), B BANKSEL _u ; .line 222; stepperirq.c u.a += u.b; MOVF _u, W, B MOVWF r0x00 ADDWF r0x00, F ; removed redundant BANKSEL RLCF (_u + 1), W, B MOVWF r0x01 MOVF r0x00, W ; removed redundant BANKSEL MOVWF _u, B MOVF r0x01, W ; removed redundant BANKSEL MOVWF (_u + 1), B MOVFF PREINC1, r0x01 MOVFF PREINC1, r0x00 MOVFF PREINC1, FSR2L RETURN Making the second operator function parameter does change things a bit, but that is not an option for me as moving the global state to a local would defeat the saving I get in the addition: ; .line 225; stepperirq.c void bar(uint16_t y) MOVFF FSR2L, POSTDEC1 MOVFF FSR1L, FSR2L MOVFF r0x00, POSTDEC1 MOVFF r0x01, POSTDEC1 MOVFF r0x02, POSTDEC1 MOVFF r0x03, POSTDEC1 MOVLW 0x02 MOVFF PLUSW2, r0x00 MOVLW 0x03 MOVFF PLUSW2, r0x01 ; .line 227; stepperirq.c x += y; MOVF r0x00, W BANKSEL _x ADDWF _x, F, B MOVF r0x01, W ; removed redundant BANKSEL ADDWFC (_x + 1), F, B ; .line 228; stepperirq.c s.v += y; MOVF r0x00, W BANKSEL _s ADDWF _s, W, B MOVWF r0x02 MOVF r0x01, W ; removed redundant BANKSEL ADDWFC (_s + 1), W, B MOVWF r0x03 MOVF r0x02, W ; removed redundant BANKSEL MOVWF _s, B MOVF r0x03, W ; removed redundant BANKSEL MOVWF (_s + 1), B BANKSEL _u ; .line 229; stepperirq.c u.a += y; MOVF _u, W, B ADDWF r0x00, F ; removed redundant BANKSEL MOVF (_u + 1), W, B ADDWFC r0x01, F MOVF r0x00, W ; removed redundant BANKSEL MOVWF _u, B MOVF r0x01, W ; removed redundant BANKSEL MOVWF (_u + 1), B MOVFF PREINC1, r0x03 MOVFF PREINC1, r0x02 MOVFF PREINC1, r0x01 MOVFF PREINC1, r0x00 MOVFF PREINC1, FSR2L RETURN So I'm in a bit of 22 unless I can somehow tell the compiler to not use so many BANKSELs in which case I could use straight forward globals. The contorted union/structs were only introduced to force the compiler to generate faster code and indeed they do, except for this addition. About the only thing I would need to use the union is to check the high bit of the nco accumulator, the union really makes checking the high bit fast! Any ideas how to get forward? ...and of course, many thanks for the support and SDCC in the first place. br Kusti This e-mail may contain confidential or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We will not be liable for direct, indirect, special or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on or as of transmission of this e-mail in general. ```
 Re: [Sdcc-user] Optimising a+=b; From: Kustaa Nyholm - 2014-06-27 03:38:18 Attachments: Message as HTML ```Hi, last night I decided to bite the bullet and coded all of my interrupt in assembly which allowed me to rise the interrupt frequency from 60 kHz up to 170 kHz after which it it leaves so few cycles for the anything else that the USB stack stops working. This is fine, as at my target freq of 120 kHz there is still about 50% of cycles available for other tasks which should fine. But that is all beside the point except that it shows that with little bit of effort I was able to optimise the code a lot and I'm virtually a PIC assembler virgin and that got me thinking: Even with the union/struc{} trick that allows the compiler to throw away many of the BANKSEL a few of them still remained. Apparently because of the union{} trick the compiler generated very wasteful code for a lot of the operations, for example: This: ; .line 126; stepperirq.c MOTOR.steps = MOTOR.next_steps; ; MOVFF (_g + 21), r0x00 ; MOVF r0x00, W ; MOVWF (_g + 17), B ; .line 127; stepperirq.c MOTOR.speed = MOTOR.next_speed; ; MOVFF (_g + 19), r0x00 ; MOVFF (_g + 20), r0x01 ; MOVF r0x00, W ; MOVWF (_g + 15), B ; MOVF r0x01, W ; MOVWF (_g + 16), B I replaced with this: ; MOVFF (_g + 21), (_g + 17) ; MOVFF (_g + 19), (_g + 15) ; MOVFF (_g + 20), (_g + 16) So it looks to me that a #pragma or something that could be used to tell the compiler that certain variables are on the same bank would pay hi dividends in terms of code size and execution speed. Seems like this should not be a big deal for a person who knows the compiler inner working and really help with the code quality. An other #pragma that would tell the compiler not to overlay registers would simple implement and but useful when the coding gets though. Combine that with a less conservative and more analytical function prologue/epilogue generation would go a long way towards smaller and faster code I think. Maybe I'm wrong. Oh, af, in case you missed it, I'm working on PIC18. Ok, enough of day dreaming, I have a concrete question: How can I ensure that a variable or number of variables are all in the same bank? Because SDCC generates code for struct's as if they are on the same page I presume that somehow this is ensured but I could not spot anything in the generated assembly that tells the linker to place it all on one page. Nor in the gpasm or MPASM manuals, on the contrary I found a warning that things may span several banks. So how does the magic happen? br Kusti ________________________________ This e-mail may contain confidential or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We will not be liable for direct, indirect, special or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on or as of transmission of this e-mail in general. ```