Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
Close
From: Bernhard Held <bernhard@be...>  20040211 21:40:21

I've commited some more optimizations: previously only mul/div/mod operations with two _unsigned_ char operands have been processed by genMult/Div/ModOneByte functions. With 1 or 2 signed operands the operands have been promoted to int requiring a 16 bit operation. Now all mul/div/mod operations with two 8 bit operands in any signed/unsigned combination are handled by genMult/Div/ModOneByte functions. Moreover these functions must be able to return 8 or 16 bit results (they should be prepared for 32 bit). This way a 16 bit software operation can be avoided and we make full use of the mcu capabilities. Another change: the modulo result's sign follows the dividend only, and never the divisor's sign. Anything else is not mathematical correct. I've commited a comprehensive regression test too: onebyte.c. The mcs51, ds390 and z80 ports pass the regression tests. Very impressive is the z80 port: apart from one problem, which has been fixed by Erik, the port passed immediately all tests. I guess the pic14 and pic16 ports don't fullfill the new requirements. The release deadline is near, and I don't want neither to delay the release nor to put any pressure on you. Please let me know, if I should revert the old behaviour (= promotion to int) for any other port. The environment variable SDCC_NEWONEBYTEOPS selects the new, SDCC_OLDONEBYTEOPS selects the old behaviour. I had a short look at the hc08 port. The only problem I could see is that in a modulo operation the sign of the divisor has no influence on the result: diff p c r1.13 gen.c *** src/hc08/gen.c 11 Feb 2004 21:30:32 0000 1.13  src/hc08/gen.c 11 Feb 2004 21:33:04 0000 *************** genModOneByte (operand * left, *** 3765,3771 **** /* AND literal negative */ if (val < 0) { emitcode ("ldx", "#0x%02x", val);  negLiteral = TRUE; } else { emitcode ("ldx", "#0x%02x", val); }  3765,3770  Bernhard 
From: Vangelis Rokas <vrokas@ot...>  20040212 21:48:48

On Wednesday 11 February 2004 11:39 pm, Bernhard Held wrote: > previously only mul/div/mod operations with two _unsigned_ char operands > have been processed by genMult/Div/ModOneByte functions. With 1 or 2 > signed operands the operands have been promoted to int requiring a 16 bit > operation. > > Now all mul/div/mod operations with two 8 bit operands in any > signed/unsigned combination are handled by genMult/Div/ModOneByte > functions. Moreover these functions must be able to return 8 or 16 bit > results (they should be prepared for 32 bit). This way a 16 bit software > operation can be avoided and we make full use of the mcu capabilities. > > Another change: the modulo result's sign follows the dividend only, and > never the divisor's sign. Anything else is not mathematical correct. > > I guess the pic14 and pic16 ports don't fullfill the new requirements. The > release deadline is near, and I don't want neither to delay the release nor > to put any pressure on you. Please let me know, if I should revert the old > behaviour (= promotion to int) for any other port. > The environment variable SDCC_NEWONEBYTEOPS selects the new, > SDCC_OLDONEBYTEOPS selects the old behaviour. Just to make sure that I understand what you mean. PIC16 port should be modified for all the arithmetic operations to use 8 bits when both operand are 8 bit, right? This should not be a problem for the pic16 (at least, but I see no reason why pic14 port should fail) since it loads the result operand according to the size of 'result'. At least I can guarantee this for the genMult function, because I just finished a refurnishing of the function to use the processor MULL instructions. I'll check with other operands too. I hope I am on the correct way... Also a question. When I multiply a char with a number which is a power of two i.e. a = a * 8, the iCode optimizer, optimizes the multiplication with a left shift. This is a true optimization for processors that can shift more than one bits at a time, but in PICs this is not possible, so for 8, the processor has to perform 3 shift left instructions, which slower than multiplying once with 8. Is it possible this optimization to be left for the pic16 port code generator? Regards, Vangelis Rokas 
From: Laurence Withers <lwithers@us...>  20040212 23:35:14

=2DBEGIN PGP SIGNED MESSAGE Hash: SHA1 On Thursday 12 February 2004 21:49, Vangelis Rokas wrote: > Also a question. When I multiply a char with a number which is a > power of two i.e. =A0a =3D a * 8, the iCode optimizer, optimizes the > multiplication with a left shift. This is a true optimization for > processors that can shift more than one bits at a time, but in PICs > this is not possible, so for 8, the processor has to perform 3 shift > left instructions, which slower than multiplying once with 8. Is it > possible this optimization to be left for the pic16 port code > generator? Actually, it might be better to add some rules to the peephole optimiser=20 which detects sequences of left shifts without carry and replaces them=20 with a multiplication by a constant. Bye for now, =2D =20 Laurence Withers, lwithers@..., l.withers@... http://xmlpcbrender.sf.net/ http://lwgui.sf.net/ http://pgp.dtype.org:11371/pks/lookup?op=3Dget&search=3D0x04A646EA =2DBEGIN PGP SIGNATURE Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFALA1cUdhclgSmRuoRAmcTAJ90u/dwDRFsCWl+v1R7s3Pp2tRmdACfRHIB A3oVRinXwYyRKfCaWyUj5HM=3D =3DGHSR =2DEND PGP SIGNATURE 
From: Vangelis Rokas <vrokas@ot...>  20040213 00:21:56

On Thu, 12 Feb 2004, Laurence Withers wrote: > Actually, it might be better to add some rules to the peephole optimiser > which detects sequences of left shifts without carry and replaces them > with a multiplication by a constant. It is a possibility, but for the time being the pic16 peephole optimizer is out of order. There are some issues that I need to overcome so I can't relay on it (at least for the near future). There are many things to fix, and release is near. I don't want to mess with portions that need a lot of work. Regards, Vangelis 
From: Bernhard Held <bernhard@be...>  20040213 21:54:09

> Just to make sure that I understand what you mean. PIC16 port should be > modified for all the arithmetic operations to use 8 bits when both operand > are 8 bit, right? if (op == mult/div/mod && getSize (leftOp) == 1 /* size of right and left are always the same: */ && getSize (rightOp) == 1) { this is what can happen here:  getSize (result) can be 1 or 2; it would be nice if you would be prepared for 4  leftOp can be signed or unsigned, independant from rightOp  rightOp can be signed or unsigned, independant from leftOp } > This should not be a problem for the pic16 (at least, but > I see no reason why pic14 port should fail) since it loads the result > operand according to the size of 'result'. Perfect. > Also a question. When I multiply a char with a number which is a power of > two i.e. a = a * 8, the iCode optimizer, optimizes the multiplication with > a left shift. This is a true optimization for processors that can shift > more than one bits at a time, but in PICs this is not possible, so for 8, > the processor has to perform 3 shift left instructions, which slower than > multiplying once with 8. Is it possible this optimization to be left for > the pic16 port code generator? Same story for mcs51: it can shift only one bit at a time. I'm doubtfull, if this should be disabled for mcs51 too. I'll have a look some other day. But in the meantime it's disabled for pic14 and pic16. Bernhard 
From: Scott Dattalo <scott@da...>  20040213 23:16:23

On Thu, 12 Feb 2004, Vangelis Rokas wrote: > Also a question. When I multiply a char with a number which is a power of two > i.e. a = a * 8, the iCode optimizer, optimizes the multiplication with a > left shift. This is a true optimization for processors that can shift more > than one bits at a time, but in PICs this is not possible, so for 8, the > processor has to perform 3 shift left instructions, which slower than > multiplying once with 8. Is it possible this optimization to be left for the > pic16 port code generator? Actually, the code for a multiplyby8 and a shiftleft3 are the same: movlw 8 mulwf _a movff PRODL, _a ; two word instruction versus: swapf _a,W ; roll left (or right) 4 rrncf WREG ; roll right 1 andlw 0xff << 3 ; get rid of the upper bits that rolled in movwf _a ; store _a * 8 Depending on the results of a liverange analysis one implementation may be preferred. Scott PS. I hope I just didn't consume my annual SDCC contribution :) 
From: HansJuergen Dorn <hans.dorn@ap...>  20040213 23:32:53

Scott Dattalo wrote: > On Thu, 12 Feb 2004, Vangelis Rokas wrote: > > >>Also a question. When I multiply a char with a number which is a power of two >>i.e. a = a * 8, the iCode optimizer, optimizes the multiplication with a >>left shift. This is a true optimization for processors that can shift more >>than one bits at a time, but in PICs this is not possible, so for 8, the >>processor has to perform 3 shift left instructions, which slower than >>multiplying once with 8. Is it possible this optimization to be left for the >>pic16 port code generator? > > > Actually, the code for a multiplyby8 and a shiftleft3 are the same: > > movlw 8 > mulwf _a > movff PRODL, _a ; two word instruction > > versus: > > swapf _a,W ; roll left (or right) 4 > rrncf WREG ; roll right 1 > andlw 0xff << 3 ; get rid of the upper bits that rolled in > movwf _a ; store _a * 8 > > Depending on the results of a liverange analysis one implementation may > be preferred. > > Scott > > PS. I hope I just didn't consume my annual SDCC contribution :) > > Hi Scott! I've come along the same thoughts when looking at long shifts. Using multiplies only saves time for a few cases, especially when left and result are the same and there is no efficient way to use both result bytes. The resulting code (mixed shifts and multiplies) might become a nightmare to support. P.S: Your simulator is performing nicely for pic16 and helps us a lot while debugging changes to the pic16 port. Regards Hans > > > >  > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > sdccdevel mailing list > sdccdevel@... > https://lists.sourceforge.net/lists/listinfo/sdccdevel > 
From: Scott Dattalo <scott@da...>  20040214 00:19:16

On Sat, 14 Feb 2004, HansJuergen Dorn wrote: > Hi Scott! > > I've come along the same thoughts when looking at long shifts. > Using multiplies only saves time for a few cases, especially when > left and result are the same and there is no efficient way to use > both result bytes. > The resulting code (mixed shifts and multiplies) might become a > nightmare to support. Yep! > > P.S: > > Your simulator is performing nicely for pic16 and helps us a lot > while debugging changes to the pic16 port. Great  so indirectly I still am helping the SDCC port! Scott 
From: Vangelis Rokas <vrokas@ot...>  20040214 02:24:26

On Fri, 13 Feb 2004, Scott Dattalo wrote: > Actually, the code for a multiplyby8 and a shiftleft3 are the same: > > movlw 8 > mulwf _a > movff PRODL, _a ; two word instruction > > versus: > > swapf _a,W ; roll left (or right) 4 > rrncf WREG ; roll right 1 > andlw 0xff << 3 ; get rid of the upper bits that rolled in > movwf _a ; store _a * 8 Correct. But this is the weird hacker's way to do multiply by literal!;) Not to mention, that if it is shiftleft4 (aka multiplyby16) then the shift method wins! ANW the first version, saves us a banksel directive when the result is not the same as the operand... (I have a point right?!;)) > PS. I hope I just didn't consume my annual SDCC contribution :) Nope, you'll never will... ;) PS. If I try to embed all these optimizations in the code generator the next person who will read the source might suicide before reaching the end of it. Well, someone said something for the peephole optimizer... I'm beginning to support his point... Its about time to get to it. It these MOVFFs that confuse it otherwise it would ok. Scott if you can give us any hints I'd appreciate it. Regards, Vangelis 
From: HansJuergen Dorn <hans.dorn@ap...>  20040214 03:15:40

Vangelis Rokas wrote: > On Fri, 13 Feb 2004, Scott Dattalo wrote: > > >>Actually, the code for a multiplyby8 and a shiftleft3 are the same: >> >> movlw 8 >> mulwf _a >> movff PRODL, _a ; two word instruction >> >>versus: >> >> swapf _a,W ; roll left (or right) 4 >> rrncf WREG ; roll right 1 >> andlw 0xff << 3 ; get rid of the upper bits that rolled in >> movwf _a ; store _a * 8 > > > Correct. But this is the weird hacker's way to do multiply by > literal!;) Not to mention, that if it is shiftleft4 (aka > multiplyby16) then the shift method wins! > That's the only way to do it for a WISC CPU. (Weird Instruction Set Computer) :o) > ANW the first version, saves us a banksel directive when the result is > not the same as the operand... (I have a point right?!;)) > You're right. I tend to forget about banksel most of the time. I guess we should use MOVFF's in the code generator whenever possible. Regards Hans 
From: Bernhard Held <bernhard@be...>  20040214 10:12:16

> Correct. But this is the weird hacker's way to do multiply by > literal!;) Not to mention, that if it is shiftleft4 (aka > multiplyby16) then the shift method wins! I bail out :) If you want to get back the shift optimization, you now know where to patch SDCCicode.c. Bernhard 
From: Vangelis Rokas <vrokas@ot...>  20040214 14:41:01

 Original Message  From: "Bernhard Held" <bernhard@...> To: <sdccdevel@...> Subject: Re: [sdccdevel] Commit: OneByteOps and promotion > I bail out :) If you want to get back the shift optimization, you now know > where to patch SDCCicode.c. I know its driving you mad... Thanks for the prompt response ANW... Vangelis 
From: Bernhard Held <bernhard@be...>  20040214 09:59:10

> PS. I hope I just didn't consume my annual SDCC contribution :) :)) You'll get one for free after the release ;) Bernhard 