From: Russel W. <ru...@ru...> - 2001-11-09 11:49:28
|
For various (possibly silly) reasons, I had the code: void multiplyByte(Byte a, Byte b, Byte xdata * lo, Byte xdata * high) { unsigned int r =3D a * b ; *low =3D r ; *high =3D r >> 8 ; } For the mcs51, this generates the code: 0079 324 _multiplyByte: 325 ; multiplyByte_test.c 65 0079 AA 82 326 mov r2,dpl 327 ; multiplyByte_test.c 63 007B 85*00 F0 328 mov b,_multiplyByte_PARM_2 007E EA 329 mov a,r2 007F A4 330 mul ab 0080 FA 331 mov r2,a 0081 AB F0 332 mov r3,b 333 ; multiplyByte_test.c 64 0083 85*01 82 334 mov dpl,_multiplyByte_PARM_3 0086 85*02 83 335 mov dph,(_multiplyByte_PARM_3 + 1) 0089 8A 04 336 mov ar4,r2 008B EC 337 mov a,r4 008C F0 338 movx @dptr,a 339 ; multiplyByte_test.c 65 008D 85*03 82 340 mov dpl,_multiplyByte_PARM_4 0090 85*04 83 341 mov dph,(_multiplyByte_PARM_4 + 1) 0093 8B 02 342 mov ar2,r3 0095 7B 00 343 mov r3,#0x00 0097 EA 344 mov a,r2 0098 F0 345 movx @dptr,a 0099 346 00101$: 0099 22 347 ret The code works and so there is no actual error but the use of r4 at 0089/008D seems totally superfluous. The hand-written optimization I came up with is: _asm mov a, dpl mov b, _multiplyByte_PARM_2 mul ab mov dpl,_multiplyByte_PARM_3 mov dph,(_multiplyByte_PARM_3 + 1) movx @dptr,a mov dpl,_multiplyByte_PARM_4 mov dph,(_multiplyByte_PARM_4 + 1) mov a,b ; movx @dptr,a ; _endasm ; Is it likely to be impossible for SDCC to get any closer to this sort of code? My suspicion is that a global rather than peephole optimizer would be required. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dr Russel Winder +44 20 7585 2200 41 Buckmaster Road +44 7770 465 077 London SW11 1EN, UK ru...@ru... |
From: Scott D. <sc...@da...> - 2001-11-09 15:37:56
|
On 9 Nov 2001, Russel Winder wrote: > For various (possibly silly) reasons, I had the code: > > void multiplyByte(Byte a, Byte b, Byte xdata * lo, Byte xdata * high) { > unsigned int r = a * b ; > *low = r ; > *high = r >> 8 ; > } > > For the mcs51, this generates the code: > > 0079 324 _multiplyByte: > 325 ; multiplyByte_test.c 65 > 0079 AA 82 326 mov r2,dpl <snip> > 0099 22 347 ret > > The code works and so there is no actual error but the use of r4 at > 0089/008D seems totally superfluous. The hand-written optimization I > came up with is: > > _asm > mov a, dpl > mov b, _multiplyByte_PARM_2 > mul ab > mov dpl,_multiplyByte_PARM_3 > mov dph,(_multiplyByte_PARM_3 + 1) > movx @dptr,a > mov dpl,_multiplyByte_PARM_4 > mov dph,(_multiplyByte_PARM_4 + 1) > mov a,b ; > movx @dptr,a ; > _endasm ; > > Is it likely to be impossible for SDCC to get any closer to this sort of > code? My suspicion is that a global rather than peephole optimizer > would be required. This is the purpose of the pCode (post code generation) optimizer. This is currently supported only in the PIC port. It essentially works by replacing each generated assembly instruction with an "object" that knows about the instruction. This knowledge is then used by an optimizer to simplify code. I currently use the pCode in two areas. The first is with the peephole optimizer. I found that even the simplest peephole optimizations require state information from prior execution. For example, some snippets can be optimized based on one of the status bits. Checking all possible sequences of code that affect the status register is prohibitively expensive. So the pCode peephole optimizer can examine instructions prior to the one being optimized and determine their impacts on the status register (without regards to what the instructions actually do). The other area I use pCode optimization is in register allocation. I force ralloc and gen to use overlayed registers for local variables. I then build a call tree and resolve the register conflicts. I do something similar for the parameter "stack" (the PIC has no accessible hardware stack). The amount of effort to incorporate pCode into the rest of the ports is fairly significant. Furthermore, the pCode is still not mature (for example, the pCode optimization needs to be supported in the linker if we are to use it to fully resolve register conflicts). I think it could be done in two parts. The first part is to port the infrastructure without regards to any optimization. In other words, the pCode's intelligence would be totally ignored. Once this is done, we can then begin incorporating some of the optimizations. Scott |
From: Sandeep D. <sa...@wi...> - 2001-11-09 17:27:26
|
I recently took a good look at the pcode stuff, very interesting. Although I got the impression that porting it to the other arches will not be easy. I have been unsatisfied with the peephole optimizer for some time now (it was a decent first try), I think it is time to have something more heavy weight. I'm planning to pcode like extensions to the peephole optimizer, which will run as a separate pass after the peephole optimization, and will be have flow information, will steal some of the pcode stuff. Currently working on a register allocation improvement (for 8051 cores), after I'm done , I'll tackle this one. Sandeep > -----Original Message----- > From: sdc...@li... > [mailto:sdc...@li...]On Behalf Of Scott > Dattalo > Sent: Friday, November 09, 2001 7:38 AM > Cc: SDCC_Developers > Subject: Re: [sdcc-devel] A question of optimization... > > > On 9 Nov 2001, Russel Winder wrote: > > > For various (possibly silly) reasons, I had the code: > > > > void multiplyByte(Byte a, Byte b, Byte xdata * lo, Byte > xdata * high) > > unsigned int r = a * b ; > > *low = r ; > > *high = r >> 8 ; > > } > > > > For the mcs51, this generates the code: > > > > 0079 324 _multiplyByte: > > 325 ; multiplyByte_test.c 65 > > 0079 AA 82 326 mov r2,dpl > > <snip> > > > 0099 22 347 ret > > > > The code works and so there is no actual error but the use of r4 at > > 0089/008D seems totally superfluous. The hand-written > optimization I > > came up with is: > > > > _asm > > mov a, dpl > > mov b, _multiplyByte_PARM_2 > > mul ab > > mov dpl,_multiplyByte_PARM_3 > > mov dph,(_multiplyByte_PARM_3 + 1) > > movx @dptr,a > > mov dpl,_multiplyByte_PARM_4 > > mov dph,(_multiplyByte_PARM_4 + 1) > > mov a,b ; > > movx @dptr,a ; > > _endasm ; > > > > Is it likely to be impossible for SDCC to get any closer to > this sort of > > code? My suspicion is that a global rather than peephole optimizer > > would be required. > > This is the purpose of the pCode (post code generation) > optimizer. This is > currently supported only in the PIC port. It essentially works by > replacing each generated assembly instruction with an > "object" that knows > about the instruction. This knowledge is then used by an optimizer to > simplify code. > > I currently use the pCode in two areas. The first is with the peephole > optimizer. I found that even the simplest peephole > optimizations require > state information from prior execution. For example, some > snippets can be > optimized based on one of the status bits. Checking all > possible sequences > of code that affect the status register is prohibitively > expensive. So the > pCode peephole optimizer can examine instructions prior to > the one being > optimized and determine their impacts on the status register (without > regards to what the instructions actually do). > > The other area I use pCode optimization is in register > allocation. I force > ralloc and gen to use overlayed registers for local variables. I then > build a call tree and resolve the register conflicts. I do something > similar for the parameter "stack" (the PIC has no accessible hardware > stack). > > The amount of effort to incorporate pCode into the rest of > the ports is > fairly significant. Furthermore, the pCode is still not mature (for > example, the pCode optimization needs to be supported in the > linker if we > are to use it to fully resolve register conflicts). I think > it could be > done in two parts. The first part is to port the > infrastructure without > regards to any optimization. In other words, the pCode's intelligence > would be totally ignored. Once this is done, we can then begin > incorporating some of the optimizations. > > Scott > > > _______________________________________________ > sdcc-devel mailing list > sdc...@li... > https://lists.sourceforge.net/lists/listinfo/sdcc-devel > > > |
From: Michael H. <mic...@ju...> - 2001-11-09 17:57:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If there is some way the pcode system could interact with the register allocator, it would be very useful to the z80 port. The basic idea would be to use the pcode and a set of rules to pack into the scratch registers (DE on the gbz80, HL on the z80) instead of having a disconnected set of rules in the register allocator itself. You get the benifits of having one set of packing rules along with more ideal register usage. - -- Michael On Fri, 9 Nov 2001, Sandeep Dutta wrote: > I recently took a good look at the pcode stuff, very interesting. > Although I got the impression that porting it to the other arches > will not be easy. I have been unsatisfied with the peephole optimizer > for some time now (it was a decent first try), I think it is time to > have something more heavy weight. > > I'm planning to pcode like extensions to the peephole optimizer, which > will run as a separate pass after the peephole optimization, and will be > have flow information, will steal some of the pcode stuff. > > Currently working on a register allocation improvement (for 8051 cores), > after > I'm done , I'll tackle this one. > > Sandeep > > > -----Original Message----- > > From: sdc...@li... > > [mailto:sdc...@li...]On Behalf Of Scott > > Dattalo > > Sent: Friday, November 09, 2001 7:38 AM > > Cc: SDCC_Developers > > Subject: Re: [sdcc-devel] A question of optimization... > > > > > > On 9 Nov 2001, Russel Winder wrote: > > > > > For various (possibly silly) reasons, I had the code: > > > > > > void multiplyByte(Byte a, Byte b, Byte xdata * lo, Byte > > xdata * high) > > > > unsigned int r = a * b ; > > > *low = r ; > > > *high = r >> 8 ; > > > } > > > > > > For the mcs51, this generates the code: > > > > > > 0079 324 _multiplyByte: > > > 325 ; multiplyByte_test.c 65 > > > 0079 AA 82 326 mov r2,dpl > > > > <snip> > > > > > 0099 22 347 ret > > > > > > The code works and so there is no actual error but the use of r4 at > > > 0089/008D seems totally superfluous. The hand-written > > optimization I > > > came up with is: > > > > > > _asm > > > mov a, dpl > > > mov b, _multiplyByte_PARM_2 > > > mul ab > > > mov dpl,_multiplyByte_PARM_3 > > > mov dph,(_multiplyByte_PARM_3 + 1) > > > movx @dptr,a > > > mov dpl,_multiplyByte_PARM_4 > > > mov dph,(_multiplyByte_PARM_4 + 1) > > > mov a,b ; > > > movx @dptr,a ; > > > _endasm ; > > > > > > Is it likely to be impossible for SDCC to get any closer to > > this sort of > > > code? My suspicion is that a global rather than peephole optimizer > > > would be required. > > > > This is the purpose of the pCode (post code generation) > > optimizer. This is > > currently supported only in the PIC port. It essentially works by > > replacing each generated assembly instruction with an > > "object" that knows > > about the instruction. This knowledge is then used by an optimizer to > > simplify code. > > > > I currently use the pCode in two areas. The first is with the peephole > > optimizer. I found that even the simplest peephole > > optimizations require > > state information from prior execution. For example, some > > snippets can be > > optimized based on one of the status bits. Checking all > > possible sequences > > of code that affect the status register is prohibitively > > expensive. So the > > pCode peephole optimizer can examine instructions prior to > > the one being > > optimized and determine their impacts on the status register (without > > regards to what the instructions actually do). > > > > The other area I use pCode optimization is in register > > allocation. I force > > ralloc and gen to use overlayed registers for local variables. I then > > build a call tree and resolve the register conflicts. I do something > > similar for the parameter "stack" (the PIC has no accessible hardware > > stack). > > > > The amount of effort to incorporate pCode into the rest of > > the ports is > > fairly significant. Furthermore, the pCode is still not mature (for > > example, the pCode optimization needs to be supported in the > > linker if we > > are to use it to fully resolve register conflicts). I think > > it could be > > done in two parts. The first part is to port the > > infrastructure without > > regards to any optimization. In other words, the pCode's intelligence > > would be totally ignored. Once this is done, we can then begin > > incorporating some of the optimizations. > > > > Scott > > > > > > _______________________________________________ > > sdcc-devel mailing list > > sdc...@li... > > https://lists.sourceforge.net/lists/listinfo/sdcc-devel > > > > > > > > > _______________________________________________ > sdcc-devel mailing list > sdc...@li... > https://lists.sourceforge.net/lists/listinfo/sdcc-devel > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (OpenBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjvsGPAACgkQ3L3H1ImjCiQTOACfdlN14WzHjHhRRybKKwu99eYF DT8Aniv9RrY7CYtsVTrIOiXr2cP70pGV =+Gkc -----END PGP SIGNATURE----- |
From: Sandeep D. <sa...@wi...> - 2001-11-09 19:04:06
|
Michael, You are absolutely right. I'm working on an infrastructure, will commit it very shortly, this is just the start .. SDCClrange will have a new function "computeClash".. each liveRange (iTemp) will have bitVect called "clashes" this gives us the basic means to check if a given variable has overlapping usage / definition with others if so which others ,then we can assign/pack special registers like "hl" (in your case) & "DPTR" in mcs51/ds390 case. Sandeep > -----Original Message----- > From: Michael Hope [mailto:mic...@ju...] > Sent: Friday, November 09, 2001 9:57 AM > To: Sandeep Dutta > Cc: 'Scott Dattalo'; 'SDCC_Developers' > Subject: RE: [sdcc-devel] A question of optimization... > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > If there is some way the pcode system could interact with the register > allocator, it would be very useful to the z80 port. The > basic idea would > be to use the pcode and a set of rules to pack into the > scratch registers > (DE on the gbz80, HL on the z80) instead of having a > disconnected set of > rules in the register allocator itself. You get the benifits > of having > one set of packing rules along with more ideal register usage. > > - -- Michael > > On Fri, 9 Nov 2001, Sandeep Dutta wrote: > > > I recently took a good look at the pcode stuff, very interesting. > > Although I got the impression that porting it to the other arches > > will not be easy. I have been unsatisfied with the peephole > optimizer > > for some time now (it was a decent first try), I think it is time to > > have something more heavy weight. > > > > I'm planning to pcode like extensions to the peephole > optimizer, which > > will run as a separate pass after the peephole > optimization, and will be > > have flow information, will steal some of the pcode stuff. > > > > Currently working on a register allocation improvement (for > 8051 cores), > > after > > I'm done , I'll tackle this one. > > > > Sandeep > > > > > -----Original Message----- > > > From: sdc...@li... > > > [mailto:sdc...@li...]On Behalf Of Scott > > > Dattalo > > > Sent: Friday, November 09, 2001 7:38 AM > > > Cc: SDCC_Developers > > > Subject: Re: [sdcc-devel] A question of optimization... > > > > > > > > > On 9 Nov 2001, Russel Winder wrote: > > > > > > > For various (possibly silly) reasons, I had the code: > > > > > > > > void multiplyByte(Byte a, Byte b, Byte xdata * lo, Byte > > > xdata * high) > > > > > > unsigned int r = a * b ; > > > > *low = r ; > > > > *high = r >> 8 ; > > > > } > > > > > > > > For the mcs51, this generates the code: > > > > > > > > 0079 324 _multiplyByte: > > > > 325 ; multiplyByte_test.c 65 > > > > 0079 AA 82 326 mov r2,dpl > > > > > > <snip> > > > > > > > 0099 22 347 ret > > > > > > > > The code works and so there is no actual error but the > use of r4 at > > > > 0089/008D seems totally superfluous. The hand-written > > > optimization I > > > > came up with is: > > > > > > > > _asm > > > > mov a, dpl > > > > mov b, _multiplyByte_PARM_2 > > > > mul ab > > > > mov dpl,_multiplyByte_PARM_3 > > > > mov dph,(_multiplyByte_PARM_3 + 1) > > > > movx @dptr,a > > > > mov dpl,_multiplyByte_PARM_4 > > > > mov dph,(_multiplyByte_PARM_4 + 1) > > > > mov a,b ; > > > > movx @dptr,a ; > > > > _endasm ; > > > > > > > > Is it likely to be impossible for SDCC to get any closer to > > > this sort of > > > > code? My suspicion is that a global rather than > peephole optimizer > > > > would be required. > > > > > > This is the purpose of the pCode (post code generation) > > > optimizer. This is > > > currently supported only in the PIC port. It essentially works by > > > replacing each generated assembly instruction with an > > > "object" that knows > > > about the instruction. This knowledge is then used by an > optimizer to > > > simplify code. > > > > > > I currently use the pCode in two areas. The first is with > the peephole > > > optimizer. I found that even the simplest peephole > > > optimizations require > > > state information from prior execution. For example, some > > > snippets can be > > > optimized based on one of the status bits. Checking all > > > possible sequences > > > of code that affect the status register is prohibitively > > > expensive. So the > > > pCode peephole optimizer can examine instructions prior to > > > the one being > > > optimized and determine their impacts on the status > register (without > > > regards to what the instructions actually do). > > > > > > The other area I use pCode optimization is in register > > > allocation. I force > > > ralloc and gen to use overlayed registers for local > variables. I then > > > build a call tree and resolve the register conflicts. I > do something > > > similar for the parameter "stack" (the PIC has no > accessible hardware > > > stack). > > > > > > The amount of effort to incorporate pCode into the rest of > > > the ports is > > > fairly significant. Furthermore, the pCode is still not > mature (for > > > example, the pCode optimization needs to be supported in the > > > linker if we > > > are to use it to fully resolve register conflicts). I think > > > it could be > > > done in two parts. The first part is to port the > > > infrastructure without > > > regards to any optimization. In other words, the pCode's > intelligence > > > would be totally ignored. Once this is done, we can then begin > > > incorporating some of the optimizations. > > > > > > Scott > > > > > > > > > _______________________________________________ > > > sdcc-devel mailing list > > > sdc...@li... > > > https://lists.sourceforge.net/lists/listinfo/sdcc-devel > > > > > > > > > > > > > > > _______________________________________________ > > sdcc-devel mailing list > > sdc...@li... > > https://lists.sourceforge.net/lists/listinfo/sdcc-devel > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.0.6 (OpenBSD) > Comment: For info see http://www.gnupg.org > > iEYEARECAAYFAjvsGPAACgkQ3L3H1ImjCiQTOACfdlN14WzHjHhRRybKKwu99eYF > DT8Aniv9RrY7CYtsVTrIOiXr2cP70pGV > =+Gkc > -----END PGP SIGNATURE----- > > > |