From: Erik P. <epe...@iv...> - 2003-07-08 15:58:11
|
On Mon, 7 Jul 2003, Josh Stone wrote: > <test.asm snip> > 164: mov r3,a > 165: mov dptr,#(_FC_Commands + 0x0100) > 166: movx @dptr,a > 168: mov r3,#0x00 > 171: mov a,r2 > 172: add a,#_FC_Commands > 173: mov dpl,a > 175: mov a,r3 > 176: addc a,#(_FC_Commands >> 8) > 177: mov dph,a > > The entire use of r3 here is unnecessary. There's a few optimizations that > would remove it completely. First, line 164 is a dead store due to 168. > Constant propagation from 168 to 175 lets you simplify 175 to "clr a". At > this point, the definition at 168 is now also dead code. Code like this bothers me too, and I have been trying to come up with a good general solution for awhile. The root of this problem is that SDCC performs its dead code and constant propagation optimizations only at the intermediate code level and not at the assembly language register level. The redundancy in this sample of code were introduced by the back end code generator. The code generator originally produced the following for lines 164-166: 164: mov r3,a 165: mov dptr,#(_FC_Commands + 0x0100) 165.5: mov a,r3 166: movx @dptr,a The peephole optimizer has a rule that shows that line 165.5 is redundant and so removes it. However, by the nature of a peephole optimizer, it can't tell that line 164 is now a dead store, so it is retained because it _might_ be actually used later. Another peephole rule could be written that would check line 167 for the reload of r3, but this is a rather futile solution since there are many times that the reload is seperated by several abitrary lines of code. The other problem is that the code generator doesn't like adding operands of dissimilar sizes, so your unsigned char array index was promoted to an unsigned int before the actual addition. I think a peephole rule could be used to fix the constant propagation in this particular case, but it still couldn't safely remove line 168. Plus one would have to decide whether this code sequence occurs often enough to justify the time it takes to match another rule. I have written a kludge that locates and removes the dead register stores, but it conflicts with the peephole optimizer. To get both optimizations, I end up iteratively running the dead register optimizer and peephole optimizer until neither one can optimize further; this adds a noticable delay. The other thing that bothers me about this approach is that at this level the register can't easily be reallocated for other use; temporary values may be spilling to RAM while this newly freed register remains unused. So at the moment I am still pondering a really good solution. Erik |