|
From: Nuno L. <nun...@sa...> - 2008-03-25 15:08:12
|
>> Currently it only removes redudant MOVs between virtual registers that >> can >> be propagated forward. Imagine this: >> >> 9 movl %vr16,%vr85 ; %vr16 isn't referenced below this line >> 10 subl $0x4,%vr85 >> >> (maps %vr85 to %vr16) >> >> the movl is removed and translated into: >> 9 subl $0x4,%vr16 > > Yes. However the register allocator does the same transformation > (virtual-to-virtual register move coalescing), in the case where > the source register's live range ends at the move instruction and > the destination register's live range starts at the move instruction. > > That allows the instruction selectors to generate apparently-stupid > code with lots of v-v moves, and reg-alloc then cleans it up later. > This makes construction of the instruction selectors much simpler, > especially for 2-address targets (x86, amd64). Uhm, strange.. I must confess that I didn't read the register allocation part of the PLDI paper, as I've no experience with register allocation algorithms. Now looking at it, it seems that part of the moves I'm eliminating should have been covered by the register allocator, but in fact they aren't. If you check my other e-mail, I was able to reduce the number of instructions of a single block (already reg-allocated) from 122 to "just" 108. > Or are you doing something else apart from V-V coalescing? I'm also killing R-V dead stores. (happens when some instruction updates e.g. %eax, then %eax is moved to a virtual reg, but it's never used). >> (memcheck creates some unnecessary moves because >> of the dirty handler arguments). > > Yes .. it could probably do better in cases where a V(irtual) register > is moved to a (R)eal register, and the V's live range ends at that > point. I think I tried to add a "preference" mechanism to regalloc2.c, > which says, for each virtual reg, which real reg it would "prefer" to > be in, for this reason. But I also think that didn't help, because > it constrains regalloc's choice of registers and so just moves the > reg-to-reg moves elsewhere. However, I didn't investigate much; you > may be able to do better. > >> Do you know what might be causing the problem? > > Check carefully the validity of the transformation(s) you do. > It's easy to break stuff at this level. Ah, found the problem.. The problem is in the register allocator. I knew the problem couldn't be from my nice code :P Seriously take a look at this: after peephole optimization: 9 call[1] 0x38007070 10 movl %eax,%vr30 11 movzwl (%vr2),%vr69 12 cmpl $0x0,%vr26 13 callnz[0] 0x38006920 14 andl $0xFFFF,%vr30 15 movl %vr30,%edx 16 movl %vr1,%eax 17 call[2] 0x38006C30 after register allocation: 12 call[1] 0x38007070 13 movl %eax,%edx 14 movl 0x288(%ebp),%ecx 15 movzwl (%ecx),%eax 16 cmpl $0x0,0x2A0(%ebp) 17 movl %edx,0x2A8(%ebp) ; %vr30 to memory 18 movl %eax,0x2B8(%ebp) 19 callnz[0] 0x38006920 20 movl 0x2A8(%ebp),%edx ; load %vr30 from memory 21 andl $0xFFFF,%edx 22 movl 0x2A8(%ebp),%edx ; <-- %vr30 is in %edx, not in memory 23 movl %edi,%eax 24 call[2] 0x38006C30 So the problem is that the register allocator gets confused somehow and loads %vr30 twice, destroying its value (i.e. the line 22 is bogus and could be removed altogether). Thanks, Nuno |