From: Borja F. <bor...@gm...> - 2010-11-19 14:08:27
|
We need to figure out some important things that need to be introduced in the compiler before we advance with bigger things. The things i'm talking about are: 1) aligning data wider than 1 byte in even registers to favour movws and argument passing into functions. int16 a = b; // a = r21:r20, b = r13:r12 movw r20, r12 <--- good reg allocation int16 a = b; // a = r20:r21, b = r12:r13 mov r21, r13 mov r20, r12 <--- bad reg allocation 2) THE BIG ONE: working with register pairs. there are some instructions like sbiw/adiw, movw, ld/st that only work with register pairs. we need to figure out how to introduce them. For example, I wrote a pass that converts 2 moves in a row into a movw. This is very basic, but i find it bad, because if the compiler places an instruction between both moves this pass wont be able to replace it with the movw. Also i find it very hacky, because the compiler doesnt know that there is really a move instruction that work with 16bit data so it doesnt have a global view of the costs and probably worsening reg allocation. If we worked with pseudo regs of 16 bits this should be straight forward, but we would face the problem i mentioned some time ago, all data operations would need custom lowering, because LLVM isnt able to split a wider reg into subregs (say R1312 into R13:R12). An initial guess would be manually splitting the reg into its subregs, perform the operation and finally recombining the subregs again into the bigger one for next instruction, but im not sure yet if this is a good solution. We really need to think about it because it's a very important design decision. This was my first email into llvm's dev mailing list, but i really didnt got any real solution. If you can think of any other points related to this, add them. |
From: Borja F. <bor...@gm...> - 2010-11-26 11:35:52
|
During the past few days i've been investigating how to do this, i'm playing with the register file description and custom lowering. What i thought is to always use register pairs, split them to perform the 8 bit operation and join them again for the next operation, of course this is for data wider than 8bits. Right now i'm getting promising results, but still needs improvement, llvm is using a new register when splitting the pair instead of using the same subreg. I'll continue investigating, and post any results, we really need to get this clear before advancing into more complex operations like memory stuff and branching. 2010/11/19 Borja Ferrer <bor...@gm...> > We need to figure out some important things that need to be introduced in > the compiler before we advance with bigger things. The things i'm talking > about are: > > 1) aligning data wider than 1 byte in even registers to favour movws and > argument passing into functions. > int16 a = b; // a = r21:r20, b = r13:r12 > movw r20, r12 <--- good reg allocation > > int16 a = b; // a = r20:r21, b = r12:r13 > mov r21, r13 > mov r20, r12 <--- bad reg allocation > > 2) THE BIG ONE: working with register pairs. there are some instructions > like sbiw/adiw, movw, ld/st that only work with register pairs. we need to > figure out how to introduce them. > For example, I wrote a pass that converts 2 moves in a row into a movw. > This is very basic, but i find it bad, because if the compiler places an > instruction between both moves this pass wont be able to replace it with the > movw. Also i find it very hacky, because the compiler doesnt know that there > is really a move instruction that work with 16bit data so it doesnt have a > global view of the costs and probably worsening reg allocation. > > If we worked with pseudo regs of 16 bits this should be straight forward, > but we would face the problem i mentioned some time ago, all data operations > would need custom lowering, because LLVM isnt able to split a wider reg into > subregs (say R1312 into R13:R12). An initial guess would be manually > splitting the reg into its subregs, perform the operation and finally > recombining the subregs again into the bigger one for next instruction, but > im not sure yet if this is a good solution. > > We really need to think about it because it's a very important design > decision. This was my first email into llvm's dev mailing list, but i really > didnt got any real solution. > If you can think of any other points related to this, add them. > > |
From: Weddington, E. <Eri...@at...> - 2010-11-29 18:07:14
|
> -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Friday, November 26, 2010 4:28 AM > To: avr...@li... > Subject: Re: [avr-llvm-devel] Important stuff > > During the past few days i've been investigating how to do this, i'm > playing with the register file description and custom lowering. What i > thought is to always use register pairs, split them to perform the 8 bit > operation and join them again for the next operation, of course this is > for data wider than 8bits. > Right now i'm getting promising results, but still needs improvement, llvm > is using a new register when splitting the pair instead of using the same > subreg. I'll continue investigating, and post any results, we really need > to get this clear before advancing into more complex operations like > memory stuff and branching. Thanks, Borja, for looking into this. Agreed that this is an important step to get right before doing more complex operations. Your help in this area is very much appreciated. :-) Eric Weddington |
From: Borja F. <bor...@gm...> - 2010-12-05 18:40:11
|
I'm currently discussing this issue in the llvmdev mailing list because it seems there's no easy way to work with register pairs. You can check it in the thread with title "Register Pairing". Basically this is the code im testing, it shows how some regs arent paired correctly missing movw insertion opportunities: typedef short t; extern t mcos(t a); extern t mdiv(t a, t b); t foo(t a, t b) { short p1 = mcos(b); short p2 = mcos(a); return mdiv(p1&p2, p1^p2); } This C code produces (for purists, ignore the fact that it's using scratch regs between calls, this is unimplemented): ; a<- r25:r24 b<--r23:r22 mov r18, r24 mov r19, r25 <-- can be combined into a movw r19:r18, r25:r24 mov r25, r23 mov r24, r22 <-- can be combined into a movw r25:r24, r23:r22 call mcos ; here we have the case i was explaining, pairs dont match because they're the other way round, function result is in r25:r24 ; but it's storing the hi part in r20 instead of r21, so we cant insert a movw mov r20, r25 mov r21, r24 <--- should be mov r21, r25; mov r20, r24 to be able to insert a movw mov r25, r19 mov r24, r18 <-- can be combined into a movw r25:r24, r19:r18 call mcos ; same problem as above, again it's moving the hi part in r25 into r18 instead of r19 so we've lost another movw here mov r18, r25 <-- should be mov r19, r25 and r18, r20 mov r19, r24 <-- should be mov r18, r24 and r19, r21 mov r23, r25 <-- this * eor r23, r20 mov r22, r24 <-- * and this could be combined into movw r23:r22, r25:r24 eor r22, r21 mov r25, r18 mov r24, r19 <-- because the result returned by the second call to mcos is stored in r18:r19 (other way round) ; we've lost another movw call mdiv ret John if you have any suggestions they're welcome. I've also asked how to combine two 8 bit instructions into a 16 bit one, mainly for movw and adiw/sbiw. I wrote a function pass that searches 2 moves in a row and combines them into a movw, but if other instructions get in between the moves like it happens in the previous example (note those ands and xors) then they get missed. GCC has an easy way of handling the register pairing issue, Lang in the mailing list suggested using his register allocator which is able to work with constraints like the ones we have, look in the src for PBQP. |
From: John M. <ato...@gm...> - 2010-12-06 06:57:33
|
On Sun, Dec 5, 2010 at 10:40 AM, Borja Ferrer <bor...@gm...> wrote: > I'm currently discussing this issue in the llvmdev mailing list because it > seems there's no easy way to work with register pairs. You can check it in > the thread with title "Register Pairing". Basically this is the code im > testing, it shows how some regs arent paired correctly missing movw > insertion opportunities: > <snip> > ; here we have the case i was explaining, pairs dont match because they're > the other way round, function result is in r25:r24 > ; but it's storing the hi part in r20 instead of r21, so we cant insert a > movw > mov r20, r25 > mov r21, r24 <--- should be mov r21, r25; mov r20, r24 to be able to > insert a movw > <snip> > John if you have any suggestions they're welcome. I've also asked how to > combine two 8 bit instructions into a 16 bit one, mainly for movw and > adiw/sbiw. I wrote a function pass that searches 2 moves in a row and > combines them into a movw, but if other instructions get in between the > moves like it happens in the previous example (note those ands and xors) > then they get missed. GCC has an easy way of handling the register pairing > issue, Lang in the mailing list suggested using his register allocator which > is able to work with constraints like the ones we have, look in the src for > PBQP. > It's starting to sound like, from reading the other thread, that we will need to modify the LLVM code generator and/or tablegen to get the registers to be assigned in the correct order. So the PBQP solver can create pairs but I'm wondering if it will it also be able to assign them in the correct order? The PBQP sounds really interesting. I've tried reading up on the concept a few times before but still haven't looked at the source code yet. |
From: Borja F. <bor...@gm...> - 2010-12-12 14:02:01
|
I'm currently working with the PBQP allocator to add the pairing constraints. It can manage ordered pairs, for it, we need to tell if a virtual register is mapped to the lo or the hi part of the data to produce the correct code. John, now that atleast the pairing stuff is a bit clearer for the moment, we have to look at inserting 16 bit instructions like movw and adiw/sbiw. You can look at the pass i wrote to replace moves with movws, ideally i would like LLVM to do it automatically but i dont know if this is possible, so i guess this is what we need to investigate before proceeding into other things. |