From: Borja F. <bor...@gm...> - 2011-04-05 21:31:07
|
Hello, it's been a little while since last time we got an update. Although I haven't commited any code during the last 3 months, I want to say that doesn't mean the project is dead or something similar, it's just the opposite that's why i want to explain what is the current status, in fact i havent stopped working on this at all. >From Jan to approximately March I've been investigating different ways to handle the insane reg constraints to pair adjacent regs to store 16bit and wider data with the PBQP allocator and other ideas i had, I think i had like 4 branches to see which one was the best. From here I want to give a big thank to Lang Hames, the author of this allocator for all the help he gave me and for his interest. But solving this using this approach was getting pretty hard, tracking which virtual reg had the low and hi part of a 16bit value is simply impossible in LLVM with the current interfaces in order to build the constraint matrix for the PBPQ allocator, in addition, there were other underlying problems when working with only 8bit types as a legal sizes, like 16 bit pointers. So around mid March I started a new backend from scratch with a new idea which has simplified everything in a huge manner. Like everything in this life, all things have pros and cons so we have to get a balance. The biggest pro of this approach is it's simplicity, the cons is that we'll have to work a bit harder to get some nice optimizations in. But these cons aren't too bad at all, because most of them would have to be handled in all aproaches so after all, we will have to get our hands dirty to get really nice code. The basic idea is to make 16bit data legal, that means working with pseudo register pairs and psuedo 16 bit instructions that need to get expanded into 8 bit instructions in a later stage. When we get a movw or adiw/sbiw we leave them alone because they're legal instructions but when we get say a 16bit and instr we have to expand it to two and instructions. As i said, when we move further we can start thinking in introducing optimizations over pseudo intrs which is something that gcc doesn't do very well, and after all we want to beat it. So the current status is: - Added support for all arithmetic and binary instructions for all data types, except mul. Including, INC/DEC, COM/NEG and SWAP. - Convert all (add x, imm) into sub(x, -imm) since we dont have an add with imm instruction. This saves doing ldi+add into a subi. - Handle adiw/sbiw instructions when it's beneficial. - Allocate everything in register pairs, as it should be, and emit movw always, no need for more heuristics. - Implemented calling conventions, including argument passing, return values, and function calls, including externals symbols which isn't handled in the current backend in SVN. - Pointers are now 16bits as it should be, and not 8bits as a hack. - Added support for ICALL. - BIG ONE: Added support for data memory operatons, this includes: ld, ldd, st and std. This adds support for the reg+imm addressing mode when imm<64. - Print for ld, ldd, st and std when emmiting asm code the X, Y, Z names. I think this is it for the moment, probably i've left something out, but those are the basic points. I'm currently working in adding support for lds and sts, so we can load/store global symbols. And after that is implemented we'll be able to implement stack frames, which is an important milestone. I'll commit the new code in a few days, i want to get something done before. Let me know your thoughts. For me the most important part is to remark that we're moving further. PS: John do you get clang rebuilt each time you recompile LLVM, if that's the case do you know a way to stop this because linking it takes like 40secs here. Im using ccache as mentioned in some doc file. |
From: Weddington, E. <Eri...@at...> - 2011-04-05 22:59:21
|
Wow! Thanks for the update! :-) I like what I'm hearing so far. The cons don't sound all that bad, considering the pros that you mentioned. It sounds to me that you're on the right track, but I'll be interested in what others have to say, too. My only caution comes from what you said about using movw always: There are avr chip variants that don't have a movw instruction. For that matter there are different instruction set variants that are dependent on the avr device type. AVR GCC sort-of handles this by organizing the list of chips into arbitrary "architectures", which are then associated with memory ranges and loosely based instruction sets. However, there always seems to be some exceptions to the rule. So my only advice to you is to keep this in mind and that we'll need to have some sort of mechanism to tie the avr device type with enabling/disabling specific instructions. I don't really care if we use a similar mechanism as AVR GCC with these arbitrary architectures. However, I would be fine too if we dropped the "architectures", because these are arbitrary and there are exceptions and sometimes they're more of a maintenance headache than it's worth. (As a side note, this is one area where I hope to help out in, though I still have quite a learning curve.) With this, also keep in mind that there is a relatively new AVR sub-family, based on the newly released ATtiny10 (all variants: ATtiny10/4/5/9/20/40). This is a weird family: - The register file is cut in half. It only has R16-R31, however R0-R15 still exist, they just map to the upper half of the register file. - Reduced instruction set. There are a number of disabled instructions (have to see datasheets for full list). - Really small amounts of flash (code), and even smaller amounts of RAM (data). Yet, with all of these limitations, there is still a desire to have a C compiler for these devices. The latest "AVR Toolchain" release from Atmel has support for these devices. Thanks again, Borja, for your work on this! I'm really excited to see your work committed! :-) Eric Weddington > -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Tuesday, April 05, 2011 3:31 PM > To: avr...@li... > Subject: [avr-llvm-devel] Status update > > Hello, it's been a little while since last time we got an update. Although > I haven't commited any code during the last 3 months, I want to say that > doesn't mean the project is dead or something similar, it's just the > opposite that's why i want to explain what is the current status, in fact > i havent stopped working on this at all. > > From Jan to approximately March I've been investigating different ways to > handle the insane reg constraints to pair adjacent regs to store 16bit and > wider data with the PBQP allocator and other ideas i had, I think i had > like 4 branches to see which one was the best. From here I want to give a > big thank to Lang Hames, the author of this allocator for all the help he > gave me and for his interest. But solving this using this approach was > getting pretty hard, tracking which virtual reg had the low and hi part of > a 16bit value is simply impossible in LLVM with the current interfaces in > order to build the constraint matrix for the PBPQ allocator, in addition, > there were other underlying problems when working with only 8bit types as > a legal sizes, like 16 bit pointers. > > So around mid March I started a new backend from scratch with a new idea > which has simplified everything in a huge manner. Like everything in this > life, all things have pros and cons so we have to get a balance. The > biggest pro of this approach is it's simplicity, the cons is that we'll > have to work a bit harder to get some nice optimizations in. But these > cons aren't too bad at all, because most of them would have to be handled > in all aproaches so after all, we will have to get our hands dirty to get > really nice code. > The basic idea is to make 16bit data legal, that means working with pseudo > register pairs and psuedo 16 bit instructions that need to get expanded > into 8 bit instructions in a later stage. When we get a movw or adiw/sbiw > we leave them alone because they're legal instructions but when we get say > a 16bit and instr we have to expand it to two and instructions. As i said, > when we move further we can start thinking in introducing optimizations > over pseudo intrs which is something that gcc doesn't do very well, and > after all we want to beat it. > > So the current status is: > - Added support for all arithmetic and binary instructions for all data > types, except mul. Including, INC/DEC, COM/NEG and SWAP. > - Convert all (add x, imm) into sub(x, -imm) since we dont have an add > with imm instruction. This saves doing ldi+add into a subi. > - Handle adiw/sbiw instructions when it's beneficial. > - Allocate everything in register pairs, as it should be, and emit movw > always, no need for more heuristics. > - Implemented calling conventions, including argument passing, return > values, and function calls, including externals symbols which isn't > handled in the current backend in SVN. > - Pointers are now 16bits as it should be, and not 8bits as a hack. > - Added support for ICALL. > - BIG ONE: Added support for data memory operatons, this includes: ld, > ldd, st and std. This adds support for the reg+imm addressing mode when > imm<64. > - Print for ld, ldd, st and std when emmiting asm code the X, Y, Z names. > > I think this is it for the moment, probably i've left something out, but > those are the basic points. I'm currently working in adding support for > lds and sts, so we can load/store global symbols. And after that is > implemented we'll be able to implement stack frames, which is an important > milestone. > I'll commit the new code in a few days, i want to get something done > before. Let me know your thoughts. For me the most important part is to > remark that we're moving further. > > PS: John do you get clang rebuilt each time you recompile LLVM, if that's > the case do you know a way to stop this because linking it takes like > 40secs here. Im using ccache as mentioned in some doc file. |
From: Borja F. <bor...@gm...> - 2011-04-05 23:44:46
|
Hello Eric, I phrased that in a wrong way sorry :) Basically when i wrote that movws were going to be emitted always i meant that they were going to be emitted directly by the compiler and not manually inserted as it's currently happening. The current implementation searches for 8 bit moves and tries to transform 2 moves in a row to a single movw, but it's missing many cases so that's why they dont get always emitted. With the new implementation, since the compiler is emitting real 16bit moves we will have a movw when it's needed so there's no danger of getting movws lost. About what you mentioned about other devices, indeed this is something that has to be done. I already kept this in mind, thinking about the tiny devices that lack mul, movw, and other instructions support, with 8 bit ptrs, and the largests that have 24?bit ptrs. LLVM has a very nice interface to handle this sort of stuff, so in theory it shouldnt be hard to implement, but really tedious. Basically, first you define a device model say the ATMEGA644PA and in there you can list its supported features, in x86 these features would be SSE1, SSE2, AVX, etc.. in our case we would have MUL, MOVW, ELPM and friends, support etc. Then when emitting the code we check if for example movw is supported for the current device, if it's not supported then LLVM has to take another path and emit a different instruction, or if for example the device doesnt have a builtin multiplier then either expand the mult instr into a chain of adds or make a libcall. I'm really interested in your help when we start supporting other devices to get things right and support every single device in the market. We'll need a good classification of the devices to list these features i mentioned so we can add them for every supported device. For the moment im focusing in the atmega644, which has a lots of features in its CPU core, and then for smaller devices we'll have to add restrictions as explained above to avoid illegal instructions getting emitted, and for larger devices ... well i cant talk about those for now because i've never worked with them. OFFTOPIC: During the past weeks i've been implementing a 4 stage pipelined MIPS core in Verilog for an FPGA for a master class project, and now i really have a good feeling of the beauty of what is inside the RISC cpus. It's nice to have different point of views: the programmer, the compiler, the CPU core hardware... |
From: Weddington, E. <Eri...@at...> - 2011-04-08 23:29:34
|
> -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Tuesday, April 05, 2011 5:45 PM > To: Weddington, Eric > Cc: avr...@li... > Subject: Re: [avr-llvm-devel] Status update > > Hello Eric, I phrased that in a wrong way sorry :) Basically when i wrote > that movws were going to be emitted always i meant that they were going to > be emitted directly by the compiler and not manually inserted as it's > currently happening. The current implementation searches for 8 bit moves > and tries to transform 2 moves in a row to a single movw, but it's missing > many cases so that's why they dont get always emitted. With the new > implementation, since the compiler is emitting real 16bit moves we will > have a movw when it's needed so there's no danger of getting movws lost. Ah, ok. That makes more sense. Thanks for the clarification. <snip> > OFFTOPIC: During the past weeks i've been implementing a 4 stage pipelined > MIPS core in Verilog for an FPGA for a master class project, and now i > really have a good feeling of the beauty of what is inside the RISC cpus. > It's nice to have different point of views: the programmer, the compiler, > the CPU core hardware... Very cool! :-) Eric |
From: Borja F. <bor...@gm...> - 2011-04-10 15:48:26
|
Small update: Today I've finished introducing support for post-increment load and stores. Pre-decrement should be nearly copy paste with some small changes. After this is done, i'll take a look at lds/sts as mentioned in a previous email so we can load/store globals (only data memory at the moment). John i will try renaming the clang folder since i'm working in linux. |
From: Weddington, E. <Eri...@at...> - 2011-04-10 17:15:52
|
> -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Sunday, April 10, 2011 9:48 AM > To: Weddington, Eric > Cc: avr...@li... > Subject: Re: [avr-llvm-devel] Status update > > Small update: > Today I've finished introducing support for post-increment load and > stores. Pre-decrement should be nearly copy paste with some small changes. > After this is done, i'll take a look at lds/sts as mentioned in a previous > email so we can load/store globals (only data memory at the moment). Thanks for doing this! What are your plans for committing your new work? |
From: Borja F. <bor...@gm...> - 2011-04-11 16:02:43
|
I would like to finish up the global load/store stuff before committing the new code. Also during the next week i'm going to be very very busy so i can't precisely say when it will happen, unless threre's some sort of big reason that requires to commit the code :) So i would say at the end of next week. As a side note, i've finished up working with predrecement loads/stores, so now it's time for lds/sts for real. 2011/4/10 Weddington, Eric <Eri...@at...> > > > > -----Original Message----- > > From: Borja Ferrer [mailto:bor...@gm...] > > Sent: Sunday, April 10, 2011 9:48 AM > > To: Weddington, Eric > > Cc: avr...@li... > > Subject: Re: [avr-llvm-devel] Status update > > > > Small update: > > Today I've finished introducing support for post-increment load and > > stores. Pre-decrement should be nearly copy paste with some small > changes. > > After this is done, i'll take a look at lds/sts as mentioned in a > previous > > email so we can load/store globals (only data memory at the moment). > > Thanks for doing this! What are your plans for committing your new work? > |
From: Weddington, E. <Eri...@at...> - 2011-04-11 21:43:26
|
> -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Monday, April 11, 2011 10:03 AM > To: Weddington, Eric > Cc: avr...@li... > Subject: Re: [avr-llvm-devel] Status update > > I would like to finish up the global load/store stuff before committing > the new code. Also during the next week i'm going to be very very busy so > i can't precisely say when it will happen, unless threre's some sort of > big reason that requires to commit the code :) So i would say at the end > of next week. Sounds good. No, there's no pressing reason, just excited to see it. :-) > As a side note, i've finished up working with predrecement loads/stores, > so now it's time for lds/sts for real. :-D |
From: Borja F. <bor...@gm...> - 2011-04-15 22:23:25
|
John can you make a diff patch of the following changes and include them in the patch file in svn? This patch is necesarry in LLVM to be able to know the size of function arguments when calling external symbols, and is required for my latests changes. TargetCallingConv.h Add static const uint64_t SplitPiece = 0x3FULL << 11; static const uint64_t SplitPieceOffs = 11; after static const uint64_t SplitOffs = 10; Add unsigned getSplitPiece() const { return (unsigned)((Flags & SplitPiece) >> SplitPieceOffs); } void setSplitPiece(unsigned S) { Flags = (Flags & ~SplitPiece) | (uint64_t(S) << SplitPieceOffs); } After void setByValSize(unsigned S) { Flags = (Flags & ~ByValSize) | (uint64_t(S) << ByValSizeOffs); } (Introduce a newline after the setByValSize closing brace) SelectionDAGBuilder.cpp Add MyFlags.Flags.setSplitPiece(j); After ISD::OutputArg MyFlags(Flags, Parts[j].getValueType(), i < NumFixedArgs); (around line 6115) Add MyFlags.Flags.setSplitPiece(i); After ISD::InputArg MyFlags(Flags, RegisterVT, isArgValueUsed); (around line 6310) |
From: Borja F. <bor...@gm...> - 2011-04-18 10:49:51
|
John it seems you did a diff patch for SelectionDAG (which is empty) instead of SelectionDAGBuilder.cpp, the patch for the other file is correct, thanks. Also, could you please make a branch for the 2.9 release, there is some code there that will be useful in the future. |
From: Borja F. <bor...@gm...> - 2011-04-21 12:14:09
|
I'm 50% done with the global lds/sts stuff. Currently, these instructions get selected for non array variables for char and int types. For the array case, something like char g_var[11] = 0 is generating: ldi r24, 0 ldi r30, g_var // should have the lo8, hi8 macros, i know xD ldi r31, g_var std Z+11, r24 obviously this add has to be performed at compile time to this: ldi r24, 0 sts g_var+11, r24 also I'm looking into how how generate the hi8/lo8 macros that are needed to get the lo and hi part of the addresses. So at this stage I'm working out these cases to get the optimal code sequence, i'll post an update when i advance further with this. |
From: Borja F. <bor...@gm...> - 2011-04-27 02:32:12
|
Ok right now at 4:30am I've just finished all this stuff. So finally we can say that global loading and storing is fully functional. 2011/4/21 Borja Ferrer <bor...@gm...> > I'm 50% done with the global lds/sts stuff. Currently, these instructions > get selected for non array variables for char and int types. For the array > case, something like char g_var[11] = 0 is generating: > ldi r24, 0 > ldi r30, g_var // should have the lo8, hi8 macros, i know xD > ldi r31, g_var > std Z+11, r24 > > obviously this add has to be performed at compile time to this: > ldi r24, 0 > sts g_var+11, r24 > > also I'm looking into how how generate the hi8/lo8 macros that are needed > to get the lo and hi part of the addresses. So at this stage I'm working out > these cases to get the optimal code sequence, i'll post an update when i > advance further with this. > > |
From: Borja F. <bor...@gm...> - 2011-05-04 09:57:48
|
Already working in stack frames. |
From: John M. <ato...@gm...> - 2011-04-06 02:37:29
|
On Tue, Apr 5, 2011 at 2:31 PM, Borja Ferrer <bor...@gm...> wrote: > PS: John do you get clang rebuilt each time you recompile LLVM, if that's > the case do you know a way to stop this because linking it takes like 40secs > here. Im using ccache as mentioned in some doc file. > No, if I remember correctly, mostly only the clang source files that have changed get compiled. I think certain LLVM library's cause parts of clang to be rebuild though. I remember working on some parts of LLVM would seem to cause the clang compilations to take a long time compared to when I would modify other files in LLVM. On Linux using *configure *I used to rename * tools/clang* which keeps the LLVM build system from trying to compile clang also. With CMake+VS you can disable compilation of each project separately. |
From: Borja F. <bor...@gm...> - 2011-05-18 22:00:00
|
Hello, while working on the stack and frame stuff a question came to my mind. Are we going to use exactly the same ABI as gcc? Basically i'm asking this because each time we need to inc/dec the SP we have to use many instructions, for example in a function prologue gcc uses 8. From what i've seen IAR stores SP in Y so manipulating it doesn't need in/out insts, reading SREG or disabling interrupts which is much shorter. When passing parameters through the stack to functions it gets longer because SP has to be decremented and restored using in/outs on each call. Gah Eric, why is SP was mapped in the IO space? xD Anyway, if somebody has a better idea let me know otherwise I'll implement what gcc does. PS: John, with clang do you get doubles or long long variables aligned to an 8 byte boundary? I've set their alignments to 1 byte in a constructor (cant remember now the name, but it's in the clang patch files in svn) but they get aligned too much wasting memory. Any ideas there? |
From: Weddington, E. <Eri...@at...> - 2011-05-19 20:21:48
|
Hi Borja, I totally agree that the ABI for AVR GCC basically sucks. However, we have compatibility issues if we hope to win over users. Basically this means that we should endeavor to be able to use avr-libc with avr-llvm, and this means that we have to keep to the same ABI as avr-gcc. Avr-libc is written in mostly optimized assembly, so changing the ABI means an awful lot of work in avr-libc. However, we should also keep in mind 2 things: - There has been some talk, here and there, of *eventually* changing the ABI of AVR GCC to something more rational and efficient. The argument stops because of how much work there is to do it and generally not enough resources. But everyone usually agrees that it needs to be done. So I can easily see that the ABI will eventually change in the future. - To that end, if there is a way for you to do *both* ABIs, and be able to switch back and forth with a switch, then I would highly recommend that you do that. I know it means extra work. But this would be a great selling point. Have it default to the AVR GCC ABI, but if you can come up with a more efficient ABI (and I can do some internal research as to what would be ideal and report back on that) *and* implement it, then it could be shown to users how much better this ABI is *and* that avr-llvm already implements it. Then it would be easier to get the needed work done on avr-libc to switch it over. HTH, Eric Weddington > -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Wednesday, May 18, 2011 3:00 PM > To: Weddington, Eric > Cc: avr...@li... > Subject: Re: [avr-llvm-devel] Status update > > Hello, while working on the stack and frame stuff a question came to my > mind. Are we going to use exactly the same ABI as gcc? Basically i'm > asking this because each time we need to inc/dec the SP we have to use > many instructions, for example in a function prologue gcc uses 8. From > what i've seen IAR stores SP in Y so manipulating it doesn't need in/out > insts, reading SREG or disabling interrupts which is much shorter. When > passing parameters through the stack to functions it gets longer because > SP has to be decremented and restored using in/outs on each call. Gah > Eric, why is SP was mapped in the IO space? xD > > Anyway, if somebody has a better idea let me know otherwise I'll implement > what gcc does. > > PS: John, with clang do you get doubles or long long variables aligned to > an 8 byte boundary? I've set their alignments to 1 byte in a constructor > (cant remember now the name, but it's in the clang patch files in svn) but > they get aligned too much wasting memory. Any ideas there? > |
From: Borja F. <bor...@gm...> - 2011-05-19 22:53:58
|
Hello Eric, It could be possible to implement both ABIs, but right now i'm unsure about how much work that would take and its difficulty. I personally haven't really thought about a new ABI standard since this is something that needs a deep thought and some analysis, indeed some opinions from atmel would be very helpful. I faced these issues while taking a look at how gcc and iar manipulated frames and the stack during the past days, and there is where i noticed the huge diference in performance and code size. I really havent taken a look at the IAR ABI standard, but i've been comparing code produced (mainly stressing register allocation with math calculations with double precision types) by IAR and llvm and the global difference in code size is only related to the ABI. What do i mean? well, that if we ignore prologue, epilogue and stack manipulation stuff which is pure ABI specific, llvm produces shorter code and requires less frame memory, otherwise they perform quite equally on average because of the overhead the gcc ABI has. My biggest concern here supposing the ABI is changed in the future is the custom assembly written by users, that would mean a huge change for them since they would get forced to rewrite their code and i'm uncertain about their reactions. The libc transition would be easier to do since many assembly functions use registers defined in macros instead of using their real names, so that would make things simpler. But again, would you then have two different libc's or you would deprecate the one using the old abi? it's not that obvious. In relation to the development, right now I'm stuck reserving the Y reg when stack frames are needed so it doesnt get used by anything else. To know if all regs get spilled so stack memory is allocated thus needing Y to access stack memory you need to run the reg allocator but that will use the Y reg for other purposes if it's available, so it's like needing to know the future to resolve something in the present. LLVM handles stack stuff in a later stage, after register allocation is done, and there is where you know if you really need to reserve Y to access the stack or you can use it for other things. |
From: Kevin S. <sch...@kw...> - 2011-05-20 00:12:12
|
Hi folks, LLVM has support for multiple calling conventions. In principle someone could eventually define additional calling conventions to match IAR, gcc (if that is not the default), or others. But the two important ones (quoting from the assembly reference) are - > "ccc" - The C calling convention: > This calling convention (the default if no other calling > convention is specified) matches the target C calling conventions. > This calling convention supports varargs function calls and > tolerates some mismatch in the declared prototype and implemented > declaration of the function (as does normal C). > "fastcc" - The fast calling convention: > This calling convention attempts to make calls as fast as > possible (e.g. by passing things in registers). This calling > convention allows the target to use whatever tricks it wants > to produce fast code for the target, without having to conform > to an externally specified ABI (Application Binary Interface). Optimization promotes calls to fastcc where possible (generally, if the function has fixed arguments and is not externally visible). I think in the (not very) long term the normal use case for AVR would be to have libraries of LLVM bitcode, rather than object code, and to do link-time whole-program optimization. Then nearly all calls could be promoted to fastcc, and the performance of the external calling convention is moot. The fastcc is not externally visible, so it can be changed over time. Regards, -- Kevin Schoedel <sch...@kw...> VA3TCS |
From: Weddington, E. <Eri...@at...> - 2011-05-19 23:12:28
|
> -----Original Message----- > From: Borja Ferrer [mailto:bor...@gm...] > Sent: Thursday, May 19, 2011 3:54 PM > To: Weddington, Eric > Cc: avr...@li... > Subject: Re: [avr-llvm-devel] Status update > > Hello Eric, > > It could be possible to implement both ABIs, but right now i'm unsure > about how much work that would take and its difficulty. I personally > haven't really thought about a new ABI standard since this is something > that needs a deep thought and some analysis, indeed some opinions from > atmel would be very helpful. And I'm happy to help out in this area. Agreed, that it will take quite a bit of thinking *and* tinkering. > I faced these issues while taking a look at > how gcc and iar manipulated frames and the stack during the past days, and > there is where i noticed the huge diference in performance and code size. > I really havent taken a look at the IAR ABI standard, And something to keep in mind is that IAR uses *2* stacks, one for local variables and one for function return addresses to quickly pop out of a function. > but i've been > comparing code produced (mainly stressing register allocation with math > calculations with double precision types) by IAR and llvm and the global > difference in code size is only related to the ABI. What do i mean? well, > that if we ignore prologue, epilogue and stack manipulation stuff which is > pure ABI specific, llvm produces shorter code and requires less frame > memory, otherwise they perform quite equally on average because of the > overhead the gcc ABI has. Well this is *very* good news then! This means that it is theoretically possible to do even better than IAR, if only we design an appropriate ABI. I do feel that an alternative, better ABI can be developed. It's just a matter of some effort and time. > My biggest concern here supposing the ABI is changed in the future is the > custom assembly written by users, that would mean a huge change for them > since they would get forced to rewrite their code and i'm uncertain about > their reactions. I'm a little less concerned, and that's mainly to do with having been in this area for several years. Yes, any ABI change could induce a cost to the end user. However: - Any cost to the end user would have to be offset by the benefits involved. If we can show that avr-llvm can produce smaller code than the best commercial AVR compiler on the market, then the benefits far outweigh the costs. - The actual costs would be minimal. Most users write their code in C, or even C++. There are very few actual projects out there that include some form of assembly. Of those that do, most would use some form of inline assembly. There are less users that even have assembly-only functions within a larger C project where they have to adjust their prologues and epilogues due to an ABI change. And even then, there are usually a very small finite number of assembly functions. Even rarer is the true all assembly-only application. So, I think, in reality the impact of changing the ABI on user applications is extremely small and therefore not as relevant. The biggest impact would be on avr-libc. > The libc transition would be easier to do since many > assembly functions use registers defined in macros instead of using their > real names, so that would make things simpler. Avr-libc would have to be audited to make sure that truly is the case. But it's just "work". Not that hard, but a little time-consuming. But I think very doable. > But again, would you then > have two different libc's or you would deprecate the one using the old > abi? it's not that obvious. Well, ideally, it might be best to make sure that we could compile avr-libc with either ABI while we do a transition. But I think we're putting the cart before the horse. I'm willing to look into what that will take with avr-libc once we have a working avr-llvm with promise of supplanting avr-gcc with better code generation. > In relation to the development, right now I'm stuck reserving the Y reg > when stack frames are needed so it doesnt get used by anything else. To > know if all regs get spilled so stack memory is allocated thus needing Y > to access stack memory you need to run the reg allocator but that will use > the Y reg for other purposes if it's available, so it's like needing to > know the future to resolve something in the present. LLVM handles stack > stuff in a later stage, after register allocation is done, and there is > where you know if you really need to reserve Y to access the stack or you > can use it for other things. It sounds like you've got a handle on what needs to be done. Is there a way to run the register allocator more than once? Run it with Y available to see if the result works, if not, then reserve Y and run the allocator again without Y available? HTH, Eric Weddington |
From: Borja F. <bor...@gm...> - 2011-05-19 23:53:07
|
2011/5/20 Weddington, Eric <Eri...@at...> > > > > -----Original Message----- > > From: Borja Ferrer [mailto:bor...@gm...] > > Sent: Thursday, May 19, 2011 3:54 PM > > To: Weddington, Eric > > Cc: avr...@li... > > Subject: Re: [avr-llvm-devel] Status update > > > > Hello Eric, > > > > It could be possible to implement both ABIs, but right now i'm unsure > > about how much work that would take and its difficulty. I personally > > haven't really thought about a new ABI standard since this is something > > that needs a deep thought and some analysis, indeed some opinions from > > atmel would be very helpful. > > And I'm happy to help out in this area. Agreed, that it will take quite a > bit of thinking *and* tinkering. > Ok then, I guess the best thing to do would be sticking to the gcc ABI until we get a functional compiler, and then patch it with the new ABI to support both. However, we could open a new discussion about the new ABI in another thread during the meantime since this is something that will take some time to arrive to some consensus and it won't get forgotten. > > > I faced these issues while taking a look at > > how gcc and iar manipulated frames and the stack during the past days, > and > > there is where i noticed the huge diference in performance and code size. > > I really havent taken a look at the IAR ABI standard, > > And something to keep in mind is that IAR uses *2* stacks, one for local > variables and one for function return addresses to quickly pop out of a > function. > Very interesting, I didnt know about this, i guess i'll have to take a look at the IAR specs to see what it does. Is it even possible to beat IAR's ABI since you have a better understanding of it? > > > > but i've been > > comparing code produced (mainly stressing register allocation with math > > calculations with double precision types) by IAR and llvm and the global > > difference in code size is only related to the ABI. What do i mean? well, > > that if we ignore prologue, epilogue and stack manipulation stuff which > is > > pure ABI specific, llvm produces shorter code and requires less frame > > memory, otherwise they perform quite equally on average because of the > > overhead the gcc ABI has. > > Well this is *very* good news then! This means that it is theoretically > possible to do even better than IAR, if only we design an appropriate ABI. I > do feel that an alternative, better ABI can be developed. It's just a matter > of some effort and time. > For what i've tested so far which is very "movw" and "call" intensive since all double operations are function calls, when we get down to lots of arith operations and stuff like that we'll have to get our hands dirty to combine operations like IAR does, this is where IAR outperforms gcc. For now im not focusing at all in optimizations since i want to be able to have a functional compiler first, but the code produced is quite decent, so when optimizations get introduced the code quality upgrade should be huge. > > > > My biggest concern here supposing the ABI is changed in the future is the > > custom assembly written by users, that would mean a huge change for them > > since they would get forced to rewrite their code and i'm uncertain about > > their reactions. > > I'm a little less concerned, and that's mainly to do with having been in > this area for several years. > > Yes, any ABI change could induce a cost to the end user. However: > > - Any cost to the end user would have to be offset by the benefits > involved. If we can show that avr-llvm can produce smaller code than the > best commercial AVR compiler on the market, then the benefits far outweigh > the costs. > > - The actual costs would be minimal. Most users write their code in C, or > even C++. There are very few actual projects out there that include some > form of assembly. Of those that do, most would use some form of inline > assembly. There are less users that even have assembly-only functions within > a larger C project where they have to adjust their prologues and epilogues > due to an ABI change. And even then, there are usually a very small finite > number of assembly functions. Even rarer is the true all assembly-only > application. So, I think, in reality the impact of changing the ABI on user > applications is extremely small and therefore not as relevant. The biggest > impact would be on avr-libc. > I absolutely agree with you on that. Btw, big words there mentioning beating the best comercial compiler on the market xD During the years i've been coding I've never cared about backwards compatibility, when i received the question about breaking some library interface or binary format my reply was always go on for it. But this is the first time I think im caring because we have to first adapt to the users, not the other way round as it always happened to me. > > The libc transition would be easier to do since many > > assembly functions use registers defined in macros instead of using their > > real names, so that would make things simpler. > > Avr-libc would have to be audited to make sure that truly is the case. But > it's just "work". Not that hard, but a little time-consuming. But I think > very doable. > > > But again, would you then > > have two different libc's or you would deprecate the one using the old > > abi? it's not that obvious. > > Well, ideally, it might be best to make sure that we could compile avr-libc > with either ABI while we do a transition. But I think we're putting the cart > before the horse. I'm willing to look into what that will take with avr-libc > once we have a working avr-llvm with promise of supplanting avr-gcc with > better code generation. > > Indeed, no need to advance on these things until we get working compiler. > > > In relation to the development, right now I'm stuck reserving the Y reg > > when stack frames are needed so it doesnt get used by anything else. To > > know if all regs get spilled so stack memory is allocated thus needing Y > > to access stack memory you need to run the reg allocator but that will > use > > the Y reg for other purposes if it's available, so it's like needing to > > know the future to resolve something in the present. LLVM handles stack > > stuff in a later stage, after register allocation is done, and there is > > where you know if you really need to reserve Y to access the stack or you > > can use it for other things. > > It sounds like you've got a handle on what needs to be done. Is there a way > to run the register allocator more than once? Run it with Y available to see > if the result works, if not, then reserve Y and run the allocator again > without Y available? > > That's what i've thought as well, i'm going to ask the the llvm dev list to see if it's possible to do it. 2 weeks ago approx the llvm devs released a new reg alloc that is the one on by default that reduces frame size by using more register swapping, i've seen some improvements in little tests i run, so at least it's good to know that they're very active in improving codegen quality. > HTH, > Eric Weddington > |
From: John M. <ato...@gm...> - 2011-05-25 03:08:29
|
On Wed, May 18, 2011 at 2:59 PM, Borja Ferrer <bor...@gm...> wrote: > PS: John, with clang do you get doubles or long long variables aligned to > an 8 byte boundary? I've set their alignments to 1 byte in a constructor > (cant remember now the name, but it's in the clang patch files in svn) but > they get aligned too much wasting memory. Any ideas there? > I've never checked the correctness of what the AVR target info produces. It's basically a patchwork of other targets like PIC16 and MSP430. There isn't much clang documentation for creating a port (at least compared to LLVM). I also think clang is X86 centric and has some hard coded assumptions of the Target. There are more src files that will need to be modified in clang but I believe Target.cpp is the minimum that allows it to be compiled. I'll try to find some time to look into it. |
From: Borja F. <bor...@gm...> - 2011-05-25 21:55:57
|
Ok John thanks for looking into it. I'm still waiting a reply from the llvmdev list about reserving the Y reg only if accessing the stack, i bumped the issue again to see if somebody replies. Always reserving the Y reg would very bad for memory operations. 2011/5/25 John Myers <ato...@gm...> > > > On Wed, May 18, 2011 at 2:59 PM, Borja Ferrer <bor...@gm...>wrote: > >> PS: John, with clang do you get doubles or long long variables aligned to >> an 8 byte boundary? I've set their alignments to 1 byte in a constructor >> (cant remember now the name, but it's in the clang patch files in svn) but >> they get aligned too much wasting memory. Any ideas there? >> > I've never checked the correctness of what the AVR target info produces. > It's basically a patchwork of other targets like PIC16 and MSP430. There > isn't much clang documentation for creating a port (at least compared to > LLVM). I also think clang is X86 centric and has some hard coded assumptions > of the Target. There are more src files that will need to be modified in > clang but I believe Target.cpp is the minimum that allows it to be compiled. > I'll try to find some time to look into it. > |