From: Frieder F. <fri...@we...> - 2005-05-15 20:38:11
|
Maarten Brock wrote: > Thanks for looking into it. I have some comments on your proposals as well. >>b) a slight variation over your approach a) >> "R0,R1,R2_indexed_lcall_over_1_generic_trampoline" >> >> Resources: 6 byte in calling, 0 byte in called routine, >> + (trampoline in common bank), 3 bytes stack, 3 registers, >> (no overhead when called from same bank) > There is only no overhead when the compiler knows you're calling from the same bank. > If the linker decides you're calling from the same bank, it can reduce to 6 bytes calling > overhead. Reading from your comment I assume you'd use the keyword banked without further argument (and thus leave it up to the linker where to put it). I'd rather like to give the compiler a stronger position by using banked(0), banked(1), ... respectively. Traditionally the decision into which bank something is located is done at the linking stage. This is probably suboptimal, the opportunity for some optimations (f.e. using setb/clrb in proposal e)) would be lost. >>c) another slight variation over your approach: >> "R0,R1_indexed_lcall_over_n_generic_trampolines" (n-number of banks) >> (for 4 banks you'd have __sdcc_banked_call_00, __sdcc_banked_call_01, >> which would switch to the respective bank >> ... in essence you'd use the trampoline address instead of R2 >> to encode the destination bank) >> >> Resources: 4 byte in calling, 2 byte in called routine, >> + (n trampolines in common bank), 3 bytes stack, 2 registers >> (no overhead when called from same bank) > The linker must find the destination address and convert that into bank selecting call. with banked(0), banked(1) (as above) the decision would be made by the compiler so this wouldn't be a problem. >>e) another variation over your approach, similar to d) >> "lcall_over_mn_specific_trampolines" >> >> (this would need trampolines with __trampoline_function_call_from_00, >> so if a banked function in bank 2 would be called from bank 0 >> and bank 1 you'd need two trampolines. >> Trampolines could be of the form: >> __trampoline_my_func_from_00: >> mov _PSBANK,#0x02 ; eventually changing a bit would do >> lcall _my_func >> mov _PSBANK,#0x00 ; eventually changing a bit would do >> ret >> >> Resources: 0 byte in calling, 0 byte in called routine, >> + (<m*n trampolines in common bank), 2 bytes stack, 0 registers >> (no overhead when called from same bank) > All your solutions require more stack space and the stack is a limited resource, esp. for > large programs. Ack. It's all a compromise. Guessing about typical stuff now: For most programs you'll be fine with less than 10 stacked inter-bank calls (costing max 20 to 30 byte of additional idata space in my variants). You'd have to migrate idata vars to pdata vars then. If we neglect the "subtle" differences between idata and pdata this adds 30/(128+256)= 8% memory pressure on combined idata/pdata. Seems fair to me. (But if the program needs more than 128byte stack, my argument falls into pieces). >>I personally like d) and e) because if a program is hierarchically >>organized you often don't need many interbank calls. Within the >>same code bank the program then runs without overhead. >> >>As programs get larger (or cannot be split into functional units) >>the generic trampolines a,b,c become increasingly more attractive. >>I don't see us there yet:) > If your program is hierarchical you can make the entry point for a calling tree banked > and the rest you make either static functions or you use them in "trust me" mode > (default non-banked). If "trust me" can be checked by the linker, we could also do11 bit > acall/ajmp instructions (near?). In "trust me" mode overhead is still low with few > interbank calls, but the library probably needs to reside in the common area. Only the > entry functions need return overhead and with a) there is very little trampoline function > overhead. The "trusted me" mode has a similar effect as banked(0), banked(1) but seems more complicated. > Also having many trampoline functions complicates things for users as they must > change many routines to their hardware. I had in mind the user would supply 4 bankswitching MACROs for 4 banks and the compiler would then generate the trampolines automatically. > Not all bankswitching is done inside a uC, Ack, probably a large fraction are devices with 128kByte external Eprom which is switched on A15/A16 by port bits (P3_0, P3_1 typically) > mostly outside. And having only one trampoline also spares the common area. That's > why I did not go for another variation: > > f) Like a) but one for every register bank so we can push ar0 and ar1 directly. Ack, around 4# wouldn't be worth it. > So my argument is to use as few as possible stack and trampolines. Proposal a-c) also use registers. This resource currently comes at no cost but I hope this will change some day: (Functions should pass their register usage upwards so the calling function should not push all the registers it cares about but only those which are used by the callee(tree)). We can probably agree that we are on extreme ends, one side uses (more stack, more code, no registers, few cycles) and the other one (less stack, less code, more registers, more cycles). If an application fits (by code and stack size) into d) or e) it will be superior to a-c). Very nice. If an application doesn't fit (by code _OR_ stack size) into d) or e) it won't work at all. Too poor to rate. If we don't agree on the weighing of these resources you should take the pragmatical approach and go ahead. One thing I'd like to have though is that the compiler should know about the bank it is switching to (by pragma or with a banked(n) keyword) Other arguments/opinions (in the order named?) anyone? Greetings, Frieder |