|
From: Ivo R. <iv...@iv...> - 2017-09-14 07:56:57
|
2017-09-12 21:12 GMT+02:00 Peter Bergner <be...@vn...>: > On 9/12/17 12:51 PM, Ivo Raisr wrote: >> Are there any comments, suggestions, objections to the patch attached to bug: >> https://bugs.kde.org/show_bug.cgi?id=384584 >> Callee saved registers listed first for AMD64, X86, and PPC architectures > > My guess on why the caller saved (aka volatile) regs are listed before > the callee saved (aka non-volatile) registers, is that is the order most > register allocators in compilers (eg, gcc, etc.) try and assign them. > They attempt to use caller saved regs for the majority of pseudos/vregs > that are not live across a function call, since those regs do not need > to be saved/restored in the prologue/epilogue (ie, they're cheap to use) > and it leaves the callee saved regs available for pseudos/vregs that > are live across calls, which means you don't have to spill them around > calls. Thank you for your response. This is something our current VEX register allocators don't do, at present. Would you like to extend v3? > Looking through host_generic_reg_alloc3.c, it doesn't seem like the VEX > register allocator keeps track of vregs that are live across calls... Yes, that's true. It only keeps tracks of start-end range. Then it allocates registers on the first-come first-serve basis within the constrains imposed by instruction's register usage, trying to get the callee save first. So it could happen that a callee saved register is allocated to a short-lived vreg which does not span call boundary, and several instructions later, under register shortage pressure, long-lived vreg is allocated to a caller saved register, leading to a necessary spill before a call. > Is that for simplicity reasons or it just didn't seem like it was needed? Simplicity and performance would be probably the main drivers here. VEX does not have any runtime profile-feedback mechanism which could tell which blocks are hot and which are cold. So all blocks get the same treatment. It would need to be carefully measured first if the added complexity of tracking vreg usage vs call span would benefit overall performance. Would you like to try that? Perhaps the current algorithm can be easily extended? I. |