From: Julian S. <js...@ac...> - 2011-10-25 08:07:04
|
> > and significantly > > poorer code generation due to non-availability of MOVW and MOVT > > for 32-bit constant generation. > > pc-relative LDR of the constant is just as fast and just as small > as MOVW+MOVT. For the specific case of a call to an internal helper: > adr lr,L101 // add lr,pc,#4 return address > ldr pc,L100 // ldr pc,[pc,#-4] goto the helper > L100: .word helper // may be re-used in same translation block > // insert other constants here! > L101: I don't remember the details, but when I measured it, pulling constants out of memory gave significantly worse performance than using MOVW+MOVT when running on a Cortex-A8. From a microarchitectural perspective that doesn't surprise me: * it causes Dcache pollution, by having to have the constants in Dcache in a situation where we already have a high cache miss rate * it means the code is subject to at least one load-use stall, even in the case where the constant is in D1 * there's a lot less latitude for the hardware to schedule the load earlier. Moving the MOVW+MOVT pair earlier is easier, since they aren't data dependent on anything. (related to the previous point). To be fair, this is mostly of significant to A9, since A8 isn't dynamically scheduled. J |