|
From: Julian S. <js...@ac...> - 2007-06-07 06:32:36
|
> Using a "ret" only as subroutine return is not only a stylistic > issue: on modern x86 processors, using a return stack for branch prediction > is quite common. Ie. it gives bad performance when you use a "ret" as > a computed goto, as you confuse the branch predictor. That's true, and it's also true on ppc. If a branch-to-LR appears, the branch predictor needs to decide whether to predict from the return stack or using the normal mechanism. Given that it has no obvious way to decide, the ppc optimisation guides suggest that in practice it is best to use branch-to-LR only for procedure returns, and use branch-to-CTR only for other computed gotos, and so the cpu has that knowledge embedded in it. If you see what I mean. (LR and CTR are different registers). Indeed, coregrind/m_dispatch in V on ppc used to use branch-to-LR to jump to translations. When I changed that to branch-to-CTR I got a measurable performance increase on POWER4, presumably due to not trashing the branch predictors so much. J |