|
From: John R.
|
>>Note that x86 has the same generic difficulty. "jmp *%ecx" >>could be a switch or a return, and a 'ret' _also_ could be a switch >>or a return. Both are "merely" a computed GOTO. Stylistically >>nearly every 'ret' is a subroutine return, and nearly every "jmp *reg" >>is not; but there are legitimate exceptions both ways. > > > In practice, callgrind currently is working really well with x86 > (and x86-64) code. > > Using a "ret" only as subroutine return is not only a stylistic > issue: on modern x86 processors, using a return stack for branch prediction > is quite common. Ie. it gives bad performance when you use a "ret" as > a computed goto, as you confuse the branch predictor. The branch predictor gets confused only because Intel refused to allow the programmer to inform the branch predictor of the program's intent. The opcode FF /7 is available to say "PUSH a return address", and the opcodes 8F /1 through 8F /7 are available to say "POP a return address". These are right next to the existing opcodes FF /6 (PUSH r/m) and 8F /0 (POP r/m) so there would be almost no cost in the hardware decoding. [In the typical case where r/m is in fact a register, then the programmer pays 1 byte for using each new opcode, in contrast to a single-byte PUSH/POP register instruction (50/58 series opcodes.)] Sending the PUSH/POP signal from the decoder to the on-chip return address stack also would use existing wires. For want of two gates and two multiplexors, measurable performance is lost. -- |