|
From: John R. <jr...@bi...> - 2011-10-24 03:15:00
|
> Even smaller: put the addresses of the top 75 helpers into a vector > at the end of each guest state block: > adr lr,L101 // mov lr,pc rerturn address > ldr pc,[r8,#k+4*j] // k=sizeof(old guest block), j=helper# > L101: > memcheck has only 35 helper subroutines. 12 of them big-endian vs little-endian specializations, so 6 of these are unused in any given run. 10 of them are --track-origins=yes/no specializations, so 5 of these are totally unused when --track-origins=no, and the other 5 are mostly unused when --track-origins=yes. The table can have 24==(35 - 6 - 5) or 29==(35 - 6) slots. Put the slots at negative offsets from the register which points to the guest state (r8 in the case of ARM.) On x86_64, 16 slots will be addressable via one-byte displacement, which save 8 bytes and a register per call to internal helper: callq *-8*slot(%gstate) # 3 bytes for 16 slots; else 6 bytes vs movq $8_bytes,%reg # 9 bytes callq *%reg # 2 bytes This difference in size has a measurable impact on the Icache. -- |