Having epilogue yield points allows the adaptive system to attribute a sample (gathered at a thread switch or taken yield point) to a method that ends with lots of straightline code.   Without epilogue yield points we take samples on entry to a method (attributing to the calling method) and on loop back edges (attributing to the current method).   If a method ends with a long sequence of non-loop statements, we would be unable to attribute any time to that method.

I did actually see this happen in mpeqaudio.   There was a method that was about 1% of the total execution time, but was only showing up as  0.01% because it had no loops and contained about 300 HIR instructions.   This led to the adaptive system not detecting it was hot enough until several iterations of the benchmark (something like 7 or 8).   (At the time we had a hack to give this method some of the samples - basically we attributed an entry yieldpoint to both the caller and current method.)

After adding epilogue yield points the method was detected as hot sooner (around 4-5th iteration?), resulting in something like a 3% or so performance improvement sooner.  

Caveat: I'm quoting all this numbers from my memory.  It was over a year ago.


- - - - - - - - - - - - - - - -
Michael Hind, Manager, Dynamic Optimization Group, Jalapeņo Project          
IBM Watson Research Center
hind@watson.ibm.com, 914 784-7589, tie: 863-7589
Jikes RVM open source release:  http://www.ibm.com/developerworks/oss/jikesrvm