I did something like this for our 2002 PLDI paper on load-value prediction (http://www.inf.usi.ch/faculty/hauswirth/publications.html
). However, that was for a much older version of the Jikes RVM (or Jalapeno), targeting PowerPC processors.
I modified the baseline and the optimizing compiler to add instrumentation (after register allocation, for the opt compiler). I think the instrumentation wrote into a buffer at a fixed address and flushed the buffer on overflow (page fault), compressing the address trace on the fly, and writing it into a file. Offline I classified memory accesses into stack/heap/statics (based on information about the memory layout) and I ran them through various cache and value predictor simulators.