|
From: Ben L. <li...@cs...> - 2006-12-15 01:54:51
|
I am creating a custom Valgrind tool that adds instrumentation code to Ist_Exit instructions. However, the extra instrumentation is only needed the first time each Ist_Exit instruction executes. After that, I'd like to discard the instrumentation and leave the Ist_Exit instruction exactly as it was in the uninstrumented basic block. When my instrumentor function sees an Ist_Exit instruction, it injects a dirty call (via unsafeIRDirty_0_N). The call provides one argument: the current code address as revealed by the closest preceding Ist_IMark instruction. Call this "codeAddr". Inside the called dirty function, I use VALGRIND_DISCARD_TRANSLATIONS(codeAddr, codeAddr + 1) to discard Valgrind's translation of the instrumented Ist_Exit instruction. Some bookkeeping on the side will let me notice that the *next* time the same instruction is considered for instrumentation, I should just pass the instruction along unchanged. Is it OK to use VALGRIND_DISCARD_TRANSLATIONS from within a dirty-called function? Is it OK if the instrumented translation being discarded includes the code which made the dirty call in the first place? (That is, can instrumentation be self-removing in this manner?) Empirically, Valgrind doesn't explode or set my cat on fire when I make this call. But it appears that the translation is not actually being discarded either. The number of basic blocks my code is asked to instrument is identical with or without the VALGRIND_DISCARD_TRANSLATIONS call. That suggests that the translations are not actually being discarded, because if they were, they would need to be reinstrumented later when the code loops back to the same basic block for a second time. Any tips on how to make this work would be much appreciated. -- Ben |
|
From: Nicholas N. <nj...@cs...> - 2006-12-15 02:47:10
|
On Thu, 14 Dec 2006, Ben Liblit wrote: > I am creating a custom Valgrind tool that adds instrumentation code to > Ist_Exit instructions. However, the extra instrumentation is only > needed the first time each Ist_Exit instruction executes. After that, > I'd like to discard the instrumentation and leave the Ist_Exit > instruction exactly as it was in the uninstrumented basic block. > > When my instrumentor function sees an Ist_Exit instruction, it injects a > dirty call (via unsafeIRDirty_0_N). The call provides one argument: the > current code address as revealed by the closest preceding Ist_IMark > instruction. Call this "codeAddr". > > Inside the called dirty function, I use > VALGRIND_DISCARD_TRANSLATIONS(codeAddr, codeAddr + 1) to discard > Valgrind's translation of the instrumented Ist_Exit instruction. Some > bookkeeping on the side will let me notice that the *next* time the same > instruction is considered for instrumentation, I should just pass the > instruction along unchanged. > > Is it OK to use VALGRIND_DISCARD_TRANSLATIONS from within a dirty-called > function? Is it OK if the instrumented translation being discarded > includes the code which made the dirty call in the first place? (That > is, can instrumentation be self-removing in this manner?) > > Empirically, Valgrind doesn't explode or set my cat on fire when I make > this call. But it appears that the translation is not actually being > discarded either. The number of basic blocks my code is asked to > instrument is identical with or without the > VALGRIND_DISCARD_TRANSLATIONS call. That suggests that the translations > are not actually being discarded, because if they were, they would need > to be reinstrumented later when the code loops back to the same basic > block for a second time. > > Any tips on how to make this work would be much appreciated. I'm surprised what you're trying doesn't make Valgrind explode. Perhaps the code in the cache is marked as removed but not immediately overwritten. I have no idea why the code would not be genuinely discarded. First question: do you really need this optimisation? Does it make a big difference? I'd be interested to know what the tool does. If so, I'd probably use the support for conditional dirty calls -- create a boolean in memory for each Ist_Exit, set it initially, clear it once it's reached, and make the call conditional on it. It's a little slower at run-time, but avoid repeatedly JITting the block. Nick |
|
From: Ben L. <li...@cs...> - 2006-12-15 03:03:51
|
Nick Nethercote wrote: > I'm surprised what you're trying doesn't make Valgrind explode. > [...] I have no idea why the code would not be genuinely discarded. Heh. :-D I'm a bit surprised too. My best guess is that the VALGRIND_DISCARD_TRANSLATIONS call is behaving as a no-op just like any other client call would when executed in non-Valgrind-translated code. I'm invoking VALGRIND_DISCARD_TRANSLATIONS from an external function, and presumably Valgrind leaves these alone. > First question: do you really need this optimisation? Does it make a > big difference? I'd be interested to know what the tool does. I do not know how big a difference it makes, since it's not actually working yet. I was hoping to measure that experimentally. Actually, perhaps I can do a better job of estimating the maximum possible benefit of the optimization. It will never be faster than running the code with no instrumentation inserted at all, ever. So I can profile that; profile instrumentation without self-removal; and see if there's a big difference. If the difference is insignificant, it's not worth pursuing this idea any further. > If so, I'd probably use the support for conditional dirty calls Ah, I hadn't noticed that before. OK, I may give this a try. It's likely that this will be slower than doing things unconditionally, though. The actual instrumentation work is pretty fast, and can even be "inlined" using just a handful of UCode instructions instead of a dirty call if I'm not trying to do crazy things like call VALGRIND_DISCARD_TRANSLATIONS. Thanks for the feedback and ideas, Nick! -- Ben |
|
From: Nicholas N. <nj...@cs...> - 2006-12-15 03:35:06
|
On Thu, 14 Dec 2006, Ben Liblit wrote: > Heh. :-D I'm a bit surprised too. My best guess is that the > VALGRIND_DISCARD_TRANSLATIONS call is behaving as a no-op just like any > other client call would when executed in non-Valgrind-translated code. > I'm invoking VALGRIND_DISCARD_TRANSLATIONS from an external function, > and presumably Valgrind leaves these alone. Ah, yes, that's what'll happen. If you run Valgrind under Valgrind things will be different :) >> If so, I'd probably use the support for conditional dirty calls > > Ah, I hadn't noticed that before. OK, I may give this a try. It's > likely that this will be slower than doing things unconditionally, > though. The actual instrumentation work is pretty fast, and can even be > "inlined" using just a handful of UCode instructions instead of a dirty > call if I'm not trying to do crazy things like call > VALGRIND_DISCARD_TRANSLATIONS. Which version of Valgrind are you using? The UCode representation hasn't been used since version 3.0.0, when we switched to the "Vex" representation. This was about two years ago. Vex supports conditional calls, but I don't think UCode did. Nick |
|
From: Julian S. <js...@ac...> - 2006-12-15 04:40:19
|
On Friday 15 December 2006 03:34, Nicholas Nethercote wrote: > On Thu, 14 Dec 2006, Ben Liblit wrote: > > Heh. :-D I'm a bit surprised too. My best guess is that the > > VALGRIND_DISCARD_TRANSLATIONS call is behaving as a no-op just like any > > other client call would when executed in non-Valgrind-translated code. > > I'm invoking VALGRIND_DISCARD_TRANSLATIONS from an external function, > > and presumably Valgrind leaves these alone. Yes, I'd agree with that analysis. Apart from this problem your scheme should work. However, making translations is expensive. From crude measurements we see that making a translation costs in the order of 500k host instructions. And VALGRIND_DISCARD_TRANSLATIONS isn't free either, since it involves finding all translations that intersect a given address range, and invalidating the dispatcher's fast-case cache. So I'd suggest it's likely to be a net performance loss most of the time anyway. Probably simpler and easier to reorganise your helper functions so that it doesn't matter how many times they are called. > >> If so, I'd probably use the support for conditional dirty calls > > > > Ah, I hadn't noticed that before. OK, I may give this a try. You could use conditional dirty calls, but then you have to somewhere store the information about whether a call should happen or not. J |
|
From: Ben L. <li...@cs...> - 2006-12-15 04:31:08
|
Nick Nethercote wrote: > The UCode representation hasn't been used since version 3.0.0, when > we switched to the "Vex" representation. Oops, I meant Vex. The Valgrind technical documentation currently online still prominently refers to UCode, so I assumed that's what it was still called. I've got valgrind-3.2.1, so that puts me in the Vex world with (apparently) outdated docs. |
|
From: Nicholas N. <nj...@cs...> - 2006-12-15 04:40:10
|
On Thu, 14 Dec 2006, Ben Liblit wrote: >> The UCode representation hasn't been used since version 3.0.0, when >> we switched to the "Vex" representation. > > Oops, I meant Vex. The Valgrind technical documentation currently > online still prominently refers to UCode, so I assumed that's what it > was still called. I've got valgrind-3.2.1, so that puts me in the Vex > world with (apparently) outdated docs. The "Design and Implementation of Valgrind" section says in the intro "[Note: this document is now very old, and a lot of its contents are out of date, and misleading.]" The best documentation at present for Vex is in VEX/pub/libvex_ir.h. Nick |