|
From: Jeroen N. W. <jn...@xs...> - 2005-10-27 20:12:08
Attachments:
lk_main.c.diff
|
Greetings, Recent changes in VEX's instrumentation interface seem to allow the reinstatement of the full functionality of lackey. The attached patch attempts to do this, with the following questions: - To count the number of guest instructions, I count the number of Ist_IMark statements executed. Is this the correct approach? - One additional guest instruction is still counted for each building block (in add_one_BB). Is this (and the comment explaining it) still relevant? - lackey still uses the word 'UInstr'. Should this be replaced by something like 'VEX statement'? - In the 'switch (st->tag)' statement, the 'case Ist_Exit:' adds a deep copy of the statement to bb; whereas the 'default:' adds the statement itself. Is there a rationale behind this difference? Jeroen. |
|
From: Nicholas N. <nj...@cs...> - 2005-10-27 20:27:28
|
On Thu, 27 Oct 2005, Jeroen N. Witmond wrote:
> Recent changes in VEX's instrumentation interface seem to allow the
> reinstatement of the full functionality of lackey. The attached patch
> attempts to do this, with the following questions:
>
> - To count the number of guest instructions, I count the number of
> Ist_IMark statements executed. Is this the correct approach?
Yes.
> - One additional guest instruction is still counted for each building
> block (in add_one_BB). Is this (and the comment explaining it) still
> relevant?
I don't think so. Adding instrumentation at each IMark to count the
instruction should be enough.
> - lackey still uses the word 'UInstr'. Should this be replaced by
> something like 'VEX statement'?
Yes.
Similarly, the notion of basic block counting is no longer accurate
either, since Vex uses superblocks (single-entry, multiple-exit sequences
of code).
> - In the 'switch (st->tag)' statement, the 'case Ist_Exit:' adds a deep
> copy of the statement to bb; whereas the 'default:' adds the statement
> itself. Is there a rationale behind this difference?
I don't know. Cachegrind doesn't do deep copies like this. If you remove
the deep copy does it still work?
Another thing... as this comment explains...
/* We need to know the entry point for this bb to do this. In any
case it's pretty meaningless in the presence of bb chasing since
we may enter this function part way through an IRBB. */
... calling get_fnname_if_entry() only at the start of a block might cause
the entry to the function to be missed. You could call
get_fnname_if_entry() for every instruction (ie. on every IMark) instead.
Nick
|
|
From: Jeroen N. W. <jn...@xs...> - 2005-10-29 14:49:15
Attachments:
lk_main.c
lk-manual.xml
|
On Thu, 27 Oct 2005, Nicholas Nethercote wrote: > On Thu, 27 Oct 2005, Jeroen N. Witmond wrote: > >> Recent changes in VEX's instrumentation interface seem to allow the >> reinstatement of the full functionality of lackey. The attached patch >> attempts to do this, with the following questions: >> >> - To count the number of guest instructions, I count the number of >> Ist_IMark statements executed. Is this the correct approach? > > Yes. > >> - One additional guest instruction is still counted for each building >> block (in add_one_BB). Is this (and the comment explaining it) still >> relevant? > > I don't think so. Adding instrumentation at each IMark to count the > instruction should be enough. > Done. See attached file lk_main.c. >> - lackey still uses the word 'UInstr'. Should this be replaced by >> something like 'VEX statement'? > > Yes. > Done. > Similarly, the notion of basic block counting is no longer accurate > either, since Vex uses superblocks (single-entry, multiple-exit sequences > of code). > Lackey now counts both the number of BBs entered and the number completed. >> - In the 'switch (st->tag)' statement, the 'case Ist_Exit:' adds a deep >> copy of the statement to bb; whereas the 'default:' adds the statement >> itself. Is there a rationale behind this difference? > > I don't know. Cachegrind doesn't do deep copies like this. If you remove > the deep copy does it still work? > Without them, it still works the same. > Another thing... as this comment explains... > > /* We need to know the entry point for this bb to do this. In any > case it's pretty meaningless in the presence of bb chasing since > we may enter this function part way through an IRBB. */ > > ... calling get_fnname_if_entry() only at the start of a block might cause > the entry to the function to be missed. You could call > get_fnname_if_entry() for every instruction (ie. on every IMark) instead. > Done. In the part of lk_fini() where the counts are printed (beginning at line 190 in attached file lk_main.c), I had to change the format from %u to %llu to get it to work properly. I'm not sure that %u is the correct format for the ratios that are printed below the counts. I've also updated the Lackey manual. See attached file lk-manual.xml. Jeroen. |
|
From: Julian S. <js...@ac...> - 2005-10-30 01:03:46
|
Jeroen, great stuff.
> >> - To count the number of guest instructions, I count the number of
> >> Ist_IMark statements executed. Is this the correct approach?
> >
> > Yes.
Yes, although you can (optionally) use a different strategy which is
cheaper. Rather than call add_one_guest_instr each time an IMark
is passed, make the instrumentation loop increment a counter when
it passes an IMark. Then, either at the end of the BB or when you
get to an Ist_Exit, call a (new) fn add_N_guest_instrs and pass it
the counter. Then reset the counter to zero. In other words,
call the instruction-counting function once for each piece of
straight-line code. Cachegrind uses a similar strategy.
I'm not saying you should do this, considering lackey is supposed
to be a simple tool example, but it is a possible go-faster option.
Depending on what Nick thinks, and your hacking enthusiasm, there is
something that would make Lackey more useful whilst still being a
nice simple demo of how to make a tool. That is, generate counts
for all the following events:
guest instructions
conditional branches (split into: taken, not taken)
loads (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
stores (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
alu ops (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
Someone on the users list asked for something like this just the
other day (Christian Stimming, "Fast profiling in valgrind?", 25 Oct).
Personally I think it'd be a valuable addition.
Not hard to do either: for stores, examine Ist_Store, and use
typeOfIRExpr(bb->tyenv, st->Ist.Store.data) to get the store type.
For loads and ALU ops, you only need to look at Ist_Tmp cases
where the Ist.Tmp.data is either Iex_Load or Iex_{Unop,Binop}.
All statements you will ever encounter will satisfy isFlatIRStmt
which essentially constrains them to being flat SSA-style.
> >> - In the 'switch (st->tag)' statement, the 'case Ist_Exit:' adds a deep
> >> copy of the statement to bb; whereas the 'default:' adds the statement
> >> itself. Is there a rationale behind this difference?
No .. me being over cautious. I'm pretty sure you can simply copy
the statements themselves; doing deep copies just makes the instrumenter
run slower :-)
> 190 in attached file lk_main.c), I had to change the format from %u to
> %llu to get it to work properly.
%llu is the correct format for ULong, yes.
J
|
|
From: Jeroen N. W. <jn...@xs...> - 2005-10-30 12:30:30
|
On Sun, 30 Oct 2005, Julian Seward wrote:
>
> Jeroen, great stuff.
>
>> >> - To count the number of guest instructions, I count the number of
>> >> Ist_IMark statements executed. Is this the correct approach?
>> >
>> > Yes.
>
> Yes, although you can (optionally) use a different strategy which is
> cheaper. Rather than call add_one_guest_instr each time an IMark
> is passed, make the instrumentation loop increment a counter when
> it passes an IMark. Then, either at the end of the BB or when you
> get to an Ist_Exit, call a (new) fn add_N_guest_instrs and pass it
> the counter. Then reset the counter to zero. In other words,
> call the instruction-counting function once for each piece of
> straight-line code. Cachegrind uses a similar strategy.
>
> I'm not saying you should do this, considering lackey is supposed
> to be a simple tool example, but it is a possible go-faster option.
>
I don't like to burden Lackey with this, but I'll keep it in mind for the
next tool I'm working on: Blanket, a basic code coverage tool (mentioned
in '3.2.2. Suggested tools' in file docs/xml/writing-tools.xml).
> Depending on what Nick thinks, and your hacking enthusiasm, there is
> something that would make Lackey more useful whilst still being a
> nice simple demo of how to make a tool. That is, generate counts
> for all the following events:
>
> guest instructions
> conditional branches (split into: taken, not taken)
> loads (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
> stores (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
> alu ops (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
>
> Someone on the users list asked for something like this just the
> other day (Christian Stimming, "Fast profiling in valgrind?", 25 Oct).
> Personally I think it'd be a valuable addition.
>
> Not hard to do either: for stores, examine Ist_Store, and use
> typeOfIRExpr(bb->tyenv, st->Ist.Store.data) to get the store type.
> For loads and ALU ops, you only need to look at Ist_Tmp cases
> where the Ist.Tmp.data is either Iex_Load or Iex_{Unop,Binop}.
> All statements you will ever encounter will satisfy isFlatIRStmt
> which essentially constrains them to being flat SSA-style.
>
Done. See attached files lk_main.c and lk-manual.xml. This output is
generated only with command line option --show-guest-details=yes. To
achieve a nice format I had to hack coregrind/m_debuglog.c to:
- accept '*' as width specifier in format strings.
- fix the interpretation of VG_MSG_LJUSTIFY for strings, which was turned
around.
See attached file coregrind/m_debuglog.c.diff.
Some notes:
- Binary operations are counted under the type of their first argument.
The second argument is ignored.
- With the changes I made in the output, the regression test for Lackey
should fail, but does not, and I don't see why.
Jeroen.
>> >> - In the 'switch (st->tag)' statement, the 'case Ist_Exit:' adds a
>> deep
>> >> copy of the statement to bb; whereas the 'default:' adds the
>> statement
>> >> itself. Is there a rationale behind this difference?
>
> No .. me being over cautious. I'm pretty sure you can simply copy
> the statements themselves; doing deep copies just makes the instrumenter
> run slower :-)
>
>> 190 in attached file lk_main.c), I had to change the format from %u to
>> %llu to get it to work properly.
>
> %llu is the correct format for ULong, yes.
>
> J
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.
> Get Certified Today * Register for a JBoss Training Course
> Free Certification Exam for All Training Attendees Through End of 2005
> Visit http://www.jboss.com/services/certification for more information
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
>
|
|
From: Julian S. <js...@ac...> - 2005-10-30 17:15:59
Attachments:
lk_main.c
|
> Done. See attached files lk_main.c and lk-manual.xml. This output is > generated only with command line option --show-guest-details=yes. To > achieve a nice format I had to hack coregrind/m_debuglog.c to: > - accept '*' as width specifier in format strings. Cool. I simplified the instrumentation loop a bit, removed the assumptions about Ity_.. ordering, and changed the output a bit, so it now prints this: ==11136== IR-level counts by type: ==11136== Type Loads Stores AluOps ==11136== ------------------------------------------- ==11136== I1 0 0 647,904,654 ==11136== I8 43,422,956 10,545,468 102,858,666 ==11136== I16 11,699,415 1,784,133 4,123,740 ==11136== I32 743,727,534 467,327,806 2,151,045,236 ==11136== I64 11,107 1,921 1,702,131 ==11136== I128 0 0 0 ==11136== F32 1,999,224 75 75 ==11136== F64 3,900,295 999,008 11,441,729 ==11136== V128 0 0 0 ==11136== It does run slowly though, largely as a result of doing at least one helper call per IR stmt. Maybe that's not important. J |
|
From: Nicholas N. <nj...@cs...> - 2005-11-01 16:26:48
|
On Sun, 30 Oct 2005, Julian Seward wrote:
>>>> - To count the number of guest instructions, I count the number of
>>>> Ist_IMark statements executed. Is this the correct approach?
>
> Yes, although you can (optionally) use a different strategy which is
> cheaper. Rather than call add_one_guest_instr each time an IMark
> is passed, make the instrumentation loop increment a counter when
> it passes an IMark. Then, either at the end of the BB or when you
> get to an Ist_Exit, call a (new) fn add_N_guest_instrs and pass it
> the counter. Then reset the counter to zero. In other words,
> call the instruction-counting function once for each piece of
> straight-line code. Cachegrind uses a similar strategy.
Why not just increment the real global counter in-line? As opposed to
incrementing a temporary counter, and periodically adding it to global
counter?
> Depending on what Nick thinks, and your hacking enthusiasm, there is
> something that would make Lackey more useful whilst still being a
> nice simple demo of how to make a tool. That is, generate counts
> for all the following events:
>
> guest instructions
> conditional branches (split into: taken, not taken)
> loads (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
> stores (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
> alu ops (split into: integer, FP, 64-bit SIMD, 128-bit SIMD)
>
> Someone on the users list asked for something like this just the
> other day (Christian Stimming, "Fast profiling in valgrind?", 25 Oct).
> Personally I think it'd be a valuable addition.
>
> Not hard to do either: for stores, examine Ist_Store, and use
> typeOfIRExpr(bb->tyenv, st->Ist.Store.data) to get the store type.
> For loads and ALU ops, you only need to look at Ist_Tmp cases
> where the Ist.Tmp.data is either Iex_Load or Iex_{Unop,Binop}.
> All statements you will ever encounter will satisfy isFlatIRStmt
> which essentially constrains them to being flat SSA-style.
I'm happy for Lackey to change. People very often ask how to get the
stream of memory accesses made by a program, I wrote "Dullard" (see
http://www.valgrind.org/downloads/variants.html?njn) to do this, but it's
based on 2.1.2 and so works with UCode.
It would be great if Lackey could give the memory accesses as well as the
info Julian suggested above, so it would serve as a much better example
tool. Ideally the different bits of functionality (getting instruction
counts, getting memory access traces) would be clearly delineated so that
people could chop out the bits they don't need easily... perhaps having
various options like --trace-mem-accesses, --do-instr-counts, etc, would
make this obvious.
As for efficiency, it might be best to keep things simple -- eg. one C
call per IMark -- but have comments that briefly describe how things might
be done more efficiently.
Nick
|
|
From: Julian S. <js...@ac...> - 2005-11-01 17:41:57
|
> Why not just increment the real global counter in-line? As opposed to > incrementing a temporary counter, and periodically adding it to global > counter? Ah yes, that never occurred to me. You could do that too (for a massive speedup) but in fact the two approaches (in-line increments vs rolling multiple increments into one) are independent and so could both be applied, which would give lackey a virtually insignificant overhead compared to no instrumentation at all. The only objection might be that then generated code would be accessing memory in areas that belong to V and not to the client. So it wouldn't have worked with pointercheck, but that's no longer the case. (Didn't JosefW have a similar problem some time back?) Whereas mediating everything through helper calls is 'cleaner' in some respect. > I'm happy for Lackey to change. Ok good. I'll commit what I have. J |
|
From: Nicholas N. <nj...@cs...> - 2005-11-01 16:39:29
|
On Sun, 30 Oct 2005, Jeroen N. Witmond wrote: > I don't like to burden Lackey with this, but I'll keep it in mind for the > next tool I'm working on: Blanket, a basic code coverage tool (mentioned > in '3.2.2. Suggested tools' in file docs/xml/writing-tools.xml). Some work has already gone into coverage tools. Benoit Peccatte was working on one earlier this year. Look for this email to valgrind-developers, and others from around the same time. Date: Fri, 29 Apr 2005 11:33:38 +0200 From: Benoit Peccatte <ben...@en...> To: val...@li... Subject: Re: [Valgrind-developers] Code Coverage I was also working on a coverage tool (VCov) then. I've put a tarball of my working source directory at www.cs.utexas.edu/~njn/vcov.tar.bz2. It is a bit old -- it uses Vex, but if you SVN update you'll have to make some changes to get it to compile again (if you just compile it as is it should work). vcov/vc_main.c is moderately well commented, so hopefully you'll be able to understand what's going on. vcov/vc_annotate.in is the annotation script. The approach taken relies totally on the debug information being present and correct -- I don't see how else to do it -- and I found that one version of GCC (3.3.4? can't remember now) was not producing correct debug info and so it wasn't working well. Anyway, IIRC it basically works, although I haven't tested it thoroughly. It should serve as a useful starting point, or perhaps you can think of a better way of doing things. Nick |
|
From: Josef W. <Jos...@gm...> - 2005-11-01 20:27:14
|
On Tuesday 01 November 2005 18:42, Julian Seward wrote: > > > Why not just increment the real global counter in-line? As opposed to > > incrementing a temporary counter, and periodically adding it to global > > counter? > > Ah yes, that never occurred to me. Regarding helper call vs. inlining: How much effort would it be to let Valgrind do the inlining of a C helper? > The only objection might be that then generated code would be accessing > memory in areas that belong to V and not to the client. So it wouldn't > have worked with pointercheck, but that's no longer the case. > (Didn't JosefW have a similar problem some time back?) Yes. My instrumented code writes to a global tool variable. I switched off pointer check. I had a patch for an UCode memory store instruction to bypass pointer check, but it was not worth it. Josef |
|
From: Nicholas N. <nj...@cs...> - 2005-11-01 21:13:30
|
On Tue, 1 Nov 2005, Josef Weidendorfer wrote: > Regarding helper call vs. inlining: How much effort would it be to let Valgrind > do the inlining of a C helper? A lot, I think. How would you do it? I believe Pin can do this, but the inlining fails if the C function modifies the condition codes, so in practice anything more than the tiniest function will not be inlined. Nick |
|
From: Josef W. <Jos...@gm...> - 2005-11-01 22:24:46
|
On Tuesday 01 November 2005 22:13, Nicholas Nethercote wrote: > On Tue, 1 Nov 2005, Josef Weidendorfer wrote: > > > Regarding helper call vs. inlining: How much effort would it be to let Valgrind > > do the inlining of a C helper? > > A lot, I think. How would you do it? I just thought we could let VEX chasing to the work, and it would help if the helper is found on the client side (in a vgpreload*.so). But it is obviously a lot more difficult, and instrumenting the helper is not really useful :-) > I believe Pin can do this, I read about this, too. You can give a hint to PIN if inlining at different points is possible, so PIN would be able to choose a point where condition codes do not have to be restored afterwards. > but the inlining fails if the C function > modifies the condition codes, so in practice anything more than the > tiniest function will not be inlined. Probably. And such tiny helpers can be inlined manually. Josef > > Nick > > |
|
From: Jeroen N. W. <jn...@xs...> - 2005-11-02 19:46:09
|
On Tue, 1 Nov 2005, Nicholas Nethercote wrote: > On Sun, 30 Oct 2005, Jeroen N. Witmond wrote: > >> I don't like to burden Lackey with this, but I'll keep it in mind for >> the >> next tool I'm working on: Blanket, a basic code coverage tool (mentioned >> in '3.2.2. Suggested tools' in file docs/xml/writing-tools.xml). > > Some work has already gone into coverage tools. Benoit Peccatte was > working on one earlier this year. Look for this email to > valgrind-developers, and others from around the same time. > > Date: Fri, 29 Apr 2005 11:33:38 +0200 > From: Benoit Peccatte <ben...@en...> > To: val...@li... > Subject: Re: [Valgrind-developers] Code Coverage > > I was also working on a coverage tool (VCov) then. I've put a tarball of > my working source directory at www.cs.utexas.edu/~njn/vcov.tar.bz2. It is > a bit old -- it uses Vex, but if you SVN update you'll have to make some > changes to get it to compile again (if you just compile it as is it should > work). vcov/vc_main.c is moderately well commented, so hopefully you'll > be able to understand what's going on. vcov/vc_annotate.in is the > annotation script. Thanks. This is useful as starting point. > The approach taken relies totally on the debug information being present > and correct -- I don't see how else to do it -- and I found that one > version of GCC (3.3.4? can't remember now) was not producing correct > debug info and so it wasn't working well. The approach taken by cachegrind, vcov and cover relies on debug information being present and correct. This approach determines the file name and line number of a guest instruction each time it is instrumented, and uses that information to group the coverage data. In the approach I am taking in blanket, the guest instructions are grouped by address into ranges. Each range had a single entry point at the first guest instruction and a single (conditional) exit at the last guest instruction in the range. During instrumentation, a table of ranges is built, and for each range one or two helpers are called to update the relevant entry in the table. In a perfect world, all translations of guest instruction addresses into source files and line numbers is done once for each guest instruction by bk_fini(). In the real world, for a shared object this translation must be done before the object is unloaded. The loading and unloading of shared objects, and the presence of self-modifying programs or generated guest instructions, will require the use of a versioning scheme in the table of ranges. If the debug information is sufficient, blanket will be able to report multiple execution counts and branch results for one source line, overcoming some of the problems mentioned in http://www.bullseye.com/coverage.html. If debug information is absent or incomplete, blanket can still report all coverage data collected, using executable/library names (from /proc/self/maps) and offset ranges to label the data. What do you think? > Anyway, IIRC it basically works, although I haven't tested it thoroughly. > It should serve as a useful starting point, or perhaps you can think of a > better way of doing things. > > Nick > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |
|
From: Nicholas N. <nj...@cs...> - 2005-11-02 20:17:05
|
On Wed, 2 Nov 2005, Jeroen N. Witmond wrote: > The approach taken by cachegrind, vcov and cover relies on debug > information being present and correct. This approach determines the file > name and line number of a guest instruction each time it is instrumented, > and uses that information to group the coverage data. > > In the approach I am taking in blanket, the guest instructions are grouped > by address into ranges. Each range had a single entry point at the first > guest instruction and a single (conditional) exit at the last guest > instruction in the range. During instrumentation, a table of ranges is > built, and for each range one or two helpers are called to update the > relevant entry in the table. It would be worth starting with the simple approach of one C call per instruction, and then move to the more complex range approach, so that you can measure how much of a speedup you get. With Cachegrind I've had surprising experiences with this kind of optimization, sometimes they don't work as well as you would expect. > In a perfect world, all translations of guest instruction addresses into > source files and line numbers is done once for each guest instruction by > bk_fini(). In the real world, for a shared object this translation must be > done before the object is unloaded. The loading and unloading of shared > objects, and the presence of self-modifying programs or generated guest > instructions, will require the use of a versioning scheme in the table of > ranges. Why wait until bk_fini() to do the debug info lookup? If you do it in bk_instrument() you don't have to worry about the debug info having been unloaded. > If the debug information is sufficient, blanket will be able to report > multiple execution counts and branch results for one source line, > overcoming some of the problems mentioned in > http://www.bullseye.com/coverage.html. If debug information is absent or > incomplete, blanket can still report all coverage data collected, using > executable/library names (from /proc/self/maps) and offset ranges to label > the data. That sounds similar to Cachegrind/VCov, but for instructions that lack debug info you are not putting them into a single "???" bucket, but rather giving the location in the binary. How will a user utilise the binary location? Nick |
|
From: Jeroen N. W. <jn...@xs...> - 2005-11-03 19:40:03
|
On Wed 2 november 2005, Nicholas Nethercote wrote: > On Wed, 2 Nov 2005, Jeroen N. Witmond wrote: > >> The approach taken by cachegrind, vcov and cover relies on debug >> information being present and correct. This approach determines the file >> name and line number of a guest instruction each time it is >> instrumented, >> and uses that information to group the coverage data. >> >> In the approach I am taking in blanket, the guest instructions are >> grouped >> by address into ranges. Each range had a single entry point at the first >> guest instruction and a single (conditional) exit at the last guest >> instruction in the range. During instrumentation, a table of ranges is >> built, and for each range one or two helpers are called to update the >> relevant entry in the table. > > It would be worth starting with the simple approach of one C call per > instruction, and then move to the more complex range approach, so that you > can measure how much of a speedup you get. With Cachegrind I've had > surprising experiences with this kind of optimization, sometimes they > don't work as well as you would expect. The grouping of guest instructions into ranges is intended to be an infrastructure that can be reused, for instance to create control flow graphs. blanket just happens to be the first tool to use this infrastructure. The advantage for the tools using this infrastructure is that they can treat the range as an atom. (This just happens to be the level of granularity blanket needs to produce correct results.) It may not optimize the execution of the tool, but it should (will) optimize the performance of the programmers using it. :-) >> In a perfect world, all translations of guest instruction addresses into >> source files and line numbers is done once for each guest instruction by >> bk_fini(). In the real world, for a shared object this translation must >> be >> done before the object is unloaded. The loading and unloading of shared >> objects, and the presence of self-modifying programs or generated guest >> instructions, will require the use of a versioning scheme in the table >> of >> ranges. > > Why wait until bk_fini() to do the debug info lookup? If you do it in > bk_instrument() you don't have to worry about the debug info having been > unloaded. A combination of motives: the entire concept of source location is irrelevant for the instrumentation loop and the helper functions; the (probabaly slight) performance increase by doing the lookup once for each guest instruction instead of each time it is instrumented; not having the debug information using space until after the execution of the guest program; and the feeling it is neater this way. >> If the debug information is sufficient, blanket will be able to report >> multiple execution counts and branch results for one source line, >> overcoming some of the problems mentioned in >> http://www.bullseye.com/coverage.html. If debug information is absent or >> incomplete, blanket can still report all coverage data collected, using >> executable/library names (from /proc/self/maps) and offset ranges to >> label >> the data. > > That sounds similar to Cachegrind/VCov, but for instructions that lack > debug info you are not putting them into a single "???" bucket, but rather > giving the location in the binary. How will a user utilise the binary > location? That depends on what you use blanket for. When blanket is a tool in an ordinary development process, the user does not need (and should not have to resort to) binary locations. However, when reverse engineering a module or library for which you do have the rights, but not the sources, blanket can help, for instance by determining entry and exit instruction sequences, or by determining which options activate which ranges of code. Anyway, I'm not saying that the binary locations will be output by default, just that they can be available. Jeroen. |
|
From: Nicholas N. <nj...@cs...> - 2005-11-03 19:58:52
|
On Thu, 3 Nov 2005, Jeroen N. Witmond wrote: > The grouping of guest instructions into ranges is intended to be an > infrastructure that can be reused, for instance to create control flow > graphs. blanket just happens to be the first tool to use this > infrastructure. The advantage for the tools using this infrastructure is > that they can treat the range as an atom. (This just happens to be the > level of granularity blanket needs to produce correct results.) It may not > optimize the execution of the tool, but it should (will) optimize the > performance of the programmers using it. :-) Ok, but be careful... in computer science we are taught that generalizing things is a good idea. It often is, but not always. Sometimes factoring out things in a "general" way just creates a level of indirection that makes things harder to understand. Particularly so if one factors out things that are supposedly common based on only a single example -- in such a case it's very easy to get the factoring wrong. So I suggest taking the simple route at first, and then worrying about creating infrastructure later. Nick |
|
From: Josef W. <Jos...@gm...> - 2005-11-03 21:32:39
|
On Thursday 03 November 2005 20:39, Jeroen N. Witmond wrote: > The grouping of guest instructions into ranges is intended to be an > infrastructure that can be reused, for instance to create control flow > graphs. > ... I agree that it would be nice for tool authors to have a pool of reusable tool components available. If there is interest, I could try to pull out some functionality of callgrind into such components (e.g. call graph tracing, jump tracing, event set registration, dumping events as cachegrind/callgrind format, CLO parsing for function patterns, ...) The problem is that such components sometimes have instrumentation needs, and it is not obvious for me how to compose different component instrumentations together in a robust way. Perhaps we would have to tag added VEX code of one component, so that another does not try to instrument the other component's instrumentation? > That depends on what you use blanket for. When blanket is a tool in an > ordinary development process, the user does not need (and should not have > to resort to) binary locations. However, when reverse engineering a module > or library for which you do have the rights, but not the sources, blanket > can help, for instance by determining entry and exit instruction > sequences, or by determining which options activate which ranges of code. I have the feeling that KCachegrind's disassembler annotation with callgrinds jump tracing already gives you a lot of information for such reverse engineering tasks, although it was not planned to be used this way. A better visualization of inner-function control flow and loop nest detection is on my long todo list for KCachegrind. But an open problem here is to map back to source in the scope of compiler optimizations. Josef |