|
From: Havoc P. <hp...@re...> - 2004-12-28 17:53:07
Attachments:
coverage_main.c
|
[resend with python script dropped so it's below 40K limit] Hi, I've made some lame progress on such a thing (attached), trying to replace my gcov-based hack for D-BUS. http://cvs.freedesktop.org/dbus/dbus/test/decode-gcov.c?view=markup is the current hack. The main purpose of the tool for me is an overall report for the whole source tree, "make check-coverage" - an example is in this old post: https://listman.redhat.com/archives/message-bus-list/2003-April/msg00191.html To implement I have a valgrind tool that collects the data for a process, and started on a python script that is intended to merge the valgrind data files for every test in "make check" and generate a report. Right now it just does gcov-type annotation. make check -> numerous coverage.out.NNNN files -> overall report I have a couple of questions if anyone has insights. 1. Thoughts on how a coverage tool can find out about basic blocks that don't get translated? I'm not sure how to limit this; obviously it would be bad to instrument piles of system library code that won't get executed, but I'd like to be able to analyze all the blocks in the source tree that "make check-coverage" applies to. So maybe some way to ask to see blocks based on object file they are in, or source files referenced in the debuginfo, I don't know. So I could say "for libdbus, just instrument the whole thing" I also turned up this in the archives: http://sourceforge.net/mailarchive/message.php?msg_id=6556760 I don't think I really need static info though, just a larger selected subset of the dynamic info. 2. I'm having problems with fork() not followed by exec() in my test suite, which causes valgrind to call SK_(fini) once per child. Once I fix the trivial bug (that the pid is saved in the output filename at init rather than fini time), there's a harder issue of making it do something sensible. I guess maybe reset the execution counts on all blocks and arcs in the child would be logical, to avoid duplicates. I'm not sure how to hook into fork() though or if there's prior art on the best way to do this. (I see a VG_(atfork) feature but not available to tools?) Anyhow, any comments welcome. In case it isn't obvious I'm mostly ignorant of assembler and compilers. I don't really have time to maintain a generic coverage tool but I'd like to spend a few more days on it and get it to work for my purposes. I saw a couple blog posts about people doing work in this area so I thought I'd post. Thanks, Havoc |
|
From: Havoc P. <hp...@re...> - 2004-12-28 02:00:41
Attachments:
coverage_main.c
coverage_report.py
|
Hi, I've made some lame progress on such a thing (attached), trying to replace my gcov-based hack for D-BUS. http://cvs.freedesktop.org/dbus/dbus/test/decode-gcov.c?view=markup is the current hack. The main purpose of the tool for me is an overall report for the whole source tree, "make check-coverage" - an example is in this old post: https://listman.redhat.com/archives/message-bus-list/2003-April/msg00191.html To implement I have a valgrind tool that collects the data for a process, and started on a python script that is intended to merge the valgrind data files for every test in "make check" and generate a report. Right now it just does gcov-type annotation. make check -> numerous coverage.out.NNNN files -> overall report I have a couple of questions if anyone has insights. 1. Thoughts on how a coverage tool can find out about basic blocks that don't get translated? I'm not sure how to limit this; obviously it would be bad to instrument piles of system library code that won't get executed, but I'd like to be able to analyze all the blocks in the source tree that "make check-coverage" applies to. So maybe some way to ask to see blocks based on object file they are in, or source files referenced in the debuginfo, I don't know. So I could say "for libdbus, just instrument the whole thing" I also turned up this in the archives: http://sourceforge.net/mailarchive/message.php?msg_id=6556760 I don't think I really need static info though, just a larger selected subset of the dynamic info. 2. I'm having problems with fork() not followed by exec() in my test suite, which causes valgrind to call SK_(fini) once per child. Once I fix the trivial bug (that the pid is saved in the output filename at init rather than fini time), there's a harder issue of making it do something sensible. I guess maybe reset the execution counts on all blocks and arcs in the child would be logical, to avoid duplicates. I'm not sure how to hook into fork() though or if there's prior art on the best way to do this. (I see a VG_(atfork) feature but not available to tools?) Anyhow, any comments welcome. In case it isn't obvious I'm mostly ignorant of assembler and compilers. I don't really have time to maintain a generic coverage tool but I'd like to spend a few more days on it and get it to work for my purposes. I saw a couple blog posts about people doing work in this area so I thought I'd post. Havoc |
|
From: Josef W. <Jos...@gm...> - 2004-12-31 19:18:25
|
Hi Havoc, On Tuesday 28 December 2004 18:53, Havoc Pennington wrote: > [resend with python script dropped so it's below 40K limit] > > Hi, > > I've made some lame progress on such a thing (attached), trying to > replace my gcov-based hack for D-BUS. What is the problem with gcov? > http://cvs.freedesktop.org/dbus/dbus/test/decode-gcov.c?view=markup > is the current hack. The main purpose of the tool for me is an overall > report for the whole source tree, "make check-coverage" - an example is > in this old post: > https://listman.redhat.com/archives/message-bus-list/2003-April/msg00191.ht >ml > > To implement I have a valgrind tool that collects the data for a > process, and started on a python script that is intended to merge the > valgrind data files for every test in "make check" and generate a > report. Right now it just does gcov-type annotation. > > make check -> numerous coverage.out.NNNN files -> overall report > > I have a couple of questions if anyone has insights. > > 1. Thoughts on how a coverage tool can find out about basic blocks that > don't get translated? I'm not sure how to limit this; obviously it would > be bad to instrument piles of system library code that won't get > executed, but I'd like to be able to analyze all the blocks in the > source tree that "make check-coverage" applies to. So maybe some way to > ask to see blocks based on object file they are in, or source files > referenced in the debuginfo, I don't know. So I could say "for libdbus, > just instrument the whole thing" I did it in my testing code (stagrind, your link below) the following way: When detecting the first basic block executed for a executable/shared lib, go through the whole object (boundaries given via SegInfo data by Valgrinds core), and for any address with debug line info, start instrumenting, with the only purpose to get x86 instruction boundaries (and some info about the type of instruction, e.g. if it is a FLOP). The data for every shared lib is dumped separatly in a cachegrind-like file, giving a 1 for every instruction where debug info exists. As this is static info, this could be done also without valgrind, but the benefit is e.g. to see which libraries are loaded with dlopen. Hmm... Perhaps this is not the best for batch processing. One could integrate this step into the same tool that is gathering the execution profile (like callgrind), and only do it if no up-todate-file already exists: include the creation time and absolute path of the library file. > I also turned up this in the archives: > http://sourceforge.net/mailarchive/message.php?msg_id=6556760 Yup, that's mine. Did you check it out? I'm not sure it is still compiling, as some VG core functions are used which are not in tool.h. > I don't think I really need static info though, just a larger selected > subset of the dynamic info. What is really "dynamic" about the additional data you need? > 2. I'm having problems with fork() not followed by exec() in my test > suite, which causes valgrind to call SK_(fini) once per child. > Once I fix the trivial bug (that the pid is saved in the output filename > at init rather than fini time), there's a harder issue of making it do > something sensible. I guess maybe reset the execution counts on all > blocks and arcs in the child would be logical, to avoid duplicates. Yes. But is this really needed? Isn't it enough to be able to say "executed at least once"? > I'm not sure how to hook into fork() though or if there's prior art on > the best way to do this. (I see a VG_(atfork) feature but not available > to tools?) > > > Anyhow, any comments welcome. In case it isn't obvious I'm mostly > ignorant of assembler and compilers. I don't really have time to > maintain a generic coverage tool but I'd like to spend a few more days > on it and get it to work for my purposes. I saw a couple blog posts > about people doing work in this area so I thought I'd post. Your tool does a lot about gathering conditional jump info. I think that my tool (callgrind) with "--trace-jumps=yes" should give you the same info. I am having a little problem following your code. Why are there cases where you can't detect the jump target? I think I do it the other way round: When the target block of a jump is executed, I know the previous block and the last jmpkind, and sum up the arc info. You dump XML. I use cachegrind's format. I would be interested to either write an import filter for your XML data or a converter to be able to use KCachegrind for visualization. BTW, KCachegrind can show (conditional) jump info in an source/disassembler annotation widget. Loading of multiple files and summing can also be done by KCachegrind. I intend to provide some kind of CLI mode in the future, exporting post-processed data (e.g. after simple summing of multiple files, or after cycle detection). Josef > > Thanks, > Havoc |
|
From: Havoc P. <hp...@re...> - 2005-01-01 00:47:43
|
On Fri, 2004-12-31 at 17:31 +0100, Josef Weidendorfer wrote: > > What is the problem with gcov? > The data it collects is fine for me, but the command line tool doesn't do what I want. So I wrote my own command line tool but it hasn't really worked since gcc 3.2. gcc 3.3 changed the file format and introduced a bug (in fork-without-exec) that was a showstopper for me. gcc 3.4 is supposed to fix that bug (at least my report was closed) but the file format has changed again and I still only support the 3.2 and 3.3 formats. Really the simplest thing for me is probably to update to the gcc 3.4 file format, but I thought I'd try writing a valgrind skin instead. It has the virtue that it doesn't require recompiling with -fprofile-arcs etc. What I'm trying to achieve again is the "make check-coverage" that runs "make check" and writes an overall report for the whole source tree. It also lets you do a gcov-style annotation of any file using the data from the entire "make check" Other than "overall report" support, the other thing my tool does that isn't in gcov is skip basic blocks inside #ifdef DBUS_BUILD_TESTS. That way the coverage percentages in the report don't count all the test failure blocks: if (!foo) fail_test(); /* 100% coverage still would not run this line */ I guess I also had some idea that with the valgrind architecture some more interesting data than gcov supports could be collected, for example the manual suggests recording the values from tests. So you could see whether specific values such as -1, 0, 1, INT_MAX, etc. had been tested. The question I'm really trying to answer is "which additional tests should I write?" so that means I need a view of the whole codebase, not single files, with ability to drill into the single files that have poor coverage. It's important to *not* see the test framework itself, or libraries that aren't part of my codebase; in addition to seeing the entirety of my codebase, if that makes sense. i.e. it wouldn't be helpful if the tool said 1% coverage because I only used 1% of libc. ;-) > Hmm... Perhaps this is not the best for batch processing. One could integrate > this step into the same tool that is gathering the execution profile (like > callgrind), and only do it if no up-todate-file already exists: include the > creation time and absolute path of the library file. I think it would be nicer to automatically read the static data needed whenever you run the coverage tool; having two separate tools would be a little bit clunky. But either one will work of course. > > I also turned up this in the archives: > > http://sourceforge.net/mailarchive/message.php?msg_id=6556760 > > Yup, that's mine. Did you check it out? I'm not sure it is still compiling, as > some VG core functions are used which are not in tool.h. I haven't had a chance to look at it in detail yet, I just found the link while I was composing my earlier mail. I decided I should work on finishing dbus 1.0 instead of a coverage tool for now ;-) > > I don't think I really need static info though, just a larger selected > > subset of the dynamic info. > > What is really "dynamic" about the additional data you need? There is nothing dynamic about it, but what I'm saying is that there's no reason to support collecting the data without running a test binary. i.e. we could have a tool like: read_basic_blocks_data libdbus.so that worked on any object, without an executable. For my purposes it isn't important to have that though, it's fine if when you do: valgrind --tool=coverage ./dbus-test it would just automatically collect the basic blocks data from the interesting objects loaded by the dbus-test process. I think there probably has to be some way to specify which objects are interesting; I don't want a huge file with every basic block in libc, really... > > 2. I'm having problems with fork() not followed by exec() in my test > > suite, which causes valgrind to call SK_(fini) once per child. > > Once I fix the trivial bug (that the pid is saved in the output filename > > at init rather than fini time), there's a harder issue of making it do > > something sensible. I guess maybe reset the execution counts on all > > blocks and arcs in the child would be logical, to avoid duplicates. > > Yes. But is this really needed? Isn't it enough to be able to say "executed > at least once"? It's enough to start, sure. I think it's useful to have execution counts though and eventually details on some of the important variable values that were tested. I think the counts are also helpful in convincing myself that the tests (and coverage tool) are behaving sensibly. With tons of children (as I have in my test suite) another problem is just the number of times the tool prints the summary message at the end. ;-) > Your tool does a lot about gathering conditional jump info. > I think that my tool (callgrind) with "--trace-jumps=yes" should give you the > same info. Yeah. I don't really know what that info is good for yet. I just figured out how to collect it so I coded it while I remembered in case it turns out to be useful. > I am having a little problem following your code. Why are there > cases where you can't detect the jump target? I'm not sure there are. I am just unclear on when the different kinds of register are used. > You dump XML. I use cachegrind's format. > I would be interested to either write an import filter for your XML data or a > converter to be able to use KCachegrind for visualization. I don't really care what the format is, XML just meant that I didn't have to write a parser or figure out a new format. I hate writing parsers. ;-) > Loading of multiple files and summing can also be done by KCachegrind. > I intend to provide some kind of CLI mode in the future, exporting > post-processed data (e.g. after simple summing of multiple files, or after > cycle detection). That would be handy. I was thinking of generating an overall report, with links to HTML-ized versions of each source file. Then you could "make check-coverage" on a tinderbox server and put the results online among other things. Havoc |
|
From: Brad H. <br...@fr...> - 2005-01-01 01:30:44
|
On Sat, 1 Jan 2005 11:48 am, Havoc Pennington wrote: > On Fri, 2004-12-31 at 17:31 +0100, Josef Weidendorfer wrote: > > What is the problem with gcov? <snip> > The question I'm really trying to answer is "which additional tests > should I write?" so that means I need a view of the whole codebase, not > single files, with ability to drill into the single files that have poor > coverage. I think that this is probably the most important question that any coverage= =20 tool can try to answer. Now I don't think that the coverage test can tell m= e=20 the test - the realistic answer I'm hoping to see is big gap where a class= =20 isn't being instantiated, or a routine is never called. It would be cooler if there was some indication that I'm always testing=20 something with the same value (say zero), but given most projects, having=20 every routine tested at all is a 99.9% solution. > It's important to *not* see the test framework itself, or libraries that > aren't part of my codebase; in addition to seeing the entirety of my > codebase, if that makes sense. i.e. it wouldn't be helpful if the tool > said 1% coverage because I only used 1% of libc. ;-) As a slightly different view, I'm interested in being able to selectively=20 choose which dynamic libraries are considered for coverage, and that is whe= re=20 gcov comes apart. Most of my code is in .so plugins (in the tree), so I=20 really care about those. I don't want to consider libqt-mt or libc though... Michael: you blogged=20 (http://michael.ellerman.id.au/index.cgi/2004/12/19#valgrind) about this=20 stuff. What is happening? Brad |