|
From: Christian S. <sti...@tu...> - 2005-10-25 13:58:46
|
Hi developers, some years ago there was a modification patch available that introduced an additional tool called "profgrind". This was, as I understood it, a slightly faster version of cachegrind, as it counted only the instructions but not the reads/writes. Is this profgrind still available for 3.0.1 or at least 2.4.0? It cannot be found anywhere on the web or on your site. I am quite interested in a tool like that because I frequently want to profile only the instructions but not the memory accesses, and any tool that has the smalles possible runtime overhead would be a big help. In case that tool is no more available, would it be difficult to modify cachegrind to remove the memory profiling and creating a comparable tool by myself? Or do you think it won't reduce the runtime overhead that much? As an additionally question, I was wondering whether it were possible to only the floating-point arithmetic instructions separately. That would be a nice verification tool for the profiling of numerical algorithms, where the theory would predict things like "This numerical algorithm will take N*log(N) multiplications and N^2 additions", and if it were possible to verify these numbers in actual implementations then this would be very neat. Is it possible to integrate such different instruction counts in the existing cachegrind infrastructure? How difficult would this be for myself, and where (code region in cachegrind/cg_main.c) should I start to think about such things? Thanks for your tool in any case! Your program really rocks, and my windows colleagues are always envious when I tell them how easy debugging is possible on Linux. Christian Stimming |
|
From: Nicholas N. <nj...@cs...> - 2005-10-25 14:54:06
|
On Tue, 25 Oct 2005, Christian Stimming wrote:
> some years ago there was a modification patch available that introduced an
> additional tool called "profgrind". This was, as I understood it, a slightly
> faster version of cachegrind, as it counted only the instructions but not the
> reads/writes.
>
> Is this profgrind still available for 3.0.1 or at least 2.4.0? It cannot be
> found anywhere on the web or on your site.
I haven't heard of this tool. There have been a couple of attempts at
simpler coverage tools; Benoit Peccante wrote one a while back that he
posted to the list.
> I am quite interested in a tool like that because I frequently want to
> profile only the instructions but not the memory accesses, and any tool that
> has the smalles possible runtime overhead would be a big help. In case that
> tool is no more available, would it be difficult to modify cachegrind to
> remove the memory profiling and creating a comparable tool by myself? Or do
> you think it won't reduce the runtime overhead that much?
It's easy to change Cachegrind to do this. Look for the various "log"
functions, that look like this:
static VG_REGPARM(3)
void log_1I_1Dr_cache_access(InstrInfo* n, Addr data_addr, Word data_size)
{
//VG_(printf)("1I_1Dr: CCaddr=0x%010lx, iaddr=0x%010lx, isize=%lu\n"
// " daddr=0x%010lx, dsize=%lu\n",
// n, n->instr_addr, n->instr_len, data_addr, data_size);
VGP_PUSHCC(VgpCacheSimulate);
cachesim_I1_doref(n->instr_addr, n->instr_len,
&n->parent->Ir.m1, &n->parent->Ir.m2);
n->parent->Ir.a++;
cachesim_D1_doref(data_addr, data_size,
&n->parent->Dr.m1, &n->parent->Dr.m2);
n->parent->Dr.a++;
VGP_POPCC(VgpCacheSimulate);
}
static VG_REGPARM(1)
void log_1I_0D_cache_access(InstrInfo* n)
{
//VG_(printf)("1I_0D : CCaddr=0x%010lx, iaddr=0x%010lx, isize=%lu\n",
// n, n->instr_addr, n->instr_len);
VGP_PUSHCC(VgpCacheSimulate);
cachesim_I1_doref(n->instr_addr, n->instr_len,
&n->parent->Ir.m1, &n->parent->Ir.m2);
n->parent->Ir.a++;
VGP_POPCC(VgpCacheSimulate);
}
Just remove all the cachesim_I1_doref() calls and the "n->parent->Dr.a++"
increments. Leave in the "n->parent->Ir.a++" increments. The result will
be something like this:
static VG_REGPARM(3)
void log_1I_1Dr_cache_access(InstrInfo* n, Addr data_addr, Word data_size)
{
n->parent->Ir.a++;
}
It should run substantially faster. You might get divide-by-zero errors
at the end for the counters that are not being used, but they should be
easy to fix.
> As an additionally question, I was wondering whether it were possible to only
> the floating-point arithmetic instructions separately. That would be a nice
> verification tool for the profiling of numerical algorithms, where the theory
> would predict things like "This numerical algorithm will take N*log(N)
> multiplications and N^2 additions", and if it were possible to verify these
> numbers in actual implementations then this would be very neat. Is it
> possible to integrate such different instruction counts in the existing
> cachegrind infrastructure? How difficult would this be for myself, and where
> (code region in cachegrind/cg_main.c) should I start to think about such
> things?
It would not be a case of modifying Cachegrind so much as writing a
completely new tool. This is quite feasible, but takes some work. The
first thing you should do is think carefully how such a tool would work --
what analysis state (metadata) will it track? How will it use that
metadata? What instrumentation must be added? In my experience there is
a large gap between the initial idea of "a tool that does something like
X" and an actual working implementation. The devil is in the details.
Good luck :)
Nick
|
|
From: Julian S. <js...@ac...> - 2005-10-25 15:02:41
|
> > As an additionally question, I was wondering whether it were possible to > > only the floating-point arithmetic instructions separately. That would be > > a nice verification tool for the profiling of numerical algorithms, where > > the theory would predict things like "This numerical algorithm will take > > N*log(N) multiplications and N^2 additions", and if it were possible to > > verify these numbers in actual implementations then this would be very > > neat. Is it possible to integrate such different instruction counts in > > the existing cachegrind infrastructure? How difficult would this be for > > myself, and where (code region in cachegrind/cg_main.c) should I start to > > think about such things? > > It would not be a case of modifying Cachegrind so much as writing a > completely new tool. This is quite feasible, but takes some work. The > first thing you should do is think carefully how such a tool would work -- > what analysis state (metadata) will it track? How will it use that > metadata? What instrumentation must be added? In my experience there is > a large gap between the initial idea of "a tool that does something like > X" and an actual working implementation. The devil is in the details. Lackey might be a good place to start, if you just want to have a single count of fp adds / fp muls for the whole program. Basically you need to enhance lk_instrument() to find IR statements of the form IRStmt_Tmp( ..., IRExpr_Binop(op, atom1, atom2)) where op is some interesting FP op. You are guaranteed that the args to the op are atoms so you don't need to look recursively inside them. Also you should handle IRExpr_Unop. Reading VEX/pub/libvex_ir.h would be useful. J |
|
From: Christian S. <sti...@tu...> - 2005-10-25 15:21:19
|
Dear Nicholas, wow, that is a quick reaction. Kudos for such an active list :-) Nicholas Nethercote schrieb: >> some years ago there was a modification patch available that >> introduced an additional tool called "profgrind". This was, as I >> understood it, a slightly faster version of cachegrind, as it counted >> only the instructions but not the reads/writes. > > I haven't heard of this tool. There have been a couple of attempts at > simpler coverage tools; Benoit Peccante wrote one a while back that he > posted to the list. I found what I had seen before; it's in bugzilla: http://bugs.kde.org/show_bug.cgi?id=95261 That tool was quite usable. The patch, however, is only for 2.2, and I've contacted the reporter (Paolo Bonzini) who said he didn't continue and/or didn't improve that patch after submitting it to bugzilla. In case I adapt it to 2.4 or 3.0 I might add a new attachment to that report, but I don't know yet whether I am able to finish that. > It's easy to change Cachegrind to do this. Look for the various "log" > functions, that look like this: (...) Thanks for explaining this. Will look at it in details. >> As an additionally question, I was wondering whether it were possible >> to only the floating-point arithmetic instructions separately. > > It would not be a case of modifying Cachegrind so much as writing a > completely new tool. This is quite feasible, but takes some work. The > first thing you should do is think carefully how such a tool would work > -- what analysis state (metadata) will it track? How will it use that > metadata? What instrumentation must be added? In my experience there > is a large gap between the initial idea of "a tool that does something > like X" and an actual working implementation. The devil is in the > details. Good luck :) Ok, thanks for this explanation. I'll come back to this when I have the timeframe available. Christian Stimming |