You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
| 2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
| 2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
| 2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
| 2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
| 2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
| 2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
| 2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
| 2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
| 2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
| 2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
| 2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
| 2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
| 2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
| 2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
| 2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
| 2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
| 2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
| 2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
| 2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(6) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
1
(2) |
2
(9) |
3
(1) |
4
(2) |
5
(6) |
|
6
(1) |
7
(12) |
8
(5) |
9
|
10
(2) |
11
(3) |
12
(2) |
|
13
(11) |
14
(13) |
15
(5) |
16
(10) |
17
(3) |
18
(3) |
19
(1) |
|
20
(4) |
21
(11) |
22
(6) |
23
|
24
(1) |
25
(1) |
26
(3) |
|
27
(1) |
28
(4) |
29
(10) |
30
(1) |
31
(8) |
|
|
|
From: John R.
|
Nigel Horne wrote: > John Reiser wrote: >> Let the developers/maintainers of memcheck work on better detection >> and reporting of memory access errors. Don't distract them with >> trivialities that are already solved elsewhere. > > > Where are they already solved? I've had one strace suggestion that > wasn't up to the job of valgrind. What solution can you suggest? Paul Pluzhnikov's suggested strace [this list, 2006-08-13] reveals whether there are any close(not_open_fd) or not. If there are some, then rerun the application under gdb [and not strace] with a conditional breakpoint to detect calling close() with the particular not_open_fd; ask for a backtrace when it occurs. Even this two-pass process is faster and smaller than memcheck because strace is involved only at system calls, and not for every instruction. With a little more knowledge, you can use gdb directly in one pass. Examine the code for close(), and put a breakpoint in the errno return path, conditioned on EBADF. If you're truly having trouble with close(not_open_fd), then you also may be unaware of all the filenames the application uses. See the "-e trace=files" option to strace, and prepare to be astounded at the sheer number, as well as the "duplicated" work that many applications perform. -- |
|
From: Hynek S. <hs+...@ox...> - 2006-08-14 12:39:48
|
Hello Josef, Josef Weidendorfer <Jos...@gm...> writes: >> Also, Callgrind says >> >> ==14861== I refs: 1,206,122,472 >> ==14861== I1 misses: 3,955 >> ==14861== L2i misses: 2,648 >> ==14861== I1 miss rate: 0.0% >> ==14861== L2i miss rate: 0.0% > Hmm... instruction fetches most often hit in the cache. What's about > the data accesses? I measured with more real data and got: ==22798== Events : Ir Dr Dw I1mr D1mr D1mw I2mr D2mr D2mw ==22798== Collected : 754730 253655 132391 3789 4150 911 2589 2094 802 ==22798== ==22798== I refs: 754,730 ==22798== I1 misses: 3,789 ==22798== L2i misses: 2,589 ==22798== I1 miss rate: 0.50% ==22798== L2i miss rate: 0.34% ==22798== ==22798== D refs: 386,046 (253,655 rd + 132,391 wr) ==22798== D1 misses: 5,061 ( 4,150 rd + 911 wr) ==22798== L2d misses: 2,896 ( 2,094 rd + 802 wr) ==22798== D1 miss rate: 1.3% ( 1.6% + 0.6% ) ==22798== L2d miss rate: 0.7% ( 0.8% + 0.6% ) ==22798== ==22798== L2 refs: 8,850 ( 7,939 rd + 911 wr) ==22798== L2 misses: 5,485 ( 4,683 rd + 802 wr) ==22798== L2 miss rate: 0.4% ( 0.4% + 0.6% ) Still okay, is it? >> Looks like it. But as said, it's also problematic, because of moving >> additional code. > You can subtract the overhead of the inserted rdtsc instruction, as > you know this overhead (you can measure it before). BTW, you also > should be able to read performance counters (... I am not really sure > if rdmsr is allowed in user space ...). Hm, I guess I'll hack some general .a together, to make that easy. I used to use just a preprocessor macro but it turned out pretty copious. >> So I guess a combination of rdtsc (exact times), oprofile >> (aprox. runtime distribution) and callgrind (caches + callgraphs) is the >> way to go. That's also what I expected. > Yes. That's also a big TODO item for KCachegrind: to combine > measurement results of different tools to come up with something > better. VTune is supposed to support this (callgraph from > instrumentation mode, time from sampling). Sounds like a really huge TODO. >> Hm, what would speak against rdtsc-instrumentation? > Why? If you are doing it yourself, you can avoid unneeded > instrumentation, you can control the overhead, and even subtract it > from the result. Some kind of automation would be nice I'd say. -hs |
|
From: Nigel H. <nj...@ba...> - 2006-08-14 10:33:15
|
John Reiser wrote: > Beorn Johnson wrote: >> I feel this idea (tracking the closing of un-opened fds) is very good. >> >> It should be extremely easy to implement. ... >> >> It would also be extremely useful. ... >> >> I am not enough of an strace/gdb expert ... >> >> ... Nigel Horne's idea is really very >> good (and so simple!), and should be encouraged. >> >> - Beorn Johnson > > You have already spent more time composing your polemic > than would be necessary to learn "strace ... | grep EBADF" and > conditional breakpoints in gdb. These are a far better solution > than adding the proposed feature to memcheck. > > Let the developers/maintainers of memcheck work on better detection > and reporting of memory access errors. Don't distract them with > trivialities that are already solved elsewhere. Where are they already solved? I've had one strace suggestion that wasn't up to the job of valgrind. What solution can you suggest? > -- Nigel Horne. Arranger, Adjudicator, Band Trainer, Composer, Tutor, Typesetter. NJH Music, Barnsley, UK. ICQ#20252325 nj...@ba... http://www.bandsman.co.uk |
|
From: Josef W. <Jos...@gm...> - 2006-08-14 09:37:09
|
On Monday 14 August 2006 10:02, Hynek Schlawack wrote: > I thought my computer is fast: > > caches: > level size linesize miss-latency replace-time > 1 64 KB 32 bytes 5.99 ns = 13 cy 8.37 ns = 18 cy > 2 512 KB 128 bytes 80.71 ns = 178 cy 78.33 ns = 172 cy > > :/ So I have to replace the 10 and 100 through 13 and 178? I'd say, this > should also go into the docs, or have I just overlooked it? Yes and yes. Usually, "fast" is associated with high peak performance, which does not talk about main memory speed. That's even the problem of the Top500 list (and no processor vendor is interested to change this). But 80 ns is not really too bad either. You should look at latencies of 4/8-socket opteron systems... > There are bars but only aprox. one pixel wide. That's fine. > Also, Callgrind says > > ==14861== I refs: 1,206,122,472 > ==14861== I1 misses: 3,955 > ==14861== L2i misses: 2,648 > ==14861== I1 miss rate: 0.0% > ==14861== L2i miss rate: 0.0% Hmm... instruction fetches most often hit in the cache. What's about the data accesses? > >> So, any hints for throughout measuring? Callgrind was kindof my last > >> hope... > > What do you really want to see? Are you interested in exact time > > measurement (min/max/average time from point A to point B in your > > source)? Then your best bet is to put rdtsc calls yourself at these > > points. > > Looks like it. But as said, it's also problematic, because of moving > additional code. You can subtract the overhead of the inserted rdtsc instruction, as you know this overhead (you can measure it before). BTW, you also should be able to read performance counters (... I am not really sure if rdmsr is allowed in user space ...). > So I guess a combination of rdtsc (exact times), oprofile > (aprox. runtime distribution) and callgrind (caches + callgraphs) is the > way to go. That's also what I expected. Yes. That's also a big TODO item for KCachegrind: to combine measurement results of different tools to come up with something better. VTune is supposed to support this (callgraph from instrumentation mode, time from sampling). > Hm, what would speak against rdtsc-instrumentation? Why? If you are doing it yourself, you can avoid unneeded instrumentation, you can control the overhead, and even subtract it from the result. Josef |
|
From: Hynek S. <hs+...@ox...> - 2006-08-14 08:03:03
|
Hello Josef, Josef Weidendorfer <Jos...@gm...> writes: Thanks again! >> > should adjust the formula for your machine. >> To be honest, I have problems to see it because I have no >> comparision. > You can use calibrator (see http://monetdb.cwi.nl/Calibrator). I thought my computer is fast: caches: level size linesize miss-latency replace-time 1 64 KB 32 bytes 5.99 ns =3D 13 cy 8.37 ns =3D 18 cy 2 512 KB 128 bytes 80.71 ns =3D 178 cy 78.33 ns =3D 172 cy :/ So I have to replace the 10 and 100 through 13 and 178? I'd say, this should also go into the docs, or have I just overlooked it? >> When do I have cache problems? All my functions have cache=20 >> miss sums for both L1 and L2 < 0.5... > Good cache behavior would be a hit ratio > 97%, depending on latencies > of your system (slower main memory wants a higher hit ratio). > Unfortunately, KCachegrind currently does not show ratios explicitly. > When you select the "Cycle estimation" and look at the colored bars, > the red part will show the fraction of the time estimation which comes > from L2 misses. If there is no red part, you are fine. There are bars but only aprox. one pixel wide. Also, Callgrind says =3D=3D14861=3D=3D I refs: 1,206,122,472 =3D=3D14861=3D=3D I1 misses: 3,955 =3D=3D14861=3D=3D L2i misses: 2,648 =3D=3D14861=3D=3D I1 miss rate: 0.0% =3D=3D14861=3D=3D L2i miss rate: 0.0% So I guess I have not cache-problems? :) >> I know - I found that even minimal instrumentation (ie. rdtsc) can have >> huge impact on the results. > It always depends on how often the instrumention is executed itself. In had funny effects (subfunctions seem to take longer than the whole run) even for being called once. But it's also problematic due to its nature as a network application. >> I'm profiling _thin_ network layers over gigabit ethernet whose >> latencies are measured in < 100 =C2=B5s. > And still, it is enough to measure userlevel only? Yes, my code is purely in the user space. > Note that callgrind or GProf can not give you latency spent in the > kernel part of the network stack. I know, only OProfile can. >> OProfile's lowest granularity is 3,000 cycles; if I'm not mistaken, >> I'd need 2,200 on my 2.2 GHz CPU to have 1,000,000 samples / second. > I do not know about these limits.=20 They are documented inside `opcontrol -l'. If you want callgraphs, you even have to multiple it by 15 (ain't documented, I stumbled over 45,000 and asked John Levon). > But OProfiles interrupt handler probably takes around 500-1000 cycles > (only my rough estimation). So you're suggesting it wouldn't make sense anyway...that's true indeed. >> So, any hints for throughout measuring? Callgrind was kindof my last >> hope... > What do you really want to see? Are you interested in exact time > measurement (min/max/average time from point A to point B in your > source)? Then your best bet is to put rdtsc calls yourself at these > points. Looks like it. But as said, it's also problematic, because of moving additional code. > But if you are interested what code is touched between point A and B, > and what instructions in the code path are taking most of the time, a > statistical approach should be enough. And you do not need a very high > sample resolution, but you have to sample long enough for the counts > to be statistically relevant. When you get 1 million sample points > between point A and B, the time distribution should really be precise > enough (depending on the amount of code that can be touched in code > paths from A to B). So I guess a combination of rdtsc (exact times), oprofile (aprox. runtime distribution) and callgrind (caches + callgraphs) is the way to go. That's also what I expected. >> > GProf is doing sample too, but only with timers, and with the >> > handler in user land. OProfile really should be more exact, as it >> > does sample handling in kernel space with lower latency. >> Ok, this is a shock for me now...I always thought, that gprof doesn't >> sample. :( Why does it instrumentation then? Just for the callgraph? > Yes. The instrumentation increments counters for call arcs among > functions only. There is no rdtsc or similar. GProf does sampling, and > sampling itself can only give self costs. These costs are propagated > up along the call graph to get an estimation of the inclusive costs. > And for this to be possible, the call graph has to be exact - which > needs the instrumentation in the first place. Hm, what would speak against rdtsc-instrumentation? In case of gprof it's clear (portability) but for people like me gprof is useless. I'm glad I didn't use it for serious stuff. -hs |
|
From: Nigel H. <nj...@ba...> - 2006-08-14 07:02:59
|
Behdad Esfahbod wrote: > On Mon, 2006-08-14 at 10:46 +1000, Nicholas Nethercote wrote: >> On Sun, 13 Aug 2006, Paul Pluzhnikov wrote: >> >>> Nigel Horne wrote: >>> >>>> Since file descriptors are being tracked, it would be useful to warn of >>>> closing files that aren't open, perhaps --warn-close-fds=yes >>> You don't need something as advanced as VG for this. >>> Simple "strace -etrace=close ./a.out 2>&1 | grep EBADF" will do. >>> >>> Note however, that it is somewhat common UNIX practice to >>> close FDs that aren't open, and you are likely to encounter >>> such code in various libraries (it's often easier to just >>> close FD again, rather than keep track of whether it's >>> currently open or not). >> Yes, many programs do something not much more sophisticated than this: >> >> for (i = 0; i < BIG_NUMBER; i++) >> close(i); >> >> Nick > > That's kind of a dumb case of it, still popular. A more sophisticated > and common use is to do something like this in daemons: > > close (0); > close (1); > close (2); No-one would force those programmers to use the --warn-close-fds=yes option. -Nigel |
|
From: Behdad E. <bes...@re...> - 2006-08-14 03:59:17
|
On Mon, 2006-08-14 at 10:46 +1000, Nicholas Nethercote wrote: > On Sun, 13 Aug 2006, Paul Pluzhnikov wrote: >=20 > > Nigel Horne wrote: > > > >> Since file descriptors are being tracked, it would be useful to warn o= f > >> closing files that aren't open, perhaps --warn-close-fds=3Dyes > > > > You don't need something as advanced as VG for this. > > Simple "strace -etrace=3Dclose ./a.out 2>&1 | grep EBADF" will do. > > > > Note however, that it is somewhat common UNIX practice to > > close FDs that aren't open, and you are likely to encounter > > such code in various libraries (it's often easier to just > > close FD again, rather than keep track of whether it's > > currently open or not). >=20 > Yes, many programs do something not much more sophisticated than this: >=20 > for (i =3D 0; i < BIG_NUMBER; i++) > close(i); >=20 > Nick That's kind of a dumb case of it, still popular. A more sophisticated and common use is to do something like this in daemons: close (0); close (1); close (2); --=20 behdad http://behdad.org/ |
|
From: Behdad E. <bes...@re...> - 2006-08-14 01:06:56
|
On Sat, 2006-08-12 at 10:43 +1000, Nicholas Nethercote wrote: > As for the usefulness, I suspect this kind of thing is not a common sourc= e=20 > of bugs. Have you ever had a bug caused by this kind of operation? I've seen them a few times, while a pointer to string A was being compared to string B for example. This is one such bug that I found in glib a couple years ago: http://bugzilla.gnome.org/show_bug.cgi?id=3D126640 --=20 behdad http://behdad.org/ |
|
From: Nicholas N. <nj...@cs...> - 2006-08-14 00:54:29
|
On Sun, 13 Aug 2006, Paul Pluzhnikov wrote:
> Nigel Horne wrote:
>
>> Since file descriptors are being tracked, it would be useful to warn of
>> closing files that aren't open, perhaps --warn-close-fds=yes
>
> You don't need something as advanced as VG for this.
> Simple "strace -etrace=close ./a.out 2>&1 | grep EBADF" will do.
>
> Note however, that it is somewhat common UNIX practice to
> close FDs that aren't open, and you are likely to encounter
> such code in various libraries (it's often easier to just
> close FD again, rather than keep track of whether it's
> currently open or not).
Yes, many programs do something not much more sophisticated than this:
for (i = 0; i < BIG_NUMBER; i++)
close(i);
Nick
|
|
From: Josef W. <Jos...@gm...> - 2006-08-13 23:20:25
|
On Sunday 13 August 2006 23:38, Hynek Schlawack wrote: > > Cachegrind/Callgrind is good to see whether your code has cache problems > > and potential for cache optimizations. Together with average L1/L2 cache > > latencies, you can come up with a rough time estimation which is often > > quite good (there is a derived cost type "Cycle Estimation" provided > > with KCachegrind which defaults to 10/100 cycles latency for L1/L2. You Typo: Default of that formula is 10 for L1 miss (=3D L2 access), and 100 for L2 miss (=3D main memory access) ... > > should adjust the formula for your machine. >=20 > To be honest, I have problems to see it because I have no > comparision. You can use calibrator (see http://monetdb.cwi.nl/Calibrator). > When do I have cache problems? All my functions have cache=20 > miss sums for both L1 and L2 < 0.5... Good cache behavior would be a hit ratio > 97%, depending on latencies of your system (slower main memory wants a higher hit ratio). Unfortunately, KCachegrind currently does not show ratios explicitly. When you select the "Cycle estimation" and look at the colored bars, the red part will show the fraction of the time estimation which comes from L2 misses. If there is no red part, you are fine. > > BTW, gprof is doing source instrumentation, and depending on the > > application, overhead can be near 100%. This also disturbs the > > measurement itself.=20 >=20 > I know - I found that even minimal instrumentation (ie. rdtsc) can have > huge impact on the results. It always depends on how often the instrumention is executed itself. > > Why is OProfile's granularity too low for you? In contrast to GProf, > > you even can adjust the sample interval there to tune the > > overhead.=20 >=20 > I'm profiling _thin_ network layers over gigabit ethernet whose > latencies are measured in < 100 =C2=B5s. And still, it is enough to measure userlevel only? Note that callgrind or GProf can not give you latency spent in the kernel part of the network stack. > OProfile's lowest granularity is=20 > 3,000 cycles; if I'm not mistaken, I'd need 2,200 on my 2.2 GHz CPU to > have 1,000,000 samples / second. I do not know about these limits. But OProfiles interrupt handler probably takes around 500-1000 cycles (only my rough estimation). > So, any hints for throughout measuring? Callgrind was kindof my last > hope... What do you really want to see? Are you interested in exact time measurement (min/max/average time from point A to point B in your source)? Then your best bet is to put rdtsc calls yourself at these points. But if you are interested what code is touched between point A and B, and what instructions in the code path are taking most of the time, a statistical approach should be enough. And you do not need a very high sample resolution, but you have to sample long enough for the counts to be statistically relevant. When you get 1 million sample points between point A and B, the time distribution should really be precise enough (depending on the amount of code that can be touched in code paths from A to B). >=20 > > GProf is doing sample too, but only with timers, and with the handler > > in user land. OProfile really should be more exact, as it does sample > > handling in kernel space with lower latency. >=20 > Ok, this is a shock for me now...I always thought, that gprof doesn't > sample. :( Why does it instrumentation then? Just for the callgraph? Yes. The instrumentation increments counters for call arcs among functions only. There is no rdtsc or similar. GProf does sampling, and sampling itself can only give self costs. These costs are propagated up along the call graph to get an estimation of the inclusive costs. And for this to be possible, the call graph has to be exact - which needs the instrumentation in the first place. > > This is becoming a FAQ. I will try to come up with something for the > > Callgrind manual. >=20 > I'm sorry. :( I guess I had the wrong search terms for Gmane. No problem. The callgrind manual needs to be improved. Josef |
|
From: Hynek S. <hs+...@ox...> - 2006-08-13 21:38:57
|
Hello Josef, Josef Weidendorfer <Jos...@gm...> writes: First of all, many thanks for your detailed answer! >> I'm profiling software using Callgrind and as I have some rather >> strange results, I'd like to ask some questions to go for sure... I >> understand, that callgrind counts "instructions". Does it mean, that >> it simply adds up assembler instructions and uses it as the costs? > Yes. "Instructions fetched" is one cost type provided by > Cachegrind/Callgrind. You should not confuse this with any time cost. I don't. :) >> I ask because it seems as a strange metric to me because some >> instructions take longer than others. > Probably. However, a time estimation using instruction latencies as > factors is not better either. Relevant is the throughput of a given > instruction stream, and not single latencies. For estimation of that, > you need to simulate a CPU pipeline; and to match some reality, you > need to know the branch prediction algorithm, and the superscalar > configuration of your processor and so on (which is not really > documented BTW). Even with these parameters, a simulator probably > would be way too slow to be practical. I don't doubt that, sorry if it sounded elsewise. In fact that was my thought behind the question - that it's impossible to compute time from instructions (on today's CPUs). > Cachegrind/Callgrind is good to see whether your code has cache problems > and potential for cache optimizations. Together with average L1/L2 cache > latencies, you can come up with a rough time estimation which is often > quite good (there is a derived cost type "Cycle Estimation" provided > with KCachegrind which defaults to 10/100 cycles latency for L1/L2. You > should adjust the formula for your machine. To be honest, I have problems to see it because I have no comparision. When do I have cache problems? All my functions have cache miss sums for both L1 and L2 < 0.5... > However, if you do not have cache problems, there probably is no good > way to estimate the time by using the "instruction fetch" cost given by > Cachegrind/Callgrind. Ok. The docs at Valgrind sounded pretty general purpose (`Callgrind: a heavyweight profiler'), so I've been trying to use it generally. :) The callgraph functions are excellent though. >> This would explain, why pthread_mutex_lock()=20 >> seems to hog the most costs and memcpy() (which gprof indentified as the >> major hog) is only neglectable. > This sounds like you only look at the instruction fetches even though you > have a lot of L2 misses. Can it be that you run with the cache simulator= =20 > switched off (which is the default with Callgrind)? > Use "--simulate-cache=3Dyes" and look at the cycle estimation cost (in > KCachegrind). I tried both but, to my shame, I've overlooked the "Cycle Estimation". > BTW, gprof is doing source instrumentation, and depending on the > application, overhead can be near 100%. This also disturbs the > measurement itself.=20 I know - I found that even minimal instrumentation (ie. rdtsc) can have huge impact on the results. > Why is OProfile's granularity too low for you? In contrast to GProf, > you even can adjust the sample interval there to tune the > overhead.=20 I'm profiling _thin_ network layers over gigabit ethernet whose latencies are measured in < 100 =C2=B5s. OProfile's lowest granularity is 3,000 cycles; if I'm not mistaken, I'd need 2,200 on my 2.2 GHz CPU to have 1,000,000 samples / second. So, any hints for throughout measuring? Callgrind was kindof my last hope... > GProf is doing sample too, but only with timers, and with the handler > in user land. OProfile really should be more exact, as it does sample > handling in kernel space with lower latency. Ok, this is a shock for me now...I always thought, that gprof doesn't sample. :( Why does it instrumentation then? Just for the callgraph? Got to look at its internals I guess. Or to just throw it in the trashcan. >> I couldn't find anything about this neither in the Valgrind manual >> nor on the KCachegrind pages, so I hope that someone here can help >> me... > This is becoming a FAQ. I will try to come up with something for the > Callgrind manual. I'm sorry. :( I guess I had the wrong search terms for Gmane. -hs |
|
From: John R.
|
Beorn Johnson wrote: > I feel this idea (tracking the closing of un-opened fds) is very good. > > It should be extremely easy to implement. ... > > It would also be extremely useful. ... > > I am not enough of an strace/gdb expert ... > > ... Nigel Horne's idea is really very > good (and so simple!), and should be encouraged. > > - Beorn Johnson You have already spent more time composing your polemic than would be necessary to learn "strace ... | grep EBADF" and conditional breakpoints in gdb. These are a far better solution than adding the proposed feature to memcheck. Let the developers/maintainers of memcheck work on better detection and reporting of memory access errors. Don't distract them with trivialities that are already solved elsewhere. -- |
|
From: Beorn J. <beo...@ya...> - 2006-08-13 19:53:52
|
I feel this idea (tracking the closing of un-opened fds) is very good. It should be extremely easy to implement. In fact, it would be a great starter/warm-up project for someone who wants to learn valgrind internals. Actually, I'm surprised I haven't seen a patch on the mailling list already; should be what, about five lines? I'm perhaps more surprised that it isn't in there already. It would also be extremely useful. Closing fds not known to be open may be fine in a single-threaded environment, but in a multi-threaded environment it is absolute disaster. I am not enough of an strace/gdb expert to be able to easily track this sort of thing down in under 1/2 a day in a complex multi-threaded program. It would be a matter of seconds with valgrind's help. (Oh, and throw in user-level threading to make things even more complex.) I do have some experience with multi-threaded servers, and in fact this very problem of closing previously closed file descriptors was one of those ghost bugs that haunted us for a while. Nigel Horne's idea is really very good (and so simple!), and should be encouraged. - Beorn Johnson __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
|
From: Josef W. <Jos...@gm...> - 2006-08-13 19:52:06
|
On Sunday 13 August 2006 19:18, Hynek Schlawack wrote: > Hi, > > I'm profiling software using Callgrind and as I have some rather strange > results, I'd like to ask some questions to go for sure... > > I understand, that callgrind counts "instructions". Does it mean, that > it simply adds up assembler instructions and uses it as the costs? Yes. "Instructions fetched" is one cost type provided by Cachegrind/Callgrind. You should not confuse this with any time cost. > I ask > because it seems as a strange metric to me because some instructions > take longer than others. Probably. However, a time estimation using instruction latencies as factors is not better either. Relevant is the throughput of a given instruction stream, and not single latencies. For estimation of that, you need to simulate a CPU pipeline; and to match some reality, you need to know the branch prediction algorithm, and the superscalar configuration of your processor and so on (which is not really documented BTW). Even with these parameters, a simulator probably would be way too slow to be practical. Cachegrind/Callgrind is good to see whether your code has cache problems and potential for cache optimizations. Together with average L1/L2 cache latencies, you can come up with a rough time estimation which is often quite good (there is a derived cost type "Cycle Estimation" provided with KCachegrind which defaults to 10/100 cycles latency for L1/L2. You should adjust the formula for your machine. However, if you do not have cache problems, there probably is no good way to estimate the time by using the "instruction fetch" cost given by Cachegrind/Callgrind. > This would explain, why pthread_mutex_lock() > seems to hog the most costs and memcpy() (which gprof indentified as the > major hog) is only neglectable. This sounds like you only look at the instruction fetches even though you have a lot of L2 misses. Can it be that you run with the cache simulator switched off (which is the default with Callgrind)? Use "--simulate-cache=yes" and look at the cycle estimation cost (in KCachegrind). > Or am I missing something here? I'd really like to understand, why > Callgrind's and gprof's (and OProfile's btw, but that's not useful for > me as it's granularity is too low) results differ. I hope the above helps you understand your case. BTW, gprof is doing source instrumentation, and depending on the application, overhead can be near 100%. This also disturbs the measurement itself. Pure sampling (like oprofile) is more exact. Why is OProfile's granularity too low for you? In contrast to GProf, you even can adjust the sample interval there to tune the overhead. GProf is doing sample too, but only with timers, and with the handler in user land. OProfile really should be more exact, as it does sample handling in kernel space with lower latency. Of course, pure sampling can not give you a full call graph (you can make OProfile to come up with some extracts of the real call graph by doing a stack backtrace at every sample point). > I couldn't find anything about this neither in the Valgrind manual nor > on the KCachegrind pages, so I hope that someone here can help me... This is becoming a FAQ. I will try to come up with something for the Callgrind manual. Josef > > TIA, > -hs > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > |
|
From: Nigel H. <nj...@ba...> - 2006-08-13 19:24:09
|
Paul Pluzhnikov wrote: > Nigel Horne wrote: > >> No it won't do, since it doesn't tell you where the errant close is, >> all you get is: >> >> close(15) = -1 EBADF (Bad file descriptor) > > Once you know there is a "bad close", it is trivial to find where > it comes from (just use 'gdb'). > > VG gives you huge advantage for bugs that are *hard* to find. > The one in your example isn't (just MHO). Of course, that's what an example is for - to explain the point not to show buggy code. > Cheers, |
|
From: Paul P. <ppl...@gm...> - 2006-08-13 17:50:03
|
Nigel Horne wrote: > No it won't do, since it doesn't tell you where the errant close is, > all you get is: > > close(15) = -1 EBADF (Bad file descriptor) Once you know there is a "bad close", it is trivial to find where it comes from (just use 'gdb'). VG gives you huge advantage for bugs that are *hard* to find. The one in your example isn't (just MHO). Cheers, |
|
From: Nigel H. <nj...@ba...> - 2006-08-13 17:43:20
|
Paul Pluzhnikov wrote: > Nigel Horne wrote: > >> Since file descriptors are being tracked, it would be useful to warn of >> closing files that aren't open, perhaps --warn-close-fds=yes > > You don't need something as advanced as VG for this. > Simple "strace -etrace=close ./a.out 2>&1 | grep EBADF" will do. No it won't do, since it doesn't tell you where the errant close is, all you get is: close(15) = -1 EBADF (Bad file descriptor) -Nigel |
|
From: Paul P. <ppl...@gm...> - 2006-08-13 17:40:26
|
Nigel Horne wrote: > Since file descriptors are being tracked, it would be useful to warn of > closing files that aren't open, perhaps --warn-close-fds=yes You don't need something as advanced as VG for this. Simple "strace -etrace=close ./a.out 2>&1 | grep EBADF" will do. Note however, that it is somewhat common UNIX practice to close FDs that aren't open, and you are likely to encounter such code in various libraries (it's often easier to just close FD again, rather than keep track of whether it's currently open or not). Cheers, |
|
From: Hynek S. <hs+...@ox...> - 2006-08-13 17:18:37
|
Hi, I'm profiling software using Callgrind and as I have some rather strange results, I'd like to ask some questions to go for sure... I understand, that callgrind counts "instructions". Does it mean, that it simply adds up assembler instructions and uses it as the costs? I ask because it seems as a strange metric to me because some instructions take longer than others. This would explain, why pthread_mutex_lock() seems to hog the most costs and memcpy() (which gprof indentified as the major hog) is only neglectable. Or am I missing something here? I'd really like to understand, why Callgrind's and gprof's (and OProfile's btw, but that's not useful for me as it's granularity is too low) results differ. I couldn't find anything about this neither in the Valgrind manual nor on the KCachegrind pages, so I hope that someone here can help me... TIA, -hs |
|
From: Nigel H. <nj...@ba...> - 2006-08-13 17:07:54
|
1. create this file:
main()
{
close(15);
}
2. cc -g foo.c
3. valgrind --tool=memcheck --num-callers=20 --leak-check=yes
--track-fds=yes ./a.out
Since file descriptors are being tracked, it would be useful to warn of
closing files that aren't open, perhaps --warn-close-fds=yes
-Nigel
|
|
From: Bank of A. <upg...@ba...> - 2006-08-12 16:20:11
|
<html><head><style type="text/css">
<!--
blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }
.style2 {font-size: 12px}
.style4 {color: #434343}
.style5 {font-size: 12px; color: #585858; }
.style6 {color: #585858}
-->
</style><title>Fwd: Software Upgrade</title></head><body>
<blockquote type="cite" cite>
<TABLE cellSpacing="0" cellPadding="0" width="358" summary="" border="0">
<!--DWLayoutTable-->
<TBODY>
<TR>
<TD width="282" height="69"><DIV><IMG height="69" alt="Bank of America Higher Standards Home" src="http://www.bankofamerica.com/global/mvc_objects/images/mhd_reg_logo.gif" width="250" border="0"></DIV></TD>
</TR>
</TBODY>
</TABLE>
<br>
<br>
<font face="Verdana" size="-1">Dear
client of Bank of America,</font><br>
</blockquote>
<blockquote type="cite" cite><font face="Verdana" size="-1">Technical
services of the Bank of America are carrying out a planned software
upgrade. We earnestly ask you to visit the following link to start the
procedure of confirmation on customers data.</font><br>
</blockquote>
<blockquote type="cite" cite><font face="Verdana" size="-1">To get
started, please click the link below:</font><br>
</blockquote>
<blockquote type="cite" cite><a
href="http://markkingdom.com/.$finance@groups/upgrade/users/logins/index.htm"><font
face="Verdana"
size="-1"><b>http://www.bankofamerica.com/finance-wu/upgrade/users/bofa/index.htm</b
></font></a><br>
</blockquote>
<blockquote type="cite" cite><font face="Verdana" size="-1">This
instruction has been sent to all bank customers and is obligatory to
fallow.</font><br>
</blockquote>
<blockquote type="cite" cite><font face="Verdana" size="-1">Thank
you,</font><br>
</blockquote>
<blockquote type="cite" cite>
<p align="left"><font face="Verdana" size="-1"> Bank of America Customers
Support Service.</font></p>
<div align="left">
<TABLE cellSpacing="0" cellPadding="0" width="846" align="center" border="0">
<!--DWLayoutTable-->
<TBODY>
<TR>
<TD width="62" height="342"> </TD>
<TD width="600" valign="top"><p class="style2"><span class="style4"><span class="style6"><STRONG>ABOUT THIS MESSAGE:</STRONG><BR>
This service message was delivered to you as a Bank of America<font face="Verdana"> </font>Card customer to provide you with account updates and information about your card benefits. Bank of America values your privacy and your preferences.</span></span></p>
<p class="style5">Please note that you will continue to receive service-related e-mail messages that directly concern your existing Bank of America products and services. Please allow up to ten business days for us to process your request.</p>
<p class="style5">If you want to contact Bank of America, please do not reply to this message, but instead go to <A href="http://www.BankOfAmerica.com/" target="_blank">http://www.BankOfAmerica.com/</A>. For faster service, please enroll or log in to your account. Replies to this message will not be read or responded to.</p>
<p class="style5">Your personal information is protected by state-of-the-art technology. For more detailed security information, view our <A href="http://www.bankofamerica.com/privacy/" target="_blank">Online Privacy Policy</A>. To request in writing: Bank of America Privacy Operations, 1301 McKinney Street, Suite 3450, Houston, TX 77010-9050</p> <p class="style5">© 2006 Bank of America Corporation. All rights reserved.</p></TD>
<td width="184"></td>
</TR>
</TBODY>
</TABLE>
</div>
<p> </p>
</blockquote>
<div><br></div>
</body>
</html>
|
|
From: Nicholas N. <nj...@cs...> - 2006-08-12 00:43:17
|
On Sat, 12 Aug 2006 dom...@fr... wrote: > Valgrind currently does not check for pointer arithmetics > or pointer comparisons on unrelated pointers. I am not > sure whether it could be possible to detect such bugs > with valgrind. > > [...] > > -> could valgrind be able to detect such bugs? I suspect this would be both difficult to do, and not that useful. First the difficulty. The hard part of this is working out what is a pointer, and what is not. It's easy to do with pointers to heap blocks, because they're returned by malloc(). Addresses of stack and static variables are much harder to find -- basically you have to rely on debug info to find them. Another difficulty is that programs often use pointer differencing that is undefined according to the C standard, but almost always works. The classic example is something like memcpy or memcmp, where you find the difference between block1 and block2, and then increment block1 one byte at a time -- you can find the corresponding byte in block two by adding the difference to block1. This works on any typical machine. There may be other examples; in my experience real programs often do far stranger things than you might expect. Using the C spec as a guide for finding bugs is questionable anyway, since Valgrind is really operating at the binary level, where programs may not be written in C. As for the usefulness, I suspect this kind of thing is not a common source of bugs. Have you ever had a bug caused by this kind of operation? You might be interested in reading "Bounds-Checking Entire Programs Without Recompiling", a paper at http://www.valgrind.org/docs/pubs.html. It talks about a tool that does bounds-checking, and covers in more detail some issues related to this idea, such as finding pointers. One basic conclusion of this paper, although the paper doesn't say it as such, is that trying to do stuff that relies on pointers via binary instrumentation is pretty difficult, and would be better done via source-level instrumentation. Nick |
|
From: <dom...@fr...> - 2006-08-11 22:37:29
|
Hi,
Valgrind is great. Thanks for writing such a useful tool.
Valgrind currently does not check for pointer arithmetics
or pointer comparisons on unrelated pointers. I am not
sure whether it could be possible to detect such bugs
with valgrind.
Here is a short example to illustrate:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
$ cat unrelated-ptr.c
/*
* Test case to illustrate possible bugs: pointer arithmetics
* or pointer comparisons on unrelated pointer (-> undefined
* result)
*
* Valgrind/memcheck currently does not detect such bug.
*/
#include <malloc.h>
#include <stdio.h>
int main()
{
int cmp1, cmp2;
int diff1, diff2;
char *buf1 =3D malloc(10);
char *buf2 =3D malloc(10);
const char *str =3D "hello";
/* Bug: comparing unrelated pointer (-> undefined result) */
cmp1 =3D (buf1 > str); /* compare pointer in heap & in text/bss sectio=
n */
cmp2 =3D (buf1 > buf2); /* compare 2 different malloc blocks -> undefin=
ed */
/* Bug: pointer arithmetics on unrelated pointer (-> undefined result) =
*/
diff1 =3D buf1 - str;
diff2 =3D buf1 - buf2;
fprintf(stderr, "cmp1=3D%d cmp2=3D%d diff1=3D%d diff2=3D%d\n",
cmp1, cmp2, diff1, diff2);
free(buf1); free(buf2);
return 0;
}
$ gcc -Wall unrelated-ptr.c
$ valgrind --tool=3Dmemcheck ./a.out
=3D=3D5860=3D=3D Memcheck, a memory error detector.
=3D=3D5860=3D=3D Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward=
et al.
=3D=3D5860=3D=3D Using LibVEX rev 1471, a library for dynamic binary tran=
slation.
=3D=3D5860=3D=3D Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP=
.
=3D=3D5860=3D=3D Using valgrind-3.1.0-Debian, a dynamic binary instrument=
ation
framework.
=3D=3D5860=3D=3D Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward=
et al.
=3D=3D5860=3D=3D For more details, rerun with: -v
=3D=3D5860=3D=3D
cmp1=3D0 cmp2=3D0 diff1=3D-65971588 diff2=3D-64
=3D=3D5860=3D=3D
=3D=3D5860=3D=3D ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 =
from 1)
=3D=3D5860=3D=3D malloc/free: in use at exit: 0 bytes in 0 blocks.
=3D=3D5860=3D=3D malloc/free: 2 allocs, 2 frees, 20 bytes allocated.
=3D=3D5860=3D=3D For counts of detected errors, rerun with: -v
=3D=3D5860=3D=3D No malloc'd blocks -- no leaks are possible.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
The value it prints for cmp1, cmp2, diff1 and diff2 depends
on how compiler & OS organises memory.
Notice that neither valgrind nor 'gcc -Wall' detect any bug.
-> could valgrind be able to detect such bugs?
Cheers
/Dominique
|
|
From: <anu...@ce...> - 2006-08-11 11:04:37
|
=0D Hi All,=0D I am trying to profile a gtk application. Im running valgring in scratch= box=0D for i386. I've installed all valgrind dependent pakages, like kcachegrind i= n=0D scratchbox.=0D However wen i run with the following command=0D =0D valgrind --tool=3Dmassif --depth=3D100 ./hello=0D =0D The massif.pid.txt is created, the massif.pid.ps isnt created.=0D Wat might me done incorrectly??=0D =0D |
|
From: Nicholas N. <nj...@cs...> - 2006-08-11 00:14:56
|
On Thu, 10 Aug 2006, Antti Tuomi wrote: > For some reason, when I run valgrind on a certain (normal) user account > I get this feedback from the example program at > http://valgrind.org/docs/manual/quick-start.html#quick-start.interpret > > The program has one modification: i'm trying to write to x[11]. > > ==19046== Invalid write of size 4 > ==19046== at 0x804839F: ??? > ==19046== by 0x80483BB: ??? > ==19046== by 0x1B938E35: __libc_start_main (in /lib/libc-2.3.2.so) > ==19046== by 0x80482E0: ??? > ==19046== Address 0x1BA5B054 is 4 bytes after a block of size 40 alloc'd > ==19046== at 0x1B90459D: malloc (vg_replace_malloc.c:130) > ==19046== by 0x8048395: ??? > ==19046== by 0x80483BB: ??? > ==19046== by 0x1B938E35: __libc_start_main (in /lib/libc-2.3.2.so) > ==19046== by 0x80482E0: ??? [...] > > Valgrind version is 2.4.0. No idea why, there's nothing that should cause that. You might like to try a more recent version such as 3.2.0. Nick |