You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
1
(1) |
2
(8) |
3
(7) |
4
(16) |
5
|
|
6
(3) |
7
(4) |
8
(1) |
9
(1) |
10
(4) |
11
(5) |
12
(1) |
|
13
|
14
(4) |
15
(2) |
16
|
17
(2) |
18
(9) |
19
(5) |
|
20
(9) |
21
(7) |
22
(9) |
23
(5) |
24
|
25
(1) |
26
|
|
27
|
28
(1) |
29
(11) |
30
(6) |
31
|
|
|
|
From: Jeremy F. <je...@go...> - 2002-10-19 01:50:01
|
On Fri, 2002-10-18 at 17:52, H. J. Lu wrote: > > OK, but valgrind.so is already being linked with "-z initfirst"; what > > happens if there are two .so files with initfirst? (It does seem to > > work). > > Which ever comes first wins So if the order of events is: 1. run executable A, with LD_PRELOAD=valgrind.so (which has initfirst set) 2. A uses function foo() which is defined in libc.so 3. A doesn't use libpthread, but after a while it dlopens libB.so, which does 4. libB.so pulls in Valgrind's libpthread.so. 5. Valgrind's libpthread.so wants to override foo(), now that this has become a multithreaded program. If A uses foo() again, which definition will it get? libc's original one (which it was using before), or the new definition in libpthread.so? I wonder if the solution is to always pull in Valgrind's libpthread.so, but only make it do special stuff once there's more than one thread... Thanks, J |
|
From: Jeremy F. <je...@go...> - 2002-10-19 00:56:19
|
New patches for today, fresh from http://www.goop.org/~jeremy/valgrind/: 14-hg-tid HELGRIND: This fixes a bug in Helgrind in which all memory access by syscalls was being treated as if it were happening in thread 1. This is because the eraser_mem_read/write functions were using get_current_tid_1_if_root() to get the current tid. Unfortunately, during syscalls there is no current thread, so it was getting 1_if_root. This patch fixes this by using what thread ID information we're given, and only using get_current_tid() if we're recording a memory access performed by code (rather than by a syscall). 15-hg-datasym HELGRIND: In conjuction with patch 13-data-syms, print symbolic information for addresses in error messages (if possible). 16-ld-nodelete Add -Wl,-z,nodelete,-z,initfirst to link line for libpthread.so, because HJ says so. Also add soname. J |
|
From: H. J. L. <hj...@lu...> - 2002-10-19 00:52:39
|
On Fri, Oct 18, 2002 at 05:26:21PM -0700, Jeremy Fitzhardinge wrote: > On Fri, 2002-10-18 at 16:58, H. J. Lu wrote: > > 1. There are supposed to be no differences between weak and strong > > symbols in DSOs. I submitted a patch to glibc: > > > > http://sources.redhat.com/ml/libc-alpha/2001-09/msg00109.html > > It looks from that thread that the patch wasn't applied to 2.2. Does > that mean it still needs to be applied, or has it been applied since. I > really don't understand the issues here; can or explain, or is there a I am pushing for it again. > reference? In particular, what's DT_FILTER? Is it a mechanism for > interposing symbols, or is it something else? > See http://docs.sun.com/db/doc/806-0641 As I understand and I could be wrong, there may be still some problems with DT_FILTER and DT_AUXILIARY in glibc. > > 2. Glibc will make sure libpthread.so will override libc.so, weak > > or strong. Please file a bug if it doesn't do so. But please make > > sure your libpthread has: > > > > # readelf -d /lib/libpthread.so.0 > > ... > > 0x6ffffffb (FLAGS_1) Flags: NODELETE INITFIRST > > ... > > > > by passing "-z nodelete -z initfirst" to ld. > > OK, but valgrind.so is already being linked with "-z initfirst"; what > happens if there are two .so files with initfirst? (It does seem to > work). Which ever comes first wins H.J. |
|
From: Jeremy F. <je...@go...> - 2002-10-19 00:26:20
|
On Fri, 2002-10-18 at 16:58, H. J. Lu wrote: > 1. There are supposed to be no differences between weak and strong > symbols in DSOs. I submitted a patch to glibc: > > http://sources.redhat.com/ml/libc-alpha/2001-09/msg00109.html It looks from that thread that the patch wasn't applied to 2.2. Does that mean it still needs to be applied, or has it been applied since. I really don't understand the issues here; can or explain, or is there a reference? In particular, what's DT_FILTER? Is it a mechanism for interposing symbols, or is it something else? > 2. Glibc will make sure libpthread.so will override libc.so, weak > or strong. Please file a bug if it doesn't do so. But please make > sure your libpthread has: > > # readelf -d /lib/libpthread.so.0 > ... > 0x6ffffffb (FLAGS_1) Flags: NODELETE INITFIRST > ... > > by passing "-z nodelete -z initfirst" to ld. OK, but valgrind.so is already being linked with "-z initfirst"; what happens if there are two .so files with initfirst? (It does seem to work). Thanks, J |
|
From: H. J. L. <hj...@lu...> - 2002-10-18 23:58:17
|
On Fri, Oct 18, 2002 at 04:32:53PM -0700, Jeremy Fitzhardinge wrote: > Hi HJ, > > Oops. I guess lucon.com is dead to you. Let's see if this works. Is lucon.com available? I may switch to that. In the meantime, please email me at hj...@lu... :-). Jeremy, drop me a line if you want to make autofs 4.0. Content-Description: Forwarded message - Overriding glibc functions > Subject: Overriding glibc functions > From: Jeremy Fitzhardinge <je...@go...> > To: hj...@lu... > X-Mailer: Ximian Evolution 1.0.8 > Date: 18 Oct 2002 16:12:56 -0700 > > On Fri, 2002-10-18 at 13:53, Julian Seward wrote: > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgsnd > > 000e76c8 T msgsnd > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgrcv > > 000e771c T msgrcv > > > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgsnd > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgrcv > > > > IOW, msgsnd/msgrcv are exported as strong text symbols from libc, and not > > at all from libpthread. Your patch adds them to valgrind's libpthread.so. > > So any threaded program running on V will have two definitions of > > them to choose from, and it is unclear which it will get. If the libc > > version was defined as a weak symbol, then V's implementation would > > definitely overrride it (and many libc things are indeed exported weakly) > > but that is not the case. > > > > Does that make sense? Is there some other behaviour of the dynamic > > linker which might save the day and avoid this issue? I don't know much > > about ld.so; perhaps you understand more about all this? > > Hm, no, but I know who does. > > Hey, HJ: > > We have the problem that Valgrind's implementation of libpthread.so > needs to intercept various libc functions (in this instance > msgsnd/msgrcv) so it can make sure they only block the thread rather > than the whole process (since Valgrind completely re-implements the > thread library without using kernel threads). The question is which > definition will be picked up when glibc's function definition is not a > weak reference? How can we make sure that our libpthread version is > preferred over the glibc version? > > One thing occurs to me: the main Valgrind shared object, valgrind.so, is > always LD_PRELOADED, and is linked with -Wl,-z -Wl,initfirst. I wonder > if the solution is to put the intercept stub functions in valgrind.so, > and have them call either glibc or libpthread, depending on the context. > There are 2 things you should know: 1. There are supposed to be no differences between weak and strong symbols in DSOs. I submitted a patch to glibc: http://sources.redhat.com/ml/libc-alpha/2001-09/msg00109.html 2. Glibc will make sure libpthread.so will override libc.so, weak or strong. Please file a bug if it doesn't do so. But please make sure your libpthread has: # readelf -d /lib/libpthread.so.0 ... 0x6ffffffb (FLAGS_1) Flags: NODELETE INITFIRST ... by passing "-z nodelete -z initfirst" to ld. H.J. |
|
From: Jeremy F. <je...@go...> - 2002-10-18 23:34:52
|
Here's a small fix for helgrind as it currently stands in CVS head. I've got a number of other changes to improve reporting, but they're still too messy. This is the condensed bugfix patch. J |
|
From: Jeremy F. <je...@go...> - 2002-10-18 23:12:58
|
On Fri, 2002-10-18 at 13:53, Julian Seward wrote: > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgsnd > 000e76c8 T msgsnd > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgrcv > 000e771c T msgrcv > > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgsnd > sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgrcv > > IOW, msgsnd/msgrcv are exported as strong text symbols from libc, and not > at all from libpthread. Your patch adds them to valgrind's libpthread.so. > So any threaded program running on V will have two definitions of > them to choose from, and it is unclear which it will get. If the libc > version was defined as a weak symbol, then V's implementation would > definitely overrride it (and many libc things are indeed exported weakly) > but that is not the case. > > Does that make sense? Is there some other behaviour of the dynamic > linker which might save the day and avoid this issue? I don't know much > about ld.so; perhaps you understand more about all this? Hm, no, but I know who does. Hey, HJ: We have the problem that Valgrind's implementation of libpthread.so needs to intercept various libc functions (in this instance msgsnd/msgrcv) so it can make sure they only block the thread rather than the whole process (since Valgrind completely re-implements the thread library without using kernel threads). The question is which definition will be picked up when glibc's function definition is not a weak reference? How can we make sure that our libpthread version is preferred over the glibc version? One thing occurs to me: the main Valgrind shared object, valgrind.so, is always LD_PRELOADED, and is linked with -Wl,-z -Wl,initfirst. I wonder if the solution is to put the intercept stub functions in valgrind.so, and have them call either glibc or libpthread, depending on the context. Thanks, J |
|
From: Julian S. <js...@ac...> - 2002-10-18 20:46:22
|
> I already put 03-poll-select and 04-lax-ioctls into 1.0.4 and will > merge the result to the head shortly. I considered 02-sysv-msg but > concluded it too risky for the stable branch > > Why? Clearly nobody is using SYSV messaging in threaded programs > because it is currently completely broken. That patch makes it much > better (and might even be problem-free). Ah, I remember now. This is the situation on my R H 7.2 box; YMMV. sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgsnd 000e76c8 T msgsnd sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libc-2.2.4.so | grep msgrcv 000e771c T msgrcv sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgsnd sewardj@phoenix:~/VgHEAD/valgrind$ nm /lib/libpthread-0.9.so | grep msgrcv IOW, msgsnd/msgrcv are exported as strong text symbols from libc, and not at all from libpthread. Your patch adds them to valgrind's libpthread.so. So any threaded program running on V will have two definitions of them to choose from, and it is unclear which it will get. If the libc version was defined as a weak symbol, then V's implementation would definitely overrride it (and many libc things are indeed exported weakly) but that is not the case. Does that make sense? Is there some other behaviour of the dynamic linker which might save the day and avoid this issue? I don't know much about ld.so; perhaps you understand more about all this? > My current plans for it are: > - beef up reporting (starting by stealing chunks of memcheck) > - suppress duplicate errors > - investigate the state of the stack on thread startup > - make the memory granularity a compile-time setting for experiments > - implement the thread lifetime segment stuff (which may help with the > thread stack state problem; not sure yet) > > I hope to make some substantial progress in the next week or so. That's great. You have my encouragement :) J |
|
From: Jeremy F. <je...@go...> - 2002-10-18 18:15:34
|
On Fri, 2002-10-18 at 02:13, Josef Weidendorfer wrote: > I'm not so sure about this. Cachegrind does instrumentation for EVERY original > x86 instruction. > > > The alternative would be to regenerate the code, but I think that would > > be much more expensive. > > Yes. Well, if you really have that much instrumentation it might be cheaper to invalidate the cache and regenerate with instrumentation. Another possibility is to run on the real CPU for a while, and then make a breakpoint trap (or something) drop into the Valgrind virtual machine. No idea how easy that would be to do, but my immediate impression is that it isn't all that hard (though this stuff always has those devil-in-the-details gotchas). J |
|
From: Nicholas N. <nj...@ca...> - 2002-10-18 09:35:09
|
On Fri, 18 Oct 2002, Josef Weidendorfer wrote: > > if (!active) > > return; > > I'm not so sure about this. Cachegrind does instrumentation for EVERY original > x86 instruction. Even the overhead of calling a C function can make a > difference, I suppose. Switching helper functions would be nice, but the C > calling overhead (saving/restoring registers on the stack) is still there. > OK, only benchmarking will help here. It's easy to do -- make the log functions empty, time it, then remove the calls to the log functions, time it again. I've done this before and found the overhead of the C calls alone (argument passing, register saving, etc) to be significant for Cachegrind, eg. something like 20%. C calls have been optimised since then with liveness analysis, the use of ((regparm(n))) attribute, etc, so I would be interested in seeing it re-measured. N |
|
From: Josef W. <Jos...@gm...> - 2002-10-18 09:26:42
|
On Friday 18 October 2002 08:31, Jeremy Fitzhardinge wrote: > On Thu, 2002-10-17 at 13:40, Josef Weidendorfer wrote: > > Reason: When a user want's to profile only a short part of a run, but > > this part is 10 CPU min. from the program start, it would be nice to = go > > on to that place with no tracing at all... > > I know this will kill the cache simulation somehow... but that's the = same > > problem with the traced data at the very beginning of a profile run. > > > > Alternative approach: A new CCALL only jumping to the helper when a f= lag > > is set... > > Well, one option would be to add an API for skins so that they can > change their helper function pointers in the baseblock. That way you > could point it to a simple no-op until you want it to do something. On > the other hand, putting a: > =09if (!active) > =09=09return; > at the top would get the same effect with very little additional > performance hit. I'm not so sure about this. Cachegrind does instrumentation for EVERY ori= ginal x86 instruction. Even the overhead of calling a C function can make a=20 difference, I suppose. Switching helper functions would be nice, but the = C=20 calling overhead (saving/restoring registers on the stack) is still there= =2E OK, only benchmarking will help here. > The alternative would be to regenerate the code, but I think that would > be much more expensive. Yes. Josef |
|
From: Jeremy F. <je...@go...> - 2002-10-18 06:31:00
|
On Thu, 2002-10-17 at 13:40, Josef Weidendorfer wrote: > Reason: When a user want's to profile only a short part of a run, but this > part is 10 CPU min. from the program start, it would be nice to go on to that > place with no tracing at all... > I know this will kill the cache simulation somehow... but that's the same > problem with the traced data at the very beginning of a profile run. > > Alternative approach: A new CCALL only jumping to the helper when a flag is > set... Well, one option would be to add an API for skins so that they can change their helper function pointers in the baseblock. That way you could point it to a simple no-op until you want it to do something. On the other hand, putting a: if (!active) return; at the top would get the same effect with very little additional performance hit. The alternative would be to regenerate the code, but I think that would be much more expensive. J |
|
From: Jeremy F. <je...@go...> - 2002-10-18 01:54:52
|
On Thu, 2002-10-17 at 16:47, Julian Seward wrote:
I haven't said much lately -- been v. busy having started a new job.
But thanks for your hacking (lest you think I don't appreciate it).
Some of this stuff may well diffuse into the code base when we have
time to do so.
Good. I was going to start pushing the smaller patches to you in chunks
to encourage merging.
I've been updating them as the CVS tree changes; the current snapshot is
always at http://www.goop.org/~jeremy/valgrind (its a couple of days
behind at the moment, but nothing major has happened since then).
I already put 03-poll-select and 04-lax-ioctls into 1.0.4 and will
merge the result to the head shortly. I considered 02-sysv-msg but
concluded it too risky for the stable branch
Why? Clearly nobody is using SYSV messaging in threaded programs
because it is currently completely broken. That patch makes it much
better (and might even be problem-free).
I'm also working on a patch to deal with open() blocking in the open of
a FIFO file. It is very ugly (attached for your amusement).
; I'll put it in the head.
That's fine.
I think 05 is already in the head (N put it in). I've looked at 00
and 01 and they seem possibilities for the head too.
I've been running them for a while without any problems (though I think
wider testing would do them good).
What does 06 do? I haven't heard talk of it.
They're just dead simple implementations of VG_(memset) and VG_(memcpy)
for vg_mylibc.c because I was missing them.
I also think vgprof is useful enough to put into the source as well.
There's the slight complexity of needing a modified gprof program to
interpret the results, but I'm looking into how to get my changes back
into binutils (or simply use kcachegrind as the display tool).
I'd like to branch off a 1.2.X (stable) branch from the head in the
near future (< 1 month, possibly less than that). It would be nice
to ship a half-decent Helgrind (race detector) in that, but I'm not
sure what the prospects for this are; do you have any views on that?
Not sure yet. I've been hacking on it a fair bit, but nothing I want to
show off yet. Mostly I've been working on making the reports more
useful so it is easier to tell what's real, what's noise and what's
completely spurious. At the moment the _dl_num_relocations update is
the most obvious source of noise; I think I'm getting spurious stuff
caused by thread stacks being in a strange state when they start up, but
I haven't confirmed that yet.
My current plans for it are:
- beef up reporting (starting by stealing chunks of memcheck)
- suppress duplicate errors
- investigate the state of the stack on thread startup
- make the memory granularity a compile-time setting for experiments
- implement the thread lifetime segment stuff (which may help with the
thread stack state problem; not sure yet)
I hope to make some substantial progress in the next week or so.
J
|
|
From: Julian S. <js...@ac...> - 2002-10-17 23:39:54
|
Jeremy I haven't said much lately -- been v. busy having started a new job. But thanks for your hacking (lest you think I don't appreciate it). Some of this stuff may well diffuse into the code base when we have time to do so. I already put 03-poll-select and 04-lax-ioctls into 1.0.4 and will merge the result to the head shortly. I considered 02-sysv-msg but concluded it too risky for the stable branch; I'll put it in the head. I think 05 is already in the head (N put it in). I've looked at 00 and 01 and they seem possibilities for the head too. What does 06 do? I haven't heard talk of it. I'd like to branch off a 1.2.X (stable) branch from the head in the near future (< 1 month, possibly less than that). It would be nice to ship a half-decent Helgrind (race detector) in that, but I'm not sure what the prospects for this are; do you have any views on that? J On Monday 14 October 2002 6:53 am, Jeremy Fitzhardinge wrote: > Here's a series of my patches against the current Valgrind CVS head. > They are: > > 00-lazy-fp > This patch implements lazy FPU state save and restore, which improves > the performance of FPU-intensive code by a factor of 15 or so. > 01-partial-mul > This creates a new UInstr for multiply. This is mainly so that > memcheck can treat it like add and generate partially-defined results of > multiply with partially defined arguments. > 02-sysv-msg > Support for threaded programs using msgsnd/msgrcv. > 03-poll-select > Bind poll and select properly to catch all references. > 04-lax-ioctls > Adds new "lax-ioctls" weird hack to make checking on ioctl arguments > very weak (assume all inputs are defined and all outputs become > defined). 05-skin-clo-ordering > Reorder the baseBlock init with respect to calling the skin post_clo > routine. Some skins can't register their baseblock helpers until > after CLO parsing, so the skin's post_clo function must be called > before BaseBlock init. > 06-memops > Implement memcpy/memset. > 07-seginfo > API for skins to extract from information about mapped segments. > 08-skin-clientreq > Introduce a systematic way for skins to distinguish each other's > client requests. Uses the de-facto standard two-letter identifiers in > the top two bytes of the client request code. Also changes the > interface to SK_(handle_client_request) so that a skin can say whether > or not it handled the request, which allows correct setting of the > default return value if the request was not handled. > 09-rdtsc-calibration > Spin rather than sleep during rtdsc calibration, in order to > compensate for power-management modes where the TSC does not advance > while the CPU is idle. Of course this makes the TSC generally > unreliable as a timing mechanism, but at least it doesn't trigger an > assert. > 12-vgprof > A skin which generates gprof-style profiling output files. This allows > profiling of multithreaded programs using shared libraries. Requires a > patched version of gprof to interpret the output files. > > (The missing patches are local ones which aren't generally useful.) > > Rather than attaching them all, they're available separately at > http://www.goop.org/~jeremy/valgrind/ and rolled together at > http://www.goop.org/~jeremy/valgrind/patches.tar.gz. > > More detail on vgprof is available at > http:://www.goop.org/~jeremy/valgrind/vgprof.html. > > J > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Josef W. <Jos...@gm...> - 2002-10-17 20:40:42
|
Hi, I got a wish item... Do you think it's easy to implement runtime switching between different=20 instrumentations for the same BB? Or is this already possible from within= =20 skins? Reason: When a user want's to profile only a short part of a run, but thi= s=20 part is 10 CPU min. from the program start, it would be nice to go on to = that=20 place with no tracing at all... I know this will kill the cache simulation somehow... but that's the same= =20 problem with the traced data at the very beginning of a profile run. Alternative approach: A new CCALL only jumping to the helper when a flag = is=20 set... I'm not quite sure it's worth the trouble... Josef |
|
From: Jeremy F. <je...@go...> - 2002-10-15 15:53:52
|
On Tue, 2002-10-15 at 02:07, Nicholas Nethercote wrote:
On 14 Oct 2002, Jeremy Fitzhardinge wrote:
> Here's a patch which allows a skin to ask for data symbols to be
> preserved.
Excuse my ignorance, can someone explain what data symbols are, and why
reading them fixes the si->offset mess?
They are independent, but the same machinery allows both problems to be
solved.
Data symbols are the symbols in the exe/.so symtabs referring to storage
in the data and bss sections, as opposed to text. Currently the symbol
table reading machinery only worries about function symbols, but
helgrind really needs to be able to print symbolic information about
variables too.
At present, the symtab reader filters out all symbols which refer to
memory outside the corresponding SegInfo. Since data and bss is
typically in a separate mapping from the text which doesn't have the 'x'
bit set, the /proc/self/maps scanner ignores them. My solution was to
make vg_read_lib_symbols() parse the ELF file's PHdr to work out where
it got mapped into the address space, and extend the text's SegInfo
mapping to include the adjacent data and bss segments. Reading data and
bss symbols then just becomes a matter of not ignoring them.
It turns out that reading the PHdr also gives all the information you
need to work out what the symbol value offset is. The first PT_LOAD
entry in the PHdr specifies the base vaddr where this ELF file wants to
be mapped, and therefore the base address from which the symbol tables
are offset. For executables, the vaddr is typically 0x8048000, and for
shared libraries it is 0. If you look at the actual mapped address
(from /proc/self/maps) and compare it with the ELF file's mapping
information, it gives you the symbol offset. That is:
si->offset = si->start - phdr[0]->p_vaddr
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-10-15 09:08:08
|
On 14 Oct 2002, Jeremy Fitzhardinge wrote: > Here's a patch which allows a skin to ask for data symbols to be > preserved. Excuse my ignorance, can someone explain what data symbols are, and why reading them fixes the si->offset mess? Thanks. N > It does 3 things: > - it adds a new data_syms flag to VG_(needs), so that only skins which > want them get them. > - it adds some new logic to vg_read_lib_symbols, which > 1. only expects to see segments with a 0 file offset > 2. traverses the ELF Phdrs, and includes all the mapped segments > into the SegInfo's address range > - by happy convenience, this also completely cleans up the si->offset > mess; there is no longer any dependence on knowing where the > executables are loaded; it can be derived from the ELF file itself. > The value of si->offset = si->start - phdr->p_vaddr. |
|
From: Jeremy F. <je...@go...> - 2002-10-14 22:53:33
|
Here's a patch which allows a skin to ask for data symbols to be preserved. It does 3 things: - it adds a new data_syms flag to VG_(needs), so that only skins which want them get them. - it adds some new logic to vg_read_lib_symbols, which 1. only expects to see segments with a 0 file offset 2. traverses the ELF Phdrs, and includes all the mapped segments into the SegInfo's address range - by happy convenience, this also completely cleans up the si->offset mess; there is no longer any dependence on knowing where the executables are loaded; it can be derived from the ELF file itself. The value of si->offset = si->start - phdr->p_vaddr. J |
|
From: Jeremy F. <je...@go...> - 2002-10-14 16:17:05
|
On Mon, 2002-10-14 at 04:15, Josef Weidendorfer wrote:
Hi Jeremy,
I just looked over your vgprof skin: Seems quite cool :-)
The client side requests (e.g. VALGRIND_DUMP_PROFILE) are quite
useful. But they need changing source and recompilation. Did you
already thought about alternate solutions?
I have the "--dumpat=" command line option (should be renamed to
--dump-at-entering=" and adding "--dump-at-leaving=").
The main reason I added them was for doing profiling of an interactive
application. I actually bound them to keystrokes so I can do things
like:
1. zero stats
2. move around the UI
3. grab profile
Static profile snapshots at particular function entry/exits wouldn't
have been that useful.
Additionally I allow interactive controlling a cachegrind run by creating
"cachegrind.cmd" files. At the moment, simply a dump is made when
detecting this file. But I want to add commands for cachegrind to read and
execute them, e.g. "DUMP NOW" or "DUMP AT ENTERING xxx" or
"DELETE DUMP AT ENTERING xxx".
It would be cool if we could come up with some kind of "standard" for this.
Especially as I would like to add a (v)gprof import filter for KCachegrind,
and you create trace parts, too: KCachegrind has a toolbar button "force
dump", creating a "cachegrind.cmd" file (This should be renamed to
"valgrind.cmd").
Well, it seems that there's some plan to add a mechanism so that
valgrind can report its results via a socket rather than simply writing
out to stderr. Such a socket seems like a better way of communicating
with a skin than polling on a file.
For actual configuration-type information, I was thinking of
implementing some suppression file keywords, so that I can include or
exclude particular parts of the program. It would make sense to add a
suppression keyword to trigger a profile dump as well.
The idea of different "weights" for different instructions seems to be
quite useful to add to cachegrind (as new event type).
How did you get your weights?
They can be quite different for every different processor (AMD/Intel).
And the best thing would be to get the values by measuring online,
with an additional tool to put measured weights into a config file (like
calibrator for the cache latencies).
That's all a complete hack at the moment. It doesn't even measure the
right thing; it adds weights based on the UInstrs, but doesn't take into
account the original x86 instruction's performance characteristics. The
alternative, which is to simply count x86 instructions, gives somewhat
misleading results because it assumes that all instructions take the
same amount of time to run.
As I understand, you have a new gmon format version. I would suggest
the following for the new format:
For each header, add a section length field to allow skipping this
section if the reader doesn't know about the section type (You once said that
this is a shortcoming of the current format yourself).
Yes, that's a problem with using gmon.out, but the main reason for using
it was to reuse the gprof code base, and with luck be able to get the
changes merged into the binutils mainline. It isn't a very nice format
on the whole, so it certainly isn't hard to come up with something
improved, but I don't think its worth the effort to completely change
the gmon.out format.
Multiple history sections seem to be supported in the format. Can (v)gprof
handle these, e.g. choosing them per option? Or does the gprof output have
different columns for each event type?
The gprof program can only really deal with one unit at a time. And I
haven't really looked into recording other units yet, though elapsed
time and CPU-time seem like the most useful.
I would like to add an gprof import filter to KCachegrind: Supporting an
extensionable format would be good for this. I suppose I will have to copy
all the symbol reading stuff from binutils. The nice thing of the cachegrind
format is that you already have symbol names...
Yes, or the stuff from valgrind itself. As far as I can tell, libbfd is
pretty inefficient at doing the things that gprof needs done (in
particular, mapping code addresses to file:lines). The advantage is
that it supports lots of different file formats.
If I understand correctly, you associate the weights of a BB to the start
address of this BB. Is this granularity enough for annotated source output of
your vgprof?
I haven't used annotated source, because gprof is too inefficient at
reading lots of line-level symbol tables. But yes, I think accumulating
the BB's instructions to the first address doesn't upset things too much
(at the source level, you may see something later in a BB supposedly
taking no time, but it wouldn't be that hard to work out what's going
on).
J
|
|
From: Josef W. <Jos...@gm...> - 2002-10-14 11:15:27
|
Hi Jeremy, I just looked over your vgprof skin: Seems quite cool :-) The client side requests (e.g. VALGRIND_DUMP_PROFILE) are quite useful. But they need changing source and recompilation. Did you already thought about alternate solutions? I have the "--dumpat=3D" command line option (should be renamed to=20 --dump-at-entering=3D" and adding "--dump-at-leaving=3D"). Additionally I allow interactive controlling a cachegrind run by creating "cachegrind.cmd" files. At the moment, simply a dump is made when detecting this file. But I want to add commands for cachegrind to read an= d execute them, e.g. "DUMP NOW" or "DUMP AT ENTERING xxx" or "DELETE DUMP AT ENTERING xxx". It would be cool if we could come up with some kind of "standard" for thi= s. Especially as I would like to add a (v)gprof import filter for KCachegrin= d,=20 and you create trace parts, too: KCachegrind has a toolbar button "force=20 dump", creating a "cachegrind.cmd" file (This should be renamed to=20 "valgrind.cmd"). The idea of different "weights" for different instructions seems to be quite useful to add to cachegrind (as new event type). How did you get your weights? They can be quite different for every different processor (AMD/Intel). And the best thing would be to get the values by measuring online, with an additional tool to put measured weights into a config file (like=20 calibrator for the cache latencies). As I understand, you have a new gmon format version. I would suggest the following for the new format: For each header, add a section length field to allow skipping this section if the reader doesn't know about the section type (You once said = that=20 this is a shortcoming of the current format yourself). Multiple history sections seem to be supported in the format. Can (v)gpro= f=20 handle these, e.g. choosing them per option? Or does the gprof output hav= e=20 different columns for each event type? I would like to add an gprof import filter to KCachegrind: Supporting an=20 extensionable format would be good for this. I suppose I will have to cop= y=20 all the symbol reading stuff from binutils. The nice thing of the cachegr= ind=20 format is that you already have symbol names... If I understand correctly, you associate the weights of a BB to the start= =20 address of this BB. Is this granularity enough for annotated source outpu= t of=20 your vgprof? I suppose if you have a series of non-branching statements on a few sourc= e=20 lines, the annotation would put all the weights to the source line of the= =20 first statement... On Monday 14 October 2002 07:53, Jeremy Fitzhardinge wrote: > Here's a series of my patches against the current Valgrind CVS head. > They are: > [...] > 12-vgprof > A skin which generates gprof-style profiling output files. This al= lows > profiling of multithreaded programs using shared libraries. Requir= es a > patched version of gprof to interpret the output files. |
|
From: Jeremy F. <je...@go...> - 2002-10-14 05:53:55
|
Here's a series of my patches against the current Valgrind CVS head.
They are:
00-lazy-fp
This patch implements lazy FPU state save and restore, which improves
the performance of FPU-intensive code by a factor of 15 or so.
01-partial-mul
This creates a new UInstr for multiply. This is mainly so that memcheck
can treat it like add and generate partially-defined results of multiply
with partially defined arguments.
02-sysv-msg
Support for threaded programs using msgsnd/msgrcv.
03-poll-select
Bind poll and select properly to catch all references.
04-lax-ioctls
Adds new "lax-ioctls" weird hack to make checking on ioctl arguments
very weak (assume all inputs are defined and all outputs become defined).
05-skin-clo-ordering
Reorder the baseBlock init with respect to calling the skin post_clo
routine. Some skins can't register their baseblock helpers until
after CLO parsing, so the skin's post_clo function must be called
before BaseBlock init.
06-memops
Implement memcpy/memset.
07-seginfo
API for skins to extract from information about mapped segments.
08-skin-clientreq
Introduce a systematic way for skins to distinguish each other's
client requests. Uses the de-facto standard two-letter identifiers in
the top two bytes of the client request code. Also changes the
interface to SK_(handle_client_request) so that a skin can say whether
or not it handled the request, which allows correct setting of the
default return value if the request was not handled.
09-rdtsc-calibration
Spin rather than sleep during rtdsc calibration, in order to
compensate for power-management modes where the TSC does not advance
while the CPU is idle. Of course this makes the TSC generally
unreliable as a timing mechanism, but at least it doesn't trigger an
assert.
12-vgprof
A skin which generates gprof-style profiling output files. This allows
profiling of multithreaded programs using shared libraries. Requires a
patched version of gprof to interpret the output files.
(The missing patches are local ones which aren't generally useful.)
Rather than attaching them all, they're available separately at
http://www.goop.org/~jeremy/valgrind/ and rolled together at
http://www.goop.org/~jeremy/valgrind/patches.tar.gz.
More detail on vgprof is available at
http:://www.goop.org/~jeremy/valgrind/vgprof.html.
J
|
|
From: Julian S. <js...@ac...> - 2002-10-12 10:31:59
|
Thanks Josef, that's very excellent. I had heard many bug reports saying that the DWARF2 reader does not work very well, and so it seems like this fixes it. I will try and get it in a release soon. If you make any more bug fixes to this reader, please let me know ASAP. J On Friday 11 October 2002 10:33 pm, Josef Weidendorfer wrote: > Hi, > > I recently updated to Suse8.1 with GCC 3.2 and had a lot > of problems with cachegrind regarding DWARF2 debug info. > > Attached patch is for the Dwarf2 source line info reader; > For reading, a state machine is used reconstructing source line > info while running and reading (see DWARF2 specification, ch. 6.2). > The state machine was correct, but the calls to addLineInfo() > were wrong: It reported most of the times too small ranges > for source code statements, because it used only the diff of the last > state machine command instead of the diff to the last statement > boundary. Effect: Around 1/3 of all addresses with source line info got > unknown location. > The patch adds a "last_address" to the state machine to remember the last > statement boundary. On reset, it#s initialised to the "invalid" address 0. > I hope this is OK (or should we use "(Addr)-1" instead?). > The patch now uses the "is_stmt" boolean correctly to only call > addLineInfo() if there's a statement boundary (on x86, is_stmt most > probably is > always true...). > > Please apply both to 1.0.x branch and HEAD. > > Another problem: > Symbols like "_GLOBAL__I__ZNK10DirTreeMap9classNameEv" aren't demangled. > I need to look at this... > > Josef |
|
From: Josef W. <Jos...@gm...> - 2002-10-11 21:34:07
|
Hi, I recently updated to Suse8.1 with GCC 3.2 and had a lot of problems with cachegrind regarding DWARF2 debug info. Attached patch is for the Dwarf2 source line info reader; For reading, a state machine is used reconstructing source line info while running and reading (see DWARF2 specification, ch. 6.2). The state machine was correct, but the calls to addLineInfo() were wrong: It reported most of the times too small ranges for source code statements, because it used only the diff of the last state machine command instead of the diff to the last statement boundary. Effect: Around 1/3 of all addresses with source line info got=20 unknown location. The patch adds a "last_address" to the state machine to remember the last= =20 statement boundary. On reset, it#s initialised to the "invalid" address 0= =2E I=20 hope this is OK (or should we use "(Addr)-1" instead?). The patch now uses the "is_stmt" boolean correctly to only call addLineIn= fo() if there's a statement boundary (on x86, is_stmt most probably is always true...). Please apply both to 1.0.x branch and HEAD. Another problem: Symbols like "_GLOBAL__I__ZNK10DirTreeMap9classNameEv" aren't demangled. I need to look at this... Josef |
|
From: Nicholas N. <nj...@ca...> - 2002-10-11 11:20:26
|
On 10 Oct 2002, Jeremy Fitzhardinge wrote: > At the moment, not many skins have client callbacks (only memcheck, I > think). However, if other skins follow memcheck's example of starting > callbacks at VG_USERREQ__FINAL_DUMMY_CLIENT_REQUEST + 1, they will all > end up with overlapping numbers. > > This means that if a particular program under study has callbacks for > different skins, they will end up doing the wrong thing if run with the > wrong skin. It seems to me that there needs to be a systematic way for > skins to distinguish their callback numbers from each other. Yes, good point. > Well, since there seems to be a de-facto standard for each skin to have > a two letter code, I implemented a scheme to use it as a way for skins > to distinguish their client requests. > > I also made it not an error for a callback to be unimplemented, since > Valgrind is getting more capable with more skins, it will be common to > have a program with client requests inserted for multiple skins. As > part of this, I changed the interface to SK_(handle_client_request) so > that a skin can indicate that it did not handle the request, so that the > proper default return value can be returned. > > Patch against CVS head. This seems like a good way to do it, although I think I'll implement it slightly differently using a template function -- that must be defined if the client_requests need is set -- that returns the two letter prefix. Thanks for the good suggestion. N |
|
From: Nicholas N. <nj...@ca...> - 2002-10-11 09:48:08
|
On 10 Oct 2002, Jeremy Fitzhardinge wrote: > My skin needs to register baseblock helpers after parsing its command > line options (so that it knows what it needs to do). Unfortunately the > baseblock is currently being set up before calling SK_(post_clo_init). Hmm, fair enough. I'll fix it in the head. N |