You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
|
From: Jeremy F. <je...@go...> - 2002-10-30 16:07:06
|
On Wed, 2002-10-30 at 02:31, Nicholas Nethercote wrote:
Thinking about this some more, seems to me like every skin is going to
identically do:
details->core_interface = VG_IF_VERSION;
In which case why not just put:
const Char* core_interface = VG_IF_VERSION;
[ Well, SK_(core_interface) ]. Yes, that would work.
in vg_skin.h, which every skin #includes. That way, the skin doesn't have
to mention anything about version numbers, and assuming we get the X.Y
binary (in)compatibility numbering right, everything will work as desired.
(Thinking some more: it would be better to have VG_MINOR_IF_VERSION and
VG_MAJOR_IF_VERSION integers rather than a string, just to make
comparisons simpler.)
Yes, then you can do CPP checking:
#if VG_MAJOR_IF_VERSION < 3
#error Need Valgrind version 3
#endif
And not expose the fields of the struct in vg_skin.h. That way even if the
structs were totally reordered in core, an old skin would still work fine
with it.
Yes, that would be better. A little more cumbersome, but it could be
prettied up with a macro perhaps.
J
|
|
From: Jeremy F. <je...@go...> - 2002-10-30 16:03:20
|
On Wed, 2002-10-30 at 07:38, Nicholas Nethercote wrote:
Hi,
Jeremy: I saw your patch #33 which adds a pre_mutex_lock event. I think
it's a good thing to add to core. Would you mind augmenting your patch so
that it adds a pre_mutex_unlock() event too? It would be good to have for
completeness but I'm not sure where the hooks need to go -- my
understanding of the scheduler and threads stuff is poor.
The main reason for pre_mutex_lock is that locking can block, so you may
want to do something before that happens. Unlock doesn't block, so pre_
or post_ doesn't matter much. I didn't bother with pre_mutex_unlock
because it would be redundant.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-10-30 15:38:56
|
Hi, Jeremy: I saw your patch #33 which adds a pre_mutex_lock event. I think it's a good thing to add to core. Would you mind augmenting your patch so that it adds a pre_mutex_unlock() event too? It would be good to have for completeness but I'm not sure where the hooks need to go -- my understanding of the scheduler and threads stuff is poor. Thanks. -- Nick Nethercote nj...@ca... |
|
From: Nicholas N. <nj...@ca...> - 2002-10-30 13:33:25
|
On 29 Oct 2002, Jeremy Fitzhardinge wrote: > One of the problems with the output from Helgrind (and memcheck) is that > the results are somewhat opaque. They talk about raw memory locations > rather than programmer-visible objects. If you're lucky, the location > is associated with a symbol, but mostly they're just offsets into > allocated memory, which isn't all that useful. > > I'm thinking the best way of making this useful is by getting more > information out of the debug info (stabs or dwarf). > [...] > So, does anyone have a feel for how hard this actually is? No idea. But it would be a very useful thing; I can see this facility would definitely go in the core, as it could be useful for many skins. N |
|
From: Nicholas N. <nj...@ca...> - 2002-10-30 10:31:57
|
On some days, Jeremy Fitzhardinge wrote:
> I think a better scheme would be to have the core export a preprocessor
> define called something like VG_IF_VERSION. The skins would then simply
> do:
>
> details->core_interface = VG_IF_VERSION;
>
> If the core makes an incompatible change to its skin interface, then
> increment the IF_VERSION. You could use a string in X.Y format, where
> changes in X are completely incompatible, but Y indicates a backwards
> compatible change.
>
> This way the association between skin binary and core is checked by the
> compiler and then built into the skin automatically.
Thinking about this some more, seems to me like every skin is going to
identically do:
details->core_interface = VG_IF_VERSION;
In which case why not just put:
const Char* core_interface = VG_IF_VERSION;
in vg_skin.h, which every skin #includes. That way, the skin doesn't have
to mention anything about version numbers, and assuming we get the X.Y
binary (in)compatibility numbering right, everything will work as desired.
(Thinking some more: it would be better to have VG_MINOR_IF_VERSION and
VG_MAJOR_IF_VERSION integers rather than a string, just to make
comparisons simpler.)
> Not exposing structures as part of the interface is a really good
> idea (OpenGL is a really nice example of an API which is extremely good
> at maintaining long-term binary compatibility, partly because it exposes
> no structures more complex than arrays of ints or floats).
Interesting. Maybe I should change the way `details', `needs' and
`trackable events' are set, using functions instead? Eg. change this:
details->name = "foo";
details->version = "1.1.1";
details->description = "blah blah blah";
details->copyright_author =
"Copyright (C) 2002, and GNU GPL'd, by Nicholas Nethercote.";
details->bug_reports_to = "nj...@ca...";
needs->basic_block_discards = True;
needs->command_line_options = True;
track->new_mem_heap = & my_new_mem_heap;
track->die_mem_heap = & my_die_mem_heap;
to this:
VG_(details_name)("foo");
VG_(details_version)("1.1.1");
...
VG_(set_needs_basic_block_discards)();
VG_(set_command_line_options)();
/* could have corresponding VG_(unset_command_line_options)() etc */
VG_(track_new_mem_heap)(& my_new_mem_heap);
VG_(track_die_mem_heap)(& my_die_mem_heap);
And not expose the fields of the struct in vg_skin.h. That way even if the
structs were totally reordered in core, an old skin would still work fine
with it.
The only complication with this is that a skin could call these functions
at any time... but that could possibly be desirable (eg. start/stop
tracking new_mem_heap events if something happens) and it's also currently
possible anyway since VG_(needs) etc. are all global anyway. By using
functions I could add some sanity checking, eg. complain if a skin tries
to set a need after startup which would screw things up (eg. shadow_regs,
which affects the baseBlock initialisation and thus can't be turned on
halfway through execution).
There are some other exposed structs in the interface... I might have a
look through and see how else it can be better future-proofed.
Thanks for the good suggestions.
N
|
|
From: Jeremy F. <je...@go...> - 2002-10-29 22:49:41
|
One of the problems with the output from Helgrind (and memcheck) is that
the results are somewhat opaque. They talk about raw memory locations
rather than programmer-visible objects. If you're lucky, the location
is associated with a symbol, but mostly they're just offsets into
allocated memory, which isn't all that useful.
I'm thinking the best way of making this useful is by getting more
information out of the debug info (stabs or dwarf). The algorithm I'm
thinking about is:
- if the address is at or near a static/global symbol, look up the type
information for that symbol and work out either an array index and
structure member (if appropriate). So, for example, if the variable is
static int foo[5]; // at 0x4000
and the address is 0x4004, this would be currently presented as foo+4,
but we should display foo[1]. If foo is a structure:
struct { int a, b } foo[5]; // at 0x4000
and the address is 0x4004, we'd display foo[0].b
- if the address is in the heap, look at all the currently in-scope
local variables and see if they point near the address; if so, make up a
description based on that variable: "local->a" or "local[3].a". If
there's no locals which seem to be in the right area, we could look at
all the global and in-scope statics as well, but that might get
painful. Similarly with recurring into deep types (so we can discover
if "p->a->b->foo" is a good description for the address).
This all seems reasonably easy, on the assumption its easy to extract
more debug info out of our objects. This seems tricky. The stabs
reader in gdb (dbxread.c) is 115k of nasty GNU source. It seems to deal
with lots of bugs and special cases from stabs generated by compilers we
never have to deal with, but it still looks daunting.
So, does anyone have a feel for how hard this actually is?
J
|
|
From: Jeremy F. <je...@go...> - 2002-10-29 22:36:56
|
On Tue, 2002-10-29 at 14:23, Julian Seward wrote: > Um, /me is pleased this is progressing, but I don't have any insights to > add here. Sorry. Oh well; I guess the thing to do is try it out on other things and see what happens. It's all ready for merging... J |
|
From: Julian S. <js...@ac...> - 2002-10-29 22:17:47
|
On Tuesday 29 October 2002 8:52 pm, Jeremy Fitzhardinge wrote: > I finally got around to implementing the thread lifetime segment (TLS) > algorithm described in "Runtime Checking of Multithreaded Applications > with Visual Threads". The results are interesting. > > For small test programs, the algorithm is clearly working, and it helps > in reducing spurious warnings. Great. > -4265 possible data races found; 0 lock order problems > +4335 possible data races found; 0 lock order problems > > I don't know if this is because there's fewer spurious errors and more > real errors, or whether deferring some early spurious errors allows many > more to appear later. Um, /me is pleased this is progressing, but I don't have any insights to add here. Sorry. J |
|
From: Jeremy F. <je...@go...> - 2002-10-29 20:52:05
|
I finally got around to implementing the thread lifetime segment (TLS) algorithm described in "Runtime Checking of Multithreaded Applications with Visual Threads". The results are interesting. For small test programs, the algorithm is clearly working, and it helps in reducing spurious warnings. For Mozilla it is also clearly working. This is an extract of the difference between early parts of the mozilla startup, showing a previously reported warning not being reported with TLS enabled because the location is changing ownership rather than changing state: Thread 2: -Possible data race writing variable at 0x43794CFC - at 0x403550A6: __pthread_mutex_destroy (vg_libpthread.c:965) - by 0x4033E5CD: PR_DestroyLock (in /usr/lib/libnspr4.so) - by 0x402D08A5: nsThread::WaitUntilReadyToStartMain(void) (in /usr/lib/libxpcom.so) - by 0x402D024D: nsThread::Main(void *) (in /usr/lib/libxpcom.so) - Previous state: exclusively owned by thread 1 - Address 0x43794CFC is 12 bytes inside a block of size 88 alloc'd by thread 1 at - at 0x4009D5ED: calloc (vg_clientfuncs.c:242) - by 0x40334662: PR_Calloc (in /usr/lib/libnspr4.so) - by 0x4033E58A: PR_NewLock (in /usr/lib/libnspr4.so) - by 0x402D063E: nsThread::Init(nsIRunnable *, unsigned int, PRThreadPriority, PRThreadScope, PRThreadState) (in /usr/lib/libxpcom.so) - Word at 0x43794CFC last accessed - at 0x40354E1C: __pthread_mutex_init+36 (vg_libpthread.c:882) - The counterintuitive result is that over the whole run, more errors are reported: -4265 possible data races found; 0 lock order problems +4335 possible data races found; 0 lock order problems I don't know if this is because there's fewer spurious errors and more real errors, or whether deferring some early spurious errors allows many more to appear later. J |
|
From: Jeremy F. <je...@go...> - 2002-10-29 17:10:25
|
On Tue, 2002-10-29 at 08:23, Nicholas Nethercote wrote:
Hmm, I like this. Decoupling the interface version and the Valgrind
version is a v. good idea, as is the X vs. Y version changes compared to
the set of compatible versions. It also means the skin writer doesn't
need to worry about tracking core versions because it's done
automatically.
Yep.
> So long as the interface doesn't expose any structures which can change
> shape, this should work well.
Not sure if I understand this fully... only problem I can think of is that
the `details' struct changed so that the `core_interface' field was in a
different place. The `details' struct is the first argument to
VG_(pre_clo_init) so changing the other structs shouldn't matter, I think.
And if I make `core_interface' the first field in `details' that should
even work if more fields are added to the end of `details' later on?
Well, I was thinking more of how to make sure that the compiler will
pick up as many incompatible changes for itself. An example of a
troublesome interface would be where the skin creates an instance of a
structure and passes it back to the core. Also, a function which
changes the meaning of its arguments without changing their type would
cause a silent incompatibility (in that the compiler wouldn't notice).
But you're right; so long as the version discriminator field is first,
it doesn't much matter what goes on elsewhere (if you're willing to
break interface compatibility).
BTW, my understanding of what constitutes a backward (in)compatible change
is hazy with shared objects... here's what I'm guessing is backward
compatible:
add a local function to core: yes
If by "local function" you mean something which isn't part of the skin
interface, then yes.
add a global function to core: yes
change a global function's args in core: no
change a data structure (eg. struct) seen by skins: no
Maybe. If the struct is allocated by the core, only has new elements
added to the end, and they're initialized to sane default values, then
that's compatible. details and needs are examples of such structures
(so long as good discipline in making sure that new elements always go
at the end). You'd have to code it something like:
struct thing_v1 {
/* 1.0 functions */
int (*foo)(void);
/* 1.1 functions */
int (*bar)(int, int, char *);
void (*clunk)(char *, void *, int ****[], void (*foo)(int));
/* 1.2 functions */
int (*__obsolete_1)(int, int, void *);
};
Also, enums: always add new elements at the end. In fact, its better to
make sure that enums are always explicitly enumerated to get good
interface stability: enum { foo=0, bar=1, biff=4, bunk=3 }
There are sure to be other things I haven't thought of here.
It seems like there are two levels of incompatibility, too:
a. changes which require a skin to be recompiled
b. changes which require a skin's code to be changed
An example of (a) would be adding a new (optional) field to the `needs'
struct.
This doesn't need a recompile if the order of existing fields is
unaffected.
An example of (a) would be adding a new (compulsory) field to the
`details' struct.
Am I on the right track? Any other comments, etc? Thanks.
Yes. Not exposing structures as part of the interface is a really good
idea (OpenGL is a really nice example of an API which is extremely good
at maintaining long-term binary compatibility, partly because it exposes
no structures more complex than arrays of ints or floats).
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-10-29 16:23:33
|
On 29 Oct 2002, Jeremy Fitzhardinge wrote: > I think a better scheme would be to have the core export a preprocessor > define called something like VG_IF_VERSION. The skins would then simply > do: > > details->core_interface = VG_IF_VERSION; > > If the core makes an incompatible change to its skin interface, then > increment the IF_VERSION. You could use a string in X.Y format, where > changes in X are completely incompatible, but Y indicates a backwards > compatible change. > > This way the association between skin binary and core is checked by the > compiler and then built into the skin automatically. Hmm, I like this. Decoupling the interface version and the Valgrind version is a v. good idea, as is the X vs. Y version changes compared to the set of compatible versions. It also means the skin writer doesn't need to worry about tracking core versions because it's done automatically. > So long as the interface doesn't expose any structures which can change > shape, this should work well. Not sure if I understand this fully... only problem I can think of is that the `details' struct changed so that the `core_interface' field was in a different place. The `details' struct is the first argument to VG_(pre_clo_init) so changing the other structs shouldn't matter, I think. And if I make `core_interface' the first field in `details' that should even work if more fields are added to the end of `details' later on? > If you want to get fancy nope :) ---- BTW, my understanding of what constitutes a backward (in)compatible change is hazy with shared objects... here's what I'm guessing is backward compatible: add a local function to core: yes add a global function to core: yes change a global function's args in core: no change a data structure (eg. struct) seen by skins: no There are sure to be other things I haven't thought of here. It seems like there are two levels of incompatibility, too: a. changes which require a skin to be recompiled b. changes which require a skin's code to be changed An example of (a) would be adding a new (optional) field to the `needs' struct. An example of (a) would be adding a new (compulsory) field to the `details' struct. Am I on the right track? Any other comments, etc? Thanks. N |
|
From: Jeremy F. <je...@go...> - 2002-10-29 16:08:12
|
On Tue, 2002-10-29 at 01:12, Nicholas Nethercote wrote:
Hi,
I've been thinking about the problem of ensuring skins and the core work
together across different versions, etc.
Here's my best idea so far. Have two `details' fields that must be filled
in by a skin, core_versionv and core_versionc, a bit like argv and argc,
that define a set of core versions that a skin can work with. Eg:
details->core_versionc = 2;
details->core_versionv = { "1.1.0", "1.1.1" };
[ Strictly syntactic comment: this should be a NULL terminated array
rather than counted; manually maintaining the correspondence is one of
those rich sources of minor irritating bugs ]
The core version would always be a string of the form "X.Y.Z". The core
would bomb out if its version doesn't match any of the core_versionv
entries.
This allows a skin to work with more than one version of the core, which
seems like a good thing. Skins would have to be updated for new versions
of the core, even though this may not always be necessary, but this seems
ok; it's probably better to sometimes bomb out unnecessarily saying
"please get an updated version of the skin" than to not bomb out and then
get some mysterious crash.
This seems fragile and hard to maintain. The problem is that the skin
developer's tendency will to simply add the new version number onto the
end of the list without really checking to see if all the old versions
still work; this will be exacerbated if you require an exact match
between core and skin versions. Also, it will make stable skins rot
very quickly if a minor update to core requires changes to all skins.
I think a better scheme would be to have the core export a preprocessor
define called something like VG_IF_VERSION. The skins would then simply
do:
details->core_interface = VG_IF_VERSION;
If the core makes an incompatible change to its skin interface, then
increment the IF_VERSION. You could use a string in X.Y format, where
changes in X are completely incompatible, but Y indicates a backwards
compatible change.
This way the association between skin binary and core is checked by the
compiler and then built into the skin automatically. So long as the
interface doesn't expose any structures which can change shape, this
should work well.
If you want to get fancy and have the core expose multiple incompatible
interface versions, you'd need to do something like:
/* My skin */
#define VG_WANT_IF 2
#include "vg_skin.h"
//...
details->core_interface = VG_IF_VERSION; /* major = 2 */
But I don't think that's terribly necessary.
J
|
|
From: Josef W. <Jos...@gm...> - 2002-10-29 15:18:09
|
On Tuesday 29 October 2002 14:37, Nicholas Nethercote wrote: > ... > There's a conflict here between ease-of-use for core developers, skin > developers, and core/skin users; there's also the question of how gene= ral > and complex should this system be? Yes: =46rom a users view, the shared object versioning is the best: He can ins= tall a=20 few skins, no matter if they require different valgrind core major versio= ns,=20 as they could be installed simultaniously. =46rom a skin developers view, getting the "minimum required version" fro= m=20 skin.h and keeping the skin/core interface stable among minor releases, i= s=20 the best. > My idea is simple and somewhat selfish -- ie. it will make my life easy= =2E > Your idea is more general and powerful, but also more complex and makes= my But I hope you won't change the core/skin interface in minor versions? Otherwise it will be a nightmare for skin developers. At least the minor version check thus should be automatic. And let the sk= in=20 spezify only X.Y versions of the core it is compatible with. Or simpler: = Only=20 let the skin work with the X.Y version it was compiled with. > ... > To ruthlessly minimise the possibility of Valgrind crashing due to > core/skin version mismatches. Abortion-of-execution-with-explanation i= s > not good, but mysterious crashes are much worse. You could give out a warning if the minor versions don't match. > ... > I'm confused... is the bug for DWARF2 or stabs? Is your patch for DWAR= F2 > or stabs? GCC 2.95.x obviously generates bad line info for prologues, no matter if = STABS=20 or DWARF2 format is used. As STABS is the default for GCC 2.95.x, I only = used=20 this one in my GCC 2.95.3 days, and thus only have a patch for the STABS=20 reader. Josef |
|
From: Nicholas N. <nj...@ca...> - 2002-10-29 13:37:40
|
On Tue, 29 Oct 2002, Josef Weidendorfer wrote: > Another idea: What speaks against adopting the versioning scheme of shared > objects? A skin uses the core a little like a shared library. On compiling > the skin, the core version is in skin.h (say X1.Y1.Z1). This is the minimum > version of the core required. > > Another revision of the core X1.Y1.Z2 (Z1 != Z2) is "binary compatible" (bugs > fixed). > A new minor version of the core X1.Y2.Z2 (Y2>Y1) keeps the same > callbacks/structures but adds a few core functions; thus it's "upwards binary > compatible" to a skin compiled with core version X1.Y1.Z1. > For every thing else the core gets a new major version and you need to have a > new skin. > > This has the advantage that the skin programmer doesn't have to specify any > core versions explicitly. He makes sure the skin runs with the core version > he had the skin compiled with. And that's enough (Same as "ld" inserts the > shared object version numbers needed). > > At valgrind installation, we make the same links for valgrind.so as when a > shared lib is installed: I.e. valgrind.so.1 -> valgrind.so.1.1. This way > different versions of valgrind can be installed simultaniously. Either use a > dispatching "valgrind" script with a "--version=" option that loads the > correct valgrind.so. Or (IMHO better) each installed skin provides its > startup script and loads the "valgrind.so.<major>" it needs. > > Perhaps I'm overlooking something. > But I think with a somewhat stable skin/core interface there will be more > skins written; nobody wants to adjust his skin every few months. At least, a > skin should work with new core revisions with same minor/major number. > Bad thing with this approach: when the core will change rapidly in the future, > the major versions increase very fast. There's a conflict here between ease-of-use for core developers, skin developers, and core/skin users; there's also the question of how general and complex should this system be? My idea is simple and somewhat selfish -- ie. it will make my life easy. Your idea is more general and powerful, but also more complex and makes my life more difficult. Since I am trying to be a PhD student first and a free software developer second, I am strongly inclined towards the simple idea. I don't particularly want to have to worry about binary compatibility. Also, Valgrind's current version numbers already have a rough meaning (X.Y.Z, Z means minor release, odd Y means development version, even Y means stable version, a la Linux kernel), so overloading the version numbers would be a pain. I don't mean to sound belligerent; I'm just trying to avoid digging holes that I will fall into later :) > > Jeremy, Josef: how does this sound to you? Can you think of any flaws? > > What's the reason you want exact version dependency? To ruthlessly minimise the possibility of Valgrind crashing due to core/skin version mismatches. Abortion-of-execution-with-explanation is not good, but mysterious crashes are much worse. > PS: Nick, do you remember the workaround I have in vg_symtab2.c for wrong > generated STABS debugging line info of GCC 2.95.3 (the line numbers don't > cover the prolog of a function). > I recently looked around in GDB for the DWARF2 reader. And I saw they have the > same workaround (in gdb/dwarf2read.c). > > /* This function exists to work around a bug in certain compilers > (particularly GCC 2.95), in which the first line number marker of a > function does not show up until after the prologue, right before > the second line number marker. This function shifts ADDRESS down > to the beginning of the function if necessary, and is called on > addresses passed to record_line. */ > > static CORE_ADDR > check_cu_functions (CORE_ADDR address) > ... > > Obviously GCC 2.95 can generate DWARF2 if requested? And then has the same > bug. Therefore, I would like to have this workaround in HEAD (for STABS). > Valgrind never had a problem with this, because in a backtrace, no addresses > on the stack will ever point to function prologs. But I'm using the debug > info of the prolog address to construct the calltree :-( > What do you think on this issue? I'm confused... is the bug for DWARF2 or stabs? Is your patch for DWARF2 or stabs? If Valgrind has a bug/imperfection and you have a patch, then I guess using it would be good... > I can provide the patch. Unfortunately I don't have GCC 2.95.X installed any > more. Will have to install again to do the testing (perhaps even provide a > patch for DWARF2 if GCC 2.95.3 can produce DWARF2 info...) N |
|
From: Josef W. <Jos...@gm...> - 2002-10-29 13:17:10
|
On Tuesday 29 October 2002 10:12, Nicholas Nethercote wrote:
> Hi,
>
> I've been thinking about the problem of ensuring skins and the core wor=
k
> together across different versions, etc.
>
> Here's my best idea so far. Have two `details' fields that must be fil=
led
> in by a skin, core_versionv and core_versionc, a bit like argv and argc=
,
> that define a set of core versions that a skin can work with. Eg:
>
> details->core_versionc =3D 2;
> details->core_versionv =3D { "1.1.0", "1.1.1" };
>
> The core version would always be a string of the form "X.Y.Z". The cor=
e
> would bomb out if its version doesn't match any of the core_versionv
> entries.
>
> This allows a skin to work with more than one version of the core, whic=
h
> seems like a good thing. Skins would have to be updated for new versio=
ns
> of the core, even though this may not always be necessary, but this see=
ms
> ok; it's probably better to sometimes bomb out unnecessarily saying
> "please get an updated version of the skin" than to not bomb out and th=
en
> get some mysterious crash.
Another idea: What speaks against adopting the versioning scheme of share=
d=20
objects? A skin uses the core a little like a shared library. On compilin=
g=20
the skin, the core version is in skin.h (say X1.Y1.Z1). This is the minim=
um=20
version of the core required.
Another revision of the core X1.Y1.Z2 (Z1 !=3D Z2) is "binary compatible"=
(bugs=20
fixed).
A new minor version of the core X1.Y2.Z2 (Y2>Y1) keeps the same=20
callbacks/structures but adds a few core functions; thus it's "upwards bi=
nary=20
compatible" to a skin compiled with core version X1.Y1.Z1.
For every thing else the core gets a new major version and you need to ha=
ve a
new skin.
This has the advantage that the skin programmer doesn't have to specify a=
ny=20
core versions explicitly. He makes sure the skin runs with the core versi=
on=20
he had the skin compiled with. And that's enough (Same as "ld" inserts th=
e=20
shared object version numbers needed).
At valgrind installation, we make the same links for valgrind.so as when =
a=20
shared lib is installed: I.e. valgrind.so.1 -> valgrind.so.1.1. This way=20
different versions of valgrind can be installed simultaniously. Either us=
e a=20
dispatching "valgrind" script with a "--version=3D" option that loads the=
=20
correct valgrind.so. Or (IMHO better) each installed skin provides its=20
startup script and loads the "valgrind.so.<major>" it needs.
Perhaps I'm overlooking something.=20
But I think with a somewhat stable skin/core interface there will be more=
=20
skins written; nobody wants to adjust his skin every few months. At least=
, a=20
skin should work with new core revisions with same minor/major number.
Bad thing with this approach: when the core will change rapidly in the fu=
ture,=20
the major versions increase very fast.
> Jeremy, Josef: how does this sound to you? Can you think of any flaws=
?
What's the reason you want exact version dependency?
> It would be nice to get this right on the first try.
Definitly.
Josef
PS: Nick, do you remember the workaround I have in vg_symtab2.c for wrong=
=20
generated STABS debugging line info of GCC 2.95.3 (the line numbers don't=
=20
cover the prolog of a function).
I recently looked around in GDB for the DWARF2 reader. And I saw they hav=
e the=20
same workaround (in gdb/dwarf2read.c).
=09/* This function exists to work around a bug in certain compilers
=09 (particularly GCC 2.95), in which the first line number marker of a
=09 function does not show up until after the prologue, right before
=09 the second line number marker. This function shifts ADDRESS down
=09 to the beginning of the function if necessary, and is called on
=09 addresses passed to record_line. */
=09static CORE_ADDR
=09check_cu_functions (CORE_ADDR address)
=09...
Obviously GCC 2.95 can generate DWARF2 if requested? And then has the sam=
e=20
bug. Therefore, I would like to have this workaround in HEAD (for STABS).
Valgrind never had a problem with this, because in a backtrace, no addres=
ses=20
on the stack will ever point to function prologs. But I'm using the debug=
=20
info of the prolog address to construct the calltree :-(
What do you think on this issue?
I can provide the patch. Unfortunately I don't have GCC 2.95.X installed =
any=20
more. Will have to install again to do the testing (perhaps even provide =
a=20
patch for DWARF2 if GCC 2.95.3 can produce DWARF2 info...)
>
> N
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers
|
|
From: Nicholas N. <nj...@ca...> - 2002-10-29 09:12:11
|
Hi,
I've been thinking about the problem of ensuring skins and the core work
together across different versions, etc.
Here's my best idea so far. Have two `details' fields that must be filled
in by a skin, core_versionv and core_versionc, a bit like argv and argc,
that define a set of core versions that a skin can work with. Eg:
details->core_versionc = 2;
details->core_versionv = { "1.1.0", "1.1.1" };
The core version would always be a string of the form "X.Y.Z". The core
would bomb out if its version doesn't match any of the core_versionv
entries.
This allows a skin to work with more than one version of the core, which
seems like a good thing. Skins would have to be updated for new versions
of the core, even though this may not always be necessary, but this seems
ok; it's probably better to sometimes bomb out unnecessarily saying
"please get an updated version of the skin" than to not bomb out and then
get some mysterious crash.
Jeremy, Josef: how does this sound to you? Can you think of any flaws?
It would be nice to get this right on the first try.
N
|
|
From: Jeremy F. <je...@go...> - 2002-10-28 02:13:41
|
On Wed, 2002-10-23 at 15:58, Julian Seward wrote:
==12966== at 0x4009AAED: calloc (vg_clientfuncs.c:242)
==12966== by 0x4027C7A1: nsRecyclingAllocator::Malloc(unsigned int, int)
(in /mnt/globe/Apps/mozilla-1.0/libxpcom.so)
==12966== by 0x44FC5CAE: (within
/mnt/globe/Apps/mozilla-1.0/components/libjar50.so)
==12966== by 0x439BC812: inflate_blocks (in
/mnt/globe/Apps/mozilla-1.0/libmozz.so)
My question is: memory allocated by calloc(), what state should it start
out in? Surely it should be in some non-shared state (not shared RW as
claimed here) ?
I tried running Mozilla with the attached code LD_PRELOADED. This
reduces a mozilla startup from 11MB of warnings down to only 4.5MB. The
rest look plausible, and I think the thread lifetime segmentation stuff
will help with these.
Mozilla has just finished -- generating 13975 errors in 13924 contexts --
mostly duplicates of a small handful of errors. Incidentally, 1164 different
lock sets were required, so I raised M_LOCKSET_TABLE to 5000 (and committed
it). Most of the locksets were small (1, 2 or 3 elems), but some got quite
large; from a quick scan the biggest is
[1124] = { 0x4365A748 0x4365E424 0x436650F4 0x4366B6C8 0x4374088C
0x437F94E8 0x438214B4 0x43826378 0x4382696C 0x4384CF20
0x438A28E8 0x438A3424 0x45377038 0x45377130 0x45377228
0x45377394 0x462D8E10 0x46C2CB98 0x46C9F474 0x473A20FC
0x473A407C }
I'm not seeing sets anywhere this big, and I'm only seeing about 700
distinct sets. I wonder if there was something broken in the code you
used?
J
|
|
From: Jeremy F. <je...@go...> - 2002-10-25 06:58:43
|
New patches of note:
21-hg-hashed-lockset
HELGRIND: New implementation of LockSets. This changes the
lockset_table into a hash, and changes the representation of each
LockSet from a list into an array. This improves performance (mainly
because of the hash, but the arrays are more cache friendly too), and
simplifies the code.
22-mutex-destroy-unlock
It seems that glibc assumes in its internal use of pthreads that
destroying a lock implicity unlocks it. This patch handles this case so
that lock ownership tracking is accurate.
Also handles the case of the dyanmic linker wanting to do locking
before Valgrind has started up. vg_libpthread now implements toy
lock/unlock functions to keep it happy and leave the locks in a sensible
state. Implements some untested code to handle the case where a lock is
taken before Valgrind is running but released afterwards.
23-intercept-select-poll
Do the intercept thing for select and poll. At some point I should
sit down and do this systematically for all the functions that
vg_libpthread wants to define but normal libpthread doesn't.
24-sym-offset
When looking up a symbol, also attach an offset. Also changes to
make sprintf reentrant.
http://www.goop.org/~jeremy/valgrind
J
|
|
From: Jeremy F. <je...@go...> - 2002-10-23 23:57:17
|
On Wed, 2002-10-23 at 15:58, Julian Seward wrote: > A busy day at goop.org, I see. Where in the physical universe are you > located, just out of curiosity? Yeah, not quite sure what's been happening; maybe one of the lists went mad. I'm an Australian living in San Francisco (so the notion of a physical universe is a little ill-defined). > I've just merged Quick work! I was just about to upload a new patch. (Now uploaded: 20-hg-lockgraph-report). BTW, it it possible to set up a SourceForge list which has CVS checkins? It would be useful. > 13-fix-printf > 13-kill-1ifroot > 14-hg-mmap-magic-virgin > 02-sysv-msg (I assume that 16-function-intercept makes this safe) Well, it was never unsafe, but I think it should work reliably. > 16-function-intercept > 18-recv-nonblock > 19-hg-lockgraph > > I also peered (again) at 09-rdtsc-calibration and think I'll merge that > too, although not this evening. Any thoughts about 00-lazyfp or 01-partial-mul? And I'd quite like vgprof to be a core skin. > Thanks for this hackery. pth_threadpool (my canonical small threading test) > now runs with fewer and fewer errors. Yes, the only error I'm getting out of it now is the initial hit on _dl_num_relocations. > Am amused to see you had a go at weird_LockSet_equals(). Nick wrote the > first version; I found it buggy and completely rewrote it; and so on ... > Hopefully 3rd time lucky. Am a bit surprised; I spent ages trying to > convince myself my version was right :) Yes, I poked about the the CVS history and noticed it had already been beaten up a bit already. I hope you'll agree that my version is obviously correct (in the normal highly qualified sense of "obvious"). The two bugs were 1) it returned equal if 'a' and 'b' were the same and missing_mutex was greater than the last element in 'a' and 'b' and 2) the check which ignored missing_mutex if it were already in 'a' broke removal (though that doesn't sound right now that I describe it). > Anyway: > > I wonder if I can encourage you to use mozilla-1.0 (standard binary install > from mozilla.org) as a stress test? I'm doing > > valgrind -v --error-limit=no --skin=helgrind --trace-children=yes mozilla > > and just exiting it as soon as it comes up. I'll give it a go. > It generates thousands of errors, and I'm suspicious -- if I trust anyone to > make a large threaded app and do it right, it's the mozilla people. That's trusting. > One > very common thing is this (kmail apologises for making it ugly): > > ==12966== Possible data race writing variable at 0x438A868C > ==12966== at 0x439BCA35: inflate_blocks (in > /mnt/globe/Apps/mozilla-1.0/libmozz.so) > ==12966== by 0x439BBDF1: inflate (in > /mnt/globe/Apps/mozilla-1.0/libmozz.so) > ==12966== by 0x44FC6E49: (within > /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) > ==12966== by 0x44FC5F5B: (within > /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) > ==12966== Previous state: shared RW, locked by: 0x45540E90 > ==12966== Address 0x438A868C is 1044 bytes inside a block of size 1280 > alloc'd by thread 1 at > ==12966== at 0x4009AAED: calloc (vg_clientfuncs.c:242) > ==12966== by 0x4027C7A1: nsRecyclingAllocator::Malloc(unsigned int, int) > (in /mnt/globe/Apps/mozilla-1.0/libxpcom.so) > ==12966== by 0x44FC5CAE: (within > /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) > ==12966== by 0x439BC812: inflate_blocks (in > /mnt/globe/Apps/mozilla-1.0/libmozz.so) > > My question is: memory allocated by calloc(), what state should it start > out in? Surely it should be in some non-shared state (not shared RW as > claimed here) ? No. I should be exclusive to the thread which did the allocation (or maybe magic virgin). That said, the name "nsRecyclingAllocator::Malloc" suggests that the memory has been used by someone else before, and has been released back into a pool before being used by inflate/inflate_blocks; that would need an client call to tell helgrind to reset the memory state. If a lot of Moz's memory has been got from that allocator, I'd expect a lot of spurious errors. > Mozilla has just finished -- generating 13975 errors in 13924 contexts -- > mostly duplicates of a small handful of errors. Incidentally, 1164 different > lock sets were required, so I raised M_LOCKSET_TABLE to 5000 (and committed > it). Most of the locksets were small (1, 2 or 3 elems), but some got quite > large; from a quick scan the biggest is > > [1124] = { 0x4365A748 0x4365E424 0x436650F4 0x4366B6C8 0x4374088C > 0x437F94E8 0x438214B4 0x43826378 0x4382696C 0x4384CF20 > 0x438A28E8 0x438A3424 0x45377038 0x45377130 0x45377228 > 0x45377394 0x462D8E10 0x46C2CB98 0x46C9F474 0x473A20FC > 0x473A407C } > > Dunno if that's remotely interesting or useful, but anyway. Yes, it is. A near-term plan I have is to reimplement the lockset stuff entirely: replace the table with a structural hash, and replace the lists with arrays. That should speed things up quite a bit (but I wouldn't have bothered if there weren't many sets, or if they don't get large). I think the lock cycle test makes quite large locksets, because it keeps computing the union of a thread's current lockset and the mutex's previous dependent locks. > Oh yes, one other thing. I can't figure out the numbering scheme for > your patches. At first I thought they were were a simple patch counter, > but then the appearance of (16-function-intercept.patch, 16-ld-nodelete.patch) > and (18-hg-err-reporting.patch, 18-recv-nonblock.patch) made me discard > that idea. Or are you reusing numbers for patches I've merged? Basically. For any given set of unmerged patches, I just number them to keep their lexical order the same as order they should be applied in. Generally when I start a new patch I pick the next number up, but occasionally I get started and decide I need a prerequisite patch, so I choose an available lower number (which may well recycle a number). J |
|
From: Julian S. <js...@ac...> - 2002-10-23 22:52:49
|
A busy day at goop.org, I see. Where in the physical universe are you located, just out of curiosity? I've just merged 13-fix-printf 13-kill-1ifroot 14-hg-mmap-magic-virgin 02-sysv-msg (I assume that 16-function-intercept makes this safe) 16-function-intercept 18-recv-nonblock 19-hg-lockgraph I also peered (again) at 09-rdtsc-calibration and think I'll merge that too, although not this evening. > The only ugly/controversial thing here is what I had to do to get glibc > intercepts actually working correctly. Unfortunately, it doesn't seem > to be true that libpthread's symbols are used in preference to glibc all > the time, even if valgrind.so itself is linked with libpthread. The > only reliable way I could find of making it work is by having > valgrind.so itself define the symbols we want to catch. Ah well. I can't think of any better solution. Thanks for this hackery. pth_threadpool (my canonical small threading test) now runs with fewer and fewer errors. Am amused to see you had a go at weird_LockSet_equals(). Nick wrote the first version; I found it buggy and completely rewrote it; and so on ... Hopefully 3rd time lucky. Am a bit surprised; I spent ages trying to convince myself my version was right :) Anyway: I wonder if I can encourage you to use mozilla-1.0 (standard binary install from mozilla.org) as a stress test? I'm doing valgrind -v --error-limit=no --skin=helgrind --trace-children=yes mozilla and just exiting it as soon as it comes up. It generates thousands of errors, and I'm suspicious -- if I trust anyone to make a large threaded app and do it right, it's the mozilla people. One very common thing is this (kmail apologises for making it ugly): ==12966== Possible data race writing variable at 0x438A868C ==12966== at 0x439BCA35: inflate_blocks (in /mnt/globe/Apps/mozilla-1.0/libmozz.so) ==12966== by 0x439BBDF1: inflate (in /mnt/globe/Apps/mozilla-1.0/libmozz.so) ==12966== by 0x44FC6E49: (within /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) ==12966== by 0x44FC5F5B: (within /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) ==12966== Previous state: shared RW, locked by: 0x45540E90 ==12966== Address 0x438A868C is 1044 bytes inside a block of size 1280 alloc'd by thread 1 at ==12966== at 0x4009AAED: calloc (vg_clientfuncs.c:242) ==12966== by 0x4027C7A1: nsRecyclingAllocator::Malloc(unsigned int, int) (in /mnt/globe/Apps/mozilla-1.0/libxpcom.so) ==12966== by 0x44FC5CAE: (within /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) ==12966== by 0x439BC812: inflate_blocks (in /mnt/globe/Apps/mozilla-1.0/libmozz.so) My question is: memory allocated by calloc(), what state should it start out in? Surely it should be in some non-shared state (not shared RW as claimed here) ? Mozilla has just finished -- generating 13975 errors in 13924 contexts -- mostly duplicates of a small handful of errors. Incidentally, 1164 different lock sets were required, so I raised M_LOCKSET_TABLE to 5000 (and committed it). Most of the locksets were small (1, 2 or 3 elems), but some got quite large; from a quick scan the biggest is [1124] = { 0x4365A748 0x4365E424 0x436650F4 0x4366B6C8 0x4374088C 0x437F94E8 0x438214B4 0x43826378 0x4382696C 0x4384CF20 0x438A28E8 0x438A3424 0x45377038 0x45377130 0x45377228 0x45377394 0x462D8E10 0x46C2CB98 0x46C9F474 0x473A20FC 0x473A407C } Dunno if that's remotely interesting or useful, but anyway. Oh yes, one other thing. I can't figure out the numbering scheme for your patches. At first I thought they were were a simple patch counter, but then the appearance of (16-function-intercept.patch, 16-ld-nodelete.patch) and (18-hg-err-reporting.patch, 18-recv-nonblock.patch) made me discard that idea. Or are you reusing numbers for patches I've merged? J |
|
From: Jeremy F. <je...@go...> - 2002-10-23 20:47:52
|
A new set of patches, including a big bugfix in how Helgrind keeps track of a thread's current lock set (it does now). There's a description of the fix below in 19-hg-lockgraph. The only ugly/controversial thing here is what I had to do to get glibc intercepts actually working correctly. Unfortunately, it doesn't seem to be true that libpthread's symbols are used in preference to glibc all the time, even if valgrind.so itself is linked with libpthread. The only reliable way I could find of making it work is by having valgrind.so itself define the symbols we want to catch. 16-function-intercept is my implementation of this. Basically I added a new file, vg_intercept.c, which contains stubs for the functions we want to catch; the stub for x calls VGL_(x). vg_intercept.c contains weakly defined default implementations for VGL_(x), but vg_libpthread has strongly defined implementations for VGL_(x). This seems to be reliable in catching all libc references, regardless of symbol definition strength. It's also pretty ugly, and I don't really like it, but I can't see a nicer way. (A related alternative approach which would be even more reliable would be to have a set of pointers to functions, which the stubs call into. Initially it would point to the non-threaded versions of the functions, but when libpthread.so is loaded, it installs new pointers to the threaded functions. This allows rebinding on the fly, and doesn't need to trust the dynamic linker to get anything tricky right). J http://www.goop.org/~jeremy/valgrind: 13-fix-printf Fix stupid bug I introduced into printf with 14-sprintf. 13-kill-1ifroot Kill VG_(get_current_tid_1_if_root)() and replace it with the slightly more appetising (though still hackish) VG_(get_current_or_recent_tid)(). This is intended for use when there's no thread actually loaded into the baseblock, but we're doing work on behalf of the the thread that was last running (such as during a syscall). This probably fixes a bug with Helgrind mis-attributing memory created with mmap to thread 1 rather than the thread which called mmap (though the behaviour is still probably wrong: mmapped memory should be magically_inited). 14-hg-mmap-magic-virgin This does two things: 1. change the signatures of the new_mem_mmap and change_mem_mprotect functions to remove the pointless 'nn' argument. This makes them match the signature of new_mem_startup... 2. change helgrind to mark memory created by mmap as if it were the same as other magically pre-inited memory. Implement this by pointing helgrind's new_mem_mmap function at new_mem_startup. 16-function-intercept Implement a more reliable for vg_libpthread to intercept libc calls. Since the only reliable way of making sure that our code defines the symbol is by making sure that valgrind.so itself does it, this patch adds a new file, vg_intercept.so, which defines those symbols. They are then passed off to a weak local function if libpthread isn't present, or to the libpthread version if it is. 18-recv-nonblock Make recv() nonblocking 19-hg-lockgraph HELGRIND: large patch which does a big bugfix and adds some new instrumentation: 1. The bugfix is BIG. Previously the code which maintained the thread's current lockset would often (maybe always) fail to add new locks to the set, so it always looked like threads were holding one lock. The problem was in weird_LockSet_equals(); I rewrote it in a way which should be obviously correct. Fixing this exposed a bug in removing locks from a thread's lockset, which was also caused by another bug in weird_LockSet_equals(). This fix makes many spurious data race warnings go away (notably, stdio becomes silent). 2. The new feature is tracking of the order of lock usage. If threads are taking locks in an inconsistent order, that's a symptom of possible deadlock. Helgrind will now warn when it sees this happening (though the warnings themselves need to be improved). |
|
From: Jeremy F. <je...@go...> - 2002-10-23 15:45:40
|
On Wed, 2002-10-23 at 05:55, Nicholas Nethercote wrote:
On 21 Oct 2002, Jeremy Fitzhardinge wrote:
> > As I remember it, the root thread is #1, and if there's no tid in the base
> > block then it must be the root thread. This could be wrong.
>
> Yes, I don't think that assumption is good. If there's no thread in the
> baseBlock, then there's no current (virtual machine) thread running at
> all.
So then what do you do in that case?
A hack. I removed VG_(get_current_tid_1_if_root)() and replaced it with
VG_(get_current_or_recent_tid)(). It returns either the current tid, or
if there is none, the most recently current tid. This works for the
cases where we're acting on behalf of the most recently running thread,
such as when we're doing things in syscalls. A more correct solution
might be to always pass around a ThreadState, but that would require
changing the interface of almost all the tracking functions.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-10-23 12:55:36
|
On 21 Oct 2002, Jeremy Fitzhardinge wrote: > > As I remember it, the root thread is #1, and if there's no tid in the base > > block then it must be the root thread. This could be wrong. > > Yes, I don't think that assumption is good. If there's no thread in the > baseBlock, then there's no current (virtual machine) thread running at > all. So then what do you do in that case? N |
|
From: Julian S. <js...@ac...> - 2002-10-22 13:25:10
|
> All the complaints about dl_num_relocations have disappeared; > any idea how/why? (am not complaining; just curious). > > 18-hg-err-reporting adds a change which puts memory locations which have > had an error reported into an error state (Vg_Excl, with the magic tid > of TID_INDICATING_ALL; the intended meaning is "exclusively owned by > everyone"). It should report everything at least once. Ha; I understand. Good trick. J |
|
From: Julian S. <js...@ac...> - 2002-10-22 13:24:14
|
Ok, it was just a sanity check. I'm assuming you've got this all figured out and will clean up and/or nuke get_current_tid_1_if_root as you see fit. J On Tuesday 22 October 2002 8:35 am, Jeremy Fitzhardinge wrote: > On Mon, 2002-10-21 at 21:45, Julian Seward wrote: > Nick: I spotted the infamous (?) VG_(get_current_tid_1_if_root) > and specifically this: > > if (0 == vg_tid_currently_in_baseBlock) > return 1; /* root thread */ > > What's the meaning of the 0 here? Where is it set? AFAICS the only > valid values of vg_tid_currently_in_baseBlock are either 1 .. > VG_N_THREADS or VG_INVALID_THREADID, so I guess I'm missing something here? > > No, I don't think so. I don't think that function means anything at > all. I was very close to removing it altogether. > > I don't feel like I'm seeing a consistent story I'm happy with re > vg_tid_currently_in_baseBlock -- can you two clarify? > > I put the stronger assert into save_thread_state() because I couldn't > see any good reason why the tid argument should ever mismatch the value > of vg_tid_currently_in_baseBlock. The assertion failed on the first > thread create, but I think my fix makes sense. Thread creation copies > the parent thread's context into the child via the baseBlock. In a > sense the ownership of the baseBlock changes during the copy, which is > what the assignment to vg_tid_currently_in_baseBlock indicates: > > VG_(load_thread_state)(parent_tid); /* load parent thread state into > baseBlock */ + vg_tid_currently_in_baseBlock = tid; /* give ownership of > baseBlock to child */ VG_(save_thread_state)(tid); /* save new state into > new thread */ > > > It seems like, with that change, it will be impossible to > know from the value of vg_tid_currently_in_baseBlock whether or > not baseBlock is "full". > > No, because save_thread_state() assigns vg_tid_currently_in_baseBlock > withVG_INVALID_THREADID, so you know the baseBlock contains nothing > afterwards. It gets reassigned with a valid tid on load_thread_state(). > > J |