You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
1
(1) |
2
(8) |
3
(7) |
4
(16) |
5
|
|
6
(3) |
7
(4) |
8
(1) |
9
(1) |
10
(4) |
11
(5) |
12
(1) |
|
13
|
14
(4) |
15
(2) |
16
|
17
(2) |
18
(9) |
19
(5) |
|
20
(9) |
21
(7) |
22
(9) |
23
(5) |
24
|
25
(1) |
26
|
|
27
|
28
(1) |
29
(11) |
30
(6) |
31
|
|
|
|
From: Jeremy F. <je...@go...> - 2002-10-23 23:57:17
|
On Wed, 2002-10-23 at 15:58, Julian Seward wrote: > A busy day at goop.org, I see. Where in the physical universe are you > located, just out of curiosity? Yeah, not quite sure what's been happening; maybe one of the lists went mad. I'm an Australian living in San Francisco (so the notion of a physical universe is a little ill-defined). > I've just merged Quick work! I was just about to upload a new patch. (Now uploaded: 20-hg-lockgraph-report). BTW, it it possible to set up a SourceForge list which has CVS checkins? It would be useful. > 13-fix-printf > 13-kill-1ifroot > 14-hg-mmap-magic-virgin > 02-sysv-msg (I assume that 16-function-intercept makes this safe) Well, it was never unsafe, but I think it should work reliably. > 16-function-intercept > 18-recv-nonblock > 19-hg-lockgraph > > I also peered (again) at 09-rdtsc-calibration and think I'll merge that > too, although not this evening. Any thoughts about 00-lazyfp or 01-partial-mul? And I'd quite like vgprof to be a core skin. > Thanks for this hackery. pth_threadpool (my canonical small threading test) > now runs with fewer and fewer errors. Yes, the only error I'm getting out of it now is the initial hit on _dl_num_relocations. > Am amused to see you had a go at weird_LockSet_equals(). Nick wrote the > first version; I found it buggy and completely rewrote it; and so on ... > Hopefully 3rd time lucky. Am a bit surprised; I spent ages trying to > convince myself my version was right :) Yes, I poked about the the CVS history and noticed it had already been beaten up a bit already. I hope you'll agree that my version is obviously correct (in the normal highly qualified sense of "obvious"). The two bugs were 1) it returned equal if 'a' and 'b' were the same and missing_mutex was greater than the last element in 'a' and 'b' and 2) the check which ignored missing_mutex if it were already in 'a' broke removal (though that doesn't sound right now that I describe it). > Anyway: > > I wonder if I can encourage you to use mozilla-1.0 (standard binary install > from mozilla.org) as a stress test? I'm doing > > valgrind -v --error-limit=no --skin=helgrind --trace-children=yes mozilla > > and just exiting it as soon as it comes up. I'll give it a go. > It generates thousands of errors, and I'm suspicious -- if I trust anyone to > make a large threaded app and do it right, it's the mozilla people. That's trusting. > One > very common thing is this (kmail apologises for making it ugly): > > ==12966== Possible data race writing variable at 0x438A868C > ==12966== at 0x439BCA35: inflate_blocks (in > /mnt/globe/Apps/mozilla-1.0/libmozz.so) > ==12966== by 0x439BBDF1: inflate (in > /mnt/globe/Apps/mozilla-1.0/libmozz.so) > ==12966== by 0x44FC6E49: (within > /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) > ==12966== by 0x44FC5F5B: (within > /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) > ==12966== Previous state: shared RW, locked by: 0x45540E90 > ==12966== Address 0x438A868C is 1044 bytes inside a block of size 1280 > alloc'd by thread 1 at > ==12966== at 0x4009AAED: calloc (vg_clientfuncs.c:242) > ==12966== by 0x4027C7A1: nsRecyclingAllocator::Malloc(unsigned int, int) > (in /mnt/globe/Apps/mozilla-1.0/libxpcom.so) > ==12966== by 0x44FC5CAE: (within > /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) > ==12966== by 0x439BC812: inflate_blocks (in > /mnt/globe/Apps/mozilla-1.0/libmozz.so) > > My question is: memory allocated by calloc(), what state should it start > out in? Surely it should be in some non-shared state (not shared RW as > claimed here) ? No. I should be exclusive to the thread which did the allocation (or maybe magic virgin). That said, the name "nsRecyclingAllocator::Malloc" suggests that the memory has been used by someone else before, and has been released back into a pool before being used by inflate/inflate_blocks; that would need an client call to tell helgrind to reset the memory state. If a lot of Moz's memory has been got from that allocator, I'd expect a lot of spurious errors. > Mozilla has just finished -- generating 13975 errors in 13924 contexts -- > mostly duplicates of a small handful of errors. Incidentally, 1164 different > lock sets were required, so I raised M_LOCKSET_TABLE to 5000 (and committed > it). Most of the locksets were small (1, 2 or 3 elems), but some got quite > large; from a quick scan the biggest is > > [1124] = { 0x4365A748 0x4365E424 0x436650F4 0x4366B6C8 0x4374088C > 0x437F94E8 0x438214B4 0x43826378 0x4382696C 0x4384CF20 > 0x438A28E8 0x438A3424 0x45377038 0x45377130 0x45377228 > 0x45377394 0x462D8E10 0x46C2CB98 0x46C9F474 0x473A20FC > 0x473A407C } > > Dunno if that's remotely interesting or useful, but anyway. Yes, it is. A near-term plan I have is to reimplement the lockset stuff entirely: replace the table with a structural hash, and replace the lists with arrays. That should speed things up quite a bit (but I wouldn't have bothered if there weren't many sets, or if they don't get large). I think the lock cycle test makes quite large locksets, because it keeps computing the union of a thread's current lockset and the mutex's previous dependent locks. > Oh yes, one other thing. I can't figure out the numbering scheme for > your patches. At first I thought they were were a simple patch counter, > but then the appearance of (16-function-intercept.patch, 16-ld-nodelete.patch) > and (18-hg-err-reporting.patch, 18-recv-nonblock.patch) made me discard > that idea. Or are you reusing numbers for patches I've merged? Basically. For any given set of unmerged patches, I just number them to keep their lexical order the same as order they should be applied in. Generally when I start a new patch I pick the next number up, but occasionally I get started and decide I need a prerequisite patch, so I choose an available lower number (which may well recycle a number). J |
|
From: Julian S. <js...@ac...> - 2002-10-23 22:52:49
|
A busy day at goop.org, I see. Where in the physical universe are you located, just out of curiosity? I've just merged 13-fix-printf 13-kill-1ifroot 14-hg-mmap-magic-virgin 02-sysv-msg (I assume that 16-function-intercept makes this safe) 16-function-intercept 18-recv-nonblock 19-hg-lockgraph I also peered (again) at 09-rdtsc-calibration and think I'll merge that too, although not this evening. > The only ugly/controversial thing here is what I had to do to get glibc > intercepts actually working correctly. Unfortunately, it doesn't seem > to be true that libpthread's symbols are used in preference to glibc all > the time, even if valgrind.so itself is linked with libpthread. The > only reliable way I could find of making it work is by having > valgrind.so itself define the symbols we want to catch. Ah well. I can't think of any better solution. Thanks for this hackery. pth_threadpool (my canonical small threading test) now runs with fewer and fewer errors. Am amused to see you had a go at weird_LockSet_equals(). Nick wrote the first version; I found it buggy and completely rewrote it; and so on ... Hopefully 3rd time lucky. Am a bit surprised; I spent ages trying to convince myself my version was right :) Anyway: I wonder if I can encourage you to use mozilla-1.0 (standard binary install from mozilla.org) as a stress test? I'm doing valgrind -v --error-limit=no --skin=helgrind --trace-children=yes mozilla and just exiting it as soon as it comes up. It generates thousands of errors, and I'm suspicious -- if I trust anyone to make a large threaded app and do it right, it's the mozilla people. One very common thing is this (kmail apologises for making it ugly): ==12966== Possible data race writing variable at 0x438A868C ==12966== at 0x439BCA35: inflate_blocks (in /mnt/globe/Apps/mozilla-1.0/libmozz.so) ==12966== by 0x439BBDF1: inflate (in /mnt/globe/Apps/mozilla-1.0/libmozz.so) ==12966== by 0x44FC6E49: (within /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) ==12966== by 0x44FC5F5B: (within /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) ==12966== Previous state: shared RW, locked by: 0x45540E90 ==12966== Address 0x438A868C is 1044 bytes inside a block of size 1280 alloc'd by thread 1 at ==12966== at 0x4009AAED: calloc (vg_clientfuncs.c:242) ==12966== by 0x4027C7A1: nsRecyclingAllocator::Malloc(unsigned int, int) (in /mnt/globe/Apps/mozilla-1.0/libxpcom.so) ==12966== by 0x44FC5CAE: (within /mnt/globe/Apps/mozilla-1.0/components/libjar50.so) ==12966== by 0x439BC812: inflate_blocks (in /mnt/globe/Apps/mozilla-1.0/libmozz.so) My question is: memory allocated by calloc(), what state should it start out in? Surely it should be in some non-shared state (not shared RW as claimed here) ? Mozilla has just finished -- generating 13975 errors in 13924 contexts -- mostly duplicates of a small handful of errors. Incidentally, 1164 different lock sets were required, so I raised M_LOCKSET_TABLE to 5000 (and committed it). Most of the locksets were small (1, 2 or 3 elems), but some got quite large; from a quick scan the biggest is [1124] = { 0x4365A748 0x4365E424 0x436650F4 0x4366B6C8 0x4374088C 0x437F94E8 0x438214B4 0x43826378 0x4382696C 0x4384CF20 0x438A28E8 0x438A3424 0x45377038 0x45377130 0x45377228 0x45377394 0x462D8E10 0x46C2CB98 0x46C9F474 0x473A20FC 0x473A407C } Dunno if that's remotely interesting or useful, but anyway. Oh yes, one other thing. I can't figure out the numbering scheme for your patches. At first I thought they were were a simple patch counter, but then the appearance of (16-function-intercept.patch, 16-ld-nodelete.patch) and (18-hg-err-reporting.patch, 18-recv-nonblock.patch) made me discard that idea. Or are you reusing numbers for patches I've merged? J |
|
From: Jeremy F. <je...@go...> - 2002-10-23 20:47:52
|
A new set of patches, including a big bugfix in how Helgrind keeps track of a thread's current lock set (it does now). There's a description of the fix below in 19-hg-lockgraph. The only ugly/controversial thing here is what I had to do to get glibc intercepts actually working correctly. Unfortunately, it doesn't seem to be true that libpthread's symbols are used in preference to glibc all the time, even if valgrind.so itself is linked with libpthread. The only reliable way I could find of making it work is by having valgrind.so itself define the symbols we want to catch. 16-function-intercept is my implementation of this. Basically I added a new file, vg_intercept.c, which contains stubs for the functions we want to catch; the stub for x calls VGL_(x). vg_intercept.c contains weakly defined default implementations for VGL_(x), but vg_libpthread has strongly defined implementations for VGL_(x). This seems to be reliable in catching all libc references, regardless of symbol definition strength. It's also pretty ugly, and I don't really like it, but I can't see a nicer way. (A related alternative approach which would be even more reliable would be to have a set of pointers to functions, which the stubs call into. Initially it would point to the non-threaded versions of the functions, but when libpthread.so is loaded, it installs new pointers to the threaded functions. This allows rebinding on the fly, and doesn't need to trust the dynamic linker to get anything tricky right). J http://www.goop.org/~jeremy/valgrind: 13-fix-printf Fix stupid bug I introduced into printf with 14-sprintf. 13-kill-1ifroot Kill VG_(get_current_tid_1_if_root)() and replace it with the slightly more appetising (though still hackish) VG_(get_current_or_recent_tid)(). This is intended for use when there's no thread actually loaded into the baseblock, but we're doing work on behalf of the the thread that was last running (such as during a syscall). This probably fixes a bug with Helgrind mis-attributing memory created with mmap to thread 1 rather than the thread which called mmap (though the behaviour is still probably wrong: mmapped memory should be magically_inited). 14-hg-mmap-magic-virgin This does two things: 1. change the signatures of the new_mem_mmap and change_mem_mprotect functions to remove the pointless 'nn' argument. This makes them match the signature of new_mem_startup... 2. change helgrind to mark memory created by mmap as if it were the same as other magically pre-inited memory. Implement this by pointing helgrind's new_mem_mmap function at new_mem_startup. 16-function-intercept Implement a more reliable for vg_libpthread to intercept libc calls. Since the only reliable way of making sure that our code defines the symbol is by making sure that valgrind.so itself does it, this patch adds a new file, vg_intercept.so, which defines those symbols. They are then passed off to a weak local function if libpthread isn't present, or to the libpthread version if it is. 18-recv-nonblock Make recv() nonblocking 19-hg-lockgraph HELGRIND: large patch which does a big bugfix and adds some new instrumentation: 1. The bugfix is BIG. Previously the code which maintained the thread's current lockset would often (maybe always) fail to add new locks to the set, so it always looked like threads were holding one lock. The problem was in weird_LockSet_equals(); I rewrote it in a way which should be obviously correct. Fixing this exposed a bug in removing locks from a thread's lockset, which was also caused by another bug in weird_LockSet_equals(). This fix makes many spurious data race warnings go away (notably, stdio becomes silent). 2. The new feature is tracking of the order of lock usage. If threads are taking locks in an inconsistent order, that's a symptom of possible deadlock. Helgrind will now warn when it sees this happening (though the warnings themselves need to be improved). |
|
From: Jeremy F. <je...@go...> - 2002-10-23 15:45:40
|
On Wed, 2002-10-23 at 05:55, Nicholas Nethercote wrote:
On 21 Oct 2002, Jeremy Fitzhardinge wrote:
> > As I remember it, the root thread is #1, and if there's no tid in the base
> > block then it must be the root thread. This could be wrong.
>
> Yes, I don't think that assumption is good. If there's no thread in the
> baseBlock, then there's no current (virtual machine) thread running at
> all.
So then what do you do in that case?
A hack. I removed VG_(get_current_tid_1_if_root)() and replaced it with
VG_(get_current_or_recent_tid)(). It returns either the current tid, or
if there is none, the most recently current tid. This works for the
cases where we're acting on behalf of the most recently running thread,
such as when we're doing things in syscalls. A more correct solution
might be to always pass around a ThreadState, but that would require
changing the interface of almost all the tracking functions.
J
|
|
From: Nicholas N. <nj...@ca...> - 2002-10-23 12:55:36
|
On 21 Oct 2002, Jeremy Fitzhardinge wrote: > > As I remember it, the root thread is #1, and if there's no tid in the base > > block then it must be the root thread. This could be wrong. > > Yes, I don't think that assumption is good. If there's no thread in the > baseBlock, then there's no current (virtual machine) thread running at > all. So then what do you do in that case? N |