You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
1
(4) |
|
2
(5) |
3
(3) |
4
(3) |
5
(7) |
6
(7) |
7
(9) |
8
(10) |
|
9
(12) |
10
(26) |
11
(9) |
12
(6) |
13
(7) |
14
(15) |
15
(25) |
|
16
(20) |
17
(32) |
18
(11) |
19
(19) |
20
(22) |
21
(6) |
22
(8) |
|
23
(16) |
24
(25) |
25
(11) |
26
(16) |
27
(12) |
28
(15) |
29
(11) |
|
30
(5) |
31
(8) |
|
|
|
|
|
|
From: Eyal L. <ey...@ey...> - 2005-01-20 22:38:20
|
Jeremy Fitzhardinge wrote:
[trimmed]
> I'll note that both FC2 and SUSE 9.2 2.6 kernels seem to show sporadic
> problems with delivering signals without proper siginfo information.
> That will cause your program to spontaneously SIGSEGV when it tries to
> grow the stack, which almost every program will need to do. The kernel
> will stay in this state for some indeterminate amount of time, but then
> will spontaneously start working again.
Some further observations. I ran
strace valgrind --tool=memcheck date >date.strace 2>&1
when a crash is reported, and then when one is not. Comparing the
two logs (attached) I note the point when the two diverge:
non-crashing
============
fstat(3, {st_mode=S_IFREG|0644, st_size=78233, ...}) = 0
readlink("/proc/self/fd/3", "/lib/tls/libpthread-0.60.so", 4096) = 27
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
gettid() = 31890
old_mmap(0x52bfd000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x52bfd000
crashing
========
fstat(3, {st_mode=S_IFREG|0644, st_size=78233, ...}) = 0
readlink("/proc/self/fd/3", "/lib/tls/libpthread-0.60.so", 4096) = 27
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
gettid() = 31744
gettid() = 31744
old_getrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=2147483647}) = 0
getpid() = 31744
write(1016, "==31744== \n", 11==31744==
) = 11
getpid() = 31744
write(1016, "==31744== Process terminating wi"..., 73==31744== Process terminating with default action of signal 11 (SIGSEGV)
) = 73
I note that there is always a "SIGSEGV (Segmentation fault)"
present, even in a good run. It is the reaction to it that
differs. Is it possible that there is a special SIGSEGV
(in vg or glibc) that is overloaded (not a real segfault)
and should be handled specially?
BTW, The 'good' run was done after a fresh boot, to ensure
the kernel is not in any 'funny' state. Just in case.
--
Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/>
attach .zip as .dat
|
|
From: Johannes S. <Joh...@gm...> - 2005-01-20 14:17:51
|
Hi, I once had the same problem as you: I updated from a CVS project, and the program broke. As it was a year since I had updated, I decided to do a binary search through the patches, but soon found out that I am not good enough in book-keeping. So I wrote the attached shell script. You just call it from the root directory of valgrind, and it asks a few questions, then tries to check out a version "in the middle between known good and known bad" until it can compile cleanly. It then exits and asks you to run a test. Then you call it again and tell it if the version is good or bad, and the script again tries to check out a middle version, and so on. Eventually it will pin down the patch (or a set of patches if compilation fails between those patches) which broke the feature you need. Maybe you find this useful... Ciao, Dscho |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 12:39:38
|
Eyal Lebedinsky wrote: > Eyal Lebedinsky wrote: > Same thing with vanilla 2.6.10. Same thing with vanilla 2.6.11-rc1-bk7. FYI -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Tom H. <th...@cy...> - 2005-01-20 12:21:37
|
CVS commit by thughes: Update .cvsignore files. M +1 -0 memcheck/tests/.cvsignore 1.30 M +2 -0 none/tests/.cvsignore 1.25 --- valgrind/none/tests/.cvsignore #1.24:1.25 @@ -50,5 +50,7 @@ shorts sigcontext +sigstackgrowth smc1 +stackgrowth susphello syscall-restart1 --- valgrind/memcheck/tests/.cvsignore #1.29:1.30 @@ -46,4 +46,5 @@ null_socket overlap +post-syscall realloc1 realloc2 |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 08:58:06
|
Eyal Lebedinsky wrote: > Jeremy Fitzhardinge wrote: > [trimmed] > >> I have never seen this with stock kernel.org kernels. Is your kernel a >> Debian-supplied one, or one you've built yourself? > > > I build my own kernels, I now am on > > $ uname -a > Linux e7 2.6.10-ac9 #1 SMP Fri Jan 14 08:56:38 EST 2005 i686 GNU/Linux > > Just to remove this worry I will now boot into 2.6.10. Same thing with vanilla 2.6.10. This time, when my tests hang and zz35 fails, I did killall -9 valgrind ; sh zz35.sh and zz35 succeeded. In other words, the killall (which killed my hanging testsuit programs running under vg) immediately fixed the situation. No delay. FYI -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 08:21:19
|
Jeremy Fitzhardinge wrote: [trimmed] > I'll note that both FC2 and SUSE 9.2 2.6 kernels seem to show sporadic > problems with delivering signals without proper siginfo information. > That will cause your program to spontaneously SIGSEGV when it tries to > grow the stack, which almost every program will need to do. The kernel > will stay in this state for some indeterminate amount of time, but then > will spontaneously start working again. > > J In case it is the same thing, let me describe how I tested it just now. I have a small test zz35.sh (attached) that simple creates an uninited error, and which I use to see that I get a proper backtrace. I now use it to investigate a different problem where my regression testsuit hangs after a number of successful runs and will not proceed until I shutdown all my servers (thud stopping all valgrind instances). - run my tests. It should do 11-12 of them before failing - wait for my tests to hang - run my zz35 which fails sig 11 (zz35-sig11.log). It fails consistently for as long as I want. - 'killall -9 valgrind' to release my failed tests/servers - without any waiting run my zz35 which works OK again (zz35-ok.log) So, stopping the running valgrind instances allowed zz35 to run OK. There does not seem to be a period where the kernel is in 'a mood', but rather one needs to ensure all valgrind instances are stopped. Which suggests that some sort of global resource (internal to vg) is associated with the failure. Naturally, it could still be a kernel bug that does this. This is with vanilla 2.6.10-ac9. -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 07:33:16
|
Jeremy Fitzhardinge wrote: [trimmed] > I have never seen this with stock kernel.org kernels. Is your kernel a > Debian-supplied one, or one you've built yourself? I build my own kernels, I now am on $ uname -a Linux e7 2.6.10-ac9 #1 SMP Fri Jan 14 08:56:38 EST 2005 i686 GNU/Linux Just to remove this worry I will now boot into 2.6.10. > J -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Jeremy F. <je...@go...> - 2005-01-20 06:45:46
|
On Thu, 2005-01-20 at 13:06 +1100, Eyal Lebedinsky wrote: > I will repeat - this was not a problem until recently. I am rather sure the stable 2.2.0 > gives good backtraces. Oh, I believe you, but I don't think anything has change recently which would have affected this; at least not for calloc. > I would like to offer another observation. I just created a simple program > in an attempt to demonstrate the laconic report problem. Instead, it crashed > (sig 11) on a return. > > After repeating it a few times, I noticed that my big test is hanging again. > I killed it and deleted the semaphore it hold (somehow it is never released > after a crash). > > The tiny test program now works (no sig 11). > > Is it possible that vg uses some semaphore that all instances share and it > gets into trouble after a while? My test suit always fails after a number of > tests finish successfully, and every program thereafter gets sig 11. Every > single valgrind run. If I kill everything (and remove [ipcrm -s] my own > semaphore that my tests use) then I can continue with the tests (well, at > least for a while). Valgrind doesn't use semaphores itself, and it should just be passing your syscalls through to the kernel untouched. It also respects the CLONE_SYSVSEMA flag, so that should be OK. I'll note that both FC2 and SUSE 9.2 2.6 kernels seem to show sporadic problems with delivering signals without proper siginfo information. That will cause your program to spontaneously SIGSEGV when it tries to grow the stack, which almost every program will need to do. The kernel will stay in this state for some indeterminate amount of time, but then will spontaneously start working again. You can test for this state by running none/test/faultstatus (natively, not under Valgrind). If it doesn't pass everything, then your kernel is in a buggy state. I have never seen this with stock kernel.org kernels. Is your kernel a Debian-supplied one, or one you've built yourself? J |
|
From: Robert W. <rj...@du...> - 2005-01-20 06:27:06
|
I ran the two failing signal tests (corecheck/tests/sigkill and none/tests/exec-sigmask) outside of Valgrind on my FC2 machine. The results I got were the same as when I ran it under Valgrind. However, the .exp files were different, so the tests were marked as failed. Basically, signal 32 seems to be handled differently in both real life and under Valgrind to how the .exp files expect it to be. Is it time to update the .exp files, or is something else going on here? FWIW, here's the .diff files I get: *** sigkill.stderr.exp 2005-01-19 22:01:41.043293766 -0800 --- sigkill.stderr.out 2005-01-19 22:10:11.181185774 -0800 *************** *** 99,100 **** ! setting signal 32: Success ! getting signal 32: Success --- 99,100 ---- ! setting signal 32: Invalid argument ! getting signal 32: Invalid argument *** exec-sigmask.stdout.exp 2005-01-19 22:01:46.761629837 -0800 --- exec-sigmask.stdout.out 2005-01-19 22:11:14.487860761 -0800 *************** *** 0 **** --- 1 ---- + full: signal 32 missing from mask Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Tom H. <to...@co...> - 2005-01-20 03:24:33
|
Nightly build on dunsmere ( Fedora Core 3 ) started at 2005-01-20 03:20:03 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow seg_override: valgrind --num-callers=4 ./seg_override -- Finished tests in none/tests/x86 ------------------------------------ yield: valgrind --num-callers=4 ./yield -- Finished tests in none/tests ---------------------------------------- == 200 tests, 12 stderr failures, 0 stdout failures ================= helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) massif/tests/toobig-allocs (stderr) massif/tests/true_html (stderr) massif/tests/true_text (stderr) memcheck/tests/scalar (stderr) memcheck/tests/scalar_supp (stderr) memcheck/tests/vgtest_ume (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2005-01-20 03:21:46
|
Nightly build on audi ( Red Hat 9 ) started at 2005-01-20 03:15:03 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) massif/tests/toobig-allocs (stderr) massif/tests/true_html (stderr) massif/tests/true_text (stderr) memcheck/tests/badpoll (stderr) memcheck/tests/buflen_check (stderr) memcheck/tests/execve (stderr) memcheck/tests/execve2 (stderr) memcheck/tests/scalar (stderr) memcheck/tests/scalar_exit_group (stderr) memcheck/tests/scalar_supp (stderr) memcheck/tests/writev (stderr) none/tests/tls (stdout) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2005-01-20 03:14:52
|
Nightly build on ginetta ( Red Hat 8.0 ) started at 2005-01-20 03:10:05 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow seg_override: valgrind --num-callers=4 ./seg_override -- Finished tests in none/tests/x86 ------------------------------------ yield: valgrind --num-callers=4 ./yield -- Finished tests in none/tests ---------------------------------------- == 198 tests, 12 stderr failures, 0 stdout failures ================= helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) massif/tests/toobig-allocs (stderr) massif/tests/true_html (stderr) massif/tests/true_text (stderr) memcheck/tests/pth_once (stderr) memcheck/tests/scalar (stderr) memcheck/tests/threadederrno (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2005-01-20 03:09:17
|
Nightly build on alvis ( Red Hat 7.3 ) started at 2005-01-20 03:05:02 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow yield: valgrind --num-callers=4 ./yield -- Finished tests in none/tests ---------------------------------------- == 198 tests, 14 stderr failures, 0 stdout failures ================= helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) massif/tests/toobig-allocs (stderr) massif/tests/true_html (stderr) massif/tests/true_text (stderr) memcheck/tests/post-syscall (stderr) memcheck/tests/pth_once (stderr) memcheck/tests/scalar (stderr) memcheck/tests/threadederrno (stderr) memcheck/tests/vgtest_ume (stderr) make: *** [regtest] Error 1 |
|
From: Tom H. <th...@cy...> - 2005-01-20 03:05:07
|
Nightly build on standard ( Red Hat 7.2 ) started at 2005-01-20 03:00:03 GMT Checking out source tree ... done Configuring ... done Building ... done Running regression tests ... done Last 20 lines of log.verbose follow -- Finished tests in none/tests/x86 ------------------------------------ yield: valgrind --num-callers=4 ./yield -- Finished tests in none/tests ---------------------------------------- == 198 tests, 13 stderr failures, 0 stdout failures ================= helgrind/tests/allok (stderr) helgrind/tests/deadlock (stderr) helgrind/tests/inherit (stderr) helgrind/tests/race (stderr) helgrind/tests/race2 (stderr) helgrind/tests/readshared (stderr) massif/tests/toobig-allocs (stderr) massif/tests/true_html (stderr) massif/tests/true_text (stderr) memcheck/tests/pth_once (stderr) memcheck/tests/scalar (stderr) memcheck/tests/threadederrno (stderr) memcheck/tests/vgtest_ume (stderr) make: *** [regtest] Error 1 |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 02:07:10
|
Jeremy Fitzhardinge wrote: > On Thu, 2005-01-20 at 11:32 +1100, Eyal Lebedinsky wrote: > >>For vg I do a different build than normal. I build with '-O0' and nothing >>else (just some extra warn requests): >> -W -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wconversion -Wredundant-decls -ansi -D_XOPEN_SOURCE=1 -D_GNU_SOURCE=1 -O0 -fno-inline -g > > > No, I mean when building Valgrind itself. I do ./autogen.sh || exit 1 ./configure || exit 1 make || exit 1 make install || exit 1 If the defaults are unsuitable then I would have a bad build. Here is a snippet of a build: if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I../coregrind -I../coregrind -I../coregrind/x86 \ -I../coregrind/linux -I../coregrind/x86-linux -I../include -I../include \ -I../include/x86 -I../include/linux -I../include/x86-linux \ -DVG_LIBDIR="\"/usr/local/lib/valgrind"\" -I./demangle -DKICKSTART_BASE=0xb0000000 \ -DVG_PLATFORM="\"x86-linux"\" -Winline -Wall -Wshadow -O -g -mpreferred-stack-boundary=2 \ -DELFSZ=32 -MT stage2-vg_dummy_profile.o -MD -MP -MF ".deps/stage2-vg_dummy_profile.Tpo" \ -c -o stage2-vg_dummy_profile.o `test -f 'vg_dummy_profile.c' || echo './'`vg_dummy_profile.c then mv -f ".deps/stage2-vg_dummy_profile.Tpo" ".deps/stage2-vg_dummy_profile.Po" else rm -f ".deps/stage2-vg_dummy_profile.Tpo" exit 1 fi I will repeat - this was not a problem until recently. I am rather sure the stable 2.2.0 gives good backtraces. > J I would like to offer another observation. I just created a simple program in an attempt to demonstrate the laconic report problem. Instead, it crashed (sig 11) on a return. After repeating it a few times, I noticed that my big test is hanging again. I killed it and deleted the semaphore it hold (somehow it is never released after a crash). The tiny test program now works (no sig 11). Is it possible that vg uses some semaphore that all instances share and it gets into trouble after a while? My test suit always fails after a number of tests finish successfully, and every program thereafter gets sig 11. Every single valgrind run. If I kill everything (and remove [ipcrm -s] my own semaphore that my tests use) then I can continue with the tests (well, at least for a while). -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Jeremy F. <je...@go...> - 2005-01-20 01:13:22
|
On Thu, 2005-01-20 at 11:32 +1100, Eyal Lebedinsky wrote: > For vg I do a different build than normal. I build with '-O0' and nothing > else (just some extra warn requests): > -W -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wconversion -Wredundant-decls -ansi -D_XOPEN_SOURCE=1 -D_GNU_SOURCE=1 -O0 -fno-inline -g No, I mean when building Valgrind itself. > > Oh, and that you're not using --num-callers=1. > > I use '--num-callers=32' which I find good enough. Yeah, I was pretty sure that wasn't it, but its always worth checking... J |
|
From: Jeremy F. <je...@go...> - 2005-01-20 01:10:13
|
On Thu, 2005-01-20 at 00:44 +0000, Julian Seward wrote: > > The only problem then is the longjmp/exception case. > > Do we even need to handle this case, for libpthread? For that matter, > can we also ignore recursion? We need to deal with taking a signal while blocked in a pthread function; if the signal handler longjmps, it's as if the pthread function did. Hm, and pthread_cancel ends up invoking gcc's exception unwinding machinery, so it effectively looks like a C++ exception. Don't know about recursion, but I think we've been burned enough to not rule it out. Or pthreads functions calling each other. J |
|
From: Jeremy F. <je...@go...> - 2005-01-20 01:04:17
|
On Thu, 2005-01-20 at 00:00 +0000, Julian Seward wrote: > Who writes wrap_before_func? That has to understand the baseblock layout > and also the calling conventions to extract esp and retaddr, and so is going > to be machine specific. It's part of the core. It isn't a per-wrapper piece of code, it's a helper (ie, called something like VG_(wrapper_before_helper), and would be a pretty small piece of assembler). The actual wrapper functions are ordinary-looking pieces of C. > That will change drastically .. the new JIT (1) translates multiple BBs > at a time, Well, the BB's we're talking about here are 1) the first BB of a function and 2) the BB at the return address. Under normal circumstances, they're not going to get coalesced with other BBs anyway, I would have thought. But we're going to need a mechanism to inhibit BBs from being coalesced anyway, I think (for debugger support). > and (2) actually doesn't do translation chaining as I could > not think of a clean way to do this portably. You're hoping that coalesced BBs will make up the performance difference? What's the difficulty? It isn't something which could be implemented per-target? > The proposal leaves me with a nasty feeling that it will introduce all > sorts of complex inter-component dependencies and generally be a > maintenance and portability problem later. I actually think its pretty clean that way. By keeping it all on the real CPU rather than in virtual space, we avoid falling into a bunch of ratholes we just escaped from. > I would prefer a solution which didn't involve so much magic in the JIT. Well, there's a little bit of magic in calling a helper for the before wrapper, which doesn't really count. The patching-in of the exit wrapper is a bit tricky, but in the worst case we can always generate a space to patch into. Or regenerate the BBs. > Why do we need general function wrapping? Currently all we care about > is intercepting libpthread calls. Well, that's the immediate concern. But I think there's a lot of other things we could do with wrappers. For example, I think we should consider wrapping client mallocs rather than replacing them outright. We already have the problem that ld.so and glibc each have their own copies of malloc() and friends, and assume that they can operate of each other's pointers. I think we're OK in that case, but its just one instance where being functionally correct requires 100% coverage; with wrapping, we could miss a few cases, and it wouldn't be the end of the world. And Tools like massif don't need a special malloc at all; it only cares about observing the mallocs a program does, with no further checks. > I would prefer to write, in C, a > libpthread stub library, and use the existing intercept mechanism to > route all calls there. The stub library emits events -- using the > client request mechanism -- to those who want to know, and calls onwards > to the real pthread functions (my hands wave here). Right. I thought about that a lot, and it basically comes down to being able to distinguish between an "outside" call to a wrapped function, which needs to be directed to the wrapper, and an "inside" call, which is from the wrapper to the real function, and making that work if the wrapped function is recursive. I can think of a bunch of hacks (look at the callsite, and see if its within the wrapper), but it just seems cleaner to me to keep all this out of the virtual space. And I'm feeling a bit allergic to stub libraries and so on. We're still depending on LD_PRELOAD/LD_LIBRARY path tricks to get that code into the client space, and I'd like to minimize, or even eliminate, that. J |
|
From: Julian S. <js...@ac...> - 2005-01-20 00:44:55
|
> That's more like how I had envisaged function wrapping working. Use > the existing intercept machinery to redirect the original function > call, somehow passing the original function address as we do so. > > The wrapper would then call the real function, ensuring that this > time the address didn't get redirected during translation. It would > then get control again when the real function returned. Exactly. This is the point I arrived at. The only problem -- and one I cannot immediately see a clean solution for -- is how to know what the real (non-redirected) function address is. > The only problem then is the longjmp/exception case. Do we even need to handle this case, for libpthread? For that matter, can we also ignore recursion? J |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 00:32:47
|
Jeremy Fitzhardinge wrote: > On Thu, 2005-01-20 at 10:07 +1100, Eyal Lebedinsky wrote: > >>I get this report from a run: >> >>==2005-01-20 08:04:14.204 32619== Thread 9: >>==2005-01-20 08:04:14.220 32619== Syscall param socketcall.send(msg) points to uninitialised byte(s) >>==2005-01-20 08:04:14.220 32619== at 0x1C043A8E: send (in /lib/tls/libpthread-0.60.so) >>==2005-01-20 08:04:14.220 32619== Address 0x219C9749 is 57 bytes inside a block of size 12288 alloc'd >>==2005-01-20 08:04:14.220 32619== at 0x1B906FE5: calloc (vg_replace_malloc.c:175) >> >>I know that I am sending uninitialised data, but in the past I got >>a proper stack trace rather than just the 'send' message. Even the >>'calloc' message, without a stack, is not so helpful. >> >>Am I missing a new option? or is there a reason for this change? > > > I think libpthread is compiled with -fomit-frame-pointer, which makes it > hard to get good stack traces. I'm thinking about experimenting with > libunwind to see if we can use it for stack traces; it understands the > unwind info that gcc puts into new .o files, which should make it > possible to get good backtraces in these cases. > > I'm not sure why calloc isn't getting a bit more backtrace. Make sure > there are no -fomit-frame-pointers in the Valgrind makefiles. For vg I do a different build than normal. I build with '-O0' and nothing else (just some extra warn requests): -W -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wconversion -Wredundant-decls -ansi -D_XOPEN_SOURCE=1 -D_GNU_SOURCE=1 -O0 -fno-inline -g I should say I used to get the trace, this laconic report is recent. > Oh, and that you're not using --num-callers=1. I use '--num-callers=32' which I find good enough. > J -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> If attaching .zip rename to .dat |
|
From: Tom H. <th...@cy...> - 2005-01-20 00:10:28
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> Why do we need general function wrapping? Currently all we care about
> is intercepting libpthread calls. I would prefer to write, in C, a
> libpthread stub library, and use the existing intercept mechanism to
> route all calls there. The stub library emits events -- using the
> client request mechanism -- to those who want to know, and calls onwards
> to the real pthread functions (my hands wave here). No need to mess with
> calling conventions, guest state layout or magic run-time code modification.
That's more like how I had envisaged function wrapping working. Use
the existing intercept machinery to redirect the original function
call, somehow passing the original function address as we do so.
The wrapper would then call the real function, ensuring that this
time the address didn't get redirected during translation. It would
then get control again when the real function returned. The only
problem then is the longjmp/exception case.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Julian S. <js...@ac...> - 2005-01-20 00:00:19
|
> I think, however, that it is a > vast improvement over the outright functional bugs (and maintenance > problems) which vg_libpthread had. And certainly better than not > reporting anything as we do now. I agree. We should make this work if we can. > We could take advantage of the codegen. If we're generating code for > the first basic block of a wrapped function, we could generate in the > preamble: > call wrap_before_func > wrap_before_func would then be able to inspect %ESP and get both the > args and the return address. The value of TID+ESP+RETADDR will give us > a unique cookie key to match the call to the return. Who writes wrap_before_func? That has to understand the baseblock layout and also the calling conventions to extract esp and retaddr, and so is going to be machine specific. > Inserting the call to wrap_after_func at R is very easy; it doesn't even > require regenerating the BB. Currently, the first 16 bytes of each BB > is a preamble which is solely concerned with decrementing and testing > VG_(dispatch_ctr); we can easily do this in wrap_after_func, so we can > just patch over the preamble with the call to wrap_after_func (and nop > out the rest). That will change drastically .. the new JIT (1) translates multiple BBs at a time, and (2) actually doesn't do translation chaining as I could not think of a clean way to do this portably. ----------- The proposal leaves me with a nasty feeling that it will introduce all sorts of complex inter-component dependencies and generally be a maintenance and portability problem later. ----------- I would prefer a solution which didn't involve so much magic in the JIT. Why do we need general function wrapping? Currently all we care about is intercepting libpthread calls. I would prefer to write, in C, a libpthread stub library, and use the existing intercept mechanism to route all calls there. The stub library emits events -- using the client request mechanism -- to those who want to know, and calls onwards to the real pthread functions (my hands wave here). No need to mess with calling conventions, guest state layout or magic run-time code modification. ------------ The cookie idea seems like the kernel of something useful -- that is, a clean statement of the semantics of function wrapping in the presence of recursion, threads, and functions which don't necessarily return. J |