|
From: Jeremy F. <je...@go...> - 2005-01-27 07:16:51
|
On Thu, 2005-01-27 at 17:23 +1100, Eyal Lebedinsky wrote:
> Jeremy Fitzhardinge wrote:
> > Could you try again with --trace-signals=yes.
>
> Done. '3' is a good run, '4' is a failed run.
OK, this explains what's going on. It is the same problem as before -
it's getting a SIGSEGV to grow the stack, but it's missing siginfo so it
doesn't realize its a stack growth. Then it tries to report the
sigsegv, but dies while trying to print the backtrace.
So the question is, why are there lots of signals still being queued?
The code currently takes care to mop them up where ever the signals are
being sampled, but I guess that won't happen if the master is blocked in
a syscall... Hm.
Can you go into a bit of detail about the process/thread structure here?
Those two traces are from a single-threaded program, but for this bug to
occur (and be Valgrind's fault), there needs to be lots of thread
exiting going on. Is there a process where:
1. the main thread is blocked in a syscall, while
2. other thread(s) are creating lots of other short-lived threads?
J
|
|
From: Eyal L. <ey...@ey...> - 2005-01-27 08:35:20
|
Jeremy Fitzhardinge wrote: > On Thu, 2005-01-27 at 17:23 +1100, Eyal Lebedinsky wrote: > >>Jeremy Fitzhardinge wrote: >> >>>Could you try again with --trace-signals=yes. >> >>Done. '3' is a good run, '4' is a failed run. > > > OK, this explains what's going on. It is the same problem as before - > it's getting a SIGSEGV to grow the stack, but it's missing siginfo so it > doesn't realize its a stack growth. Then it tries to report the > sigsegv, but dies while trying to print the backtrace. > > So the question is, why are there lots of signals still being queued? Can you ask VG to tell how many is "lots" in this case? > The code currently takes care to mop them up where ever the signals are > being sampled, but I guess that won't happen if the master is blocked in > a syscall... Hm. > > Can you go into a bit of detail about the process/thread structure here? > Those two traces are from a single-threaded program, but for this bug to > occur (and be Valgrind's fault), there needs to be lots of thread > exiting going on. Is there a process where: > 1. the main thread is blocked in a syscall, while > 2. other thread(s) are creating lots of other short-lived threads? I don't think so. I can describe the scenario for the program in question. We have a few servers running, and they stay up for the full test period. A sequence of tests is running. Each test runs a few programs (maybe a dozen) in order. Each connection to a server gets a thread which normally serves it from start to finish. When no client is active there should only be some housekeeping threads on the servers. Between tests we run this 'ssashut' program, which is a very simple one. It only issues one transaction, telling a server to release cached resources. Naturally, it gets a thread on the server. All of our programs run as a thread, meaning main() is practically empty, it just launches a thread and then joins it. This way we do not need to control mains, just threads. I short, I do not expect too many threads to be active (or exiting) when ssashut is run. Is there a way to request VG to log a global status report on demand? I can then run this status report before each ssashut. I wonder is some resource is not released by VG itself. Note that some VG processes (my servers) start early and never exit before the failure, does VG delay some housekeeping to when a process exits? Just a thought. -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-24 13:47:52
|
Jeremy Fitzhardinge wrote: > On Thu, 2005-01-20 at 13:06 +1100, Eyal Lebedinsky wrote: > >>I would like to offer another observation. I just created a simple program >>in an attempt to demonstrate the laconic report problem. Instead, it crashed >>(sig 11) on a return. > > > OK, I just checked in a fix for this too. > > J Further to my report that there still is a problem. I have now ran my tests a few times, and the process dies in exactly the same spot each time, very consistently. -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-20 12:39:38
|
Eyal Lebedinsky wrote: > Eyal Lebedinsky wrote: > Same thing with vanilla 2.6.10. Same thing with vanilla 2.6.11-rc1-bk7. FYI -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Johannes S. <Joh...@gm...> - 2005-01-20 14:17:51
Attachments:
find_broken_version.sh
|
Hi, I once had the same problem as you: I updated from a CVS project, and the program broke. As it was a year since I had updated, I decided to do a binary search through the patches, but soon found out that I am not good enough in book-keeping. So I wrote the attached shell script. You just call it from the root directory of valgrind, and it asks a few questions, then tries to check out a version "in the middle between known good and known bad" until it can compile cleanly. It then exits and asks you to run a test. Then you call it again and tell it if the version is good or bad, and the script again tries to check out a middle version, and so on. Eventually it will pin down the patch (or a set of patches if compilation fails between those patches) which broke the feature you need. Maybe you find this useful... Ciao, Dscho |
|
From: Eyal L. <ey...@ey...> - 2005-01-23 12:34:30
Attachments:
zz36.sh
|
Please do not ignore the original report, the problem is still there. Attached is a program that shows how the backtrace is displayed when in main but is missing from a thread. ==19059== Memcheck, a memory error detector for x86-linux. ==19059== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al. ==19059== Using valgrind-2.3.0.CVS, a program supervision framework for x86-linux. ==19059== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. ==19059== For more details, rerun with: -v ==19059== /home/eyal/zz/zz36.c(75) testing in main ==19059== Conditional jump or move depends on uninitialised value(s) ==19059== at 0x1B905D32: memcmp (mac_replace_strmem.c:325) <<<<< good stack ***** ==19059== by 0x8048636: func (zz36.c:23) ==19059== by 0x8048831: main (zz36.c:77) 0 /home/eyal/zz/zz36.c(80) testing in thread /home/eyal/zz/zz36.c(38) will pthread_attr_init /home/eyal/zz/zz36.c(41) pthread_attr_init=0 ==19059== ==19059== Thread 2: ==19059== Conditional jump or move depends on uninitialised value(s) ==19059== at 0x1B905D32: memcmp (mac_replace_strmem.c:325) <<<<<< missing stack ***** 0 /home/eyal/zz/zz36.c(60) joined /home/eyal/zz/zz36.c(65) pthread_attr_destroy=0 /home/eyal/zz/zz36.c(85) test done ==19059== ==19059== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 13 from 1) ==19059== malloc/free: in use at exit: 68 bytes in 1 blocks. ==19059== malloc/free: 3 allocs, 2 frees, 70 bytes allocated. ==19059== For counts of detected errors, rerun with: -v ==19059== searching for pointers to 1 not-freed blocks. ==19059== checked 9860212 bytes. ==19059== ==19059== ==19059== 68 bytes in 1 blocks are possibly lost in loss record 1 of 1 ==19059== at 0x1B904FE5: calloc (vg_replace_malloc.c:175) ==19059== by 0x1B8F25A8: (within /lib/ld-2.3.2.so) ==19059== by 0x1B8F287B: _dl_allocate_tls (in /lib/ld-2.3.2.so) ==19059== by 0x1B92424A: allocate_stack (in /lib/tls/libpthread-0.60.so) ==19059== by 0x1B923C54: pthread_create@@GLIBC_2.1 (in /lib/tls/libpthread-0.60.so) ==19059== by 0x80486DF: test (zz36.c:43) ==19059== by 0x8048868: main (zz36.c:82) ==19059== ==19059== LEAK SUMMARY: ==19059== definitely lost: 0 bytes in 0 blocks. ==19059== possibly lost: 68 bytes in 1 blocks. ==19059== still reachable: 0 bytes in 0 blocks. ==19059== suppressed: 0 bytes in 0 blocks. ==19059== Reachable blocks (those to which a pointer was found) are not shown. ==19059== To see them, rerun with: --show-reachable=yes -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Jeremy F. <je...@go...> - 2005-01-23 23:24:45
|
On Sun, 2005-01-23 at 23:34 +1100, Eyal Lebedinsky wrote: > Please do not ignore the original report, the problem is still > there. > > Attached is a program that shows how the backtrace is displayed > when in main but is missing from a thread. Valgrind has to guess where the thread's stack is, because it isn't explicitly told. It assumes that the limits of the stack is between the initial stack pointer and the base of that memory mapping; this may not be a good assumption. In coregrind/x86-linux/syscalls.c:do_clone(), could you set "debug" to True and tell me what the guessed stack range is for the thread which is reporting bad stack backtraces? Also, the contents of /proc/<pid>/maps for that process. J |
|
From: Eyal L. <ey...@ey...> - 2005-01-24 00:48:40
|
Jeremy Fitzhardinge wrote:
> On Sun, 2005-01-23 at 23:34 +1100, Eyal Lebedinsky wrote:
>
>>Please do not ignore the original report, the problem is still
>>there.
>>
>>Attached is a program that shows how the backtrace is displayed
>>when in main but is missing from a thread.
>
>
> Valgrind has to guess where the thread's stack is, because it isn't
> explicitly told. It assumes that the limits of the stack is between the
> initial stack pointer and the base of that memory mapping; this may not
> be a good assumption.
>
> In coregrind/x86-linux/syscalls.c:do_clone(), could you set "debug" to
> True and tell me what the guessed stack range is for the thread which is
> reporting bad stack backtraces? Also, the contents of /proc/<pid>/maps
> for that process.
>
> J
Should I assume that zz36 does not reproduce the problem for you?
Also, at what point do you want the /proc entry? I will need to
insert some code into VG to dump it. Here is what I inserted at
the top of do_clone():
vki_sigset_t blockall, savedmask;
{
char cmd[256];
VG_(sprintf) (cmd, "cat /proc/%d/maps", (Int)VG_(getpid ()));
VG_(system) (cmd);
}
I hope that getpid() gets the required value.
It took me a while to discover that '%ld' in VG means 'Long' which is
type 'long long' and I need to use '%d' and 'Int' which are 'long'...
==22241== Memcheck, a memory error detector for x86-linux.
==22241== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==22241== Using valgrind-2.3.0.CVS, a program supervision framework for x86-linux.
==22241== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==22241== For more details, rerun with: -v
==22241==
/home/eyal/zz/zz36.c(75) testing in main
==22241== Conditional jump or move depends on uninitialised value(s)
==22241== at 0x1B905D32: memcmp (mac_replace_strmem.c:325)
==22241== by 0x8048636: func (zz36.c:23)
==22241== by 0x8048831: main (zz36.c:77)
/home/eyal/zz/zz36.c(80) testing in thread
/home/eyal/zz/zz36.c(38) will pthread_attr_init
/home/eyal/zz/zz36.c(41) pthread_attr_init=0
08048000-08049000 r-xp 00000000 03:06 344360 /home/eyal/zz/zz36
08049000-0804a000 rw-p 00000000 03:06 344360 /home/eyal/zz/zz36
1b8e4000-1b8fa000 r-xp 00000000 03:06 167200 /lib/ld-2.3.2.so
1b8fa000-1b8fb000 rw-p 00015000 03:06 167200 /lib/ld-2.3.2.so
1b8fc000-1b8fd000 rw-p 1b8fc000 00:00 0
1b8fe000-1b8ff000 r-xp 00000000 03:07 5767177 /data2/usr/local/lib/valgrind/vg_inject.so
1b8ff000-1b900000 rw-p 00000000 03:07 5767177 /data2/usr/local/lib/valgrind/vg_inject.so
1b901000-1b907000 r-xp 00000000 03:07 5767318 /data2/usr/local/lib/valgrind/vgpreload_memcheck.so
1b907000-1b908000 rw-p 00005000 03:07 5767318 /data2/usr/local/lib/valgrind/vgpreload_memcheck.so
1b909000-1b90a000 rw-p 1b909000 00:00 0
1b91d000-1b91e000 rw-p 1b91d000 00:00 0
1b91f000-1b92b000 r-xp 00000000 03:06 395375 /lib/tls/libpthread-0.60.so
1b92b000-1b92c000 rw-p 0000c000 03:06 395375 /lib/tls/libpthread-0.60.so
1b92c000-1b92e000 rw-p 1b92c000 00:00 0
1b92f000-1ba58000 r-xp 00000000 03:06 395242 /lib/tls/libc-2.3.2.so
1ba58000-1ba60000 rw-p 00129000 03:06 395242 /lib/tls/libc-2.3.2.so
1ba60000-1ba63000 rw-p 1ba60000 00:00 0
1ba64000-1ba65000 rw-p 1ba64000 00:00 0
1ba66000-1bb66000 rwxp 1ba66000 00:00 0
1bb67000-1bb68000 ---p 1bb67000 00:00 0
1bb68000-1c367000 rwxp 1bb68000 00:00 0
52bfb000-52bff000 rwxp 52bfb000 00:00 0
52bff000-52c00000 r-xp 52bff000 00:00 0
52c00000-52d00000 ---p 52c00000 00:00 0
52d00000-537f8000 rw-p 52d00000 00:00 0
537f8000-b0000000 ---p 537f8000 00:00 0
b0000000-b00af000 r-xp 00000000 03:07 5767176 /data2/usr/local/lib/valgrind/stage2
b00af000-b00b1000 rw-p 000ae000 03:07 5767176 /data2/usr/local/lib/valgrind/stage2
b00b1000-b0205000 rw-p b00b1000 00:00 0
b0206000-b0306000 rwxp b0206000 00:00 0
b0307000-b0407000 rwxp b0307000 00:00 0
b0408000-b0409000 ---p b0408000 00:00 0
b0409000-b0419000 rw-p b0409000 00:00 0
b041a000-b0422000 rwxp b041a000 00:00 0
b0423000-b0433000 rwxp b0423000 00:00 0
b0434000-b0444000 rwxp b0434000 00:00 0
b063b000-b073b000 rwxp b063b000 00:00 0
b073c000-b0986000 rwxp b073c000 00:00 0
b1000000-b1016000 r-xp 00000000 03:06 167200 /lib/ld-2.3.2.so
b1016000-b1017000 rw-p 00015000 03:06 167200 /lib/ld-2.3.2.so
b1018000-b1705000 rwxp b1018000 00:00 0
b7c88000-b7ca1000 r-xp 00000000 03:07 5767317 /data2/usr/local/lib/valgrind/vgskin_memcheck.so
b7ca1000-b7ca2000 rw-p 00018000 03:07 5767317 /data2/usr/local/lib/valgrind/vgskin_memcheck.so
b7ca2000-b7eb5000 rw-p b7ca2000 00:00 0
b7eb5000-b7fde000 r-xp 00000000 03:06 395242 /lib/tls/libc-2.3.2.so
b7fde000-b7fe6000 rw-p 00129000 03:06 395242 /lib/tls/libc-2.3.2.so
b7fe6000-b7fe9000 rw-p b7fe6000 00:00 0
b7fe9000-b7feb000 r-xp 00000000 03:06 395245 /lib/tls/libdl-2.3.2.so
b7feb000-b7fec000 rw-p 00002000 03:06 395245 /lib/tls/libdl-2.3.2.so
b7fff000-b8000000 rw-p b7fff000 00:00 0
bffeb000-c0000000 rw-p bffeb000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
guessed client stack range 0x1BB68000-0x1C367000
clone child has SETTLS: tls info at 0x52BFEB40: idx=6 base=0x1C366BB0 limit=FFFFF; esp=0x52BFEB00 fs=0 gs=33
==22241==
==22241== Thread 2:
==22241== Conditional jump or move depends on uninitialised value(s)
==22241== at 0x1B905D32: memcmp (mac_replace_strmem.c:325)
/home/eyal/zz/zz36.c(60) joined
/home/eyal/zz/zz36.c(65) pthread_attr_destroy=0
/home/eyal/zz/zz36.c(85) test done
0
0
==22241==
==22241== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 13 from 1)
==22241== malloc/free: in use at exit: 68 bytes in 1 blocks.
==22241== malloc/free: 3 allocs, 2 frees, 70 bytes allocated.
==22241== For counts of detected errors, rerun with: -v
==22241== searching for pointers to 1 not-freed blocks.
==22241== checked 9860356 bytes.
==22241==
==22241==
==22241== 68 bytes in 1 blocks are possibly lost in loss record 1 of 1
==22241== at 0x1B904FE5: calloc (vg_replace_malloc.c:175)
==22241== by 0x1B8F25A8: (within /lib/ld-2.3.2.so)
==22241== by 0x1B8F287B: _dl_allocate_tls (in /lib/ld-2.3.2.so)
==22241== by 0x1B92424A: allocate_stack (in /lib/tls/libpthread-0.60.so)
==22241== by 0x1B923C54: pthread_create@@GLIBC_2.1 (in /lib/tls/libpthread-0.60.so)
==22241== by 0x80486DF: test (zz36.c:43)
==22241== by 0x8048868: main (zz36.c:82)
==22241==
==22241== LEAK SUMMARY:
==22241== definitely lost: 0 bytes in 0 blocks.
==22241== possibly lost: 68 bytes in 1 blocks.
==22241== still reachable: 0 bytes in 0 blocks.
==22241== suppressed: 0 bytes in 0 blocks.
==22241== Reachable blocks (those to which a pointer was found) are not shown.
==22241== To see them, rerun with: --show-reachable=yes
--
Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/>
attach .zip as .dat
|
|
From: Jeremy F. <je...@go...> - 2005-01-24 00:52:46
|
On Mon, 2005-01-24 at 11:48 +1100, Eyal Lebedinsky wrote: > Should I assume that zz36 does not reproduce the problem for you? No, I can reproduce it. Could you file a bug with zz36 attached just to make sure the issue gets tracked? J |
|
From: Jeremy F. <je...@go...> - 2005-01-24 01:25:06
|
On Sun, 2005-01-23 at 16:52 -0800, Jeremy Fitzhardinge wrote: > No, I can reproduce it. Could you file a bug with zz36 attached just to > make sure the issue gets tracked? Actually, don't bother, I fixed it. J |
|
From: Eyal L. <ey...@ey...> - 2005-01-24 03:12:28
|
Jeremy Fitzhardinge wrote: [trimmed] > Could you file a bug with zz36 attached just to > make sure the issue gets tracked? > > J Logged as 1108082 -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Jeremy F. <je...@go...> - 2005-01-24 07:46:35
|
On Mon, 2005-01-24 at 14:12 +1100, Eyal Lebedinsky wrote: > Jeremy Fitzhardinge wrote: > [trimmed] > > Could you file a bug with zz36 attached just to > > make sure the issue gets tracked? > > > > J > > Logged as 1108082 That doesn't look like a bugs.kde.org bug number (we're only up to ~97000). Did you submit it via http://bugs.kde.org/enter_valgrind_bug.cgi ? J |
|
From: Eyal L. <ey...@ey...> - 2005-01-24 09:17:18
|
Jeremy Fitzhardinge wrote: > On Mon, 2005-01-24 at 14:12 +1100, Eyal Lebedinsky wrote: > >>Jeremy Fitzhardinge wrote: >>[trimmed] >> >>>Could you file a bug with zz36 attached just to >>>make sure the issue gets tracked? >>> >>> J >> >>Logged as 1108082 > > > That doesn't look like a bugs.kde.org bug number (we're only up to > ~97000). Did you submit it via > http://bugs.kde.org/enter_valgrind_bug.cgi ? No, through sourceforge.net. https://sourceforge.net/tracker/?func=detail&atid=445586&aid=1108082&group_id=46268 > J Sorry, the pasta sauce is just being made and requires my full attention, I will be back in 15m... -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |
|
From: Eyal L. <ey...@ey...> - 2005-01-24 11:59:04
|
Jeremy Fitzhardinge wrote: > On Mon, 2005-01-24 at 14:12 +1100, Eyal Lebedinsky wrote: > >>Jeremy Fitzhardinge wrote: >>[trimmed] >> >>>Could you file a bug with zz36 attached just to >>>make sure the issue gets tracked? >>> >>> J >> >>Logged as 1108082 > > > That doesn't look like a bugs.kde.org bug number (we're only up to > ~97000). Did you submit it via > http://bugs.kde.org/enter_valgrind_bug.cgi ? Now logged as http://bugs.kde.org/show_bug.cgi?id=97785 > J -- Eyal Lebedinsky (ey...@ey...) <http://samba.org/eyal/> attach .zip as .dat |