You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
| 2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
| 2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
| 2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
| 2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
| 2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
| 2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
| 2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
| 2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
| 2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
| 2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
| 2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
| 2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
| 2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
| 2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
| 2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
| 2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
| 2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
| 2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
| 2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(6) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
1
(9) |
2
(5) |
3
(1) |
4
(4) |
5
|
6
(8) |
7
(12) |
|
8
(6) |
9
(10) |
10
(6) |
11
(8) |
12
(12) |
13
(5) |
14
(1) |
|
15
(1) |
16
(12) |
17
(9) |
18
(4) |
19
(8) |
20
(4) |
21
(1) |
|
22
|
23
(4) |
24
(12) |
25
(13) |
26
(16) |
27
(4) |
28
(2) |
|
29
(1) |
30
(4) |
31
(9) |
|
|
|
|
|
From: Paul P. <ppl...@gm...> - 2006-01-19 05:01:46
|
On 19 Jan 2006 04:29:04 +0000, Bryan Henderson <br...@gi...> wr= ote: > My personal strategy is to look at every "Invalid read of size 4" and if > it just runs off the end of an odd-sized block, I write a suppression for > it. I don't try to determine if there's a real problem; I just presume > there isn't. If the bad read comes from libc, your strategy is possibly ok, though you may miss bugs where you pass a "too small" buffer into libc. If it comes from your own code, it is almost certainly a bug in your code (unless you also perform "ultra-optimizations", and you should fix it instead of suppressing. > So why can't Valgrind do that for me? Why not run with --tool=3Dnone ? That way you wouldn't get any reports at all. Cheers, |
|
From: <br...@gi...> - 2006-01-19 04:44:56
|
I've seen a few references in the archives to the ultra-optimized glibc strlen function (and various other functions that use the same optimization). This must be quite familiar to all Valgrind users -- I myself am getting tired of dealing with it. But as a reminder: Some functions read a byte-granularity block of memory one word at a time, for speed. That means they tend to overrun the block of memory by a few bytes at the end, and Valgrind/Memcheck screams. My personal strategy is to look at every "Invalid read of size 4" and if it just runs off the end of an odd-sized block, I write a suppression for it. I don't try to determine if there's a real problem; I just presume there isn't. So why can't Valgrind do that for me? I assume it can't or it would have been mentioned in this context. But I'd love to be able to tell Valgrind to ignore invalid reads of size one word that happen in the last word of a block. -- Bryan Henderson Phone 408-621-2000 San Jose, California |
|
From: Dalibor T. <ro...@ka...> - 2006-01-19 03:10:57
|
Richard Frith-Macdonald <richard <at> brainstorm.co.uk> writes: > I haven't managed to get things to work with Kaffe ... I am sorry to hear that. > The only way I'm likely to find out for sure is to build a > version from source and run under gdb to see exactly what's going on > when it starts up Yeah. You may want to chose jikes for the java compiler, since ecj crashes building 1.1.6-91 (i.e 1.1.7-rc1) on amd64 debian, according to the buildd logs from today :/ cheers, dalibor topic |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-18 17:11:28
|
On 18 Jan 2006, at 03:29, Dalibor Topic wrote: > Richard Frith-Macdonald <richard <at> brainstorm.co.uk> writes: > >> >> I have a large body of code making complex use of JNI called from >> servlets in tomcat. >> >> Because this uses a lot of JNI, and was developed on the Sun >> implementation of java, I have been unable to find another >> implementation which will run it ... SableVM and Kaffe lack a lot of >> the JNI functions required. > > Have you tried with Kaffe 1.1.6 or 1.1.7-rc1? That version should > implement all > of the JNI 1.4 APIs, as far as I remember (don't have the list > handy atm). Kaffe > also works under velgrind, try KAFFE_DEBUG=valgrind kaffe yourClass. I haven't managed to get things to work with Kaffe ... at first I assumed it was a JNI problem, but I reduced things to a very simple test program and it wasn't even getting as far as loading the shared library but was actually failing to locate the java class which used it ... despite having CLASSPATH set up to point to it (and trying the -classpath command line argument). I've been looking at lots of kaffe debug output, and it doesn't seem to be initialising the internal classpath correctly (the latest debian unstable package for the amd64), and my current best guess is that kaffe has bugs in 64bit mode. The only way I'm likely to find out for sure is to build a version from source and run under gdb to see exactly what's going on when it starts up :-( |
|
From: Tom H. <to...@co...> - 2006-01-18 13:58:52
|
In message <200...@gm...>
Sigurd Schneider <sig...@go...> wrote:
> Running valgrind-3.1.0 on my system with any executable always lead up to the
> following error. I'm using the compiler option -fprefetch-loop-arrays. Maybe
> that is a problem?
> If you need more information feel free to mail me. If it is stupid to report
> this, then please let me know.
Please raise a bug for this on the tracker - the important thing
to include is the error message with the unhandled bytes listed.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Sigurd S. <sig...@go...> - 2006-01-18 13:06:25
|
Hi everyone!
Running valgrind-3.1.0 on my system with any executable always lead up to the
following error. I'm using the compiler option -fprefetch-loop-arrays. Maybe
that is a problem?
If you need more information feel free to mail me. If it is stupid to report
this, then please let me know.
Sigurd Schneider
But now read yourself:
Valgrind outputs:
==3392== Memcheck, a memory error detector.
==3392== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==3392== Using LibVEX rev 1471, a library for dynamic binary translation.
==3392== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==3392== Using valgrind-3.1.0, a dynamic binary instrumentation framework.
==3392== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==3392== For more details, rerun with: -v
==3392==
vex x86->IR: unhandled instruction bytes: 0xF 0xD 0x48 0x4
==3392== Your program just tried to execute an instruction that Valgrind
==3392== did not recognise. There are two possible reasons for this.
==3392== 1. Your program has a bug and erroneously jumped to a non-code
==3392== location. If you are running Memcheck and you just saw a
==3392== warning about a bad jump, it's probably your program's fault.
==3392== 2. The instruction is legitimate but Valgrind doesn't handle it,
==3392== i.e. it's Valgrind's fault. If you think this is the case or
==3392== you are not sure, please let us know.
==3392== Either way, Valgrind will now raise a SIGILL signal which will
==3392== probably kill your program.
==3392==
==3392== Process terminating with default action of signal 4 (SIGILL)
==3392== Illegal opcode at address 0x400F7C2
==3392== at 0x400F7C2: _dl_important_hwcaps (in /lib/ld-2.3.6.so)
==3392== by 0x4004E16: _dl_init_paths (in /lib/ld-2.3.6.so)
==3392== by 0x40022EB: dl_main (in /lib/ld-2.3.6.so)
==3392== by 0x400F182: _dl_sysdep_start (in /lib/ld-2.3.6.so)
==3392== by 0x4001610: _dl_start (in /lib/ld-2.3.6.so)
==3392== by 0x40007D6: (within /lib/ld-2.3.6.so)
==3392==
==3392== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==3392== malloc/free: in use at exit: 0 bytes in 0 blocks.
==3392== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==3392== For counts of detected errors, rerun with: -v
==3392== No malloc'd blocks -- no leaks are possible.
Disassembling shows the following code:
0x0000f7b4 <_dl_important_hwcaps+612>: mov $0x1,%esi
0x0000f7b9 <_dl_important_hwcaps+617>: shl %cl,%esi
0x0000f7bb <_dl_important_hwcaps+619>: test %esi,%esi
0x0000f7bd <_dl_important_hwcaps+621>: je 0xf80e
<_dl_important_hwcaps+702>
0x0000f7bf <_dl_important_hwcaps+623>: mov 0xffffffdc(%ebp),%eax
0x0000f7c2 <_dl_important_hwcaps+626>: prefetchw 0x4(%eax) <---- here
0x0000f7c6 <_dl_important_hwcaps+630>: prefetchw 0x44(%eax)
0x0000f7ca <_dl_important_hwcaps+634>: prefetchw 0x84(%eax)
0x0000f7d1 <_dl_important_hwcaps+641>: prefetchw 0xc4(%eax)
0x0000f7d8 <_dl_important_hwcaps+648>: prefetchw 0x104(%eax)
---Type <return> to continue, or q <return> to quit---
0x0000f7df <_dl_important_hwcaps+655>: prefetchw 0x144(%eax)
0x0000f7e6 <_dl_important_hwcaps+662>: dec %esi
0x0000f7e7 <_dl_important_hwcaps+663>: test %esi,%edx
0x0000f7e9 <_dl_important_hwcaps+665>: je 0xf806
<_dl_important_hwcaps+694>
This code is within the ld-2.3.6.so lib of glibc-2.3.6. I compiled glibc
myself using gcc-3.4.5 with decent optimizations:
-O2 -fprefetch-loop-arrays -march=athlon-xp -mpreferred-stack-boundary=2
-fPIC
I'm using tls and ntpl only.
The function is compiled from the following source code:
(taken from glibc-2.3.6/sysdeps/generic/dl-sysdep.c)
/* Return an array of useful/necessary hardware capability names. */
const struct r_strlenpair *
internal_function
_dl_important_hwcaps (const char *platform, size_t platform_len, size_t *sz,
size_t *max_capstrlen)
{
/* Determine how many important bits are set. */
unsigned long int masked = GLRO(dl_hwcap) & GLRO(dl_hwcap_mask);
size_t cnt = platform != NULL;
size_t n, m;
size_t total;
struct r_strlenpair *temp;
struct r_strlenpair *result;
struct r_strlenpair *rp;
char *cp;
/* Count the number of bits set in the masked value. */
for (n = 0; (~((1UL << n) - 1) & masked) != 0; ++n)
if ((masked & (1UL << n)) != 0)
++cnt;
#ifdef USE_TLS
/* For TLS enabled builds always add 'tls'. */
++cnt;
#else
if (cnt == 0)
{
/* If we have platform name and no important capability we only have
the base directory to search. */
result = (struct r_strlenpair *) malloc (sizeof (*result));
if (result == NULL)
goto no_memory;
result[0].str = (char *) result; /* Does not really matter. */
result[0].len = 0;
*sz = 1;
return result;
}
#endif
/* Create temporary data structure to generate result table. */
temp = (struct r_strlenpair *) alloca (cnt * sizeof (*temp));
m = 0;
for (n = 0; masked != 0; ++n)
if ((masked & (1UL << n)) != 0)
{
temp[m].str = _dl_hwcap_string (n);
temp[m].len = strlen (temp[m].str);
masked ^= 1UL << n;
++m;
}
if (platform != NULL)
{
temp[m].str = platform;
temp[m].len = platform_len;
++m;
}
#ifdef USE_TLS
temp[m].str = "tls";
temp[m].len = 3;
++m;
#endif
assert (m == cnt);
/* Determine the total size of all strings together. */
if (cnt == 1)
total = temp[0].len + 1;
else
{
total = (1UL << (cnt - 2)) * (temp[0].len + temp[cnt - 1].len + 2);
for (n = 1; n + 1 < cnt; ++n)
total += (1UL << (cnt - 3)) * (temp[n].len + 1);
}
/* The result structure: we use a very compressed way to store the
various combinations of capability names. */
*sz = 1 << cnt;
result = (struct r_strlenpair *) malloc (*sz * sizeof (*result) + total);
if (result == NULL)
{
#ifndef USE_TLS
no_memory:
#endif
_dl_signal_error (ENOMEM, NULL, NULL,
N_("cannot create capability list"));
}
if (cnt == 1)
{
result[0].str = (char *) (result + *sz);
result[0].len = temp[0].len + 1;
result[1].str = (char *) (result + *sz);
result[1].len = 0;
cp = __mempcpy ((char *) (result + *sz), temp[0].str, temp[0].len);
*cp = '/';
*sz = 2;
*max_capstrlen = result[0].len;
return result;
}
/* Fill in the information. This follows the following scheme
(indeces from TEMP for four strings):
entry #0: 0, 1, 2, 3 binary: 1111
#1: 0, 1, 3 1101
#2: 0, 2, 3 1011
#3: 0, 3 1001
This allows the representation of all possible combinations of
capability names in the string. First generate the strings. */
result[1].str = result[0].str = cp = (char *) (result + *sz);
#define add(idx) \
cp = __mempcpy (__mempcpy (cp, temp[idx].str, temp[idx].len), "/", 1);
if (cnt == 2)
{
add (1);
add (0);
}
else
{
n = 1 << (cnt - 1);
do
{
n -= 2;
/* We always add the last string. */
add (cnt - 1);
/* Add the strings which have the bit set in N. */
for (m = cnt - 2; m > 0; --m)
if ((n & (1 << m)) != 0)
add (m);
/* Always add the first string. */
add (0);
}
while (n != 0);
}
#undef add
/* Now we are ready to install the string pointers and length. */
for (n = 0; n < (1UL << cnt); ++n)
result[n].len = 0;
n = cnt;
do
{
size_t mask = 1 << --n;
rp = result;
for (m = 1 << cnt; m > 0; ++rp)
if ((--m & mask) != 0)
rp->len += temp[n].len + 1;
}
while (n != 0);
/* The first half of the strings all include the first string. */
n = (1 << cnt) - 2;
rp = &result[2];
while (n != (1UL << (cnt - 1)))
{
if ((--n & 1) != 0)
rp[0].str = rp[-2].str + rp[-2].len;
else
rp[0].str = rp[-1].str;
++rp;
}
/* The second have starts right after the first part of the string of
corresponding entry in the first half. */
do
{
rp[0].str = rp[-(1 << (cnt - 1))].str + temp[cnt - 1].len + 1;
++rp;
}
while (--n != 0);
/* The maximum string length. */
*max_capstrlen = result[0].len;
return result;
}
|
|
From: Dalibor T. <ro...@ka...> - 2006-01-18 03:41:11
|
Richard Frith-Macdonald <richard <at> brainstorm.co.uk> writes: > > I have a large body of code making complex use of JNI called from > servlets in tomcat. > > Because this uses a lot of JNI, and was developed on the Sun > implementation of java, I have been unable to find another > implementation which will run it ... SableVM and Kaffe lack a lot of > the JNI functions required. Have you tried with Kaffe 1.1.6 or 1.1.7-rc1? That version should implement all of the JNI 1.4 APIs, as far as I remember (don't have the list handy atm). Kaffe also works under velgrind, try KAFFE_DEBUG=valgrind kaffe yourClass. cheers, dalibor topic |
|
From: Michael R. <mic...@gm...> - 2006-01-17 22:40:50
|
On Tuesday 17 January 2006 17:56, Tom Hughes wrote: > In message <200...@gm...> > > Michael Reiher <re...@gm...> wrote: > > I run the program via (The suppression options only when needed, of > > course): > > > > valgrind --tool=memcheck --suppressions=valgrind.supp > > --gen-suppressions=yes --leak-check=full --leak-resolution=high > > --show-reachable=no <program> <args> > > > > Valgrind nicly prints loss records for lost blocks. Like this: > > > > ==7085== 157,866 (40 direct, 157,826 indirect) bytes in 1 blocks are > > definitelylost in loss record 135 of 218 > > > > First of, do I understand "block" correctly as some amount of memory > > allocated at a time, and it doesn't say anything about the amount i.e. it > > can be 2byte, 57Kb or whatever? If so, how can 1 block be directly and > > indirectly leaked at the same time? > > The block count is the number of direct blocks lost I believe - so you > have one 40 byte block lost which contains pointers in one way or > another to another 157826 bytes of memory. > > It doesn't report the number of indirect blocks lost. > Ahh, I see, that makes sence... > > Then I wonder which records are actually printed? Only 20something out > > those 218 are printed. So what about the rest? I suspected them to be > > either duplicates of the printed ones or still reachable blocks, but > > neither seems to be the case. They have different backtraces for > > instance. Also the numbers of blocks in the summary are way above the > > printed ones (counted together). So obviously the hidden ones add to the > > lost blocks in the summary as well. But why are they hidden then? I fear, > > actual leaks from my plugin might be hidden? > > Duplicates should already have been suppressed I think - that is done > as the loss records are computed. > ... Then the reports apparently cover only the "definitly lost" blocks. And the hidden ones are probably the "indirectly lost" blocks? Is there a way to see information about those, too? > > When now generating suppressions, obviously I'm also asked for the not > > printed records. I get lots of: > > > > ==7100== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ---- > > > > but with no additional info. Which is not exactly much to base a decision > > on ;) Can I make valgrind a bit mor verbose? > > Sounds like you have loss records for which it has failed to record > any location for some reason. Seems a bit odd. > Don't think so. The backtraces seem ok when printing the suppressions for these cases. Greets Michael |
|
From: Jeroen N. W. <jn...@xs...> - 2006-01-17 17:00:19
|
> In message <200...@gm...> > Michael Reiher <re...@gm...> wrote: > >> I run the program via (The suppression options only when needed, of >> course): >> >> valgrind --tool=memcheck --suppressions=valgrind.supp >> --gen-suppressions=yes >> --leak-check=full --leak-resolution=high --show-reachable=no <program> >> <args> >> >> Valgrind nicly prints loss records for lost blocks. Like this: >> >> ==7085== 157,866 (40 direct, 157,826 indirect) bytes in 1 blocks are >> definitelylost in loss record 135 of 218 >> >> First of, do I understand "block" correctly as some amount of memory >> allocated >> at a time, and it doesn't say anything about the amount i.e. it can be >> 2byte, >> 57Kb or whatever? If so, how can 1 block be directly and indirectly >> leaked at >> the same time? > > The block count is the number of direct blocks lost I believe - so you > have one 40 byte block lost which contains pointers in one way or > another to another 157826 bytes of memory. > > It doesn't report the number of indirect blocks lost. > >> Then I wonder which records are actually printed? Only 20something out >> those >> 218 are printed. So what about the rest? I suspected them to be either >> duplicates of the printed ones or still reachable blocks, but neither >> seems >> to be the case. They have different backtraces for instance. Also the >> numbers >> of blocks in the summary are way above the printed ones (counted >> together). >> So obviously the hidden ones add to the lost blocks in the summary as >> well. >> But why are they hidden then? I fear, actual leaks from my plugin might >> be >> hidden? > > Duplicates should already have been suppressed I think - that is done > as the loss records are computed. > >> When now generating suppressions, obviously I'm also asked for the not >> printed >> records. I get lots of: >> >> ==7100== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ---- >> >> but with no additional info. Which is not exactly much to base a >> decision >> on ;) Can I make valgrind a bit mor verbose? > > Sounds like you have loss records for which it has failed to record > any location for some reason. Seems a bit odd. > > Tom > > -- > Tom Hughes (to...@co...) > http://www.compton.nu/ > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Tom H. <to...@co...> - 2006-01-17 16:56:25
|
In message <200...@gm...>
Michael Reiher <re...@gm...> wrote:
> I run the program via (The suppression options only when needed, of course):
>
> valgrind --tool=memcheck --suppressions=valgrind.supp --gen-suppressions=yes
> --leak-check=full --leak-resolution=high --show-reachable=no <program> <args>
>
> Valgrind nicly prints loss records for lost blocks. Like this:
>
> ==7085== 157,866 (40 direct, 157,826 indirect) bytes in 1 blocks are
> definitelylost in loss record 135 of 218
>
> First of, do I understand "block" correctly as some amount of memory allocated
> at a time, and it doesn't say anything about the amount i.e. it can be 2byte,
> 57Kb or whatever? If so, how can 1 block be directly and indirectly leaked at
> the same time?
The block count is the number of direct blocks lost I believe - so you
have one 40 byte block lost which contains pointers in one way or
another to another 157826 bytes of memory.
It doesn't report the number of indirect blocks lost.
> Then I wonder which records are actually printed? Only 20something out those
> 218 are printed. So what about the rest? I suspected them to be either
> duplicates of the printed ones or still reachable blocks, but neither seems
> to be the case. They have different backtraces for instance. Also the numbers
> of blocks in the summary are way above the printed ones (counted together).
> So obviously the hidden ones add to the lost blocks in the summary as well.
> But why are they hidden then? I fear, actual leaks from my plugin might be
> hidden?
Duplicates should already have been suppressed I think - that is done
as the loss records are computed.
> When now generating suppressions, obviously I'm also asked for the not printed
> records. I get lots of:
>
> ==7100== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ----
>
> but with no additional info. Which is not exactly much to base a decision
> on ;) Can I make valgrind a bit mor verbose?
Sounds like you have loss records for which it has failed to record
any location for some reason. Seems a bit odd.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Michael R. <re...@gm...> - 2006-01-17 16:30:16
|
Hi I'm triying to find out the source of memleaks of a plugin I'm developing. The server has quite a few leaks (mostly during init). I don't care about most of them, so I would like to suppress them. But I'm a bit confused about the output of the leak checker. Studying the manuals and searching the archives didn't really enlighten me. I run the program via (The suppression options only when needed, of course): valgrind --tool=memcheck --suppressions=valgrind.supp --gen-suppressions=yes --leak-check=full --leak-resolution=high --show-reachable=no <program> <args> Valgrind nicly prints loss records for lost blocks. Like this: ==7085== 157,866 (40 direct, 157,826 indirect) bytes in 1 blocks are definitelylost in loss record 135 of 218 First of, do I understand "block" correctly as some amount of memory allocated at a time, and it doesn't say anything about the amount i.e. it can be 2byte, 57Kb or whatever? If so, how can 1 block be directly and indirectly leaked at the same time? Then I wonder which records are actually printed? Only 20something out those 218 are printed. So what about the rest? I suspected them to be either duplicates of the printed ones or still reachable blocks, but neither seems to be the case. They have different backtraces for instance. Also the numbers of blocks in the summary are way above the printed ones (counted together). So obviously the hidden ones add to the lost blocks in the summary as well. But why are they hidden then? I fear, actual leaks from my plugin might be hidden? When now generating suppressions, obviously I'm also asked for the not printed records. I get lots of: ==7100== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ---- but with no additional info. Which is not exactly much to base a decision on ;) Can I make valgrind a bit mor verbose? Some enlightenment would be great :) Greets Michael |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-17 15:49:58
|
On 17 Jan 2006, at 15:11, Julian Seward wrote: > One thing that does occur to me is that by default the simulated > registers may not be up to date at the exception point, and that > could cause problems. What happens if you run with > --vex-iropt-precise-memory-exns=yes ? No difference ... it ends up looping handling a sigsegv |
|
From: Julian S. <js...@ac...> - 2006-01-17 15:11:40
|
> OK. I can reproduce this. I think the problem is that the JVM is > installing a SEGV handler which tries to resume at a different > address by updating the value of RIP in the signal context. Gaah. How horrible. > Now valgrind doesn't handle that - it copies most of the register > values back but not the RIP value. I've just tried making it copy > the RIP value back but it didn't seem to help and it was still > looping. > > I presume the reason for that is that although the RIP for the > simulated CPU has been updated when the signal handler returns > the real CPU will restart execution from in the JITed code that > valgrind has generated and then fault again. This is based on inadequate (zero) investigation on my part, so could be completely bogus, but how I think it works is: - vex-generated code gets a segfault - Valgrind's signal handler runs. It sees this is a synch signal and longjmps back to the scheduler. - scheduler sees a signal is pending for this thread, builds frame for thread, sets RIP to enter handler - thread is rescheduled, handler runs (on sim'd cpu of course). - handler messes with RIP in sigcontext. - handler returns. This makes it jump to the stub code in the frame created by m_sigframe, which in turn causes it to do a sys_rt_sigreturn. - So we're now in PRE(sys_rt_sigreturn) in syswap-amd64-linux.c. That does some incomprehensible messing around (I never really understood it) but it does call VG_(sigframe_destroy)(tid, True). This copies the sigcontext back into the thread state. - When the thread is rescheduled, it will continue at the new RIP value. So it should work (ha ha ha). I'm sure Jeremy had it working, or something very similar, so that he could do self-hosting on 2.4.X, but that did get partially broken at the time the 3.0 code line was created, and now we don't rely on signal cleverness to do self hosting. One thing that does occur to me is that by default the simulated registers may not be up to date at the exception point, and that could cause problems. What happens if you run with --vex-iropt-precise-memory-exns=yes ? J |
|
From: Julian S. <js...@ac...> - 2006-01-17 13:36:33
|
Send the output somewhere else. Use --log-file=, --log-fd= or --log-socket=. J On Tuesday 17 January 2006 12:52, Karim Bernardet wrote: > Hi > > I am using valgrind 3.10 and I run valgrind like this : > > valgrind --tool=memcheck --leak-check=yes --trace-children=yes > --num-callers=8 --show-reachable=yes `which athena.py` ReDoBtag.py RDB.p > > The (huge) program uses a filename which is changed by "==12176== > Memcheck, a memory error detector." ! > > Domain[ROOT_All]: level[Info] > Access DbDomain READ > [ROOT_All] > Domain[ROOT_All]: level[Info] > Deaccess DbDomain READ > [ROOT_All] > EventSelector ERROR (PersistencySvc) > pool::PersistencySvc::UserDatabase::connectForRead: PFN "==12176== > Memcheck, a memory error detector." is not existing > ProxyProviderSvc ERROR > ServiceLocatorHelper::createService: can not create service > EventSelector of type EventSelector > ProxyProviderSvc > > Thanks for any help ! > > Karim > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Karim B. <ber...@cp...> - 2006-01-17 12:52:39
|
Hi I am using valgrind 3.10 and I run valgrind like this : valgrind --tool=memcheck --leak-check=yes --trace-children=yes --num-callers=8 --show-reachable=yes `which athena.py` ReDoBtag.py RDB.p The (huge) program uses a filename which is changed by "==12176== Memcheck, a memory error detector." ! Domain[ROOT_All]: level[Info] > Access DbDomain READ [ROOT_All] Domain[ROOT_All]: level[Info] > Deaccess DbDomain READ [ROOT_All] EventSelector ERROR (PersistencySvc) pool::PersistencySvc::UserDatabase::connectForRead: PFN "==12176== Memcheck, a memory error detector." is not existing ProxyProviderSvc ERROR ServiceLocatorHelper::createService: can not create service EventSelector of type EventSelector ProxyProviderSvc Thanks for any help ! Karim |
|
From: Krishna <kri...@ya...> - 2006-01-17 05:00:51
|
Hi , I tried installing valgrind with the --prefix=path option However I get the following error when I try to execute it. valgrind: failed to start tool 'memcheck' for platform 'x86-linux': No such file or directory Would anyone know why this is happening and how I could correct it. Thank you, Krishna |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 17:01:11
|
On 16 Jan 2006, at 16:47, Tom Hughes wrote: > In message <861...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> On 16 Jan 2006, at 11:44, Tom Hughes wrote: >> >>> Looks like a signal handling issue then - add --trace-signals=yes >>> and see what that gets you. >> >> OK ... segmentation violation ... but I don't know where/why. >> Does the following (the last six lines are the ones that repeat >> forever) make any sense? > > OK. I can reproduce this. I think the problem is that the JVM is > installing a SEGV handler which tries to resume at a different > address by updating the value of RIP in the signal context. > > Now valgrind doesn't handle that - it copies most of the register > values back but not the RIP value. I've just tried making it copy > the RIP value back but it didn't seem to help and it was still > looping. > > I presume the reason for that is that although the RIP for the > simulated CPU has been updated when the signal handler returns > the real CPU will restart execution from in the JITed code that > valgrind has generated and then fault again. > > The signal handler probably needs to jump back to the scheduler > if RIP has changed so that the scheduler can restart execution > from the right address but Julian may have a better idea how to > fix this... That's getting into the internals of valgrind and beyond my current knowhow ... but I understand the gist of your reply. I guess this is to be taken as a valgrind bug then ... would you like me to file a bug report? |
|
From: Tom H. <to...@co...> - 2006-01-16 16:47:42
|
In message <861...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> On 16 Jan 2006, at 11:44, Tom Hughes wrote:
>
>> Looks like a signal handling issue then - add --trace-signals=yes
>> and see what that gets you.
>
> OK ... segmentation violation ... but I don't know where/why.
> Does the following (the last six lines are the ones that repeat
> forever) make any sense?
OK. I can reproduce this. I think the problem is that the JVM is
installing a SEGV handler which tries to resume at a different
address by updating the value of RIP in the signal context.
Now valgrind doesn't handle that - it copies most of the register
values back but not the RIP value. I've just tried making it copy
the RIP value back but it didn't seem to help and it was still
looping.
I presume the reason for that is that although the RIP for the
simulated CPU has been updated when the signal handler returns
the real CPU will restart execution from in the JITed code that
valgrind has generated and then fault again.
The signal handler probably needs to jump back to the scheduler
if RIP has changed so that the scheduler can restart execution
from the right address but Julian may have a better idea how to
fix this...
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 12:27:16
|
On 16 Jan 2006, at 11:44, Tom Hughes wrote: > In message <BBC...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> On 16 Jan 2006, at 10:52, Tom Hughes wrote: >> >>> In message <88F...@br...> >>> Richard Frith-Macdonald <ri...@br...> wrote: >>> >>>> I just tried adding '--error-limit=no' to the command line to start >>>> tomcat. It generates a bigger log file ... but the log file has >>>> stopped growing, though the valgrind/tomcat process looks like >>>> it is >>>> in a loop of some sort (ie a 'ps' shows it in an 'R' state). >>>> The log file stopped growing at 10:13, but it's now 10:40 and 'ps' >>>> shows the the process has now used over 28 minutes of CPU time. >>> >>> Well what does strace say it is doing? What about --trace- >>> syscalls=yes? >> >> I hadn't thought of that ... looks like a signal handling problem of >> some sort ... after adding '--trace-syscalls=yes' I get >> 'SYSCALL[8149,1]( 15) rt_sigreturn ( ) --> [pre-success] Success >> (0x0)' >> repeated indefinitely at the end of the log file. The log file to >> that point is about 900KB. > > Looks like a signal handling issue then - add --trace-signals=yes > and see what that gets you. OK ... segmentation violation ... but I don't know where/why. Does the following (the last six lines are the ones that repeat forever) make any sense? SYSCALL[29972,1](202) sys_futex ( 0x141E5244, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 0, 2, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x141E5240, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 1, 1, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 0, 2, 0x0, 0x477FFE0 ) -- > [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x477FFE0, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 1, 1, 0x50C6070, 0x477FFE0 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5244, 0, 9, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E354CD: ??? ==29972== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC49: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC50: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC57: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC5E: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC65: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC6C: ??? ==29972== by 0x70E35503: ??? ==29972== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd SYSCALL[29972,1](202) sys_futex ( 0x141E5244, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,10](202) ... [async] --> Failure(0x6E) SYSCALL[29972,10](202) sys_futex ( 0x14BC9E78, 1, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,10](202) ... [async] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E90, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E00, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10](228) sys_clock_gettime( 0, 0x14BC9DD8 )[sync] --> Success(0x0) SYSCALL[29972,10](202) sys_futex ( 0x15415C64, 0, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 0, 2, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x141E5240, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5240, 1, 1, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 0, 2, 0x0, 0x477FFE0 ) -- > [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) SYSCALL[29972,1](202) sys_futex ( 0x477FFE0, 1, 1, 0x508CB10, 0x141E5240 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x477FFE0, 1, 1, 0x50C6070, 0x477FFE0 ) --> [async] ... SYSCALL[29972,6](202) ... [async] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6]( 96) sys_gettimeofday ( 0x147C6010, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,6](202) sys_futex ( 0x141E5244, 0, 11, 0x0, 0x141E4200 ) --> [async] ... SYSCALL[29972,1](202) ... [async] --> Success(0x1) ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E3524D: ??? ==29972== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC49: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFFCA88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC50: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFFBA88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC57: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFFAA88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC5E: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF9A88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC65: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF8A88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC6C: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF7A88 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E31840: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF7A78 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E31F80: ??? ==29972== by 0x70E3527B: ??? ==29972== Address 0x7FEFF7A48 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70E33B4D: ??? ==29972== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC49: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC50: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC57: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC5E: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC65: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid write of size 4 ==29972== at 0x70DDDC6C: ??? ==29972== by 0x70E33B7F: ??? ==29972== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd ==29972== ==29972== Invalid read of size 8 ==29972== at 0x70E3F260: ??? ==29972== Address 0x8 is not stack'd, malloc'd or (recently) free'd --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 ==29972== ==29972== Conditional jump or move depends on uninitialised value(s) ==29972== at 0x4E31AEB: SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned char*, SharedRuntime::ImplicitExceptionKind) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==29972== by 0x4DCA625: JVM_handle_linux_signal (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==29972== by 0x4DC812D: signalHandler(int, siginfo*, void*) (in / usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==29972== by 0x432B79F: (within /lib/libpthread-2.3.5.so) SYSCALL[29972,1]( 15) rt_sigreturn ( )--29972-- VG_(signal_return) (thread 1): isRT=1 valid magic; RIP=0x70E3F260 --> [pre-success] Success(0x0) --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 SYSCALL[29972,1]( 15) rt_sigreturn ( )--29972-- VG_(signal_return) (thread 1): isRT=1 valid magic; RIP=0x70E3F260 --> [pre-success] Success(0x0) --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 SYSCALL[29972,10](202) ... [async] --> Failure(0x6E) SYSCALL[29972,10](202) sys_futex ( 0x14BC9E78, 1, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,10](202) ... [async] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E90, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10]( 96) sys_gettimeofday ( 0x14BC9E00, 0x0 )[sync] --> Success(0x0) SYSCALL[29972,10](228) sys_clock_gettime( 0, 0x14BC9DD8 )[sync] --> Success(0x0) SYSCALL[29972,10](202) sys_futex ( 0x48071D4, 0, 1, 0x14BC9DD8, 0x14BC9E78 ) --> [async] ... SYSCALL[29972,1]( 15) rt_sigreturn ( )--29972-- VG_(signal_return) (thread 1): isRT=1 valid magic; RIP=0x70E3F260 --> [pre-success] Success(0x0) --29972-- signal 11 arrived ... si_code=1, EIP=0x70E3F260, eip=0x405C4F34B --29972-- SIGSEGV: si_code=1 faultaddr=0x8 tid=1 ESP=0x7FEFFDB10 seg=0x0-0x3FFFFFF --29972-- delivering signal 11 (SIGSEGV):1 to thread 1 --29972-- push_signal_frame (thread 1): signal 11 |
|
From: Tom H. <to...@co...> - 2006-01-16 11:44:45
|
In message <BBC...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> On 16 Jan 2006, at 10:52, Tom Hughes wrote:
>
>> In message <88F...@br...>
>> Richard Frith-Macdonald <ri...@br...> wrote:
>>
>>> I just tried adding '--error-limit=no' to the command line to start
>>> tomcat. It generates a bigger log file ... but the log file has
>>> stopped growing, though the valgrind/tomcat process looks like it is
>>> in a loop of some sort (ie a 'ps' shows it in an 'R' state).
>>> The log file stopped growing at 10:13, but it's now 10:40 and 'ps'
>>> shows the the process has now used over 28 minutes of CPU time.
>>
>> Well what does strace say it is doing? What about --trace-
>> syscalls=yes?
>
> I hadn't thought of that ... looks like a signal handling problem of
> some sort ... after adding '--trace-syscalls=yes' I get
> 'SYSCALL[8149,1]( 15) rt_sigreturn ( ) --> [pre-success] Success(0x0)'
> repeated indefinitely at the end of the log file. The log file to
> that point is about 900KB.
Looks like a signal handling issue then - add --trace-signals=yes
and see what that gets you.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 11:18:38
|
On 16 Jan 2006, at 10:52, Tom Hughes wrote: > In message <88F...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> I just tried adding '--error-limit=no' to the command line to start >> tomcat. It generates a bigger log file ... but the log file has >> stopped growing, though the valgrind/tomcat process looks like it is >> in a loop of some sort (ie a 'ps' shows it in an 'R' state). >> The log file stopped growing at 10:13, but it's now 10:40 and 'ps' >> shows the the process has now used over 28 minutes of CPU time. > > Well what does strace say it is doing? What about --trace- > syscalls=yes? I hadn't thought of that ... looks like a signal handling problem of some sort ... after adding '--trace-syscalls=yes' I get 'SYSCALL[8149,1]( 15) rt_sigreturn ( ) --> [pre-success] Success(0x0)' repeated indefinitely at the end of the log file. The log file to that point is about 900KB. > At end of the day my experience of trying to look at these problems > in the past is that the Sun JVM does truly disgusting things that are > often very hard to support in valgrind. It tends to stress just about > everything to the absolute limit especially things like threading and > signal handling. I suppose threading/signal handling are understandable given the nature of the beast. Sounds like you don't have much hope of it working though:-( > My favourite one was where it seemed to be fiddling with a memory > page (either munmap or mprotect, I can't remember which) which it > didn't seem to have allocated... It seemed to just be making > assumptions about where certain things would be in memory for some > reason. > > I assume you are using --smc-check=all or have turned the JIT compiler > off? Yes ... '/usr/local/bin/valgrind -v --trace-children=yes --smc- check=all --trace-syscalls=yes --error-limit=no --logfile=/tmp/vg' followed by the normal tomcat startup command. |
|
From: Tom H. <to...@co...> - 2006-01-16 10:53:02
|
In message <88F...@br...>
Richard Frith-Macdonald <ri...@br...> wrote:
> On 16 Jan 2006, at 08:55, Tom Hughes wrote:
>
>> Possibly - if you use --trace-children=yes do you see a whole stream
>> of valgrind startup messages appearing?
>
> No ... I guess I misdiagnosed the problem ... what I actually got was
> 1. a valgrind log file saying that there were too many errors and
> logging had stopped.
> 2. a system where the tomcat log was empty, and attempts to connect
> to it by apache were failing.
> This made me assume it was looping, as starting tomcat normally (ie
> without valgrind) results in a working tomcat setup visible in my web
> browser.
The LD_LIBRARY_PATH loop means that Java is not really running much
at all so I wouldn't expect to see much in the way of valgrind errors
reports in that case.
> I just tried adding '--error-limit=no' to the command line to start
> tomcat. It generates a bigger log file ... but the log file has
> stopped growing, though the valgrind/tomcat process looks like it is
> in a loop of some sort (ie a 'ps' shows it in an 'R' state).
> The log file stopped growing at 10:13, but it's now 10:40 and 'ps'
> shows the the process has now used over 28 minutes of CPU time.
Well what does strace say it is doing? What about --trace-syscalls=yes?
At end of the day my experience of trying to look at these problems
in the past is that the Sun JVM does truly disgusting things that are
often very hard to support in valgrind. It tends to stress just about
everything to the absolute limit especially things like threading and
signal handling.
My favourite one was where it seemed to be fiddling with a memory
page (either munmap or mprotect, I can't remember which) which it
didn't seem to have allocated... It seemed to just be making
assumptions about where certain things would be in memory for some
reason.
I assume you are using --smc-check=all or have turned the JIT compiler
off?
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 10:51:27
|
On 16 Jan 2006, at 10:14, David Eriksson wrote: > On Mon, 2006-01-16 at 07:12 +0000, Richard Frith-Macdonald wrote: >> I have a large body of code making complex use of JNI called from >> servlets in tomcat. >> >> Because this uses a lot of JNI, and was developed on the Sun >> implementation of java, I have been unable to find another >> implementation which will run it ... SableVM and Kaffe lack a lot of >> the JNI functions required. >> >> I have infrequent crashes in Java which are almost certainly due to >> errors in the JNI code ... most likely C library code writing to >> memory it shouldn't, and corrupting the Java system ... so valgrind >> seems the only realistic hope to catch that kind of thing. > > Just thinking out loud here... (It was a long time since I did JNI.) > > Would it be possible to create unit tests or similar that exerciced > the > JNI code without any Java involved? Then valgrind could run on the > unit > tests... Thanks for the suggestion, but the code is large,complex and involves callbacks into the java environment from the external libraries. Any non-java unit tests would need to simulate the presence of java and be hugely time consuming to write. It would probably be quicker to re-implement the system entirely in C/Objective-C. |
|
From: Richard Frith-M. <ri...@br...> - 2006-01-16 10:42:19
|
On 16 Jan 2006, at 08:55, Tom Hughes wrote: > In message <ABD...@br...> > Richard Frith-Macdonald <ri...@br...> wrote: > >> Unfortunately, valgrind doesn't seem to work with Sun's Java >> implementation. After much searching the 'resolved' bug #69508 seems >> to describe what is happening ... the last two comments in the bug >> report talk about Java re-execing to put its own directory at the >> front of LD_LIBRARY_PATH and valgrind doing the same ... getting into >> an infinite loop. > > Possibly - if you use --trace-children=yes do you see a whole stream > of valgrind startup messages appearing? No ... I guess I misdiagnosed the problem ... what I actually got was 1. a valgrind log file saying that there were too many errors and logging had stopped. 2. a system where the tomcat log was empty, and attempts to connect to it by apache were failing. This made me assume it was looping, as starting tomcat normally (ie without valgrind) results in a working tomcat setup visible in my web browser. >> I guess the bug being marked as 'resolved' refers to the original >> problem (a 'stack size too small' report) ... but it doesn't actually >> explain how to get valgrind/java to work together. has this actually >> been done? Is there a known workaround? > > I'm not sure when the bug was resolved, but we no longer change > the value of LD_LIBRARY_PATH as it was only ever needed to allow > us to replace libpthread with our own one and we haven't done that > since the 2.2.x releases. > > I think there was a delay in removing the LD_LIBRARY_PATH code, but > the 3.x releases should all be fine - none of them will try and > change it. > > What version are you using? Valgrind 3.1.0 built from source, Java jdk1.5.0_06 from Sun, Tomcat 4.1.29 on a Debian AMD64 'sid' system. I just tried adding '--error-limit=no' to the command line to start tomcat. It generates a bigger log file ... but the log file has stopped growing, though the valgrind/tomcat process looks like it is in a loop of some sort (ie a 'ps' shows it in an 'R' state). The log file stopped growing at 10:13, but it's now 10:40 and 'ps' shows the the process has now used over 28 minutes of CPU time. I don't know if it's any use at all, but the end of the log (the whole log is nearly 300KB ... too big to post to a list) looks like this ... ==7476== Invalid write of size 4 ==7476== at 0x70E3E710: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD332C: ??? ==7476== by 0x4BF2BA4: JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DC8EB8: os::os_exception_wrapper(void (*) (JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4BF29B4: JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/ server/libjvm.so) ==7476== by 0x4C1FF54: jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== Address 0x7FEFF7B88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E332F0: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5EFD: ??? ==7476== by 0x70DD5DE0: ??? ==7476== by 0x70DD332C: ??? ==7476== by 0x4BF2BA4: JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DC8EB8: os::os_exception_wrapper(void (*) (JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4BF29B4: JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/ server/libjvm.so) ==7476== by 0x4C1FF54: jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) (in /usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E3574D: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFFCA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFFBA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFFAA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFF9A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFF8A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E3577F: ??? ==7476== Address 0x7FEFF7A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E354CD: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E35503: ??? ==7476== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E3524D: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFFCA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFFBA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFFAA88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF9A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF8A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF7A88 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E31840: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF7A78 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E31F80: ??? ==7476== by 0x70E3527B: ??? ==7476== Address 0x7FEFF7A48 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70E33B4D: ??? ==7476== Address 0x7FEFF7B08 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC49: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFFCA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC50: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFFBA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC57: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFFAA60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC5E: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFF9A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC65: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFF8A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid write of size 4 ==7476== at 0x70DDDC6C: ??? ==7476== by 0x70E33B7F: ??? ==7476== Address 0x7FEFF7A60 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Invalid read of size 8 ==7476== at 0x70E3F260: ??? ==7476== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==7476== ==7476== Conditional jump or move depends on uninitialised value(s) ==7476== at 0x4E31AEB: SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned char*, SharedRuntime::ImplicitExceptionKind) (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DCA625: JVM_handle_linux_signal (in /usr/local/ jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x4DC812D: signalHandler(int, siginfo*, void*) (in / usr/local/jdk1.5.0_06/jre/lib/amd64/server/libjvm.so) ==7476== by 0x432B79F: (within /lib/libpthread-2.3.5.so) |
|
From: David E. <tw...@us...> - 2006-01-16 10:14:57
|
On Mon, 2006-01-16 at 07:12 +0000, Richard Frith-Macdonald wrote:
> I have a large body of code making complex use of JNI called from
> servlets in tomcat.
>
> Because this uses a lot of JNI, and was developed on the Sun
> implementation of java, I have been unable to find another
> implementation which will run it ... SableVM and Kaffe lack a lot of
> the JNI functions required.
>
> I have infrequent crashes in Java which are almost certainly due to
> errors in the JNI code ... most likely C library code writing to
> memory it shouldn't, and corrupting the Java system ... so valgrind
> seems the only realistic hope to catch that kind of thing.
Just thinking out loud here... (It was a long time since I did JNI.)
Would it be possible to create unit tests or similar that exerciced the
JNI code without any Java involved? Then valgrind could run on the unit
tests...
--
Regards,
-\- David Eriksson -/-
SynCE - http://synce.sourceforge.net
ScummVM - http://scummvm.sourceforge.net
Desquirr - http://desquirr.sourceforge.net
|