You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
| 2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
| 2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
| 2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
| 2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
| 2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
| 2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
| 2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
| 2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
| 2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
| 2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
| 2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
| 2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
| 2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
| 2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
| 2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
| 2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
| 2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
| 2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
| 2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(6) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
1
(3) |
|
2
(4) |
3
(5) |
4
(8) |
5
(4) |
6
(2) |
7
(4) |
8
(3) |
|
9
(1) |
10
(6) |
11
|
12
(2) |
13
|
14
|
15
|
|
16
|
17
(6) |
18
(7) |
19
(1) |
20
(14) |
21
(2) |
22
(9) |
|
23
(1) |
24
(8) |
25
(6) |
26
(12) |
27
(24) |
28
(24) |
29
(2) |
|
30
(13) |
31
(4) |
|
|
|
|
|
|
From: Nicholas N. <nj...@cs...> - 2008-03-22 00:38:08
|
On Fri, 21 Mar 2008, Jason Evans wrote:
> Firefox recently started embedding a custom memory allocator (jemalloc),
> but Firefox developers still want to be able to use valgrind, so I added
> the necessary VALGRIND_{MALLOC,FREE}LIKE_BLOCK() macro calls to make
> this work (sans redzones). This all went well except for peculiar
> errors that I suspect are indicative of a problem within valgrind.
>
> Early on during jemalloc initialization, a low level data structure is
> allocated internally. This data structure happens to start 64 bytes
> past the beginning of a page, and it is 3432 bytes (on amd64). I see
> many errors of the following nature, both 'invalid read' and 'invalid
> write' errors. These errors are always within the last 64 bytes of the
> allocated region (even when I turn off a compile-time flag, which
> shrinks the data structure by a few hundred bytes).
>
> ==16589== Invalid read of size 8
> ==16589== at 0x5CF3DF3: arena_malloc (in
> /home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
> ==16589== by 0x5CF640F: malloc (in
> /home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
> ==16589== by 0x64F5AC5: JS_ArenaAllocate (in
> /home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
>
> [...]
>
> I tried increasing the alignment of internal allocations in order to
> avoid this problem. If this large object is 64 or 128 bytes past the
> beginning of a page boundary, the problem persists. If it is any power
> of two in [256..4096] past a page boundary, the problem goes away.
>
> Does anyone familiar with valgrind's internals have some idea of what
> the problem is? My guess is that objects of this size are usually more
> strictly aligned, and that I'm hitting an edge case that other
> allocators happen to avoid.
I can't think of any problems off the top of my head. Valgrind's allocator
doesn't consider pages in any way, as far as I know, so I don't see why the
page boundary location would have an effect.
It could be a Valgrind bug. Can you make a small test case, eg. by using
jemalloc in a small C program? The {MALLOC,FREE}_LIKE_BLOCK stuff has
always been a little flaky.
Nick
|
|
From: Nicholas N. <nj...@cs...> - 2008-03-22 00:32:44
|
On Fri, 21 Mar 2008, Oswald, Michael wrote: >> I would say first that in my view using MAP_FIXED for anything is a >> bad idea. It silently replaces or truncates any existing mapping which >> overlaps the requested range, but there is no easy way to know beforehand >> if this will happened. The only way to use it safely is to have some >> way to know what the process' address space layout is, like reading >> /proc/self/maps, or in some very specialised situations, as ld.so does. > > I totally agree with that. The system I am working on was developed by > many companies and we proposed a few times to drop POST and use something > different, more portable and safe, but the proposal was never accepted. So you run the program natively, write out a data structure from memory to a file, and then try to read it in from a program running under Valgrind? And it doesn't work because the reading-in expects the data structure to be at exactly the same address as when it was written out? Assuming that's right, I don't see how it's ever going to work -- Valgrind provides an environment that is similar to native, but not identical, and any program that relies so much on things such as memory layout is a hopeless case for Valgrind, IMO. Nick |
|
From: Jason E. <ja...@ca...> - 2008-03-21 23:28:47
|
Firefox recently started embedding a custom memory allocator (jemalloc),
but Firefox developers still want to be able to use valgrind, so I added
the necessary VALGRIND_{MALLOC,FREE}LIKE_BLOCK() macro calls to make
this work (sans redzones). This all went well except for peculiar
errors that I suspect are indicative of a problem within valgrind.
Early on during jemalloc initialization, a low level data structure is
allocated internally. This data structure happens to start 64 bytes
past the beginning of a page, and it is 3432 bytes (on amd64). I see
many errors of the following nature, both 'invalid read' and 'invalid
write' errors. These errors are always within the last 64 bytes of the
allocated region (even when I turn off a compile-time flag, which
shrinks the data structure by a few hundred bytes).
==16589== Invalid read of size 8
==16589== at 0x5CF3DF3: arena_malloc (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x5CF640F: malloc (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x64F5AC5: JS_ArenaAllocate (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x6544511: NewRENode (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x654B6F9: ParseQuantifier (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x654C333: ParseRegExp (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x654C851: js_NewRegExp (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x654CF59: js_NewRegExpObject (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x653E87C: PrimaryExpr (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x653F6C1: MemberExpr (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x653FCFF: UnaryExpr (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== by 0x653FDD0: MulExpr (in
/home/jasone/mozilla/mozilla/obj_opt/js/src/libmozjs.so)
==16589== Address 0x421ED80 is 3,392 bytes inside a block of size 3,432
alloc'd
==16589== at 0x5CF1744: base_alloc (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x5CF4D9A: arenas_extend (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x5CF5348: malloc_init (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x5CF62DD: calloc (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x6FD954A: _dlerror_run (dlerror.c:142)
==16589== by 0x6FD8EF0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
==16589== by 0x6DB269F: _PR_InitZones (in
/home/jasone/mozilla/mozilla/obj_opt/nsprpub/pr/src/libnspr4.so)
==16589== by 0x6DB7501: _PR_ImplicitInitialization (in
/home/jasone/mozilla/mozilla/obj_opt/nsprpub/pr/src/libnspr4.so)
==16589== by 0x6DAD834: PR_NewLogModule (in
/home/jasone/mozilla/mozilla/obj_opt/nsprpub/pr/src/libnspr4.so)
==16589== by 0x5C3C665:
__static_initialization_and_destruction_0(int, int) (in
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x5CF6585: (within
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
==16589== by 0x546EADA: (within
/home/jasone/mozilla/mozilla/obj_opt/toolkit/library/libxul.so)
I tried increasing the alignment of internal allocations in order to
avoid this problem. If this large object is 64 or 128 bytes past the
beginning of a page boundary, the problem persists. If it is any power
of two in [256..4096] past a page boundary, the problem goes away.
Does anyone familiar with valgrind's internals have some idea of what
the problem is? My guess is that objects of this size are usually more
strictly aligned, and that I'm hitting an edge case that other
allocators happen to avoid.
Thanks,
Jason Evans
|
|
From: Oswald, M. <mic...@si...> - 2008-03-21 18:14:06
|
> I would say first that in my view using MAP_FIXED for anything is a > bad idea. It silently replaces or truncates any existing mapping which > overlaps the requested range, but there is no easy way to know beforehand > if this will happened. The only way to use it safely is to have some > way to know what the process' address space layout is, like reading > /proc/self/maps, or in some very specialised situations, as ld.so does. I totally agree with that. The system I am working on was developed by many companies and we proposed a few times to drop POST and use something different, more portable and safe, but the proposal was never accepted. The current approach is, that there is some small test-program, which uses some kind of heuristic to determine an address, which is then fixed with an environment variable. Still a rather lousy approach. > It respects MAP_FIXED if it can, but will reject calls which could > overwrite Valgrind's code or data mappings. So it will likely fail > vs succeed differently on Valgrind than natively. Note that Valgrind > changes the process' address space layout a lot compared to natively, and > so assumptions about what-is-where or what areas are free that might > appear to work natively may not work when running in Valgrind. That's what I was afraid of... > Yes. But, uh, requiring the libraries to load in a particular order > seems to me to be a sign of fragileness. Yes, of course. I think you can imagine, that you run into very funny crashes, if you recompile some of the libraries and forget to import into POST and try to run the system afterwards... Or you add some new library and did forget about the link order... Some people already spent days debugging crashes which were caused on this... > Best thing is to send a small test case which shows the problem. I > read through the rest of the thread but can't see from that enough > info to say anything much else. I don't know, if I am able to strip down the code to something like that. I will try. POST itself is free (http://www.ispras.ru/~knizhnik/post.html). > How does POST deal with address space randomization that modern > kernels commonly do? Even when not using Valgrind, wouldn't address > space randomization cause it problems? Yes. Normally this doesn't represent problems, since the system is only supported for older kernels. I myself did a port to Suse Linux Enterprise Server 10 where I ran exactly into this problem. The solution was quite simple, we added the disable_rand_maps kernel parameter at startup which disables this feature. thanks, Michael |
|
From: Julian S. <js...@ac...> - 2008-03-20 16:41:12
|
I would say first that in my view using MAP_FIXED for anything is a bad idea. It silently replaces or truncates any existing mapping which overlaps the requested range, but there is no easy way to know beforehand if this will happened. The only way to use it safely is to have some way to know what the process' address space layout is, like reading /proc/self/maps, or in some very specialised situations, as ld.so does. I worked for a while on a compiler runtime (http://haskell.org/ghc) that used MAP_FIXED to place the heap at certain locations. This caused enough portability and reliability problems that we eventually stopped using it. > So a few questions: > - How does valgrind handle mmap calls with MAP_FIXED? It respects MAP_FIXED if it can, but will reject calls which could overwrite Valgrind's code or data mappings. So it will likely fail vs succeed differently on Valgrind than natively. Note that Valgrind changes the process' address space layout a lot compared to natively, and so assumptions about what-is-where or what areas are free that might appear to work natively may not work when running in Valgrind. > - Does valgrind respect the link order of the libraries when loading these > (I would assume this)? Yes. But, uh, requiring the libraries to load in a particular order seems to me to be a sign of fragileness. > - Does anybody have an idea how to get valgrind to work with such a > process? Best thing is to send a small test case which shows the problem. I read through the rest of the thread but can't see from that enough info to say anything much else. How does POST deal with address space randomization that modern kernels commonly do? Even when not using Valgrind, wouldn't address space randomization cause it problems? J |
|
From: Oswald, M. <mic...@si...> - 2008-03-20 16:34:24
|
> You can look at the vptr in gdb. Just create an object of type CMDpktSlotDef > and CMDpktParDef in your main and check that the vptr point to the same > address in both cases. Ok, I think we have a hit: Local object in the beginning of main: _vptr.CMDpktSlotDef = 0x401038e0 _vptr.CMDpktParDef = 0x401038c0 Objects in shared mem: _vptr.CMDpktSlotDef = 0x40103d40 _vptr.CMDpktParDef = 0x401038c0 So the CMDpktParDef are identical, but the CMDpktSlotDef not. Strange thing anyway is, that the program runs through normally. Maybe it doesn't access critical virtual functions... So I'll have to investigate, where this mismatch of addresses comes from... > > I use this small program, which simply loads the shared mem, loops over it > > and dumps the classes on cout. > You do not want to provide it? It's simply not possible. Though it is a small program, it pulls in a lot of libraries which it depends on. And you would need the importer pgrogram too and it's libraries, together with the data files to import and the configuration of a lot variables which is needed by most libraries (last time it took me a week to correctly configure it for the first startup). So sadly, it is not feasible. Thanks very much to all and a happy easter! lg, Michael |
|
From: Christoph B. <bar...@or...> - 2008-03-20 15:33:32
|
Am Donnerstag, 20. März 2008 schrieb Oswald, Michael: > > Is m_pktParam a variable of a class with a virtual table? If yes, is the > > virtual table loaded correctly? > > Yes it is (CMDpktParDef). It inherits from a class "object" which has only > compiler generated default constructor/destructor. It's only virtual > function is the virtual destructor. No other classes inherit from it. > Funny. > > I did a small test: removed the virtual from the destructor, so the class > shouldn't be virtual anymore. The result was exactly the same. > > But CMDpktSlotDef has some virtual functions and the getLength() is one of > them, so I would say we are going into the right direction. > > I can't tell, if the virtual table is loaded correctly, it should be part > of the shared lib where CMDpktSlotDef is compiled in, right? > > So in principle it should be possible to get the vptr from an instance and > dump the vtable somehow. Hm, maybe I should look, if gcc provides something > which supports this. You can look at the vptr in gdb. Just create an object of type CMDpktSlotDef and CMDpktParDef in your main and check that the vptr point to the same address in both cases. > > Do you have a small testprogramm to look at the problem? > > I use this small program, which simply loads the shared mem, loops over it > and dumps the classes on cout. You do not want to provide it? |
|
From: Oswald, M. <mic...@si...> - 2008-03-20 15:23:50
|
> Is m_pktParam a variable of a class with a virtual table? If yes, is the > virtual table loaded correctly? Yes it is (CMDpktParDef). It inherits from a class "object" which has only compiler generated default constructor/destructor. It's only virtual function is the virtual destructor. No other classes inherit from it. Funny. I did a small test: removed the virtual from the destructor, so the class shouldn't be virtual anymore. The result was exactly the same. But CMDpktSlotDef has some virtual functions and the getLength() is one of them, so I would say we are going into the right direction. I can't tell, if the virtual table is loaded correctly, it should be part of the shared lib where CMDpktSlotDef is compiled in, right? So in principle it should be possible to get the vptr from an instance and dump the vtable somehow. Hm, maybe I should look, if gcc provides something which supports this. > Do you have a small testprogramm to look at the problem? I use this small program, which simply loads the shared mem, loops over it and dumps the classes on cout. lg, Michael |
|
From: Oswald, M. <mic...@si...> - 2008-03-20 14:51:51
|
> Did you already try to add --trace-signals=yes to Valgrind's command
> line options ? This should tell you more about the cause of the crash.
Ok, this did put out this:
==21657== Invalid read of size 4
==21657== at 0x804EBCA: main (TESTmib.C:127)
==21657== Address 0x40103d48 is not stack'd, malloc'd or (recently) free'd
--21657-- signal 11 arrived ... si_code=1, EIP=0x804EBCA, eip=0x6883659C
--21657-- SIGSEGV: si_code=1 faultaddr=0x40103D48 tid=1 ESP=0xBEFFD270 seg=NULL
--21657-- delivering signal 11 (SIGSEGV):1 to thread 1
--21657-- push_signal_frame (thread 1): signal 11
==21657== at 0x804EBCA: main (TESTmib.C:127)
--21657-- Async handler got signal 6 for tid 2 info 0
--21657-- kill: sent signal 6 to pid 21657
--21657-- VG_(signal_return) (thread 1): isRT=0 valid magic; EIP=0x804EBCA
==21657==
==21657== Jump to the invalid address stated on the next line
==21657== at 0x40103D40: ???
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
==21657== Address 0x40103d40 is not stack'd, malloc'd or (recently) free'd
--21657-- translations not allowed here (0x40103d40) - throwing SEGV
--21657-- delivering signal 11 (SIGSEGV):1 to thread 1
--21657-- push_signal_frame (thread 1): signal 11
==21657== at 0x40103D40: ???
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
--21657-- Async handler got signal 6 for tid 3 info 0
--21657-- kill: sent signal 6 to pid 21657
--21657-- VG_(signal_return) (thread 1): isRT=0 valid magic; EIP=0x40103D40
--21657-- translations not allowed here (0x40103d40) - throwing SEGV
--21657-- delivering signal 11 (SIGSEGV):1 to thread 1
--21657-- push_signal_frame (thread 1): signal 11
==21657== at 0x40103D40: ???
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
--21657-- Async handler got signal 6 for tid 4 info 0
--21657-- kill: sent signal 6 to pid 21657
--21657-- VG_(signal_return) (thread 1): isRT=0 valid magic; EIP=0x40103D40
--21657-- translations not allowed here (0x40103d40) - throwing SEGV
--21657-- delivering signal 11 (SIGSEGV):1 to thread 1
--21657-- push_signal_frame (thread 1): signal 11
==21657== at 0x40103D40: ???
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
--21657-- Async handler got signal 6 for tid 5 info 0
--21657-- kill: sent signal 6 to pid 21657
--21657-- VG_(signal_return) (thread 1): isRT=0 valid magic; EIP=0x40103D40
--21657-- translations not allowed here (0x40103d40) - throwing SEGV
--21657-- delivering signal 11 (SIGSEGV):1 to thread 1
--21657-- push_signal_frame (thread 1): signal 11
==21657== at 0x40103D40: ???
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
--21657-- Async handler got signal 6 for tid 6 info 0
--21657-- kill: sent signal 6 to pid 21657
--21657-- VG_(signal_return) (thread 1): isRT=0 valid magic; EIP=0x40103D40
--21657-- translations not allowed here (0x40103d40) - throwing SEGV
--21657-- delivering signal 11 (SIGSEGV):1 to thread 1
--21657-- push_signal_frame (thread 1): signal 11
==21657== at 0x40103D40: ???
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
--21657-- kill: sent signal 6 to pid 21657
--21657-- poll_signals: got signal 6 for thread 1
--21657-- Polling found signal 6 for tid 1
--21657-- delivering signal 6 (SIGABRT):0 to thread 1
--21657-- push_signal_frame (thread 1): signal 6
==21657== at 0x695D8B6: kill (in /lib/tls/libc.so.6)
==21657== by 0x62EDA77: (within /lib/tls/libpthread.so.0)
==21657== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
--21657-- delivering signal 6 (SIGABRT):0 to thread 6
--21657-- push_signal_frame (thread 6): signal 6
==21657== at 0x69BEE66: (within /lib/tls/libc.so.6)
==21657== by 0x49F4FD9: APPLItaskManagerInterface::checkTaskManager() (APPLItaskManagerInterface.C:1043)
==21657== by 0x49F4419: APPLItaskManagerInterface::f_checkTaskManger(void*) (APPLItaskManagerInterface.C:942)
==21657== by 0x49F55A1: APPLItaskManagerInterface_f_checkTaskMangerWrapper (APPLItaskManagerInterface.C:1064)
==21657== by 0x62E7CF6: start_thread (in /lib/tls/libpthread.so.0)
==21657== by 0x69F02ED: clone (in /lib/tls/libc.so.6)
==21657== by 0xA85EBAF: ???
--21657-- delivering signal 6 (SIGABRT):0 to thread 2
--21657-- push_signal_frame (thread 2): signal 6
==21657== at 0x62E9F7C: pthread_cond_timedwait@@GLIBC_2.3.2 (in /lib/tls/libpthread.so.0)
--21657-- delivering signal 6 (SIGABRT):0 to thread 3
--21657-- push_signal_frame (thread 3): signal 6
==21657== at 0x62E9D06: pthread_cond_wait@@GLIBC_2.3.2 (in /lib/tls/libpthread.so.0)
==21657== by 0x65EFA1C: omniOrbORB::run() (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x4889281: MISCcorba::startEventLoop(void (*)(void*), void*) (MISCcorba.C:1588)
==21657== by 0x488E63E: MISCcorbaLoopThread::threadMethod() (MISCcorba.C:94)
==21657== by 0x4892FE0: MISCthread::threadFunc(void*) (MISCthread.C:244)
==21657== by 0x48931A7: MISCthread_threadFuncWrapper (MISCthread.C:347)
==21657== by 0x62E7CF6: start_thread (in /lib/tls/libpthread.so.0)
==21657== by 0x69F02ED: clone (in /lib/tls/libc.so.6)
==21657== by 0x905BBAF: ???
--21657-- delivering signal 6 (SIGABRT):0 to thread 5
--21657-- push_signal_frame (thread 5): signal 6
==21657== at 0x62E9F7C: pthread_cond_timedwait@@GLIBC_2.3.2 (in /lib/tls/libpthread.so.0)
==21657== by 0x489125F: _CORBA_Sequence<unsigned char>::copybuffer(unsigned long) (seqTemplatedecls.h:296)
--21657-- delivering signal 6 (SIGABRT):0 to thread 4
--21657-- push_signal_frame (thread 4): signal 6
==21657== at 0x69E6E44: poll (in /lib/tls/libc.so.6)
==21657== by 0x667E5ED: omni::SocketCollection::Select() (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x66A5002: omni::tcpEndpoint::AcceptAndMonitor(void (*)(void*, omni::giopConnection*), void*) (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x666451F: omni::giopRendezvouser::execute() (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x660AB7F: omniAsyncWorker::real_run() (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x660A4CC: omniAsyncWorkerInfo::run() (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x660ADE8: omniAsyncWorker::run(void*) (in /opt/omniORB-4.1.0/lib/libomniORB4.so.1.0)
==21657== by 0x6578982: omni_thread_wrapper (in /opt/omniORB-4.1.0/lib/libomnithread.so.3.3)
==21657== by 0x62E7CF6: start_thread (in /lib/tls/libpthread.so.0)
==21657== by 0x69F02ED: clone (in /lib/tls/libc.so.6)
Program catch signal 6.
Hm, so it initially throws a signal 11, but the Async handler gets a signal 6? Or does this mean, on signal 11 it sends signal 6 to abort the other threads?
Somehow confusing...
lg,
Michael
|
|
From: Christoph B. <bar...@or...> - 2008-03-20 14:48:11
|
Am Donnerstag, 20. März 2008 schrieb Oswald, Michael:
> The code, where valgrind points to is like this:
>
> ....
>
> const CMDpktSlotDef* sd =
> (*(m_def->getPkt()->getPktParamSlots()))[i]; if(!sd) continue;
> unsigned short length = sd->getLength(); <--- crash appears
> here
>
> ....
>
> The CMDpktSlotDef lies in the shared memory. The getLength() is like this:
>
> virtual const unsigned short getLength() const
> {return (m_pktParam) ? m_pktParam->getLength() : 0;};
>
> When I print the pointers, they are all valid (with addresses in the shared
> memory), still it crashes with the 0x40xxxxxx address, which seems to be in
> the code segment (the shared mem is mapped to 0x71000000).
>
Is m_pktParam a variable of a class with a virtual table? If yes, is the
virtual table loaded correctly?
Do you have a small testprogramm to look at the problem?
Christoph
|
|
From: Oswald, M. <mic...@si...> - 2008-03-20 14:30:16
|
> Do you check that the mmap succeeds to load the objects at the desired
> address? I would suspect that this is not the case.
Yup, is checked. The testprogram, which generated the valgrind output I posted simply loops through the containers and dumps out all classes. It runs ok for the first half (the TM classes), but fails for the TC classes. I did put some logging output into some of the classes to print out the adresses of *this and some members. With this logging it runs even over the first few TC classes without problems and then crashes. Somehow strange.
The code, where valgrind points to is like this:
....
const CMDpktSlotDef* sd = (*(m_def->getPkt()->getPktParamSlots()))[i];
if(!sd) continue;
unsigned short length = sd->getLength(); <--- crash appears here
....
The CMDpktSlotDef lies in the shared memory. The getLength() is like this:
virtual const unsigned short getLength() const
{return (m_pktParam) ? m_pktParam->getLength() : 0;};
When I print the pointers, they are all valid (with addresses in the shared memory), still it crashes with the 0x40xxxxxx address, which seems to be in the code segment (the shared mem is mapped to 0x71000000).
The only hint, that I have about what's going on, is the manual which told me about the link order of the libraries and that I can reproduce the error when I change the link order for the testprogram even without valgrind.
With an earlier version of the system, it was possible to use purify on Solaris. But you had to purify the importer, import the data into the shared mem and then purify the process to debug which then uses the (purified) memory image. Unfortunately, this doesn't even work with purify under Linux, so valgrind is some kind of last resort.
> Why is not proper serializing of the containers used? I would bet that is is
> still fast enough but safe to use.
I am completely with you on this and I would be the first volunteer to change it but this is not in my hand. This system has evolved from 1996 onwards and went through numerous changes but sadly not in that range.
lg,
Michael
|
|
From: Christoph B. <bar...@or...> - 2008-03-20 14:06:01
|
Am Donnerstag, 20. März 2008 schrieb Oswald, Michael: > Yes, I did this. After the mmap call I declared the whole block with > VALGRIND_MALLOCLIKE_BLOCK. And it works for some of the objects (e.g. I can > access the TM objects whereas the crash appears on accessing the TC > objects). Unfortunately, the requirements for using the persistent object > store are the fixed addresses, so if for some reason the loading of the > shared libraries is in a different order, they get allocated to a different > address and the code doesn't work anymore. This is really annoying (and a > really outdated behaviour for a system) but I have to live with it. Do you check that the mmap succeeds to load the objects at the desired address? I would suspect that this is not the case. Why is not proper serializing of the containers used? I would bet that is is still fast enough but safe to use. Christoph |
|
From: Bart V. A. <bar...@gm...> - 2008-03-20 13:46:24
|
On Thu, Mar 20, 2008 at 2:40 PM, Oswald, Michael <mic...@si...> wrote: > > > ==10251== Invalid read of size 4 > > ==10251== at 0x804EBCA: main (TESTmib.C:127) > > ==10251== Address 0x40103d48 is not stack'd, malloc'd or (recently) free'd > > ==10251== > > ==10251== Jump to the invalid address stated on the next line > > ==10251== at 0x40103D40: ??? > > ==10251== by 0x694B20F: (below main) (in /lib/tls/libc.so.6) > > ==10251== Address 0x40103d40 is not stack'd, malloc'd or (recently) free'd > > Program catch signal 6. > > >A signal 6 is an SIGABRT. Are you sure you're not bouncing your head > >agains an internal consitancy check ? > > Well, normally it crashes with SIGSEGV. This was really the first time where it > crashed with SIGABRT. In the indicated code, there is no assertion near the > crash. Maybe because of the invalid jump, the code where it jumped to was > interpreted as an abort? I really don't know. Did you already try to add --trace-signals=yes to Valgrind's command line options ? This should tell you more about the cause of the crash. Bart. |
|
From: Oswald, M. <mic...@si...> - 2008-03-20 13:41:19
|
> ==10251== Invalid read of size 4 > ==10251== at 0x804EBCA: main (TESTmib.C:127) > ==10251== Address 0x40103d48 is not stack'd, malloc'd or (recently) free'd > ==10251== > ==10251== Jump to the invalid address stated on the next line > ==10251== at 0x40103D40: ??? > ==10251== by 0x694B20F: (below main) (in /lib/tls/libc.so.6) > ==10251== Address 0x40103d40 is not stack'd, malloc'd or (recently) free'd > Program catch signal 6. >A signal 6 is an SIGABRT. Are you sure you're not bouncing your head >agains an internal consitancy check ? Well, normally it crashes with SIGSEGV. This was really the first time where it crashed with SIGABRT. In the indicated code, there is no assertion near the crash. Maybe because of the invalid jump, the code where it jumped to was interpreted as an abort? I really don't know. lg, Michael |
|
From: Oswald, M. <mic...@si...> - 2008-03-20 13:33:27
|
>Hello Michael, >If I understood your e-mail correcty, the memory of the process that >you are analyzing with Valgrind has been initialized by another >process ? Yes. First an importer process is run, which generates the objects in the shared memory. The content of the shared memory is then stored into a file. Later, when the system is started and one of it's processes needs the object information, it loads the file, mmaps it and then uses the objects like normal C++ code. >In that case you will have to include the file valgrind.h in your >program and declare the shared memory segment as initialized (see also >http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs). >The InfiniBand people are also doing this (InfiniBand is a networking >technology that allows one process to write in the memory of a process >on another server). Yes, I did this. After the mmap call I declared the whole block with VALGRIND_MALLOCLIKE_BLOCK. And it works for some of the objects (e.g. I can access the TM objects whereas the crash appears on accessing the TC objects). Unfortunately, the requirements for using the persistent object store are the fixed addresses, so if for some reason the loading of the shared libraries is in a different order, they get allocated to a different address and the code doesn't work anymore. This is really annoying (and a really outdated behaviour for a system) but I have to live with it. lg, Michael |
|
From: Bart V. A. <bar...@gm...> - 2008-03-20 13:04:58
|
On Thu, Mar 20, 2008 at 1:11 PM, Oswald, Michael <mic...@si...> wrote: > - Does anybody have an idea how to get valgrind to work with such a process? Hello Michael, If I understood your e-mail correcty, the memory of the process that you are analyzing with Valgrind has been initialized by another process ? In that case you will have to include the file valgrind.h in your program and declare the shared memory segment as initialized (see also http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs). The InfiniBand people are also doing this (InfiniBand is a networking technology that allows one process to write in the memory of a process on another server). Bart. |
|
From: Igmar P. <mai...@jd...> - 2008-03-20 12:51:25
|
> ==10251== Invalid read of size 4 > ==10251== at 0x804EBCA: main (TESTmib.C:127) > ==10251== Address 0x40103d48 is not stack'd, malloc'd or (recently) free'd > ==10251== > ==10251== Jump to the invalid address stated on the next line > ==10251== at 0x40103D40: ??? > ==10251== by 0x694B20F: (below main) (in /lib/tls/libc.so.6) > ==10251== Address 0x40103d40 is not stack'd, malloc'd or (recently) free'd > Program catch signal 6. A signal 6 is an SIGABRT. Are you sure you're not bouncing your head agains an internal consitancy check ? Igmar |
|
From: Oswald, M. <mic...@si...> - 2008-03-20 12:15:40
|
Hello,
I am using valgrind (ver. 3.3.0 on SuSE Linux Enterprise Server 9, gcc 3.3.3) on a large project which uses the POST++ persistent object library. In principle, it imports some data from files and creates a lot of (modified) STL containers of objects in a shared memory segment. The binary image of this segment is then saved and, when needed from a process, loaded and mmapped to a fixed address. The objects and containers can then be normally accessed.
When using valgrind on a process which uses POST (I added some valgrind client requests to tell valgrind about the shared memory), the program crashes when accessing a specific part of the shared memory. It doesn't do this when running the program without valgrind and most of the runs with valgrind are fine too (if they are in another range of the shared memory).
Valgrind reports something like this:
==10251== Invalid read of size 4
==10251== at 0x804EBCA: main (TESTmib.C:127)
==10251== Address 0x40103d48 is not stack'd, malloc'd or (recently) free'd
==10251==
==10251== Jump to the invalid address stated on the next line
==10251== at 0x40103D40: ???
==10251== by 0x694B20F: (below main) (in /lib/tls/libc.so.6)
==10251== Address 0x40103d40 is not stack'd, malloc'd or (recently) free'd
Program catch signal 6.
Whereas the given problematic address (0x40103d48) seems to be rather in the code segment.
After some research it turned out, that I can get the same error with gdb (running the program without valgrind) when the link order of the libraries is invalidated. This means for example, that I have to link a program with libPOST libA libB libC and so on in this explicit order which has to be the same as from the process, who generated the binary image. Only with the right link order the addresses match when the code of the C++ objects in the shared memory is executed.
Now it seems that valgrind, since it provides a slightly different memory model, runs into problems because even when the link order of the libraries is the same, the addresses of some objects may not be the same and the code of one library (say libB) then jumps into the void.
So a few questions:
- How does valgrind handle mmap calls with MAP_FIXED?
- Does valgrind respect the link order of the libraries when loading these (I would assume this)?
- Does anybody have an idea how to get valgrind to work with such a process?
lg,
Michael
|
|
From: Julian S. <js...@ac...> - 2008-03-18 17:26:46
|
> The output debug format is, at least until recently, is stabs. I don't know > if valgrind requires dwarf, which is what GCC generates currently. 3.3.X should handle stabs OK, although we are thinking of dumping support for it in 3.4.X, since stabs is pretty ancient now. J |
|
From: Paul F. <pa...@fr...> - 2008-03-18 17:21:48
|
Quoting Aravind G S <Ara...@ib...>: > Hi > > I am using valgrind to find the memory leakages of my application. I am > using Sun compiler to compile the source code in Linux. But I am not able > to find the line number/file where the leakage happens. It displays only > the library where the problem occurs. Is there any method to display the > file name while using Sun compilers? Hi Exactly which version of the Sun compilers are you using? The output debug format is, at least until recently, is stabs. I don't know if valgrind requires dwarf, which is what GCC generates currently. I suggest that you try either or both of the following: - try using the latest Sun Studio Express - try the -xdebugformat=dwarf A+ Paul |
|
From: Paul W. <pa...@bl...> - 2008-03-18 14:37:05
|
On Tue, Mar 18, 2008 at 06:42:47PM +0530, Aravind G S wrote: > to find the line number/file where the leakage happens. It displays only > the library where the problem occurs. Is there any method to display the > file name while using Sun compilers? This is an obvious question, but I have to ask it - have you made sure to compile the program/libraries with debugging information? This is -g on gcc; I don't know what it would be for Sun's cc. I could be asking completely the wrong question here; someone will come along shortly to correct me if I am. ;-) -- Paul "You stay here. I'll whistle if it's safe to follow me." "What will you do if it isn't safe?" "Scream." -- Terry Pratchett, "Pyramids" |
|
From: Bart V. A. <bar...@gm...> - 2008-03-18 14:33:30
|
On Tue, Mar 18, 2008 at 2:12 PM, Aravind G S <Ara...@ib...> wrote: > > I am using valgrind to find the memory leakages of my application. I am > using Sun compiler to compile the source code in Linux. But I am not able to > find the line number/file where the leakage happens. It displays only the > library where the problem occurs. Is there any method to display the file > name while using Sun compilers? Hello Aravind, Would it be feasible to recompile Valgrind and your application with gcc, or do you need particular Sun Studio features ? Bart. |
|
From: Bart V. A. <bar...@gm...> - 2008-03-18 07:10:59
|
On Mon, Mar 17, 2008 at 11:10 PM, Susan Margulies <sma...@uc...> wrote: > You are totally right! I am so used to only new-ing objects that I forgot > about new[]-ing arrays! I am so sorry to have bothered you!! > > Best, > Susan Hello Susan, new[] and delete[] are obsolete C++ features -- it is better to use the class std::vector<> instead. Bart. |
|
From: Julian S. <js...@ac...> - 2008-03-18 01:29:00
|
> We have custom allocators so I used the mempool > interface as that matches our internals nicely - but I found the > check_mempool_sane > calls made things excessively slow so commented them out of alloc and > free which works > fine for me. These checks seemed overly paranoid for a well tested > allocator > to me and was wondering why they are there all the time (at least after > my > integration of the custom allocator is tested and working for us). I don't know. > The second issue is the user requests like VALGRIND_CHECK_MEM_IS_DEFINED > are documented to return the address of the first bad byte - yet > internally > they use a unsigned int type is which does not seem correct on 64 bit > machines > I would have expected that to be unsigned long or perhaps void *. That sounds like a straightforward 32-vs-64-bit bug. Can you fix both of these in a way that seems right to you, and send along a patch? Also, what version of V was this? J |
|
From: Solomon, B. <be...@ug...> - 2008-03-17 22:29:52
|
I have been trying valgrind for earnest for the first time and have a couple of questions. We have custom allocators so I used the mempool interface as that matches our internals nicely - but I found the check_mempool_sane calls made things excessively slow so commented them out of alloc and free which works fine for me. These checks seemed overly paranoid for a well tested allocator to me and was wondering why they are there all the time (at least after my integration of the custom allocator is tested and working for us). The second issue is the user requests like VALGRIND_CHECK_MEM_IS_DEFINED are documented to return the address of the first bad byte - yet internally they use a unsigned int type is which does not seem correct on 64 bit machines I would have expected that to be unsigned long or perhaps void *. Thanks Bernie Solomon |