|
From: John R. <joh...@cr...> - 2004-03-16 23:15:50
|
Yes, its a _big_ server comprised of about 1.5 million lines C++ compiled with debug on. :) I upped that limit as you said and got the same failure. So then I upped VALGRIND_HEAPSIZE to 512M, same failure. Since I was there, I thought I'd see what else I could raise... :) I also tried upping VALGRIND_MAPSIZE to 512M, same failure. I also tried upping CLIENT_SIZE_MULTIPLE to 128M, same failure. By the same failure, I mean the same source code lines in Valgrind failing in the reported stack traces. You got me interested in estimating the number of symbols in my code. I did nm over the shared libraries it uses (my server is composed of 43 shared libraries, plus itself). This yielded 264,645 symbols (didn't try to uniq them :). John Roberts Credence Systems Corporation >Subject: Re: [Valgrind-users] memcheck in 2.1.1 gives INTERNAL ERROR >From: Jeremy Fitzhardinge <je...@go...> >To: John Roberts <joh...@cr...> >Cc: Valgrind users <val...@li...> >Mime-Version: 1.0 >Date: Tue, 16 Mar 2004 14:24:53 -0800 >Content-Transfer-Encoding: 7bit >X-BigFish: pcvs-47(z60di17eK60eHz98dIQ1432W1805M122eHzzzzz) > >On Tue, 2004-03-16 at 13:06, John Roberts wrote: >> Memcheck in Valgrind 2.1.1 doesn't work on my >> program, while the 2.1.0 distro did. >> >> I'm running the 2.4.21 kernel on Redhat Enterprise >> Linux 3. I made two "tweaks" to valgrind that might >> of affected this. :) >> I upped two values in coregrind/vg_include.h: >> >> #define M_PROCMAP_BUF 500000 >> (was 50000) >> >> #define VG_N_SEMAPHORES 250 >> (was 50) >> >> I upped those values because I ran into these limits >> in some earlier version of Valgrind. >> >> The gory details are appended... > >Hm, looks like its running out of heap. Is that a large number of .so >files of C++ code compiled with -g? > >Try increasing the heap size by changing VALGRIND_HEAPSIZE in vg_main.c >- try 256M or something. > > J > |
|
From: John R. <joh...@cr...> - 2004-03-16 23:39:39
|
>Just foe interest's sake, and to confirm the theory, can you strip the >libraries and try again? If it still happens, then we need to look >elsewhere. Well, if I run the optimized (non "-g" version) that is a monolithic executable 38M in size (and nm reports 101997) symbols, then memcheck does look like its working! So you're onto something here. It might be of interest that the unmodified vg_main.c distro also works on my non-debug, optimized server. I've appended the "good" trace, in case its of interest. thanks, John Roberts Credence Systems Corporation 133 mexia(2.4.21-4.0.1.EL):jroberts:server> /export/jroberts/tmp/bin/valgrind --tool=memcheck -v .vserver ==29951== Memcheck, a memory error detector for x86-linux. ==29951== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward. ==29951== Using valgrind-2.1.1, a program supervision framework for x86-linux. ==29951== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward. ==29951== Valgrind library directory: /export/jroberts/tmp/lib/valgrind ==29951== Command line ==29951== .vserver ==29951== Startup, with flags: ==29951== --tool=memcheck ==29951== -v ==29951== Reading syms from /export/jroberts/c/ServerApps/src/server/.vserver (0x8048000) ==29951== Reading syms from /lib/ld-2.3.2.so (0x30000000) ==29951== object doesn't have any debug info ==29951== Reading syms from /lib/ld-2.3.2.so (0xB0000000) ==29951== object doesn't have any debug info ==29951== Reading syms from /export/jroberts/tmp/lib/valgrind/vgskin_memcheck.so (0xB728D000) ==29951== Reading syms from /lib/tls/libc-2.3.2.so (0xB74B5000) ==29951== object doesn't have any debug info ==29951== Reading syms from /lib/libdl-2.3.2.so (0xB75ED000) ==29951== object doesn't have any debug info ==29951== Reading syms from /export/jroberts/tmp/lib/valgrind/stage2 (0xB8000000) ==29951== Reading suppressions file: /export/jroberts/tmp/lib/valgrind/default.supp ==29951== REDIRECT soname:libc.so.6(__GI___errno_location) to soname:libpthread.so.0(__errno_location) ==29951== REDIRECT soname:libc.so.6(__errno_location) to soname:libpthread.so.0(__errno_location) ==29951== REDIRECT soname:libc.so.6(__GI___h_errno_location) to soname:libpthread.so.0(__h_errno_location) ==29951== REDIRECT soname:libc.so.6(__h_errno_location) to soname:libpthread.so.0(__h_errno_location) ==29951== REDIRECT soname:libc.so.6(__GI___res_state) to soname:libpthread.so.0(__res_state) ==29951== REDIRECT soname:libc.so.6(__res_state) to soname:libpthread.so.0(__res_state) ==29951== REDIRECT soname:libc.so.6(stpcpy) to *vgpreload_memcheck.so*(stpcpy) ==29951== REDIRECT soname:libc.so.6(strnlen) to *vgpreload_memcheck.so*(strnlen) ==29951== REDIRECT soname:ld-linux.so.2(stpcpy) to *vgpreload_memcheck.so*(stpcpy) ==29951== REDIRECT soname:ld-linux.so.2(strchr) to *vgpreload_memcheck.so*(strchr) ==29951== ==29951== Reading syms from /export/jroberts/tmp/lib/valgrind/vg_inject.so (0x30019000) ==29951== Reading syms from /export/jroberts/tmp/lib/valgrind/vgpreload_memcheck.so (0x3001C000) ==29951== TRANSLATE: 0x30011E90 redirected to 0x3001DA00 ==29951== Reading syms from /export/jroberts/tmp/lib/valgrind/libpthread.so (0x30022000) ==29951== Reading syms from /lib/libdl-2.3.2.so (0x30064000) ==29951== object doesn't have any debug info ==29951== Reading syms from /lib/libnsl-2.3.2.so (0x30068000) ==29951== object doesn't have any debug info ==29951== Reading syms from /lib/tls/libm-2.3.2.so (0x3007E000) ==29951== object doesn't have any debug info ==29951== Reading syms from /lib/tls/libc-2.3.2.so (0x300A3000) ==29951== object doesn't have any debug info ==29951== TRANSLATE: 0x3011C6F0 redirected to 0x3001DFFC ==29951== TRANSLATE: 0x3011ADB0 redirected to 0x3001DBC0 ==29951== warning: Valgrind's pthread_attr_destroy does nothing ==29951== your program may misbehave as a result ==29951== warning: Valgrind's pthread_attr_destroy does nothing ==29951== your program may misbehave as a result ==29951== warning: Valgrind's pthread_attr_destroy does nothing ==29951== your program may misbehave as a result Inst file: /ims/cobalt/release/linux/cfg/inst_1_0_Build_17.cfg Config file: /ims/cobalt/release/linux/cfg/config_1_0_Build_17.cfg Project file: /ims/cobalt/release/linux/cfg/project_1_0_Build_17.cfg Defaults file: /ims/cobalt/release/linux/cfg/defaults_1_0_Build_17.cfg Protocol file: /ims/cobalt/release/linux/cfg/protocol_1_0_Build_17.cfg Timing: 1 Data: 33 Pmu: 1 Power: 11 ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x8854AA5: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C87E2: ImsSetupCollection_Server::createItem(ImsSaveableCollectionSelector const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C5D79: ImsSetupCollection_Server::ImsSetupCollection_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x884AF11: ImsTestStation_Server::ImsTestStation_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x8854AA5: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C8827: ImsSetupCollection_Server::createItem(ImsSaveableCollectionSelector const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C5D79: ImsSetupCollection_Server::ImsSetupCollection_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x884AF11: ImsTestStation_Server::ImsTestStation_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x94C5E53: SsiServerState::setServerStateFlags(unsigned long, bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x94C2CB4: SsiInterface::setServerStateFlags(unsigned long, bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x8854ABF: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C8827: ImsSetupCollection_Server::createItem(ImsSaveableCollectionSelector const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x8854AA5: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x8854A88: ImsTestStation_Server::setChanged() (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91E2479: ImsSetupCollection_Server::moveCurrent(ImsSetupTypeEnum, _STL::basic_string<char, _STL::char_traits<char>, _STL::allocator<char> > const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91186AF: ImsFixture_Server::makeCurrent() (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== Reading syms from /ims/cobalt/release/linux/subp/Lite/1_0_Build_17/src/libLiteS.so (0x31526000) Emulator Server: Version Cobalt 1.0 Build 17, "mexia:jroberts" ...Booted... [then I typed control-C to terminate my server:] Caught signal 2, SIGINT (interrupt) ==29951== ==29951== ERROR SUMMARY: 13 errors from 4 contexts (suppressed: 21 from 1) ==29951== ==29951== 1 errors in context 1 of 4: ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x8854AA5: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x8854A88: ImsTestStation_Server::setChanged() (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91E2479: ImsSetupCollection_Server::moveCurrent(ImsSetupTypeEnum, _STL::basic_string<char, _STL::char_traits<char>, _STL::allocator<char> > const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91186AF: ImsFixture_Server::makeCurrent() (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== ==29951== 4 errors in context 2 of 4: ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x94C5E53: SsiServerState::setServerStateFlags(unsigned long, bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x94C2CB4: SsiInterface::setServerStateFlags(unsigned long, bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x8854ABF: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C8827: ImsSetupCollection_Server::createItem(ImsSaveableCollectionSelector const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== ==29951== 4 errors in context 3 of 4: ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x8854AA5: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C8827: ImsSetupCollection_Server::createItem(ImsSaveableCollectionSelector const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C5D79: ImsSetupCollection_Server::ImsSetupCollection_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x884AF11: ImsTestStation_Server::ImsTestStation_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== ==29951== 4 errors in context 4 of 4: ==29951== Conditional jump or move depends on uninitialised value(s) ==29951== at 0x8854AA5: ImsTestStation_Server::setSaveNeeded(bool) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C87E2: ImsSetupCollection_Server::createItem(ImsSaveableCollectionSelector const&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x91C5D79: ImsSetupCollection_Server::ImsSetupCollection_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) ==29951== by 0x884AF11: ImsTestStation_Server::ImsTestStation_Server(ImsServerDatabase*, Vtr&) (in /export/jroberts/c/ServerApps/src/server/.vserver) --29951-- --29951-- supp: 21 Ugly strchr error in /lib/ld-2.3.2.so ==29951== ==29951== IN SUMMARY: 13 errors from 4 contexts (suppressed: 21 from 1) ==29951== ==29951== malloc/free: in use at exit: 8916908 bytes in 3594 blocks. ==29951== malloc/free: 2722823 allocs, 2719229 frees, 115162604 bytes allocated. ==29951== --29951-- TT/TC: 0 tc sectors discarded. --29951-- 43139 chainings, 2 unchainings. --29951-- translate: new 91274 (3674776 -> 31269553; ratio 85:10) --29951-- discard 1 (23 -> 320; ratio 139:10). --29951-- dispatch: 429500000 jumps (bb entries), of which 91749180 (21%) were unchained. --29951-- 12177/17977640 major/minor sched events. 156039 tt_fast misses. --29951-- reg-alloc: 16679 t-req-spill, 5304475+122891 orig+spill uis, 434637 total-reg-r. --29951-- sanity: 11694 cheap, 468 expensive checks. --29951-- ccalls: 693889 C calls, 50% saves+restores avoided (2048494 bytes) --29951-- 1146214 args, avg 0.78 setup instrs each (481646 bytes) --29951-- 0% clear the stack (2080986 bytes) --29951-- 147690 retvals, 38% of reg-reg movs avoided (110892 bytes) |
|
From: John R. <joh...@cr...> - 2004-03-17 17:12:36
|
I tried this patch, but still get the same failure with my debug server. The stack traces have the same source code files and lines (different hex addresses though). John R. >Subject: Re: [Valgrind-users] memcheck in 2.1.1 gives INTERNAL ERROR >From: Jeremy Fitzhardinge <je...@go...> >To: Doug Rabson <df...@nl...> >Cc: val...@li..., John Roberts <joh...@cr...> >Mime-Version: 1.0 >Date: Wed, 17 Mar 2004 08:19:04 -0800 >X-BigFish: pcvs-72(z60di17eK60eHz98dIQfa7RedcR122eHzzzzz1IQ) > >On Wed, 2004-03-17 at 02:16, Doug Rabson wrote: >> I have a tester with a similar problem (their application is apache + a >> large number of custom C++ modules). I ended up hacking together a >> 'replacement' for vg_symtab2.c which moved the symbol table storage and >> lookups into another process. We are still testing the results but it >> looks good so far. > >Hm, interesting idea. > >I put together a much simpler patch which just adds a >--detailed-types=no CLO to ignore all the extra info in the debug >output. It will make some error messages a bit less precise, but it >should use a lot less memory in cases like these. > >I think the ultimate fix, at least for Linux, is to implement a proper >DWARF2 reader which takes advantage of all the incremental loading stuff >DWARF2 provides. This will minimise memory overhead while still >providing full detail. > > J |
|
From: Jeremy F. <je...@go...> - 2004-03-17 18:19:32
Attachments:
skip-debug-typeinfo.patch
|
On Wed, 2004-03-17 at 09:12, John Roberts wrote: > I tried this patch, but still get the same failure with my > debug server. The stack traces have the same source code > files and lines (different hex addresses though). I'm surprised. When you ran with --detailed-types=no, it still ran out of memory? Did you try increasing the VALGRIND_HEAP as well? The plain symbol table and source+line info shouldn't take that much space. Hm, maybe they do. Hm, the patch needs a bit more in it. Try this one. J |
|
From: John R. <joh...@cr...> - 2004-03-17 18:54:43
|
This patch was unsucessful as well, sorry. I did try increasing VALGRIND_HEAP to 256M, then 512M, still no joy. I don't know if there's anything to be gained from the different runs between my optimized (non-debug) server and my debug server. I'm attaching both traces/outputs. Is it of interest that the optimized server run emits some "TRANSLATE 0xXXX redirected to 0xYYY", but the debug server doesn't? John R. >Subject: Re: [Valgrind-users] memcheck in 2.1.1 gives INTERNAL ERROR >From: Jeremy Fitzhardinge <je...@go...> >To: John Roberts <joh...@cr...> >Cc: Valgrind users <val...@li...> >Mime-Version: 1.0 >Date: Wed, 17 Mar 2004 10:19:56 -0800 >X-BigFish: pcvs-72(z60di17eK60eHz98dIQfa7RedcR122eHzzzzz1IQ) > >On Wed, 2004-03-17 at 09:12, John Roberts wrote: >> I tried this patch, but still get the same failure with my >> debug server. The stack traces have the same source code >> files and lines (different hex addresses though). > >I'm surprised. When you ran with --detailed-types=no, it still ran out >of memory? Did you try increasing the VALGRIND_HEAP as well? The plain >symbol table and source+line info shouldn't take that much space. Hm, >maybe they do. > >Hm, the patch needs a bit more in it. Try this one. > > J |
|
From: Jeremy F. <je...@go...> - 2004-03-16 23:22:52
|
On Tue, 2004-03-16 at 15:15, John Roberts wrote: > I upped that limit as you said and got the same failure. > > So then I upped VALGRIND_HEAPSIZE to 512M, same failure. > > Since I was there, I thought I'd see what else I could raise... :) > > I also tried upping VALGRIND_MAPSIZE to 512M, same failure. That shouldn't matter; that just limits the size of a particular shared library (Valgrind maps in a shared library to read its symbol table, but then unmaps it, so there's only ever one at a time mapped). > I also tried upping CLIENT_SIZE_MULTIPLE to 128M, same failure. This just means that the client address space is made to be a multiple of this, mostly to keep things pretty. > By the same failure, I mean the same source code lines in > Valgrind failing in the reported stack traces. > > You got me interested in estimating the number of symbols > in my code. I did nm over the shared libraries it uses > (my server is composed of 43 shared libraries, plus itself). > > This yielded 264,645 symbols (didn't try to uniq them :). Just foe interest's sake, and to confirm the theory, can you strip the libraries and try again? If it still happens, then we need to look elsewhere. J |
|
From: Doug R. <df...@nl...> - 2004-03-17 10:17:21
|
On Tuesday 16 March 2004 23:21, Jeremy Fitzhardinge wrote: > On Tue, 2004-03-16 at 15:15, John Roberts wrote: > > I upped that limit as you said and got the same failure. > > > > So then I upped VALGRIND_HEAPSIZE to 512M, same failure. > > > > Since I was there, I thought I'd see what else I could raise... :) > > > > I also tried upping VALGRIND_MAPSIZE to 512M, same failure. > > That shouldn't matter; that just limits the size of a particular > shared library (Valgrind maps in a shared library to read its symbol > table, but then unmaps it, so there's only ever one at a time > mapped). > > > I also tried upping CLIENT_SIZE_MULTIPLE to 128M, same failure. > > This just means that the client address space is made to be a > multiple of this, mostly to keep things pretty. > > > By the same failure, I mean the same source code lines in > > Valgrind failing in the reported stack traces. > > > > You got me interested in estimating the number of symbols > > in my code. I did nm over the shared libraries it uses > > (my server is composed of 43 shared libraries, plus itself). > > > > This yielded 264,645 symbols (didn't try to uniq them :). > > Just foe interest's sake, and to confirm the theory, can you strip > the libraries and try again? If it still happens, then we need to > look elsewhere. I have a tester with a similar problem (their application is apache + a large number of custom C++ modules). I ended up hacking together a 'replacement' for vg_symtab2.c which moved the symbol table storage and lookups into another process. We are still testing the results but it looks good so far. |
|
From: Jeremy F. <je...@go...> - 2004-03-17 16:19:12
Attachments:
skip-debug-typeinfo.patch
|
On Wed, 2004-03-17 at 02:16, Doug Rabson wrote: > I have a tester with a similar problem (their application is apache + a > large number of custom C++ modules). I ended up hacking together a > 'replacement' for vg_symtab2.c which moved the symbol table storage and > lookups into another process. We are still testing the results but it > looks good so far. Hm, interesting idea. I put together a much simpler patch which just adds a --detailed-types=no CLO to ignore all the extra info in the debug output. It will make some error messages a bit less precise, but it should use a lot less memory in cases like these. I think the ultimate fix, at least for Linux, is to implement a proper DWARF2 reader which takes advantage of all the incremental loading stuff DWARF2 provides. This will minimise memory overhead while still providing full detail. J |
|
From: Doug R. <df...@nl...> - 2004-03-17 16:44:10
|
On Wed, 2004-03-17 at 16:19, Jeremy Fitzhardinge wrote: > On Wed, 2004-03-17 at 02:16, Doug Rabson wrote: > > I have a tester with a similar problem (their application is apache + a > > large number of custom C++ modules). I ended up hacking together a > > 'replacement' for vg_symtab2.c which moved the symbol table storage and > > lookups into another process. We are still testing the results but it > > looks good so far. > > Hm, interesting idea. > > I put together a much simpler patch which just adds a > --detailed-types=no CLO to ignore all the extra info in the debug > output. It will make some error messages a bit less precise, but it > should use a lot less memory in cases like these. We tried something like that (mainly just stubbing out the stab type parser) but valgrind still wanted to use a lot of memory. Increasing valgrind's available virtual space wasn't feasable either because the application needed gobs of VM for its own massive shared memory requirements. > I think the ultimate fix, at least for Linux, is to implement a proper > DWARF2 reader which takes advantage of all the incremental loading stuff > DWARF2 provides. This will minimise memory overhead while still > providing full detail. This would be good for people with modern toolchains but still doesn't help those who are stuck with ancient stabs toolchains, including all FreeBSD 4.x systems. |