|
From: Nicholas N. <nj...@cs...> - 2005-03-02 17:00:29
|
Hi, I've trawled through Bugzilla, and categorised all our bugs and wishlist reports. Details are below. The aim was to identify which parts of Valgrind are causing us grief, based on real user feedback... a bit like an indirect survey. I've done some brief analysis, but I'm hoping to prompt some discussion, basically. An extremely brief summary is: - The JIT suffers some fundamental problems that are worrying and hard to fix, even with Vex - Our debug info reader sucks - Suppressions, as implemented, suck, and could be improved Happy reading. N ============================================================================= BUGS ============================================================================= The status of those marked with a '?' is unclear, eg. they might have been fixed but the original reporter hasn't confirmed it, or there's a patch but it's unclear if the patch works/has been committed. Those marked with a '*' are those that I think are the most important. ----------------------------------------------------------------------------- JIT shortcomings ----------------------------------------------------------------------------- * 69511 Valgrind can call wrong function * 69530 we need to implement precise exception handling * 69531 Some tools need a mechanism to save machine state before ... * 81361 Can't distinguish large stack allocations from stack-swit... 82654 prefetch instructions are ignored 85756 x86 assembly prefix LOCK to guarantee atomicity has no ef... ----------------------------------------------------------------------------- JIT bugs ----------------------------------------------------------------------------- 87263 Assertion `seg_selector >= 6 && seg_selector < (6 + 3)' f... 88116 "enter" instruction's nested variation not supported 96542 Assertion vg_assert(sz == 4) failed in vg_to_ucode 97231 Jump to the invalid address stated on the next line 100486 memcheck reports "valgrind: the `impossible' happened: V... ----------------------------------------------------------------------------- debug info reading ----------------------------------------------------------------------------- ?78520 bug parsing gstabs+ debug syms for c++ templates ?81262 valgrind crashes with "the `impossible' happened" <- don'... 89914 seg fault analyzing programs compiled with gnu pascal 20... ?89973 seg fault parsing stabs debug information 90901 Valgrind is very-very slow ?91633 dereference of null ptr in vgPlain_st_basetype * 92071 Reading debugging info uses too much memory ?95867 get SIGSEGV when running my code under valgrind (compile ... 96918 Hit maxsyms limit in VG_(get_scope_variables) ----------------------------------------------------------------------------- resource management/conflicts ----------------------------------------------------------------------------- 73146 Can't increase file descriptor space (valgrind reserved fds) * 82301 FV memory layout too rigid ?89199 aio_write causes valgrind to hang * 93818 couldn't allocate address space for shadow memory * 98278 Infinite recursion possible when allocating memory 100628 leak-check gets assertion failure when using VALGRIND_MALLOCLIKE... ----------------------------------------------------------------------------- Massif ----------------------------------------------------------------------------- 82871 Massif output function names too short 89061 Massif: ms_main.c:485 (get_XCon): Assertion `xpt->max_chi... 89928 Massif Aborting with failure get_XCon ----------------------------------------------------------------------------- misc crashes/asserts ----------------------------------------------------------------------------- 73133 old versions of glibc can't handle auxv types >=32 80932 valgrind: vg_memory.c:757 (vgPlain_init_shadow_range): As... ?87645 ASSERT `vgPlain_is_valid_tid(vg_tid_last_in_baseBlock)' f... 88192 dlsym(...,"errno") fails under valgrind on RH-9.0 91601 INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) ?96559 Received "INTERNAL ERROR: Valgrind received a signal 11 (... ----------------------------------------------------------------------------- suppressions ----------------------------------------------------------------------------- 89996 supp file for libguile 91153 Possible new leak to suppress in glibc-2.?.supp 91197 Annoying user confirmation required with --gen-suppressio... ----------------------------------------------------------------------------- memcheck, Cachegrind, Helgrind ----------------------------------------------------------------------------- 97042 valgrind problems with subl $0x80000000, %reg [Vex] 90838 debugging aisexec: cannot get simulation results 79844 Helgrind complains about race condition which does not exist ----------------------------------------------------------------------------- GDB attaching ----------------------------------------------------------------------------- 88842 --db-attach=yes problem with --log-file ?95896 enter gdb option doesn't work for my stty -raw application ----------------------------------------------------------------------------- misc ----------------------------------------------------------------------------- 79759 error reports when running programs copiled with g++ with... ?80946 lots of false errors due to incomplete str* intercepts 85050 Full source path not logged 88678 Empty stack trace for executables with a space in the path 90348 vg_mylibc.c:668 (vgPlain_sprintf): Assertion `vgPlain_str... 92923 lit_to_globvar broken 96321 make check fails with hardened gcc 97452 Valgrind doesn't report any pthreads problems 98444 valgrind fails to run against user-mode-linux (UML) proce... 98071 valgrind bombs out when sdl is linked 98290 Valgrind internals need annotating 100428 compile failure in coregrind vg_pthreadmodel ----------------------------------------------------------------------------- 8 most important, IMHO ----------------------------------------------------------------------------- * 69511 Valgrind can call wrong function * 69530 we need to implement precise exception handling * 69531 Some tools need a mechanism to save machine state before ... * 81361 Can't distinguish large stack allocations from stack-swit... * 82301 FV memory layout too rigid * 92071 Reading debugging info uses too much memory * 93818 couldn't allocate address space for shadow memory * 98278 Infinite recursion possible when allocating memory Comments: - Of the eight most important, 4 are in the JIT, and they're just hard problems. It's not easy to see how to fix these without hurting performance really badly. And the other two JIT shortcomings are arguably very important too. And the other four all have to do with memory layout. Hopefully Julian's ideas about 64MB blocks can fix these; that would be great. - Our debug info reading is kind of sucky, although it's improved a lot thanks to Tom and Jeremy's work. But I think it's now ready for an overhaul: to be neater, and to support incremental reading, which would help with the memory layout issues. - Massif needs a re-write. It's currently too fragile, the data structures are too complex and hard to understand, and it's hard to verify it's doing the right thing. It's on my long-term todo list, but I'm not even sure how to go about it yet. - There are happily few bugs involving signals/pthreads/the scheduler. Before Jeremy fixed that stuff, there were heaps. So that's great. (We're down to about 57 bugs now, whereas at one point we were closer to 90.) ============================================================================= WISHLIST ============================================================================= ----------------------------------------------------------------------------- suppressions ----------------------------------------------------------------------------- 73309 Ignore duplicate reports when prompting for generating su... * 77922 Honour error suppression call lists longer than 4 functions. * 79787 Suppresions files should be auto generated 93376 Suppressions directory ----------------------------------------------------------------------------- error messages ----------------------------------------------------------------------------- 75104 Would like option to suppress output of stack addresses 79311 malloc silly arg warning does not give stack trace * 79362 Debug info is lost for .so files when they are dlclose'd 82176 loss records with parameters 92036 A wish for ANSI Colors in the output of valgrind 98993 Option to show addresses of recently executed basic blocks ----------------------------------------------------------------------------- ports ----------------------------------------------------------------------------- * 75247 x86_64/amd64 support ----------------------------------------------------------------------------- new tools/greatly extended tools ----------------------------------------------------------------------------- 75999 Valgrind tool should also include functionality of code c... 76510 way to measure working set size? 81917 Request for Feature - maps of variable locations in cache... 84303 How about a LockCheck tool? 93498 Request for implementing SIDT instruction 95261 provide a library to write cachegrind-like tools 96109 Support for CMP architecture in Valgrind ( cachegrind ) ----------------------------------------------------------------------------- leak checking ----------------------------------------------------------------------------- 74899 Tracking leaks in real time 81079 Provide a macro to clear leak list ----------------------------------------------------------------------------- massif ----------------------------------------------------------------------------- 89707 alloc-fn appears not to work for C++ class member functions 92615 Write output from Massif at crash 95483 massif feature request: include peak allocation in report ----------------------------------------------------------------------------- misc ----------------------------------------------------------------------------- 84348 Support linuxthreads_db? 87000 Macros to start/stop/restart logging 92336 speedup re-build after few changes 92456 Tracing the origin of uninitialised memory 93673 request for memory limits 97361 Better ~/.valgrindrc ( # comments, multi argument options) ----------------------------------------------------------------------------- 2 most important, IMHO ----------------------------------------------------------------------------- * 75247 x86_64/amd64 support * 77922 Honour error suppression call lists longer than 4 functions. * 79362 Debug info is lost for .so files when they are dlclose'd * 79787 Suppresions files should be auto generated Comments: - AMD64 support is going well, so that's not a problem. (Nb: that report has 440 votes associated with it; no other open report has more than 40!) - Suppressions kind of suck in general: the syntax is too hard to get right, not flexible enough; the auto-generation is not convenient enough to use; and they only support a stack trace four deep. I plan to address the stack depth issue when I change --num-callers to a larger value. Fixing the other problems is on my long-term todo list. - The debug info getting lost is a problem for the leak checker. I think the right way to fix this is to record code locations as either source locations (eg. file/fn/line) if possible, or as object code locations (eg. file/offset). Recording them as locations in memory is no good, since they can change over time. But I recall some argument about this in the past. - Massif comes up again. I think people tend to project their wishes for a memory-measurement tool onto it, even when what they want isn't very close to what Massif currently does. |
|
From: Jeremy F. <je...@go...> - 2005-03-02 19:10:05
|
Nicholas Nethercote wrote: > ----------------------------------------------------------------------------- > > JIT shortcomings > ----------------------------------------------------------------------------- > > * 69511 Valgrind can call wrong function I thought of a pretty simple fix for this; see bug. > * 69530 we need to implement precise exception handling > * 69531 Some tools need a mechanism to save machine state before ... I think these are basically the same. Vex has precise exceptions with memory accesses, which will solve a large part of this problem; I don't think it has precise (or any) FP exceptions. > * 81361 Can't distinguish large stack allocations from stack-swit... I've got a solution for this too. Not really a JIT problem as such. > 82654 prefetch instructions are ignored > 85756 x86 assembly prefix LOCK to guarantee atomicity has no ef... This would be nice to have. Now that we're using native pthreads, people are going to start trying to use system-scope mutexes, etc, which will probably break badly for them. > ----------------------------------------------------------------------------- > > JIT bugs > ----------------------------------------------------------------------------- > > 87263 Assertion `seg_selector >= 6 && seg_selector < (6 + 3)' f... This one is reasonably easy to fix, I think. > 88116 "enter" instruction's nested variation not supported > 96542 Assertion vg_assert(sz == 4) failed in vg_to_ucode As are these. > 97231 Jump to the invalid address stated on the next line > 100486 memcheck reports "valgrind: the `impossible' happened: V... I think these are the same bug, and probably fixed. I suspect they were actually signal handling bugs. > ----------------------------------------------------------------------------- > > debug info reading > ----------------------------------------------------------------------------- > > ?78520 bug parsing gstabs+ debug syms for c++ templates > ?81262 valgrind crashes with "the `impossible' happened" <- don'... > 89914 seg fault analyzing programs compiled with gnu pascal 20... > ?89973 seg fault parsing stabs debug information > 90901 Valgrind is very-very slow > ?91633 dereference of null ptr in vgPlain_st_basetype > * 92071 Reading debugging info uses too much memory > ?95867 get SIGSEGV when running my code under valgrind (compile ... > 96918 Hit maxsyms limit in VG_(get_scope_variables) For 78520, 81262, 89973 and 95867 the basic problem is that the stabs encoding for C++ types is not well defined, and is basically ambigious. There's stuff which simply isn't possible to represent in stabs, but that doesn't mean the compiler won't try. The correlated problem is that stabs parsing is pretty fragile; once you lose track of what's going on, you pretty much have to give up. Clearly Valgrind needs to be robust against this junk, but we're not actually going to be able to close all these bugs (we should do what gdb does, and just silently ignore bad debug, so the user need never know...). For stabs, a streaming loader will definitely help cap memory use, though of course our parsed structures will take the same amount of space. For DWARF, I think we can get large time and memory savings by using incremental loading. > ----------------------------------------------------------------------------- > > misc crashes/asserts > ----------------------------------------------------------------------------- > > 73133 old versions of glibc can't handle auxv types >=32 > 80932 valgrind: vg_memory.c:757 (vgPlain_init_shadow_range): As... This is just a client-stomps-on-Valgrind problem, assuming --pointercheck was off. > ?87645 ASSERT `vgPlain_is_valid_tid(vg_tid_last_in_baseBlock)' f... > 88192 dlsym(...,"errno") fails under valgrind on RH-9.0 > 91601 INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) > ?96559 Received "INTERNAL ERROR: Valgrind received a signal 11 (... > > ----------------------------------------------------------------------------- > > suppressions > ----------------------------------------------------------------------------- > > 89996 supp file for libguile > 91153 Possible new leak to suppress in glibc-2.?.supp > 91197 Annoying user confirmation required with --gen-suppressio... Also "93376 <http://bugs.kde.org/show_bug.cgi?id=93376>: Suppressions directory" seems like a good idea to me. > > ----------------------------------------------------------------------------- > > memcheck, Cachegrind, Helgrind > ----------------------------------------------------------------------------- > > 97042 valgrind problems with subl $0x80000000, %reg [Vex] > 90838 debugging aisexec: cannot get simulation results > 79844 Helgrind complains about race condition which does not exist > > ----------------------------------------------------------------------------- > > GDB attaching > ----------------------------------------------------------------------------- > > 88842 --db-attach=yes problem with --log-file > ?95896 enter gdb option doesn't work for my stty -raw application gdb stub support might be the best fix for these. > ----------------------------------------------------------------------------- > > misc > ----------------------------------------------------------------------------- > > 79759 error reports when running programs copiled with g++ with... > ?80946 lots of false errors due to incomplete str* intercepts > 85050 Full source path not logged This one is simply that the debug info doesn't always include a full path, I think. It would be pretty messy to print it all the time as well... > 88678 Empty stack trace for executables with a space in the path > 90348 vg_mylibc.c:668 (vgPlain_sprintf): Assertion `vgPlain_str... > 92923 lit_to_globvar broken > 96321 make check fails with hardened gcc > 97452 Valgrind doesn't report any pthreads problems > 98444 valgrind fails to run against user-mode-linux (UML) proce... I think this just comes down to bug 81361. > 98071 valgrind bombs out when sdl is linked > 98290 Valgrind internals need annotating > 100428 compile failure in coregrind vg_pthreadmodel > > ----------------------------------------------------------------------------- > > 8 most important, IMHO > ----------------------------------------------------------------------------- > > * 69511 Valgrind can call wrong function > * 69530 we need to implement precise exception handling > * 69531 Some tools need a mechanism to save machine state before ... > * 81361 Can't distinguish large stack allocations from stack-swit... > * 82301 FV memory layout too rigid > * 92071 Reading debugging info uses too much memory > * 93818 couldn't allocate address space for shadow memory > * 98278 Infinite recursion possible when allocating memory > > > Comments: > - Of the eight most important, 4 are in the JIT, and they're just hard > problems. It's not easy to see how to fix these without hurting > performance really badly. And the other two JIT shortcomings are > arguably > very important too. I added some comments to bugs 69511 and 81361; they both seem tractable. I'm pretty sure 69530 and 69531 are the same bug (specially now that the BB has been removed). > ----------------------------------------------------------------------------- > > suppressions > ----------------------------------------------------------------------------- > > 73309 Ignore duplicate reports when prompting for generating su... > * 77922 Honour error suppression call lists longer than 4 functions. > * 79787 Suppresions files should be auto generated > 93376 Suppressions directory 93376 is worth a * I think. > ----------------------------------------------------------------------------- > > error messages > ----------------------------------------------------------------------------- > > 75104 Would like option to suppress output of stack addresses > 79311 malloc silly arg warning does not give stack trace > * 79362 Debug info is lost for .so files when they are dlclose'd > 82176 loss records with parameters > 92036 A wish for ANSI Colors in the output of valgrind > 98993 Option to show addresses of recently executed basic blocks 98993 would allow a class of "wild jumping through pointer" bugs to be easily fixable; at present they're very hard to debug by any means. > ----------------------------------------------------------------------------- > > leak checking > ----------------------------------------------------------------------------- > > 74899 Tracking leaks in real time > 81079 Provide a macro to clear leak list 81079 is a fairly small extension to the leak checker. It just needs another bit per allocated heap block. > > ----------------------------------------------------------------------------- > > massif > ----------------------------------------------------------------------------- > > 89707 alloc-fn appears not to work for C++ class member functions This a more general wishlist item, which is "should be able to use unmangled names when referring to C++ methods/functions". > 92615 Write output from Massif at crash > 95483 massif feature request: include peak allocation in report > > ----------------------------------------------------------------------------- > > misc > ----------------------------------------------------------------------------- > > 84348 Support linuxthreads_db? This is related to the other gdb-attach bugs, and is fixable by implementing the gdb remote protocol (which I have a solid beginning of). > 87000 Macros to start/stop/restart logging > 92336 speedup re-build after few changes > 92456 Tracing the origin of uninitialised memory > 93673 request for memory limits > 97361 Better ~/.valgrindrc ( # comments, multi argument options) > > ----------------------------------------------------------------------------- > > 2 most important, IMHO > ----------------------------------------------------------------------------- > > * 75247 x86_64/amd64 support > * 77922 Honour error suppression call lists longer than 4 functions. > * 79362 Debug info is lost for .so files when they are dlclose'd > * 79787 Suppresions files should be auto generated (for large values of 2) Yes, these looks good. I think the 3 I mentioned, 93376, 98993 and 81079, are pretty useful. 98993 (show recent BBs) will require codegen support, so that's definitely post-merge. The suppressions directory will become particularly useful as people use more suppressions; I would imagine that a large project would have its own suppressions directory (or even one per subsystem), and people would put things into it/them as required. A lot nicer than editing a single file, or having lots of individual suppression files listed on the command line. Great job Nick. This really helps put everything into perspective. J |
|
From: Tom H. <to...@co...> - 2005-03-02 19:54:46
|
In message <422...@go...>
Jeremy Fitzhardinge <je...@go...> wrote:
> Nicholas Nethercote wrote:
>
> > -----------------------------------------------------------------------------
> >
> > debug info reading
> > -----------------------------------------------------------------------------
> >
> > ?78520 bug parsing gstabs+ debug syms for c++ templates
> > ?81262 valgrind crashes with "the `impossible' happened" <- don'...
> > 89914 seg fault analyzing programs compiled with gnu pascal 20...
> > ?89973 seg fault parsing stabs debug information
> > 90901 Valgrind is very-very slow
> > ?91633 dereference of null ptr in vgPlain_st_basetype
> > * 92071 Reading debugging info uses too much memory
> > ?95867 get SIGSEGV when running my code under valgrind (compile ...
> > 96918 Hit maxsyms limit in VG_(get_scope_variables)
>
> For 78520, 81262, 89973 and 95867 the basic problem is that the stabs
> encoding for C++ types is not well defined, and is basically ambigious.
I'm not sure that there is any ambiguity actually. I went through it
quite carefully and looked at what gdb was doing and more or less
convinced myself that you can always work it out, but you do need to
use different algorithms for each of the three cases.
As of my last change we should hopefully be doing the same as gdb
and all the test cases I had were fixed, but several of the bugs had
no test case. They're still open because none of the submitters came
back and confirmed if my patch had fixed the problem but I did commit
it anyway.
> There's stuff which simply isn't possible to represent in stabs, but
> that doesn't mean the compiler won't try. The correlated problem is
> that stabs parsing is pretty fragile; once you lose track of what's
> going on, you pretty much have to give up. Clearly Valgrind needs to be
> robust against this junk, but we're not actually going to be able to
> close all these bugs (we should do what gdb does, and just silently
> ignore bad debug, so the user need never know...).
Stabs is just horrible basically, so we should do our best and then
give up if necessary. If people want reliable debugging they should
be using DWARF anyway - it's the default from gcc 3 onwards on linux
so we should see less and less stabs I guess.
> For DWARF, I think we can get large time and memory savings by using
> incremental loading.
When everything is a bit more stable I'll probably look at whether
we can switch to using libdwarf which should make it a lot easier
to add the missing functionality to the DWARF reader and also give
us incremental loading I believe.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Jeremy F. <je...@go...> - 2005-03-02 22:52:09
|
Tom Hughes wrote:
>In message <422...@go...>
> Jeremy Fitzhardinge <je...@go...> wrote:
>
>
>
>>Nicholas Nethercote wrote:
>>
>>
>>
>>>-----------------------------------------------------------------------------
>>>
>>>debug info reading
>>>-----------------------------------------------------------------------------
>>>
>>> ?78520 bug parsing gstabs+ debug syms for c++ templates
>>> ?81262 valgrind crashes with "the `impossible' happened" <- don'...
>>> 89914 seg fault analyzing programs compiled with gnu pascal 20...
>>> ?89973 seg fault parsing stabs debug information
>>> 90901 Valgrind is very-very slow
>>> ?91633 dereference of null ptr in vgPlain_st_basetype
>>>* 92071 Reading debugging info uses too much memory
>>> ?95867 get SIGSEGV when running my code under valgrind (compile ...
>>> 96918 Hit maxsyms limit in VG_(get_scope_variables)
>>>
>>>
>>For 78520, 81262, 89973 and 95867 the basic problem is that the stabs
>>encoding for C++ types is not well defined, and is basically ambigious.
>>
>>
>
>I'm not sure that there is any ambiguity actually. I went through it
>quite carefully and looked at what gdb was doing and more or less
>convinced myself that you can always work it out, but you do need to
>use different algorithms for each of the three cases.
>
>As of my last change we should hopefully be doing the same as gdb
>and all the test cases I had were fixed, but several of the bugs had
>no test case. They're still open because none of the submitters came
>back and confirmed if my patch had fixed the problem but I did commit
>it anyway.
>
>
I don't remember all the details, but I seem to remember that there was
ambiguity around the various meanings of ':', and whether they're
bracketed by '<>' or not.
Plus there's a definite bug in the handling of template classes which
have chars (any maybe strings?) as template parameters: it just plonks
the character literally in the stab info, even if its something like \0
(ie, it puts a literal 0x0 in the stabs string).
>When everything is a bit more stable I'll probably look at whether
>we can switch to using libdwarf which should make it a lot easier
>to add the missing functionality to the DWARF reader and also give
>us incremental loading I believe.
>
Good.
J
|
|
From: Tom H. <to...@co...> - 2005-03-02 23:51:10
|
In message <422...@go...>
Jeremy Fitzhardinge <je...@go...> wrote:
> Tom Hughes wrote:
>
> >I'm not sure that there is any ambiguity actually. I went through it
> >quite carefully and looked at what gdb was doing and more or less
> >convinced myself that you can always work it out, but you do need to
> >use different algorithms for each of the three cases.
> >
> >As of my last change we should hopefully be doing the same as gdb
> >and all the test cases I had were fixed, but several of the bugs had
> >no test case. They're still open because none of the submitters came
> >back and confirmed if my patch had fixed the problem but I did commit
> >it anyway.
>
> I don't remember all the details, but I seem to remember that there was
> ambiguity around the various meanings of ':', and whether they're
> bracketed by '<>' or not.
That is the main problem, but if you analyse it carefully there are
some cases where :: can only occur inside <> so if you see : at the
top leel you know it is the fields separator. In other cases it can
occur at the top level but you can never get a colon at the start of
the next field so you know that a double colon doesn't end the field
but a single one does.
This is the commit log entry I wrote when I committed the patch to
try and explain the various cases:
: Try and improve the parsing of C++ stabs that contain :: sequences. This
: patch attempts to follow the same rules that gdb uses and is based on the
: fact that there appear to be three places where :: can appear:
:
: - In the name of a undefined struct/union/enum after an x type
: marker. In this case we follow a simplified version of the old
: rules and only allow :: inside <> characters.
:
: - In a method name. These are mangled so :: will never appear as
: part of the name but will always occurs as the terminator. We
: handle this by stopping at the first :: sequence.
:
: - In a symbol/type name. This can include :: but can only be ended
: by a single colon so we simply carry on until we see that.
:
: I suspect this will resolve a number of bugs but I'm still waiting for
: the submitters to confirm exactly which ones it resolves.
> Plus there's a definite bug in the handling of template classes which
> have chars (any maybe strings?) as template parameters: it just plonks
> the character literally in the stab info, even if its something like \0
> (ie, it puts a literal 0x0 in the stabs string).
That I can believe but I haven't seen that one come up yet...
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-02 23:29:19
|
On Wed, 2 Mar 2005, Tom Hughes wrote: > Stabs is just horrible basically, so we should do our best and then > give up if necessary. If people want reliable debugging they should > be using DWARF anyway - it's the default from gcc 3 onwards on linux > so we should see less and less stabs I guess. Doesn't MacOS X use stabs? N |
|
From: Tom H. <to...@co...> - 2005-03-02 23:54:37
|
In message <Pin...@ch...>
Nicholas Nethercote <nj...@cs...> wrote:
> On Wed, 2 Mar 2005, Tom Hughes wrote:
>
> > Stabs is just horrible basically, so we should do our best and then
> > give up if necessary. If people want reliable debugging they should
> > be using DWARF anyway - it's the default from gcc 3 onwards on linux
> > so we should see less and less stabs I guess.
>
> Doesn't MacOS X use stabs?
Oh it may be an issue on other systems, but we'll just have to do
the best that we can.
The main problem is that there was never a formal specification for
stabs in the first place and over the years each compiler vendor has
extended the undocumented format with undocumented extensions as
new language features have appeared.
The best documentation that I'm aware is the reverse engineered
attempt at documenting all the various extensions at:
http://sources.redhat.com/cygwin/stabs.html
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-02 23:28:58
|
On Wed, 2 Mar 2005, Jeremy Fitzhardinge wrote: > This one is reasonably easy to fix, I think. > >> 88116 "enter" instruction's nested variation not supported >> 96542 Assertion vg_assert(sz == 4) failed in vg_to_ucode > > As are these. I don't think 88116 is -- the nested variations are really awful. But they're also incredibly rare, thankfully. >> 80932 valgrind: vg_memory.c:757 (vgPlain_init_shadow_range): As... > > This is just a client-stomps-on-Valgrind problem, assuming --pointercheck was > off. But we don't yet know for sure that --pointercheck was off. Maybe something weird is happening. N |
|
From: Jeremy F. <je...@go...> - 2005-03-03 00:00:26
|
Nicholas Nethercote wrote:
> I don't think 88116 is -- the nested variations are really awful. But
> they're also incredibly rare, thankfully.
Yeesh. I'd never looked at ENTER before. Is it ever used?
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-03 00:01:17
|
On Wed, 2 Mar 2005, Jeremy Fitzhardinge wrote: >> I don't think 88116 is -- the nested variations are really awful. But >> they're also incredibly rare, thankfully. > > Yeesh. I'd never looked at ENTER before. Is it ever used? Well, we've received one complaint in 3 years... N |
|
From: Jeremy F. <je...@go...> - 2005-03-03 00:07:44
|
Nicholas Nethercote wrote:
>>
>> Yeesh. I'd never looked at ENTER before. Is it ever used?
>
>
> Well, we've received one complaint in 3 years...
What was the code which caused the problem?
I guess the immediate fix is to SIGILL it rather than assert.
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-03 00:14:25
|
On Wed, 2 Mar 2005, Jeremy Fitzhardinge wrote: > What was the code which caused the problem? Can't remember. > I guess the immediate fix is to SIGILL it rather than assert. yeah, should do. N |
|
From: Julian S. <js...@ac...> - 2005-03-03 03:57:34
|
Very useful summary. > * 69511 Valgrind can call wrong function As Jeremy says, this is probably fixable by adding CRC checks for selected translations obtained from near the stack pointer or from other known-volatile code areas. > * 69530 we need to implement precise exception handling > * 69531 Some tools need a mechanism to save machine state before ... Vex can (at a price) provide precise mem exceptions, but not any kind of FP exception support. My question is, is there any sizeable user group writing programs that actually need precise exceptions? > * 81361 Can't distinguish large stack allocations from stack-swit... Do we care about this? Is writing-your-own-thread-package regarded as a sensible thing to do? In any case there's not much we can do about this without the client telling us when stack switches are happening. > * 82301 FV memory layout too rigid Will be fixed in the 3.0 series. > * 92071 Reading debugging info uses too much memory Hmm. Dunno. Possibly is time for a cleanup of it. It's a good candidate as a cleanly-defined subsystem which we can extract from the coregrind/ swamp. > * 93818 couldn't allocate address space for shadow memory > * 98278 Infinite recursion possible when allocating memory Also will be fixed in 3.0 -- both sound like low-level mem management problems. > - The debug info getting lost is a problem for the leak checker. I think > the right way to fix this is to record code locations as either source > locations (eg. file/fn/line) if possible, or as object code locations > (eg. file/offset). Recording them as locations in memory is no good, since > they can change over time. But I recall some argument about this in the > past. Yes .. recording them as file/fn/line locations might work, but it seems expensive in that basically every stack snapshot has to be converted right away into file/fn/line info. And a lot of such snapshots get made (once per malloc for example). Also, the error-commoning mechanism works by comparing stack snapshots, and that can really get hammered. There just doesn't seem to be any easy solution. Perhaps the best one is the idiot-solution which is essentially to ignore requests to munmap executable areas so their symbol tables never go away. Of course that has its own dangers. > - Massif comes up again. I think people tend to project their wishes for a > memory-measurement tool onto it, even when what they want isn't very > close to what Massif currently does. I guess it might help to find some Real Live Massif Users and see what they think. I agree that a lot of wishlist stuff for Massif seems to derive from armchair users of it. J |
|
From: Jeremy F. <je...@go...> - 2005-03-03 07:34:45
|
Julian Seward wrote:
>Vex can (at a price) provide precise mem exceptions, but not any
>kind of FP exception support. My question is, is there any sizeable
>user group writing programs that actually need precise exceptions?
>
>
Yep, they're not uncommon. All the virtual machines use page protection
tricks, as do garbage collectors.
>Yes .. recording them as file/fn/line locations might work, but it
>seems expensive in that basically every stack snapshot has to be
>converted right away into file/fn/line info. And a lot of
>such snapshots get made (once per malloc for example). Also, the
>error-commoning mechanism works by comparing stack snapshots, and
>that can really get hammered. There just doesn't seem to be any
>easy solution. Perhaps the best one is the idiot-solution which
>is essentially to ignore requests to munmap executable areas
>so their symbol tables never go away. Of course that has its
>own dangers.
>
>
What would happen if you deferred the addr->symbol resolution until the
unload actually happens? When the .so is unloaded, you know that those
ExeContexts are essentially static (ie, you don't need to compare
against them, because nothing should match). And if the .so isn't
unloaded, then there's no need to do anything.
J
|
|
From: Robert W. <rj...@du...> - 2005-03-03 06:14:37
|
> > * 81361 Can't distinguish large stack allocations from stack-swit... >=20 > Do we care about this? Is writing-your-own-thread-package > regarded as a sensible thing to do? Oh yeah. For example, we play all sorts of tricks in our OpenMP implementation to get higher performance, including mixing pthreads and a home-grown light-weight threads package. We'd be happy to augment everything and anything so that our customers can Valgrind their OpenMP applications without spurious errors. > In any case there's not > much we can do about this without the client telling us when=20 > stack switches are happening. Yup - that's fine by us. Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Nicholas N. <nj...@cs...> - 2005-03-03 16:13:06
|
On Wed, 2 Mar 2005, Robert Walsh wrote: >>> * 81361 Can't distinguish large stack allocations from stack-swit... >> >> Do we care about this? Is writing-your-own-thread-package >> regarded as a sensible thing to do? > > Oh yeah. For example, we play all sorts of tricks in our OpenMP > implementation to get higher performance, including mixing pthreads and > a home-grown light-weight threads package. We'd be happy to augment > everything and anything so that our customers can Valgrind their OpenMP > applications without spurious errors. Arguably the bigger problem is when Joe Programmer allocates a 2MB array on the stack and Memcheck gives him a zillion invalid read/write errors because it thinks he switched stacks. I think we should slant things in favour of him, rather than the person using stack-switching -- they presumably know what they're doing, so making them use a client request seems not unreasonable. So I'd suggest implementing the client request as Jeremy says, and then tweaking the heuristic so that the %esp-delta has to be substantially bigger (say, 8MB) before Memcheck assumes it's a stack-switch. And add a FAQ about it. N |
|
From: Julian S. <js...@ac...> - 2005-03-04 15:31:40
|
> Arguably the bigger problem is when Joe Programmer allocates a 2MB array > on the stack and Memcheck gives him a zillion invalid read/write errors > because it thinks he switched stacks. I think we should slant things in > favour of him, rather than the person using stack-switching -- they > presumably know what they're doing, so making them use a client request > seems not unreasonable. > > So I'd suggest implementing the client request as Jeremy says, and then > tweaking the heuristic so that the %esp-delta has to be substantially > bigger (say, 8MB) before Memcheck assumes it's a stack-switch. And add a > FAQ about it. Yes, I agree. But (1) not for 2.4.0, and (2) how does this client request work? The client request needs to happen atomically with the new assignment to the stack pointer; I don't see how that can happen. J |
|
From: Tom H. <to...@co...> - 2005-03-04 15:43:52
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
>> So I'd suggest implementing the client request as Jeremy says, and then
>> tweaking the heuristic so that the %esp-delta has to be substantially
>> bigger (say, 8MB) before Memcheck assumes it's a stack-switch. And add a
>> FAQ about it.
>
> Yes, I agree. But (1) not for 2.4.0, and (2) how does this client
> request work? The client request needs to happen atomically with
> the new assignment to the stack pointer; I don't see how that can
> happen.
I thought Jeremy's suggestion was to have a request that would
declare an area of memory as a stack so that valgrind would know
that when the stack pointer moved from one "stack area" to another
one then it would know it was a stack switch.
Obviously valgrind would have to mark the areas used by cloned
threads as stacks, as well as the initial stack.
I have no idea if that is all workable, but it is what I though
had been suggested anyway and it doesn't seem to require an atomic
request and update stack pointer operation.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Jeremy F. <je...@go...> - 2005-03-04 17:08:18
|
Tom Hughes wrote:
>Obviously valgrind would have to mark the areas used by cloned
>threads as stacks, as well as the initial stack.
>
>
Maybe, though possibly not. We don't have any trouble with those stacks
at the moment. There are quite a few details to be filled out.
J
|
|
From: Jeremy F. <je...@go...> - 2005-03-04 17:07:03
|
Julian Seward wrote:
>Yes, I agree. But (1) not for 2.4.0, and (2) how does this client
>request work? The client request needs to happen atomically with
>the new assignment to the stack pointer; I don't see how that can
>happen.
>
No, I was proposing a call which you'd use at stack-allocation time, to
tell Valgrind that this is a stack, and it is distinct from all other
stacks. This would allow update_unknown_ESP (or whatever it's called)
to know whether the ESP changed from one stack area to another. It also
has the advantage of not needing to instrument every place in which the
stack switch can happen.
J
|
|
From: Julian S. <js...@ac...> - 2005-03-03 12:38:24
|
> >Vex can (at a price) provide precise mem exceptions, but not any > >kind of FP exception support. My question is, is there any sizeable > >user group writing programs that actually need precise exceptions? > > Yep, they're not uncommon. All the virtual machines use page protection > tricks, as do garbage collectors. Fair enough. Well, we can have precise exns off by default but have a flag --precise-mem-exns=yes for those that need it. > What would happen if you deferred the addr->symbol resolution until the > unload actually happens? When the .so is unloaded, you know that those > ExeContexts are essentially static (ie, you don't need to compare > against them, because nothing should match). And if the .so isn't > unloaded, then there's no need to do anything. So execontext becomes a union type, starting out as an array of addresses but being converted into a human readable source location. Hmm. Maybe. Depends what operations we need to do on them after the conversion step. J |
|
From: Jeremy F. <je...@go...> - 2005-03-03 20:58:25
|
Julian Seward wrote:
>Fair enough. Well, we can have precise exns off by default but have
>a flag --precise-mem-exns=yes for those that need it.
>
>
How about a client request too, so that programs can request it if they
need it?
>So execontext becomes a union type, starting out as an array of
>addresses but being converted into a human readable source location.
>
>Hmm. Maybe. Depends what operations we need to do on them
>after the conversion step.
>
>
The conversion would need to be per-frame rather than per ExeContext.
If you have an ExeContext of someone calling a .so, and that .so calling
something else, and then you unload the .so, you would only need to
convert the .so's frames. It is possible the ExeContext could still be
used for matching leak check records (which may only use the top 2-4
frames).
J
|
|
From: Julian S. <js...@ac...> - 2005-03-04 00:03:15
|
> The conversion would need to be per-frame rather than per ExeContext. > If you have an ExeContext of someone calling a .so, and that .so calling > something else, and then you unload the .so, you would only need to > convert the .so's frames. It is possible the ExeContext could still be > used for matching leak check records (which may only use the top 2-4 > frames). Urrrr. True. Getting more complicated by the moment. I'm not going to think about this any more just at the mo. J |
|
From: Bryan O'S. <bo...@se...> - 2005-03-03 16:56:55
|
On Thu, 2005-03-03 at 10:12 -0600, Nicholas Nethercote wrote: > Arguably the bigger problem is when Joe Programmer allocates a 2MB array > on the stack and Memcheck gives him a zillion invalid read/write errors > because it thinks he switched stacks. This would affect most Fortran programs, for example, since many Fortran implementations allocate arrays on the stack. <b |
|
From: Nicholas N. <nj...@cs...> - 2005-03-03 17:38:54
|
On Thu, 3 Mar 2005, Bryan O'Sullivan wrote: >> Arguably the bigger problem is when Joe Programmer allocates a 2MB array >> on the stack and Memcheck gives him a zillion invalid read/write errors >> because it thinks he switched stacks. > > This would affect most Fortran programs, for example, since many Fortran > implementations allocate arrays on the stack. I'm not sure what you're saying -- I described the current situation. Are you saying my suggestion of tweaking the heuristic is bad? N |