|
From: Nicholas N. <nj...@ca...> - 2004-04-13 10:09:36
|
Hi,
I've been thinking about improving the suppression syntax. It really is
pretty awful at the moment, so many people don't understand it, even with
--gen-suppressions.
I've worked out pretty clearly the reasons why people find it confusing.
But I haven't come up with a new syntax that fixes all the problems; I
have some ideas, but they're not quite right yet. I thought I'd run it
past some other eyes.
Note that people have also requested better ways of generated the
suppressions, eg. dumping all --gen-suppressions output to a file to save
all the cutting and pasting; probably a good idea, but I'm not addressing
that here.
Comments appreciated.
N
The suppression syntax is generally confusing -- people just can't get
it right. Ideally, you should be able to write your own suppression
without reading the manual (nobody does) just by looking at existing
examples.
Specific problems with the current format:
- it's unclear that different lines have different roles:
- line 1: name
- line 2: tool:type
- lines 3+: stack trace
- it's unclear that the suppression name is arbitrary (the common a/b/c
form in the default .supp files might make people think the slashes
are meaningful?)
- it's unclear that you need "core" in front of the type for PThread errors
- supp types unclear, in two ways:
- unclear what type each error has, easy to mix up (eg. using
Value4 instead of Addr4)
- people sometimes get them completely wrong -- have no idea what
the type even means
- it's unclear that the trace must match exactly, ie. line-by-line
- sometimes a line is missed, esp. the first one
- sometimes people try to match multiple similar errors with one set of
lines that partially match
- sometimes we have an almost meaningless "fun:*" as the last line,
indicates likely confusion
- one person seemed to think that each line gave a different
suppression, ie. you could specify suppress all errors of a
single type in a single {} block
- it's unclear that obj: matches executables as well as .so files
- it's possibly unclear that you can use either the fun: or obj: form
to match a line like this:
==3643== by 0x804835A: main (in /auto/homes/njn25/grind/head5/a.out)
- The extra info for Param errors in Memcheck is a nasty kludge;
inconsistent and thus highly confusing
- unclear that C++ names must be mangled
[hard to address in syntax...]
Desired functionality:
- need much greater depth than 4 -- have seen examples where the
distinguishing part of the call chain was 11--14 deep. Should
support arbitrary length.
- want a wildcard to match a single entry, and one for multiple
entries. Shouldn't be easily confusible with * and ? as wildcards
within entries. (ie. clearly distinguish inter-entry and intra-entry
wildcards).
- sometimes want to match against "???:???" (could do with a
single-entry wildcard)
- sometimes want to match against a .c source file name?
- possibilities:
at 0x80483BF: really (malloc1.c:20)
at 0x80483BF: really (in /foo/a.out)
at 0x80483BF: (within /foo/a.out)
func:really should match the first two?
file:malloc1.c should match the first one?
obj:/foo/a.out should match the last two?
[should we really distinguish between "in" and "within"?]
- want to be able to partially specify call traces, using wildcards,
eg:
bad_func() ... -> ... malloc()
purify style:
# suppress abr GetNextFontList; _XmCvtStringToXmFontList; XtDirectConvert
But do wildcards match a single entry, or multiple? If multiple,
greedy or non-greedy matching?
- could be more flexible with whitespace
-----------------------------------------------------------------------------
possible new syntax
-----------------------------------------------------------------------------
Memcheck,Addrcheck:Addr4("this is the name of it")[optional part?]
[
fun:blahblah,
fil:foobar,
..., multi-matching wildcard?
obj:foo*,
_, single-matching wildcard?
obj:floo
]
- name as string gives a clue that it's arbitrary
- commas after trace lines indicate they're part of a sequence, as does
grouping them together within the brackets, as does (maybe) the
use of square brackets
- different roles of different parts much clearer
- ... gives more flexibility; the different meanings of "..." vs. '*'/'?'
wildcards should be clear XXX: really?
- lots of the complaints above not addressed, however
|
|
From: Madhu M K. <mm...@ya...> - 2004-04-13 19:30:16
|
Hi, Nicholas Nethercote <nj...@ca...> said on April 13,2004: > > suppressions, eg. dumping all --gen-suppressions output to a file to > save all the cutting and pasting; probably a good idea, but I'm not > addressing that here. > I was just about to resend my patch to do precisely that. It does the following: - create a suppfile / pid - determines frequency counts for suppressions seen - orders them This enables you to tackle the errors that occur once or twice and show up in suppressions, which most often I have found to be *real* errors. In some of our internal tests, valgrind with this patch produced a suppression file that had ~ 100 rules which then went on to suppress ~ 300K errors. And all of that was via openssl - Value4/Param "errors". Cheerio, M Madhu M Kurup /* Nemo Me Impune Lacessit */ mmk at yahoo-inc dt com |
|
From: Jeremy F. <je...@go...> - 2004-04-14 05:32:49
|
On Tue, 2004-04-13 at 20:09, Nicholas Nethercote wrote:
> -----------------------------------------------------------------------------
> possible new syntax
> -----------------------------------------------------------------------------
>
> Memcheck,Addrcheck:Addr4("this is the name of it")[optional part?]
> [
> fun:blahblah,
> fil:foobar,
> ..., multi-matching wildcard?
> obj:foo*,
> _, single-matching wildcard?
> obj:floo
> ]
>
> - name as string gives a clue that it's arbitrary
> - commas after trace lines indicate they're part of a sequence, as does
> grouping them together within the brackets, as does (maybe) the
> use of square brackets
> - different roles of different parts much clearer
> - ... gives more flexibility; the different meanings of "..." vs. '*'/'?'
> wildcards should be clear XXX: really?
I think _ and ... are reasonably clear as entry-matching operators,
while * and ? work within entries. It could also be useful to match
multiple entries with a |-like syntax:
fun:malloc,
fun:_myalloc_1 | fun:another_alloc | src:blah.c,
fun:common
(This suggests that we could also allow &:
fun:foobar & src:myfoo.c, // not the other foobar()
)
Also, I would prefer to avoid over-abbreviating things. Use "file:" and
"func:" rather than fil: and fun:, since they're a bit unclear. And
perhaps src: (and obj:) rather than file:, since file: is a bit generic
when we have two interesting file types.
Another nit: use {} for grouping, since that's what the languages do.
Other comments:
* Some way of enabling/disabling a suppression while keeping it in
the file and not commenting it out. This would make it a bit
easier for tools/programs to manage the file contents
* allow comments everywhere
* allow a more detailed description about the suppression, so it
can be shown in a GUI
* matching against the soname of a .so file, rather than just the
file name. The soname is canonical.
* Have a consistent quoting scheme to allow quoting anywhere
(otherwise how would we cope with someone with ',' in their
filenames - strange, but possible); also needed as symbol names
get more complex (ie, unmangled c++)
* Need to work out how to address mangling. Since there are
multiple mangling schemes, we would either need to specify which
one we want (ugly, compliler-dependent, but what we have now in
effect, only manual), or generate matching rules for all mangled
forms of the given c++ name. I guess some syntax like
"mangle:int foo(void)" would do it.
* How flexible should we be with pathname matching? Do we match
the whole pathname, or does "foo.c" mean */foo.c? i.e, should
we pay attention to whether the specified filename contains '/'
or not?
J
|
|
From: Nicholas N. <nj...@ca...> - 2004-04-14 14:43:38
|
On Wed, 14 Apr 2004, Jeremy Fitzhardinge wrote:
> I think _ and ... are reasonably clear as entry-matching operators,
Unfortunately, if you don't look at the manual, you won't know what they
are from context (unlike * and ? which people will likely be able to
guess... not that anyone ever uses ?). Maybe that's unavoidable.
> while * and ? work within entries. It could also be useful to match
> multiple entries with a |-like syntax:
>
> fun:malloc,
> fun:_myalloc_1 | fun:another_alloc | src:blah.c,
> fun:common
>
> (This suggests that we could also allow &:
>
> fun:foobar & src:myfoo.c, // not the other foobar()
> )
I'm wary about these -- have you seen particular cases where these would
be useful, or are you just hypothesizing? All the things I've mentioned
have come from user feedback, people saying either "I want to be able to
do this", or being confused about something. No point adding complexity
if no-one will use it.
> Also, I would prefer to avoid over-abbreviating things. Use "file:" and
> "func:" rather than fil: and fun:, since they're a bit unclear. And
> perhaps src: (and obj:) rather than file:, since file: is a bit generic
> when we have two interesting file types.
Sure.
> Another nit: use {} for grouping, since that's what the languages do.
I specifically didn't choose {} because it's used to groups statements in
procedural languages, which is not at all like a call trace. I
deliberately used [] because it's often used as list notation (eg. in
Haskell, Prolog), and a call trace can be considered exactly as a list.
> Other comments:
>
> * Some way of enabling/disabling a suppression while keeping it in
> the file and not commenting it out. This would make it a bit
> easier for tools/programs to manage the file contents
ok, any ideas how?
> * allow comments everywhere
yes
> * allow a more detailed description about the suppression, so it
> can be shown in a GUI
can you give an example?
> * matching against the soname of a .so file, rather than just the
> file name. The soname is canonical.
I don't understand what you mean by canonical here...
> * Have a consistent quoting scheme to allow quoting anywhere
> (otherwise how would we cope with someone with ',' in their
> filenames - strange, but possible); also needed as symbol names
> get more complex (ie, unmangled c++)
hmm, yes
> * Need to work out how to address mangling. Since there are
> multiple mangling schemes, we would either need to specify which
> one we want (ugly, compliler-dependent, but what we have now in
> effect, only manual), or generate matching rules for all mangled
> forms of the given c++ name. I guess some syntax like
> "mangle:int foo(void)" would do it.
I'm tempted to stick with the current system.
> * How flexible should we be with pathname matching? Do we match
> the whole pathname, or does "foo.c" mean */foo.c? i.e, should
> we pay attention to whether the specified filename contains '/'
> or not?
I think current system -- exact matches against entire paths -- seems ok.
N
|
|
From: Jeremy F. <je...@go...> - 2004-04-15 07:27:25
|
On Thu, 2004-04-15 at 00:43, Nicholas Nethercote wrote:
> On Wed, 14 Apr 2004, Jeremy Fitzhardinge wrote:
>
> > I think _ and ... are reasonably clear as entry-matching operators,
>
> Unfortunately, if you don't look at the manual, you won't know what they
> are from context (unlike * and ? which people will likely be able to
> guess... not that anyone ever uses ?). Maybe that's unavoidable.
Well, yes, but if you don't use them you don't need to know about them.
If you're reading someone else's suppressions you'd need to look it up.
> I'm wary about these -- have you seen particular cases where these would
> be useful, or are you just hypothesizing? All the things I've mentioned
> have come from user feedback, people saying either "I want to be able to
> do this", or being confused about something. No point adding complexity
> if no-one will use it.
Well, the | syntax is optional, because you can always have multiple
suppressions with the same effect. The & syntax adds something new.
Whether these are sufficiently useful or not I don't know, but I have
seen situations where one suppression could be applied to multiple call
paths where some of the middle call elements have a few alternatives.
So, dunno. We can reserve the syntax and decide to implement them
later.
> > Another nit: use {} for grouping, since that's what the languages do.
>
> I specifically didn't choose {} because it's used to groups statements in
> procedural languages, which is not at all like a call trace. I
> deliberately used [] because it's often used as list notation (eg. in
> Haskell, Prolog), and a call trace can be considered exactly as a list.
Yes, but C uses {} for lists, as in array and structure initializers.
It doesn't use [] in this way at all.
> > Other comments:
> >
> > * Some way of enabling/disabling a suppression while keeping it in
> > the file and not commenting it out. This would make it a bit
> > easier for tools/programs to manage the file contents
>
> ok, any ideas how?
Well, I'm thinking of a grammar something like this:
suppfile := suppression*
suppression := tool-list ':' SUPPTYPE '(' params? ')' trace
tool-list := TOOLNAME ( ',' TOOLNAME )*
params := param ( ',' param )*
param := "name" STRING
| "description" STRING
| "enable" BOOL
trace := '{' entry ( ',' entry )* ','? '}'
entry := match
| "_"
| "..."
| match ( '&' match )+
| match ( '|' match )+
entry := "obj:" FILENAME-PATTERN
| "src:" FILENAME-PATTERN
| "func:" SYMBOL-PATTERN
>
> > * allow comments everywhere
>
> yes
>
> > * allow a more detailed description about the suppression, so it
> > can be shown in a GUI
>
> can you give an example?
Well, given the grammar above, an entry could look like
Memcheck,Addrcheck: Addr4(
name "some badness",
description
"when the frobincator calls blattock without setting
quokka-mode first, then this error occurs deep inside
someone else's code",
enable true)
{
func:blattock,
func:twunk | soname:libgurple.so.1,
func:frobnicator
}
I think this syntax is still pretty cumbersome, but putting block text
which isn't a comment is a bit tricky. Python-like """ for block
strings might be a good idea.
>
>
> > * matching against the soname of a .so file, rather than just the
> > file name. The soname is canonical.
>
> I don't understand what you mean by canonical here...
Well, when you link with a shared object, the dependency is "I need a
file with an soname of (say) libc.so.6" rather than "I need a file
called libc.so.6". The filename of the shared library is irrelevant if
the soname is correct.
> > * Need to work out how to address mangling. Since there are
> > multiple mangling schemes, we would either need to specify which
> > one we want (ugly, compliler-dependent, but what we have now in
> > effect, only manual), or generate matching rules for all mangled
> > forms of the given c++ name. I guess some syntax like
> > "mangle:int foo(void)" would do it.
>
> I'm tempted to stick with the current system.
I would say that exposing mangled names to users is about the #1
usability problem with suppressions. Mangled names are all but useless
for any human-visible output, yet we require them for suppressions. It
also means that a generated suppression (or manual suppressions for that
matter) is essentially uninterpretable by a later reader. It also means
that if you change toolchains, you need separate suppressions for each
version of the compiler (consider supplying a suppression file with your
source code, with the intent that it be built on a wide range of target
machines).
The whole mangling issue is a big pain, but I don't think we can ignore
it.
> I think current system -- exact matches against entire paths -- seems ok.
OK.
|
|
From: Nicholas N. <nj...@ca...> - 2004-04-15 22:40:54
|
On Thu, 15 Apr 2004, Jeremy Fitzhardinge wrote:
> Memcheck,Addrcheck: Addr4(
> name "some badness",
> description
> "when the frobincator calls blattock without setting
> quokka-mode first, then this error occurs deep inside
> someone else's code",
> enable true)
> {
> func:blattock,
> func:twunk | soname:libgurple.so.1,
> func:frobnicator
> }
>
> I think this syntax is still pretty cumbersome, but putting block text
> which isn't a comment is a bit tricky. Python-like """ for block
> strings might be a good idea.
You say you want the long explanation for GUIs -- can you explain more why
that's useful? AFAICT people usually don't bother even giving meaningful
suppression names.
> Well, when you link with a shared object, the dependency is "I need a
> file with an soname of (say) libc.so.6" rather than "I need a file
> called libc.so.6". The filename of the shared library is irrelevant if
> the soname is correct.
Is the idea that you ignore the path before the filename?
N
|
|
From: Jeremy F. <je...@go...> - 2004-04-16 04:56:52
|
On Fri, 2004-04-16 at 08:40, Nicholas Nethercote wrote: > You say you want the long explanation for GUIs -- can you explain more why > that's useful? AFAICT people usually don't bother even giving meaningful > suppression names. Sure, but other people won't. In particular, I'm thinking about the standard suppressions we supply, and that library/software vendors might supply, which presumably will be put together with some care. Run of the mill suppressions will just be slapped together as always. A comment would do if you're directly editing/viewing the suppressions file, but I think we need to anticipate better interfaces. > > Well, when you link with a shared object, the dependency is "I need a > > file with an soname of (say) libc.so.6" rather than "I need a file > > called libc.so.6". The filename of the shared library is irrelevant if > > the soname is correct. > > Is the idea that you ignore the path before the filename? No, the SONAME is a specific property of a shared object which is unrelated to the filename (though it may be similar). For example: $ readelf -d /lib/libc.so.6 Dynamic segment at offset 0x1378b4 contains 23 entries: Tag Type Name/Value 0x00000001 (NEEDED) Shared library: [ld-linux.so.2] 0x0000000e (SONAME) Library soname: [libc.so.6] 0x0000000c (INIT) 0x15950 0x0000000d (FINI) 0x11b3f0 ... You set this with the --soname option on the linker. That's why VG_(add_redirect_sym) supports either matching the filename of a library, or the soname. J |