Thread: [Valgrind-users] New experimental skin: signal handler checker

Brought to you by: njn, sewardj, wielaard

valgrind-users

[Valgrind-users] New experimental skin: signal handler checker

From: Nicholas N. <nj...@ca...> - 2003-07-02 10:05:58

Hi,

I've written a new skin for Valgrind.  It's called Crocus, and it searches
for problems in signal handlers.

Because signal handlers can be called at any time, they shouldn't call any
functions that are non-reentrant, ie. functions that will screw up if they
are interrupted by another call to themselves, usually because some shared
or global data structure could be corrupted or overwritten.  Many
functions are non-reentrant, including lots of common ones like printf()
(most standard I/O functions, in fact) and malloc().  You also aren't
supposed to call any pthread functions from signal handlers, as this can
cause deadlock.

Lots of programs don't get this right;  Vim, for example.  Crocus
identifies when a program's interrupt handler calls one of a number of
common non-reentrant functions.  It also identifies when an interrupt
handler doesn't preserve errno as it should.  To run this skin,
download valgrind-CROCUS.tar.bz2 from:

  www.cl.cam.ac.uk/~njn25/valgrind.html

install as normal, and run it with the --skin=crocus option. It's a little
bit experimental, but has been tested reasonably well.   Thanks to Steve
Grubb for the original idea and testing it on a variety of programs.

I welcome any feedback.

N

RE: [Valgrind-users] New experimental skin: signal handler checker

From: Troels W. H. <tr...@th...> - 2003-07-03 09:39:33

Hi Nicholas,

I tried your skin on RH8 + all current updates.

It seems to work great and identified a potential deadlock (calling
pthread_mutex_unlock() in a SIGINT/SIGTERM handler) in one of the
applications I tried it with.

Didn't encounter any problems.

Definitely a nice addition to Valgrind's skin arsenal.

Troels

> -----Original Message-----
> From: val...@li... 
> [mailto:val...@li...] On Behalf 
> Of Nicholas Nethercote
> Sent: 2. juli 2003 12:06
> To: Valgrind Users
> Subject: [Valgrind-users] New experimental skin: signal 
> handler checker
> 
> 
> Hi,
> 
> I've written a new skin for Valgrind.  It's called Crocus, 
> and it searches
> for problems in signal handlers.
> 
> Because signal handlers can be called at any time, they 
> shouldn't call any
> functions that are non-reentrant, ie. functions that will 
> screw up if they
> are interrupted by another call to themselves, usually 
> because some shared
> or global data structure could be corrupted or overwritten.  Many
> functions are non-reentrant, including lots of common ones 
> like printf()
> (most standard I/O functions, in fact) and malloc().  You also aren't
> supposed to call any pthread functions from signal handlers, 
> as this can
> cause deadlock.
> 
> Lots of programs don't get this right;  Vim, for example.  Crocus
> identifies when a program's interrupt handler calls one of a number of
> common non-reentrant functions.  It also identifies when an interrupt
> handler doesn't preserve errno as it should.  To run this skin,
> download valgrind-CROCUS.tar.bz2 from:
> 
  www.cl.cam.ac.uk/~njn25/valgrind.html

install as normal, and run it with the --skin=crocus option. It's a
little
bit experimental, but has been tested reasonably well.   Thanks to Steve
Grubb for the original idea and testing it on a variety of programs.

I welcome any feedback.

N



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
Valgrind-users mailing list
Val...@li...
https://lists.sourceforge.net/lists/listinfo/valgrind-users

RE: [Valgrind-users] New experimental skin: signal handler checker

From: Nicholas N. <nj...@ca...> - 2003-07-03 10:37:16

On Thu, 3 Jul 2003, Troels Walsted Hansen wrote:

> I tried your skin on RH8 + all current updates.
>
> It seems to work great and identified a potential deadlock (calling
> pthread_mutex_unlock() in a SIGINT/SIGTERM handler) in one of the
> applications I tried it with.
>
> Didn't encounter any problems.
>
> Definitely a nice addition to Valgrind's skin arsenal.

Great, thanks for the feedback, I'm glad it worked well for you.

----

Warning: here be Technical mumbo-jumbo...

In practice, I don't know how likely deadlock is if you call
pthread_mutex_unlock() from a signal handler, but the LinuxThreads
pthread_mutex_lock() man page says this:

  ASYNC-SIGNAL SAFETY
     The mutex functions are not async-signal safe. What this means is  that
     they should not be called from a signal handler. In particular, calling
     pthread_mutex_lock or pthread_mutex_unlock from a  signal  handler  may
     deadlock the calling thread.

Similar advice is given at www.burningvoid.com/iaq/posix-signals.html:

     One important safety tip is that pthreads constructs like mutexes,
     condition variables, etc. do not have defined behavior inside signal
     handlers. So you cannot use pthreads calls or synchronize with other
     threads while inside a signal handler.

Currently the only pthread functions in the blacklist are
pthread_mutex_{lock,unlock,trylock}, maybe others (eg.  condition variable
ones) should be too.  The blacklist currently looks like this:

   Char* bad_funcs[] = {
      "_IO_printf", "_IO_puts",
      "syslog", "strdup",
      "__libc_malloc", "__libc_calloc", "__libc_realloc", "__libc_free",
      "inet_ntoa", "gethostbyname", "gethostbyaddr", "getservbynam",
      "readdir", "__readdir", "__readdir64", "strtok",
      "ttyname", "getttyname", "getlogin", "cuserid",
      "getpwent", "getpwnam", "getpwuid",
      "getgrent", "getgrnam", "getgrgid",
      "asctime", "ctime", "gmtime", "localtime",
      "fclose", "fflush", "fflush_unlocked", "fopen", "freopen",
      "fdopen", "fopencookie", "fmemopen", "open_memstream",
      "fprintf", "vfprintf", "fscanf", "vfscanf", "fgetc", "getc",
      "getc_unlocked", "fgetc_unlocked", "fputc", "putc", "fputc_unlocked",
      "putc_unlocked", "getw", "putw", "fgets", "fputs", "ungetc", "fseek",
      "rewind",
      "__pthread_mutex_lock", "__pthread_mutex_unlock",
      "__pthread_mutex_trylock",
      NULL
   };

I'd welcome suggestions for new ones, or any corrections.

Crocus also has a --strict option, which uses a whitelist instead of a
blacklist;  any function called from a signal handler that's not on the
whitelist causes a warning.  Unfortunately, user-defined functions can be
reentrant so that's not such a good approach.  For completeness, and in
case you're having trouble sleeping, here's the current whitelist:

   /* From "Advanced Programming in the UNIX Environment", Stevens, p278.
      Those marked with a "+" are not specified by POSIX.1 as reentrant,
      but are listed in the SVR4 SVID. */
   Char* good_funcs[] = {
      "_exit", "abort"/*+*/, "access", "alarm",
      "cfgetispeed", "cfgetospeed", "cfsetispeed", "cfsetospeed",
      "chdir", "chmod", "chown", "close", "creat", "dup", "dup2", "execle",
      "execve", "exit"/*+*/, "fcntl", "fork", "fstat",
      "getegid", "geteuid", "getgid", "getgroups", "getpgrp", "getpid",
      "getppid", "getuid",
      "kill", "link", "longjmp"/*+*/, "lseek", "mkdir", "mkfifo", "open",
      "pathconf", "pause", "pipe", "read", "rename", "rmdir",
      "setgid", "setpgid", "setsid", "setuid",
      "__sigaction", "__libc_sigaction",
      "sigaddset", "sigdelset", "sigemptyset", "sigfillset",
      "sigismember", "signal"/*+*/, "sigpending",
      "__sigprocmask",
      "sigsuspend",
      "sleep", "stat", "sysconf",
      "tcdrain", "tcflow", "tcflush", "tcgetattr", "tcgetpgrp", "tcsendbreak",
      "tcsetattr", "tcsetpgrp", "time", "times", "umask", "uname", "unlink",
      "utime", "wait", "waitpid", "write",
      NULL
   };

And maybe I should use the term "signal safe" when talking about functions
that can be called from signal handlers, rather than "reentrant"?  Or are
they the same thing?  I haven't quite worked out the exact differences
between "signal safe", "thread safe", and "reentrant".  AFAICT, they're
all pretty similar, but there might be some functions that satisfy one but
not all the definitions?

N

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Tom H. <th...@cy...> - 2003-07-03 11:29:16

In message <Pin...@gr...>
        Nicholas Nethercote <nj...@ca...> wrote:

>    Char* bad_funcs[] = {
>       "_IO_printf", "_IO_puts",
>       "syslog", "strdup",
>       "__libc_malloc", "__libc_calloc", "__libc_realloc", "__libc_free",
>       "inet_ntoa", "gethostbyname", "gethostbyaddr", "getservbynam",
>       "readdir", "__readdir", "__readdir64", "strtok",
>       "ttyname", "getttyname", "getlogin", "cuserid",
>       "getpwent", "getpwnam", "getpwuid",
>       "getgrent", "getgrnam", "getgrgid",
>       "asctime", "ctime", "gmtime", "localtime",
>       "fclose", "fflush", "fflush_unlocked", "fopen", "freopen",
>       "fdopen", "fopencookie", "fmemopen", "open_memstream",
>       "fprintf", "vfprintf", "fscanf", "vfscanf", "fgetc", "getc",
>       "getc_unlocked", "fgetc_unlocked", "fputc", "putc", "fputc_unlocked",
>       "putc_unlocked", "getw", "putw", "fgets", "fputs", "ungetc", "fseek",
>       "rewind",
>       "__pthread_mutex_lock", "__pthread_mutex_unlock",
>       "__pthread_mutex_trylock",
>       NULL
>    };
>
> I'd welcome suggestions for new ones, or any corrections.

Well "find /usr/include -name *.h -print0 | xargs -0 grep '_r *('" suggests
quite a lot, including:

gethostent
getnetent getnetbyname getnetbyaddr
getservent getservbyport
getprotoent getprotobyname getprotobynumber
getrpcent getrpcbyname getrpcbynumber
getmntent
fgetpwent fgetgrent
tmpnam
random srandom initstate setstate
drand48 erand48 lrand48 nrand48 mrand48 jrand48 srand48 seed48 lcong48
ecvt fcvt qecvt qfcvt
strerror
crypt
setkey encrypt
hcreate hsearch hdestroy

Tom

-- 
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Steve G <lin...@ya...> - 2003-07-03 12:00:46

>Well "find /usr/include -name *.h -print0 | xargs -0 grep
'_r
>*('" suggests quite a lot, including:

Yup. It also seems that the glibc documentation is not
complete either. Looking at the getnetent function man
page, it doesn't mention the word re-entrant anywhere. It
also doesn't mention where the returned data structure came
from and whether or not you are supposed to free it. Glibc
man pages needs patching, too.

-Steve Grubb

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Igmar P. <mai...@jd...> - 2003-07-08 09:46:37

> >Well "find /usr/include -name *.h -print0 | xargs -0 grep
> '_r
> >*('" suggests quite a lot, including:
> 
> Yup. It also seems that the glibc documentation is not
> complete either. Looking at the getnetent function man
> page, it doesn't mention the word re-entrant anywhere. It
> also doesn't mention where the returned data structure came
> from and whether or not you are supposed to free it. Glibc
> man pages needs patching, too.

I don't look at them anymore, I mainly use FreeBSD's manpage. There tons 
of functions (ttyname() is a recent one I ran into) that return static 
memory, but that isn't mentioned.



	Igmar

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Nicholas N. <nj...@ca...> - 2003-07-03 12:22:05

On Thu, 3 Jul 2003, Tom Hughes wrote:

> Well "find /usr/include -name *.h -print0 | xargs -0 grep '_r *('" suggests
> quite a lot, including:
>
> gethostent
> getnetent getnetbyname getnetbyaddr
> getservent getservbyport
> getprotoent getprotobyname getprotobynumber
> getrpcent getrpcbyname getrpcbynumber
> getmntent
> fgetpwent fgetgrent
> tmpnam
> random srandom initstate setstate
> drand48 erand48 lrand48 nrand48 mrand48 jrand48 srand48 seed48 lcong48
> ecvt fcvt qecvt qfcvt
> strerror
> crypt
> setkey encrypt
> hcreate hsearch hdestroy

Ok, I'll add them.

Doing the same on my system, I also got these ones (that weren't on the
original blacklist):

 ether_ntoa ether_aton
 nis_leaf_of nis_name_of nis_domain_of nis_sperror
 getaliasent getaliasbyname gethostbyname2 getservbyname
 getnetgrent getpwnam getspent getspnam
 sgetspent fgetspent
 rand
 ptsname getdate getutent getutid getutline
 BIO_gethostbyname

Should they go in too?

Thanks very much for the help.

N

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Tom H. <th...@cy...> - 2003-07-03 13:22:57

In message <Pin...@gr...>
        Nicholas Nethercote <nj...@ca...> wrote:

> Doing the same on my system, I also got these ones (that weren't on the
> original blacklist):
>
>  ether_ntoa ether_aton

These look like they need to go it.

>  nis_leaf_of nis_name_of nis_domain_of nis_sperror
>  getaliasent getaliasbyname

I can't see any documentation for these, so it's difficult to say.

>  gethostbyname2 getservbyname

These look like they need to go it.

>  getnetgrent

Probably, although it doesn't seem to have a manual page.

>  getpwnam

I think this is already in isn't it? If it isn't then it needs to go in.

>  getspent getspnam
>  sgetspent fgetspent

These are for the shadow file, and probably need to go in although
they don't seem to have manual pages...

>  rand

Yes, along with srand if it isn't in the other list.

>  ptsname

Probably, although it doesn't seem to have a manual page.

>  getdate getutent getutid getutline

Yes, along with pututline I think.

>  BIO_gethostbyname

That's an openssl routine. It may need to go in though, but I was
mainly looking at C library routines.

Tom

-- 
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Steve G <lin...@ya...> - 2003-07-03 13:46:16

> ether_ntoa ether_aton
> nis_leaf_of nis_name_of nis_domain_of nis_sperror
> getaliasent getaliasbyname gethostbyname2 getservbyname
> getnetgrent getpwnam getspent getspnam
> sgetspent fgetspent
> rand
> ptsname getdate getutent getutid getutline
> BIO_gethostbyname
>
>Should they go in too?

Some of these are esoteric and/or undocumented. For
example, getaliasent has no man page. What does it do? Is
it a public API? With no man page, I have no idea what its
doing or if its safe.

The ether functions page says it uses static buffers, so
adding this would be good.

nis_*, BIO_*, pts*, getut*, ?getspent, getsp* are
undocumented (no man page). I suppose we can err on the
side of caution by adding all of them and wait for someone
to prove it otherwise.

-Steve Grubb

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Tom H. <th...@cy...> - 2003-07-03 14:06:16

In message <200...@we...>
        Steve G. <lin...@ya...> wrote:

> nis_*, BIO_*, pts*, getut*, ?getspent, getsp* are
> undocumented (no man page). I suppose we can err on the
> side of caution by adding all of them and wait for someone
> to prove it otherwise.

ptsname is documented in SUS:

  http://www.opengroup.org/onlinepubs/007904975/functions/ptsname.html

Most of BIO_* has manual pages, just not that one routine as far 
as I can see. I haven't checked openssl.org to see if there is any
doco there for it.

Tom

-- 
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Dan K. <da...@ke...> - 2003-07-03 15:05:49

Nicholas Nethercote wrote:
> Doing the same on my system, I also got these ones (that weren't on the
> original blacklist):
> 
>  ether_ntoa ether_aton
>  nis_leaf_of nis_name_of nis_domain_of nis_sperror
>  getaliasent getaliasbyname gethostbyname2 getservbyname
>  getnetgrent getpwnam getspent getspnam
>  sgetspent fgetspent
>  rand
>  ptsname getdate getutent getutid getutline
>  BIO_gethostbyname
> 
> Should they go in too?

You can check SuSv3 for each standard function; it's pretty good about
telling you what's threadsafe/reentrant and what isn't.
e.g. http://www.opengroup.org/onlinepubs/007904975/functions/rand.html
says
   "[CX]  The rand() function need not be reentrant.
    A function that is not required to be reentrant is not required to be thread-safe. "

BTW, this tool of yours is pretty cool.  I bet a similar
test could be applied at compile time using smatch
http://smatch.sourceforge.net/
That'd be quite a combination -- catch a bunch of them statically,
and then use your dynamic tool to find some the static checker missed...
- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Nicholas N. <nj...@ca...> - 2003-07-03 15:47:51

On Thu, 3 Jul 2003, Dan Kegel wrote:

> BTW, this tool of yours is pretty cool.  I bet a similar
> test could be applied at compile time using smatch
> http://smatch.sourceforge.net/
> That'd be quite a combination -- catch a bunch of them statically,
> and then use your dynamic tool to find some the static checker missed...

Ah, I was aware of the Stanford checking project but not smatch.  That's
cool.  I had thought that this signal handler checking would be much
better done statically.  Crocus may be put out of business before it's
even started!  But that would be fine, a static checker would be better.

You mention using the dynamic tool for the ones the static tool misses,
have you used smatch and know that static misses are common, or is this
just a guess?  I imagine that signal handler checking would be pretty
straightforward to do statically.  And once people know they shouldn't
call these function from signal handlers, it's not a problem that's hard
to fix, it's more just a case of knowing about it in the first place.  As
opposed to eg. memory errors, where a dynamic automated tool is very
useful for finding obscure bugs.

N

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Dan K. <da...@ke...> - 2003-07-03 16:07:54

Nicholas Nethercote wrote:
> On Thu, 3 Jul 2003, Dan Kegel wrote:
>>BTW, this tool of yours is pretty cool.  I bet a similar
>>test could be applied at compile time using smatch
>>http://smatch.sourceforge.net/
>>That'd be quite a combination -- catch a bunch of them statically,
>>and then use your dynamic tool to find some the static checker missed...
> 
> 
> Ah, I was aware of the Stanford checking project but not smatch.  That's
> cool.  I had thought that this signal handler checking would be much
> better done statically.  Crocus may be put out of business before it's
> even started!  But that would be fine, a static checker would be better.

We need both!

> You mention using the dynamic tool for the ones the static tool misses,
> have you used smatch and know that static misses are common, or is this
> just a guess?  I imagine that signal handler checking would be pretty
> straightforward to do statically.  And once people know they shouldn't
> call these function from signal handlers, it's not a problem that's hard
> to fix, it's more just a case of knowing about it in the first place.  As
> opposed to eg. memory errors, where a dynamic automated tool is very
> useful for finding obscure bugs.

Consider the case where you're doing QA on an open source
project which is huge and hard to compile, and the developers haven't
seen fit to run the static checker on the whole source base yet.
Or consider a closed-source app.  In both cases, the dynamic
checker will work, and the static one won't (because the source is
hard to get or configure).

- Dan

-- 
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Nicholas N. <nj...@ca...> - 2003-07-03 16:20:03

On Thu, 3 Jul 2003, Dan Kegel wrote:

> > But that would be fine, a static checker would be better.
>
> We need both!
>
> Consider the case where you're doing QA on an open source
> project which is huge and hard to compile, and the developers haven't
> seen fit to run the static checker on the whole source base yet.
> Or consider a closed-source app.  In both cases, the dynamic
> checker will work, and the static one won't (because the source is
> hard to get or configure).

Good point.  Three cheers for dynamic binary translation! :)

N

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Steve G <lin...@ya...> - 2003-07-03 16:16:01

>I had thought that this signal handler checking would 
>be much better done statically.  Crocus may be put out
>of business before it's even started! 

Doing it at compile time would be
better...however...there's many times I'm helping someone
and out of curiousity we run valgrind, libsafe, or electric
fence to see what's going on with the binary.

There will always be a need for tools that do not require
getting the source code just to check out a misbehaving
application. You also may not have the source code if its a
commercial app. You could still check it out and send a
better bug report.

-Steve Grubb

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

RE: [Valgrind-users] New experimental skin: signal handler checker

From: Steve G <lin...@ya...> - 2003-07-03 11:42:43

>And maybe I should use the term "signal safe" when talking

>about functions that can be called from signal handlers,
>rather than "reentrant"? Or are they the same thing?

They are different. A function that is re-entrant is one
that may be called by a different thread of execution
concurrently and it won't cause any ill effects. 

Signal safe means that the function is not using static
data buffers, not using library data structures that may be
in unsafe state, and only calling functions that are in the
whitelist or functions composed of those on the whitelist.

Signal handlers should be both signal-safe and re-entrant.
Common functions in a Pthreads program should be
re-entrant.

Glad to see other people finding bugs in signal handlers.
This was the area of most problems in a lot of code reviews
I've been doing. I'm reviewing some code from freeradius
right now and its sigchld signal handler does bad
things...which might explain why it crashes after 6-7 days
of heavy use. Fixing signal handlers is simple once you are
aware of a problem. It seems people were not aware.

-Steve Grubb

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

RE: [Valgrind-users] New experimental skin: signal handler checker

From: Steve G <lin...@ya...> - 2003-07-05 23:50:28

Nicholas,

  I ran across a function that should be on the good_funcs
list, sem_post(). According to the man page, it has the
distinction of being the ONLY pthreads synchronization
function that is async-signal safe.

-Steve Grubb

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Re: [Valgrind-users] New experimental skin: signal handler checker

From: John R. <jr...@Bi...> - 2003-07-07 02:57:22

On Wed, 2003-07-02 at 03:05, Nicholas Nethercote wrote:
> Many
> functions are non-reentrant, including lots of common ones like printf()
> (most standard I/O functions, in fact) and malloc().

Yes, but in practice the glibc implementation of printf/fprintf can be
used safely from a signal handler under restricted circumstances.  The
first is that malloc() must not be called [indirectly] from the signal
handler.  The reason why printf/fprintf would call malloc() is to
allocate a buffer for the output characters.  If the stream defaults to
non-buffered, or if the buffer has been allocated already, then
printf/fprintf does not call malloc.  So, if there has been a
[completed] call to printf/fprintf before the signal handler is invoked
[this will allocate a buffer], or if the stream defaults to non-buffered
[such as for fprintf(stderr, ...)], or if the program calls setvbuf() on
the stream before the signal handler is invoked, then printf/fprintf
need not call malloc().  Secondly, the usage from the handler must not
collide with usage of the same stream from non-signal level.  The
easiest way is to segregate entirely the signal-level streams from the
non-signal level streams, although a protocol for sharing a stream
between signal level and non-signal level also is possible.  You can
also fdopen(dup(fileno(stream)), ...) [needs error checking] to obtain a
separate stream "that goes to the same place" for use by a signal
handler.  [Output from the two streams will be intermixed.]

Of course, this is not necessarily portable to non-glibc
implementations.  But it may be useful.

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Jeremy F. <je...@go...> - 2003-08-18 00:32:54

On Wed, 2003-07-02 at 03:05, Nicholas Nethercote wrote:
> I've written a new skin for Valgrind.  It's called Crocus, and it searches
> for problems in signal handlers.

Have you considered adding helgrind-like functionality to look for
signal races?  Any code which needs to do something moderately complex
in a signal handler needs to make sure that it uses sigprocmask() to
prevent signals from happening at the wrong time in the mainline code. 
I have a chunk of code (autofs4's automount daemon) which does quite a
lot of stuff from signal handlers, and it would be nice to check for
correct behaviour.

	J

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Nicholas N. <nj...@ca...> - 2003-08-18 07:27:55

On Sun, 17 Aug 2003, Jeremy Fitzhardinge wrote:

> > I've written a new skin for Valgrind.  It's called Crocus, and it searches
> > for problems in signal handlers.
>
> Have you considered adding helgrind-like functionality to look for
> signal races?  Any code which needs to do something moderately complex
> in a signal handler needs to make sure that it uses sigprocmask() to
> prevent signals from happening at the wrong time in the mainline code.

No, but I wasn't aware it was something to look out for.  Is it clear what
needs to be checked, ie. is there a firm definition of "moderately
complex"?

N

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Jeremy F. <je...@go...> - 2003-08-18 07:51:38

On Mon, 2003-08-18 at 00:13, Nicholas Nethercote wrote:
> No, but I wasn't aware it was something to look out for.  Is it clear what
> needs to be checked, ie. is there a firm definition of "moderately
> complex"?

In simple terms, modifying data-structures in a signal handler.  You
could probably modify helgrind itself, or at least its algorithm, to
handle this case.  In effect, there are two threads: the main-line code,
and the signal handlers (extra complication: signal handlers can be
recursive).  Signals, if delivered, cause the context switch.  The mutex
operator is sigprocmask, so you set a fixed set of NSIG locks.  

The general idea that if you're touching a piece of memory and the
handler for signal N also touches that memory, you'd better have signal
N blocked.

You can easily tell when you start running a signal handler, because a
signal is delivered.  The tricky part is knowing when you've stopped -
sigreturn is easy, but you can longjmp out of a handler.  On the other
hand, maybe it doesn't matter.

	J

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Steve G <lin...@ya...> - 2003-08-18 14:09:25

>The general idea that if you're touching a piece of 
>memory and the handler for signal N also touches that 
>memory, you'd better have signal N blocked.

I suspect if you see code that does this, it should be
redesigned. Some good examples of serialization are stunnel
4.04 or xinetd-2.3.12. Come to think of it, postfix may be
another example of proper serialization.

I suppose an approach might be something like this: Each
time a signal is about to be delivered, check the mask. If
any unblocked signal (besides the one about to be
delivered) has a user defined signal handler, fake
delivering a signal to it and only note the addresses it
wanted to read data from or write data to and what the
access type is (read or write). Don't let it actually
write, reading is ok since its non-destructive. Then after
faking a signal to all user defined handlers, check the
list each time the current signal handler accesses a memory
location. Only read/write or write/write collisions matter.
The list gets thrown away at the end of the signal handler.

I don't think performance is an issue since signals don't
get delivered very often in normal applications. The
operating system even delays delivery until a system call
is made. I think this is managable so long as preventing
writes from the other signal handlers is possible. I have
no idea if this is tricky in the virtual cpu.

I think finding these problems does matter and is
worthwhile. Does anyone see issues with the above
algorithm? Of course, I am not familiar with the inner
workings of valgrind and someone else may have to code it.

Best Regards,
-Steve Grubb

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Jeremy F. <je...@go...> - 2003-08-19 06:13:58

On Mon, 2003-08-18 at 06:47, Steve G wrote:
> I suppose an approach might be something like this: Each
> time a signal is about to be delivered, check the mask. If
> any unblocked signal (besides the one about to be
> delivered) has a user defined signal handler, fake
> delivering a signal to it and only note the addresses it
> wanted to read data from or write data to and what the
> access type is (read or write). Don't let it actually
> write, reading is ok since its non-destructive. Then after
> faking a signal to all user defined handlers, check the
> list each time the current signal handler accesses a memory
> location. Only read/write or write/write collisions matter.
> The list gets thrown away at the end of the signal handler.

So you're saying speculatively run the signal handler without side
effects?  I think that's pretty tricky.  For one, it may have a read
which depends on one of its earlier writes.  Even more tricky, it may
write to memory with a syscall, which would be hard to make
side-effect-free.

I think a helgrind-like algorithm would be more appropriate:

For each memory location, maintain a state.  There are 4 states:
     1. untouched
     2. private - only touched by mainline code or a handler for a
        particular signal, but not both
     3. read-shared - only ever read by mainline and signal handlers
     4. RW-shared - both read and written by mainline and signal
        handlers

In RW-shared state, it keeps a set of signal handlers which touch the
memory location.  If any code touches a location in RW-shared state, it
must have that set of signals blocked, otherwise it reports an error.

The tricky part is trying to determine if a given instruction is being
run as part of a signal handler or not.  Entering the signal handler is
pretty obvious, since Valgrind has to go to some effort to deliver a
signal to a thread.  Detecting when the application leaves the handler
is more tricky.  If the handler does a normal return it's easy enough,
but if it longjmps out of the handler it's harder to tell.  I think the
sure-fire way which works in both cases is to detect when ESP goes above
the signal stack frame - this will happen in both the longjmp and
sigreturn cases.

The other potentially tricky thing to handle is recursive signals.  I
need to think about it more - it may just fall out in the algorithm
above.  I think if you consider the mainline code to also be a
particular form of signal handler (SIGMAIN) then it comes out naturally.

	J

Re: [Valgrind-users] New experimental skin: signal handler checker

From: John R. <jr...@Bi...> - 2003-08-18 14:14:55

> The general idea that if you're touching a piece of memory and the
> handler for signal N also touches that memory, you'd better have signal
> N blocked.

But only for "complex" situations.  For "simple" cases the _recommended_
methodolgy violates this rule.  Namely, the signal handler sets  (or
increments) a global variable, and the non-signal handling code checks
this variable at convenient times (to see if it has changed from the
last time the non-handler code checked its value).

Even in more complicated situations, a circular buffer between the
signal handler as producer and the mainline code as consumer,
would violate the rule about "no touching" the control pointers
of the circular buffer.  What did you really mean?

Re: [Valgrind-users] New experimental skin: signal handler checker

From: Jeremy F. <je...@go...> - 2003-08-18 15:59:18

On Mon, 2003-08-18 at 06:59, John Reiser wrote:
> > The general idea that if you're touching a piece of memory and the
> > handler for signal N also touches that memory, you'd better have signal
> > N blocked.
> 
> But only for "complex" situations.  For "simple" cases the _recommended_
> methodolgy violates this rule.  Namely, the signal handler sets  (or
> increments) a global variable, and the non-signal handling code checks
> this variable at convenient times (to see if it has changed from the
> last time the non-handler code checked its value).

Well, you probably wouldn't bother running it under crocus to test its
correctness in that case.  

In my experience, however, programs which use such a simple scheme are
prone to signal race deadlocks.  If its a program which can truly work
with only a flag set in the signal handler, it would probably be much
better off to simply block all signals and poll for pending signals with
sigwait().  If it relies on the signal to interrupt a syscall, then it
almost certainly has deadlock bugs.

> Even in more complicated situations, a circular buffer between the
> signal handler as producer and the mainline code as consumer,
> would violate the rule about "no touching" the control pointers
> of the circular buffer.

Well, surely the buffer pointer updates need to be protected, unless
you're using strictly atomic operations?

>   What did you really mean?

Erm, what I said, I think.

I think the most common case of manipulating data structures from a
signal handler is in programs which keep track of their child processes:
they seem to do a lot of work in their SIGCHLD handlers.

	J

1 2 > >> (Page 1 of 2)