From: Nicholas N. <nj...@ca...> - 2003-07-02 10:05:58
|
Hi, I've written a new skin for Valgrind. It's called Crocus, and it searches for problems in signal handlers. Because signal handlers can be called at any time, they shouldn't call any functions that are non-reentrant, ie. functions that will screw up if they are interrupted by another call to themselves, usually because some shared or global data structure could be corrupted or overwritten. Many functions are non-reentrant, including lots of common ones like printf() (most standard I/O functions, in fact) and malloc(). You also aren't supposed to call any pthread functions from signal handlers, as this can cause deadlock. Lots of programs don't get this right; Vim, for example. Crocus identifies when a program's interrupt handler calls one of a number of common non-reentrant functions. It also identifies when an interrupt handler doesn't preserve errno as it should. To run this skin, download valgrind-CROCUS.tar.bz2 from: www.cl.cam.ac.uk/~njn25/valgrind.html install as normal, and run it with the --skin=crocus option. It's a little bit experimental, but has been tested reasonably well. Thanks to Steve Grubb for the original idea and testing it on a variety of programs. I welcome any feedback. N |
From: Troels W. H. <tr...@th...> - 2003-07-03 09:39:33
|
Hi Nicholas, I tried your skin on RH8 + all current updates. It seems to work great and identified a potential deadlock (calling pthread_mutex_unlock() in a SIGINT/SIGTERM handler) in one of the applications I tried it with. Didn't encounter any problems. Definitely a nice addition to Valgrind's skin arsenal. Troels > -----Original Message----- > From: val...@li... > [mailto:val...@li...] On Behalf > Of Nicholas Nethercote > Sent: 2. juli 2003 12:06 > To: Valgrind Users > Subject: [Valgrind-users] New experimental skin: signal > handler checker > > > Hi, > > I've written a new skin for Valgrind. It's called Crocus, > and it searches > for problems in signal handlers. > > Because signal handlers can be called at any time, they > shouldn't call any > functions that are non-reentrant, ie. functions that will > screw up if they > are interrupted by another call to themselves, usually > because some shared > or global data structure could be corrupted or overwritten. Many > functions are non-reentrant, including lots of common ones > like printf() > (most standard I/O functions, in fact) and malloc(). You also aren't > supposed to call any pthread functions from signal handlers, > as this can > cause deadlock. > > Lots of programs don't get this right; Vim, for example. Crocus > identifies when a program's interrupt handler calls one of a number of > common non-reentrant functions. It also identifies when an interrupt > handler doesn't preserve errno as it should. To run this skin, > download valgrind-CROCUS.tar.bz2 from: > www.cl.cam.ac.uk/~njn25/valgrind.html install as normal, and run it with the --skin=crocus option. It's a little bit experimental, but has been tested reasonably well. Thanks to Steve Grubb for the original idea and testing it on a variety of programs. I welcome any feedback. N ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Valgrind-users mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: Nicholas N. <nj...@ca...> - 2003-07-03 10:37:16
|
On Thu, 3 Jul 2003, Troels Walsted Hansen wrote: > I tried your skin on RH8 + all current updates. > > It seems to work great and identified a potential deadlock (calling > pthread_mutex_unlock() in a SIGINT/SIGTERM handler) in one of the > applications I tried it with. > > Didn't encounter any problems. > > Definitely a nice addition to Valgrind's skin arsenal. Great, thanks for the feedback, I'm glad it worked well for you. ---- Warning: here be Technical mumbo-jumbo... In practice, I don't know how likely deadlock is if you call pthread_mutex_unlock() from a signal handler, but the LinuxThreads pthread_mutex_lock() man page says this: ASYNC-SIGNAL SAFETY The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread. Similar advice is given at www.burningvoid.com/iaq/posix-signals.html: One important safety tip is that pthreads constructs like mutexes, condition variables, etc. do not have defined behavior inside signal handlers. So you cannot use pthreads calls or synchronize with other threads while inside a signal handler. Currently the only pthread functions in the blacklist are pthread_mutex_{lock,unlock,trylock}, maybe others (eg. condition variable ones) should be too. The blacklist currently looks like this: Char* bad_funcs[] = { "_IO_printf", "_IO_puts", "syslog", "strdup", "__libc_malloc", "__libc_calloc", "__libc_realloc", "__libc_free", "inet_ntoa", "gethostbyname", "gethostbyaddr", "getservbynam", "readdir", "__readdir", "__readdir64", "strtok", "ttyname", "getttyname", "getlogin", "cuserid", "getpwent", "getpwnam", "getpwuid", "getgrent", "getgrnam", "getgrgid", "asctime", "ctime", "gmtime", "localtime", "fclose", "fflush", "fflush_unlocked", "fopen", "freopen", "fdopen", "fopencookie", "fmemopen", "open_memstream", "fprintf", "vfprintf", "fscanf", "vfscanf", "fgetc", "getc", "getc_unlocked", "fgetc_unlocked", "fputc", "putc", "fputc_unlocked", "putc_unlocked", "getw", "putw", "fgets", "fputs", "ungetc", "fseek", "rewind", "__pthread_mutex_lock", "__pthread_mutex_unlock", "__pthread_mutex_trylock", NULL }; I'd welcome suggestions for new ones, or any corrections. Crocus also has a --strict option, which uses a whitelist instead of a blacklist; any function called from a signal handler that's not on the whitelist causes a warning. Unfortunately, user-defined functions can be reentrant so that's not such a good approach. For completeness, and in case you're having trouble sleeping, here's the current whitelist: /* From "Advanced Programming in the UNIX Environment", Stevens, p278. Those marked with a "+" are not specified by POSIX.1 as reentrant, but are listed in the SVR4 SVID. */ Char* good_funcs[] = { "_exit", "abort"/*+*/, "access", "alarm", "cfgetispeed", "cfgetospeed", "cfsetispeed", "cfsetospeed", "chdir", "chmod", "chown", "close", "creat", "dup", "dup2", "execle", "execve", "exit"/*+*/, "fcntl", "fork", "fstat", "getegid", "geteuid", "getgid", "getgroups", "getpgrp", "getpid", "getppid", "getuid", "kill", "link", "longjmp"/*+*/, "lseek", "mkdir", "mkfifo", "open", "pathconf", "pause", "pipe", "read", "rename", "rmdir", "setgid", "setpgid", "setsid", "setuid", "__sigaction", "__libc_sigaction", "sigaddset", "sigdelset", "sigemptyset", "sigfillset", "sigismember", "signal"/*+*/, "sigpending", "__sigprocmask", "sigsuspend", "sleep", "stat", "sysconf", "tcdrain", "tcflow", "tcflush", "tcgetattr", "tcgetpgrp", "tcsendbreak", "tcsetattr", "tcsetpgrp", "time", "times", "umask", "uname", "unlink", "utime", "wait", "waitpid", "write", NULL }; And maybe I should use the term "signal safe" when talking about functions that can be called from signal handlers, rather than "reentrant"? Or are they the same thing? I haven't quite worked out the exact differences between "signal safe", "thread safe", and "reentrant". AFAICT, they're all pretty similar, but there might be some functions that satisfy one but not all the definitions? N |
From: Tom H. <th...@cy...> - 2003-07-03 11:29:16
|
In message <Pin...@gr...> Nicholas Nethercote <nj...@ca...> wrote: > Char* bad_funcs[] = { > "_IO_printf", "_IO_puts", > "syslog", "strdup", > "__libc_malloc", "__libc_calloc", "__libc_realloc", "__libc_free", > "inet_ntoa", "gethostbyname", "gethostbyaddr", "getservbynam", > "readdir", "__readdir", "__readdir64", "strtok", > "ttyname", "getttyname", "getlogin", "cuserid", > "getpwent", "getpwnam", "getpwuid", > "getgrent", "getgrnam", "getgrgid", > "asctime", "ctime", "gmtime", "localtime", > "fclose", "fflush", "fflush_unlocked", "fopen", "freopen", > "fdopen", "fopencookie", "fmemopen", "open_memstream", > "fprintf", "vfprintf", "fscanf", "vfscanf", "fgetc", "getc", > "getc_unlocked", "fgetc_unlocked", "fputc", "putc", "fputc_unlocked", > "putc_unlocked", "getw", "putw", "fgets", "fputs", "ungetc", "fseek", > "rewind", > "__pthread_mutex_lock", "__pthread_mutex_unlock", > "__pthread_mutex_trylock", > NULL > }; > > I'd welcome suggestions for new ones, or any corrections. Well "find /usr/include -name *.h -print0 | xargs -0 grep '_r *('" suggests quite a lot, including: gethostent getnetent getnetbyname getnetbyaddr getservent getservbyport getprotoent getprotobyname getprotobynumber getrpcent getrpcbyname getrpcbynumber getmntent fgetpwent fgetgrent tmpnam random srandom initstate setstate drand48 erand48 lrand48 nrand48 mrand48 jrand48 srand48 seed48 lcong48 ecvt fcvt qecvt qfcvt strerror crypt setkey encrypt hcreate hsearch hdestroy Tom -- Tom Hughes (th...@cy...) Software Engineer, Cyberscience Corporation http://www.cyberscience.com/ |
From: Steve G <lin...@ya...> - 2003-07-03 12:00:46
|
>Well "find /usr/include -name *.h -print0 | xargs -0 grep '_r >*('" suggests quite a lot, including: Yup. It also seems that the glibc documentation is not complete either. Looking at the getnetent function man page, it doesn't mention the word re-entrant anywhere. It also doesn't mention where the returned data structure came from and whether or not you are supposed to free it. Glibc man pages needs patching, too. -Steve Grubb __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com |
From: Igmar P. <mai...@jd...> - 2003-07-08 09:46:37
|
> >Well "find /usr/include -name *.h -print0 | xargs -0 grep > '_r > >*('" suggests quite a lot, including: > > Yup. It also seems that the glibc documentation is not > complete either. Looking at the getnetent function man > page, it doesn't mention the word re-entrant anywhere. It > also doesn't mention where the returned data structure came > from and whether or not you are supposed to free it. Glibc > man pages needs patching, too. I don't look at them anymore, I mainly use FreeBSD's manpage. There tons of functions (ttyname() is a recent one I ran into) that return static memory, but that isn't mentioned. Igmar |
From: Nicholas N. <nj...@ca...> - 2003-07-03 12:22:05
|
On Thu, 3 Jul 2003, Tom Hughes wrote: > Well "find /usr/include -name *.h -print0 | xargs -0 grep '_r *('" suggests > quite a lot, including: > > gethostent > getnetent getnetbyname getnetbyaddr > getservent getservbyport > getprotoent getprotobyname getprotobynumber > getrpcent getrpcbyname getrpcbynumber > getmntent > fgetpwent fgetgrent > tmpnam > random srandom initstate setstate > drand48 erand48 lrand48 nrand48 mrand48 jrand48 srand48 seed48 lcong48 > ecvt fcvt qecvt qfcvt > strerror > crypt > setkey encrypt > hcreate hsearch hdestroy Ok, I'll add them. Doing the same on my system, I also got these ones (that weren't on the original blacklist): ether_ntoa ether_aton nis_leaf_of nis_name_of nis_domain_of nis_sperror getaliasent getaliasbyname gethostbyname2 getservbyname getnetgrent getpwnam getspent getspnam sgetspent fgetspent rand ptsname getdate getutent getutid getutline BIO_gethostbyname Should they go in too? Thanks very much for the help. N |
From: Tom H. <th...@cy...> - 2003-07-03 13:22:57
|
In message <Pin...@gr...> Nicholas Nethercote <nj...@ca...> wrote: > Doing the same on my system, I also got these ones (that weren't on the > original blacklist): > > ether_ntoa ether_aton These look like they need to go it. > nis_leaf_of nis_name_of nis_domain_of nis_sperror > getaliasent getaliasbyname I can't see any documentation for these, so it's difficult to say. > gethostbyname2 getservbyname These look like they need to go it. > getnetgrent Probably, although it doesn't seem to have a manual page. > getpwnam I think this is already in isn't it? If it isn't then it needs to go in. > getspent getspnam > sgetspent fgetspent These are for the shadow file, and probably need to go in although they don't seem to have manual pages... > rand Yes, along with srand if it isn't in the other list. > ptsname Probably, although it doesn't seem to have a manual page. > getdate getutent getutid getutline Yes, along with pututline I think. > BIO_gethostbyname That's an openssl routine. It may need to go in though, but I was mainly looking at C library routines. Tom -- Tom Hughes (th...@cy...) Software Engineer, Cyberscience Corporation http://www.cyberscience.com/ |
From: Steve G <lin...@ya...> - 2003-07-03 13:46:16
|
> ether_ntoa ether_aton > nis_leaf_of nis_name_of nis_domain_of nis_sperror > getaliasent getaliasbyname gethostbyname2 getservbyname > getnetgrent getpwnam getspent getspnam > sgetspent fgetspent > rand > ptsname getdate getutent getutid getutline > BIO_gethostbyname > >Should they go in too? Some of these are esoteric and/or undocumented. For example, getaliasent has no man page. What does it do? Is it a public API? With no man page, I have no idea what its doing or if its safe. The ether functions page says it uses static buffers, so adding this would be good. nis_*, BIO_*, pts*, getut*, ?getspent, getsp* are undocumented (no man page). I suppose we can err on the side of caution by adding all of them and wait for someone to prove it otherwise. -Steve Grubb __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com |
From: Tom H. <th...@cy...> - 2003-07-03 14:06:16
|
In message <200...@we...> Steve G. <lin...@ya...> wrote: > nis_*, BIO_*, pts*, getut*, ?getspent, getsp* are > undocumented (no man page). I suppose we can err on the > side of caution by adding all of them and wait for someone > to prove it otherwise. ptsname is documented in SUS: http://www.opengroup.org/onlinepubs/007904975/functions/ptsname.html Most of BIO_* has manual pages, just not that one routine as far as I can see. I haven't checked openssl.org to see if there is any doco there for it. Tom -- Tom Hughes (th...@cy...) Software Engineer, Cyberscience Corporation http://www.cyberscience.com/ |
From: Dan K. <da...@ke...> - 2003-07-03 15:05:49
|
Nicholas Nethercote wrote: > Doing the same on my system, I also got these ones (that weren't on the > original blacklist): > > ether_ntoa ether_aton > nis_leaf_of nis_name_of nis_domain_of nis_sperror > getaliasent getaliasbyname gethostbyname2 getservbyname > getnetgrent getpwnam getspent getspnam > sgetspent fgetspent > rand > ptsname getdate getutent getutid getutline > BIO_gethostbyname > > Should they go in too? You can check SuSv3 for each standard function; it's pretty good about telling you what's threadsafe/reentrant and what isn't. e.g. http://www.opengroup.org/onlinepubs/007904975/functions/rand.html says "[CX] The rand() function need not be reentrant. A function that is not required to be reentrant is not required to be thread-safe. " BTW, this tool of yours is pretty cool. I bet a similar test could be applied at compile time using smatch http://smatch.sourceforge.net/ That'd be quite a combination -- catch a bunch of them statically, and then use your dynamic tool to find some the static checker missed... - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
From: Nicholas N. <nj...@ca...> - 2003-07-03 15:47:51
|
On Thu, 3 Jul 2003, Dan Kegel wrote: > BTW, this tool of yours is pretty cool. I bet a similar > test could be applied at compile time using smatch > http://smatch.sourceforge.net/ > That'd be quite a combination -- catch a bunch of them statically, > and then use your dynamic tool to find some the static checker missed... Ah, I was aware of the Stanford checking project but not smatch. That's cool. I had thought that this signal handler checking would be much better done statically. Crocus may be put out of business before it's even started! But that would be fine, a static checker would be better. You mention using the dynamic tool for the ones the static tool misses, have you used smatch and know that static misses are common, or is this just a guess? I imagine that signal handler checking would be pretty straightforward to do statically. And once people know they shouldn't call these function from signal handlers, it's not a problem that's hard to fix, it's more just a case of knowing about it in the first place. As opposed to eg. memory errors, where a dynamic automated tool is very useful for finding obscure bugs. N |
From: Dan K. <da...@ke...> - 2003-07-03 16:07:54
|
Nicholas Nethercote wrote: > On Thu, 3 Jul 2003, Dan Kegel wrote: >>BTW, this tool of yours is pretty cool. I bet a similar >>test could be applied at compile time using smatch >>http://smatch.sourceforge.net/ >>That'd be quite a combination -- catch a bunch of them statically, >>and then use your dynamic tool to find some the static checker missed... > > > Ah, I was aware of the Stanford checking project but not smatch. That's > cool. I had thought that this signal handler checking would be much > better done statically. Crocus may be put out of business before it's > even started! But that would be fine, a static checker would be better. We need both! > You mention using the dynamic tool for the ones the static tool misses, > have you used smatch and know that static misses are common, or is this > just a guess? I imagine that signal handler checking would be pretty > straightforward to do statically. And once people know they shouldn't > call these function from signal handlers, it's not a problem that's hard > to fix, it's more just a case of knowing about it in the first place. As > opposed to eg. memory errors, where a dynamic automated tool is very > useful for finding obscure bugs. Consider the case where you're doing QA on an open source project which is huge and hard to compile, and the developers haven't seen fit to run the static checker on the whole source base yet. Or consider a closed-source app. In both cases, the dynamic checker will work, and the static one won't (because the source is hard to get or configure). - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
From: Nicholas N. <nj...@ca...> - 2003-07-03 16:20:03
|
On Thu, 3 Jul 2003, Dan Kegel wrote: > > But that would be fine, a static checker would be better. > > We need both! > > Consider the case where you're doing QA on an open source > project which is huge and hard to compile, and the developers haven't > seen fit to run the static checker on the whole source base yet. > Or consider a closed-source app. In both cases, the dynamic > checker will work, and the static one won't (because the source is > hard to get or configure). Good point. Three cheers for dynamic binary translation! :) N |
From: Steve G <lin...@ya...> - 2003-07-03 16:16:01
|
>I had thought that this signal handler checking would >be much better done statically. Crocus may be put out >of business before it's even started! Doing it at compile time would be better...however...there's many times I'm helping someone and out of curiousity we run valgrind, libsafe, or electric fence to see what's going on with the binary. There will always be a need for tools that do not require getting the source code just to check out a misbehaving application. You also may not have the source code if its a commercial app. You could still check it out and send a better bug report. -Steve Grubb __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com |
From: Steve G <lin...@ya...> - 2003-07-03 11:42:43
|
>And maybe I should use the term "signal safe" when talking >about functions that can be called from signal handlers, >rather than "reentrant"? Or are they the same thing? They are different. A function that is re-entrant is one that may be called by a different thread of execution concurrently and it won't cause any ill effects. Signal safe means that the function is not using static data buffers, not using library data structures that may be in unsafe state, and only calling functions that are in the whitelist or functions composed of those on the whitelist. Signal handlers should be both signal-safe and re-entrant. Common functions in a Pthreads program should be re-entrant. Glad to see other people finding bugs in signal handlers. This was the area of most problems in a lot of code reviews I've been doing. I'm reviewing some code from freeradius right now and its sigchld signal handler does bad things...which might explain why it crashes after 6-7 days of heavy use. Fixing signal handlers is simple once you are aware of a problem. It seems people were not aware. -Steve Grubb __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com |
From: Steve G <lin...@ya...> - 2003-07-05 23:50:28
|
Nicholas, I ran across a function that should be on the good_funcs list, sem_post(). According to the man page, it has the distinction of being the ONLY pthreads synchronization function that is async-signal safe. -Steve Grubb __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com |
From: John R. <jr...@Bi...> - 2003-07-07 02:57:22
|
On Wed, 2003-07-02 at 03:05, Nicholas Nethercote wrote: > Many > functions are non-reentrant, including lots of common ones like printf() > (most standard I/O functions, in fact) and malloc(). Yes, but in practice the glibc implementation of printf/fprintf can be used safely from a signal handler under restricted circumstances. The first is that malloc() must not be called [indirectly] from the signal handler. The reason why printf/fprintf would call malloc() is to allocate a buffer for the output characters. If the stream defaults to non-buffered, or if the buffer has been allocated already, then printf/fprintf does not call malloc. So, if there has been a [completed] call to printf/fprintf before the signal handler is invoked [this will allocate a buffer], or if the stream defaults to non-buffered [such as for fprintf(stderr, ...)], or if the program calls setvbuf() on the stream before the signal handler is invoked, then printf/fprintf need not call malloc(). Secondly, the usage from the handler must not collide with usage of the same stream from non-signal level. The easiest way is to segregate entirely the signal-level streams from the non-signal level streams, although a protocol for sharing a stream between signal level and non-signal level also is possible. You can also fdopen(dup(fileno(stream)), ...) [needs error checking] to obtain a separate stream "that goes to the same place" for use by a signal handler. [Output from the two streams will be intermixed.] Of course, this is not necessarily portable to non-glibc implementations. But it may be useful. |
From: Jeremy F. <je...@go...> - 2003-08-18 00:32:54
|
On Wed, 2003-07-02 at 03:05, Nicholas Nethercote wrote: > I've written a new skin for Valgrind. It's called Crocus, and it searches > for problems in signal handlers. Have you considered adding helgrind-like functionality to look for signal races? Any code which needs to do something moderately complex in a signal handler needs to make sure that it uses sigprocmask() to prevent signals from happening at the wrong time in the mainline code. I have a chunk of code (autofs4's automount daemon) which does quite a lot of stuff from signal handlers, and it would be nice to check for correct behaviour. J |
From: Nicholas N. <nj...@ca...> - 2003-08-18 07:27:55
|
On Sun, 17 Aug 2003, Jeremy Fitzhardinge wrote: > > I've written a new skin for Valgrind. It's called Crocus, and it searches > > for problems in signal handlers. > > Have you considered adding helgrind-like functionality to look for > signal races? Any code which needs to do something moderately complex > in a signal handler needs to make sure that it uses sigprocmask() to > prevent signals from happening at the wrong time in the mainline code. No, but I wasn't aware it was something to look out for. Is it clear what needs to be checked, ie. is there a firm definition of "moderately complex"? N |
From: Jeremy F. <je...@go...> - 2003-08-18 07:51:38
|
On Mon, 2003-08-18 at 00:13, Nicholas Nethercote wrote: > No, but I wasn't aware it was something to look out for. Is it clear what > needs to be checked, ie. is there a firm definition of "moderately > complex"? In simple terms, modifying data-structures in a signal handler. You could probably modify helgrind itself, or at least its algorithm, to handle this case. In effect, there are two threads: the main-line code, and the signal handlers (extra complication: signal handlers can be recursive). Signals, if delivered, cause the context switch. The mutex operator is sigprocmask, so you set a fixed set of NSIG locks. The general idea that if you're touching a piece of memory and the handler for signal N also touches that memory, you'd better have signal N blocked. You can easily tell when you start running a signal handler, because a signal is delivered. The tricky part is knowing when you've stopped - sigreturn is easy, but you can longjmp out of a handler. On the other hand, maybe it doesn't matter. J |
From: Steve G <lin...@ya...> - 2003-08-18 14:09:25
|
>The general idea that if you're touching a piece of >memory and the handler for signal N also touches that >memory, you'd better have signal N blocked. I suspect if you see code that does this, it should be redesigned. Some good examples of serialization are stunnel 4.04 or xinetd-2.3.12. Come to think of it, postfix may be another example of proper serialization. I suppose an approach might be something like this: Each time a signal is about to be delivered, check the mask. If any unblocked signal (besides the one about to be delivered) has a user defined signal handler, fake delivering a signal to it and only note the addresses it wanted to read data from or write data to and what the access type is (read or write). Don't let it actually write, reading is ok since its non-destructive. Then after faking a signal to all user defined handlers, check the list each time the current signal handler accesses a memory location. Only read/write or write/write collisions matter. The list gets thrown away at the end of the signal handler. I don't think performance is an issue since signals don't get delivered very often in normal applications. The operating system even delays delivery until a system call is made. I think this is managable so long as preventing writes from the other signal handlers is possible. I have no idea if this is tricky in the virtual cpu. I think finding these problems does matter and is worthwhile. Does anyone see issues with the above algorithm? Of course, I am not familiar with the inner workings of valgrind and someone else may have to code it. Best Regards, -Steve Grubb __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
From: Jeremy F. <je...@go...> - 2003-08-19 06:13:58
|
On Mon, 2003-08-18 at 06:47, Steve G wrote: > I suppose an approach might be something like this: Each > time a signal is about to be delivered, check the mask. If > any unblocked signal (besides the one about to be > delivered) has a user defined signal handler, fake > delivering a signal to it and only note the addresses it > wanted to read data from or write data to and what the > access type is (read or write). Don't let it actually > write, reading is ok since its non-destructive. Then after > faking a signal to all user defined handlers, check the > list each time the current signal handler accesses a memory > location. Only read/write or write/write collisions matter. > The list gets thrown away at the end of the signal handler. So you're saying speculatively run the signal handler without side effects? I think that's pretty tricky. For one, it may have a read which depends on one of its earlier writes. Even more tricky, it may write to memory with a syscall, which would be hard to make side-effect-free. I think a helgrind-like algorithm would be more appropriate: For each memory location, maintain a state. There are 4 states: 1. untouched 2. private - only touched by mainline code or a handler for a particular signal, but not both 3. read-shared - only ever read by mainline and signal handlers 4. RW-shared - both read and written by mainline and signal handlers In RW-shared state, it keeps a set of signal handlers which touch the memory location. If any code touches a location in RW-shared state, it must have that set of signals blocked, otherwise it reports an error. The tricky part is trying to determine if a given instruction is being run as part of a signal handler or not. Entering the signal handler is pretty obvious, since Valgrind has to go to some effort to deliver a signal to a thread. Detecting when the application leaves the handler is more tricky. If the handler does a normal return it's easy enough, but if it longjmps out of the handler it's harder to tell. I think the sure-fire way which works in both cases is to detect when ESP goes above the signal stack frame - this will happen in both the longjmp and sigreturn cases. The other potentially tricky thing to handle is recursive signals. I need to think about it more - it may just fall out in the algorithm above. I think if you consider the mainline code to also be a particular form of signal handler (SIGMAIN) then it comes out naturally. J |
From: John R. <jr...@Bi...> - 2003-08-18 14:14:55
|
> The general idea that if you're touching a piece of memory and the > handler for signal N also touches that memory, you'd better have signal > N blocked. But only for "complex" situations. For "simple" cases the _recommended_ methodolgy violates this rule. Namely, the signal handler sets (or increments) a global variable, and the non-signal handling code checks this variable at convenient times (to see if it has changed from the last time the non-handler code checked its value). Even in more complicated situations, a circular buffer between the signal handler as producer and the mainline code as consumer, would violate the rule about "no touching" the control pointers of the circular buffer. What did you really mean? |
From: Jeremy F. <je...@go...> - 2003-08-18 15:59:18
|
On Mon, 2003-08-18 at 06:59, John Reiser wrote: > > The general idea that if you're touching a piece of memory and the > > handler for signal N also touches that memory, you'd better have signal > > N blocked. > > But only for "complex" situations. For "simple" cases the _recommended_ > methodolgy violates this rule. Namely, the signal handler sets (or > increments) a global variable, and the non-signal handling code checks > this variable at convenient times (to see if it has changed from the > last time the non-handler code checked its value). Well, you probably wouldn't bother running it under crocus to test its correctness in that case. In my experience, however, programs which use such a simple scheme are prone to signal race deadlocks. If its a program which can truly work with only a flag set in the signal handler, it would probably be much better off to simply block all signals and poll for pending signals with sigwait(). If it relies on the signal to interrupt a syscall, then it almost certainly has deadlock bugs. > Even in more complicated situations, a circular buffer between the > signal handler as producer and the mainline code as consumer, > would violate the rule about "no touching" the control pointers > of the circular buffer. Well, surely the buffer pointer updates need to be protected, unless you're using strictly atomic operations? > What did you really mean? Erm, what I said, I think. I think the most common case of manipulating data structures from a signal handler is in programs which keep track of their child processes: they seem to do a lot of work in their SIGCHLD handlers. J |