|
From: Dan K. <da...@ke...> - 2003-08-18 16:54:01
|
Jeremy Fitzhardinge wrote: >>Even in more complicated situations, a circular buffer between the >>signal handler as producer and the mainline code as consumer, >>would violate the rule about "no touching" the control pointers >>of the circular buffer. Yep. In other words, there's something iffy about the whole idea of signal handlers > > > Well, surely the buffer pointer updates need to be protected, unless > you're using strictly atomic operations? > > >> What did you really mean? > > > Erm, what I said, I think. > > I think the most common case of manipulating data structures from a > signal handler is in programs which keep track of their child processes: > they seem to do a lot of work in their SIGCHLD handlers. > > J > > > > ------------------------------------------------------- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
|
From: Dan K. <da...@ke...> - 2003-08-18 18:11:26
|
Dan Kegel wrote: > Jeremy Fitzhardinge wrote: > >>> Even in more complicated situations, a circular buffer between the >>> signal handler as producer and the mainline code as consumer, >>> would violate the rule about "no touching" the control pointers >>> of the circular buffer. > > Yep. In other words, there's something iffy about the > whole idea of signal handlers Arrgh. Clicked 'send' by mistake. I meant to say "classic signal handlers". The current Unix standard, http://www.opengroup.org/onlinepubs/007904975/functions/sigaction.html has this to say about the issue: "In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these functions either by blocking the appropriate signals or through the use of some programmatic semaphore (see semget() , sem_init() , sem_open(), and so on). ..." - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
|
From: Steve G <lin...@ya...> - 2003-08-18 17:56:11
|
>In my experience, however, programs which use such a >simple scheme are prone to signal race deadlocks. Not really...it just depends on what your job is. I re-wrote the signal handlers in proftpd about a year and half ago using flags. The signal handlers set a flag when something of interest occurs. The handler is call from the main loop which periodically polls the flag. The handlers reset the flag first, and then do the work. They completely process everything pending. The worst case is that the same signal is delivered while processing that flag. This causes a call to the signal handler for which no work is waiting. >If it relies on the signal to interrupt a syscall, then >it almost certainly has deadlock bugs. Almost all daemons depend on a signal to get them out of select(). >Well, surely the buffer pointer updates need to be >protected, unless you're using strictly atomic >operations? There's no way to protect data structures using process synchronization techniques from a signal handler. Either you have to do you homework up front in the design making sure nothing steps on each other, or using signals as flags and not as hardware interrupt handlers. In case you missed the way that stunnel works, what it does is create a signal pipe. Whenever a signal comes in, it writes to the pipe. The pipe is added to the fd_set that select watches. Whenever a signal occurs, the pipe becomes readable and it goes off to handle that event. Write() is a legal function call from a signal handler. What's written to the pipe is an integer which is sigatomic in size. Xinetd does the same thing except it uses a different integer for each signal. The only issue you have to watch when using the pipe technique for serialization is to call fcntl(fd, F_SETFD, CLOEXEC); so you don't leak the pipe descriptor. >I think the most common case of manipulating data >structures from a signal handler is in programs which >keep track of their child processes: These are all fundamentally broken. See http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2002-1563 for a good reason why stunnel moved to the current technique. -Steve Grubb __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Jeremy F. <je...@go...> - 2003-08-18 18:01:05
|
On Mon, 2003-08-18 at 10:19, Steve G wrote: > >In my experience, however, programs which use such a > >simple scheme are prone to signal race deadlocks. > > Not really...it just depends on what your job is. I > re-wrote the signal handlers in proftpd about a year and > half ago using flags. The signal handlers set a flag when > something of interest occurs. The handler is call from the > main loop which periodically polls the flag. The handlers > reset the flag first, and then do the work. They completely > process everything pending. The worst case is that the same > signal is delivered while processing that flag. This causes > a call to the signal handler for which no work is waiting. Yes, this is the case where sigwait/sigtimedwait allows you to consume pending signals without async delivery - which makes the whole issue of calling handlers moot. > There's no way to protect data structures using process > synchronization techniques from a signal handler. Either > you have to do you homework up front in the design making > sure nothing steps on each other, or using signals as flags > and not as hardware interrupt handlers. Well, that's not really true. You can use sigprocmask to defer delivery of a signal for as long as you need to, so you can use it to protect your critical sections. > In case you missed the way that stunnel works, what it does > is create a signal pipe. Whenever a signal comes in, it > writes to the pipe. The pipe is added to the fd_set that > select watches. Whenever a signal occurs, the pipe becomes > readable and it goes off to handle that event. Write() is a > legal function call from a signal handler. What's written > to the pipe is an integer which is sigatomic in size. Right, that's what I do in the autofs4 autmount as well. It means that you're not relying on the signal to interrupt the syscall directly; you're using the normal select mechanisms. There's a theoretical risk of deadlock if the pipe fills up, but that just means you need to select/poll often enough. > Xinetd does the same thing except it uses a different > integer for each signal. The only issue you have to watch > when using the pipe technique for serialization is to call > fcntl(fd, F_SETFD, CLOEXEC); so you don't leak the pipe > descriptor. If you're forking child processes (no exec) you also have to be careful not to end up sharing your message pipe. (I'd really like CLOFORK.) > > >I think the most common case of manipulating data > >structures from a signal handler is in programs which > >keep track of their child processes: > > These are all fundamentally broken. See > http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2002-1563 > for a good reason why stunnel moved to the current > technique. I disagree, because you can use sigprocmask to get it right. On the other hand, it is error-prone (like any multi-threaded programming), so it would be nice to have a tool to test for correctness. Hence this thread on this list. J |
|
From: John R.
|
Jeremy Fitzhardinge wrote: > On Mon, 2003-08-18 at 06:59, John Reiser wrote: >>Even in more complicated situations, a circular buffer between the >>signal handler as producer and the mainline code as consumer, >>would violate the rule about "no touching" the control pointers >>of the circular buffer. > > > Well, surely the buffer pointer updates need to be protected, unless > you're using strictly atomic operations? No, the buffer pointer updates do not need to be "protected". As long as the width of the pointer equals the width of the memory bus [or pointer is narrower and the memory has natural operations of that width, and the code uses them], and the storage for the pointer itself is naturally aligned [width divides address], then ordinary load and store operations suffice. This is commonly the case; the significant exceptions are on "8-bit" (or less) microcontrollers where a pointer is wider than the memory bus. It may be argued that this makes the ordinary read+write of the control pointers "strictly atomic", but that is much of the point. The ordinary operations are enough; you don't need privilege, assembly code, a special compiler, or special tricks. This a major point of a [properly implemented] circular buffer. No bus interlock, no compare-and-swap, no semaphore. Ordinary load and store are atomic enough. The interlocks are implicit in the logic of the control pointers. For review: A circular buffer implements a one-way reliable channel between two process[es|ors] having read+write access to a common memory, without any hardware interlocks except ordinary read+write memory operations that are monotonic in time-causality. The buffer consists of a region of memory and four indices: FIRST, IN, OUT, LAST. At all times FIRST <= IN <= OUT <= LAST. FIRST and LAST are constant for the duration of the channel. Only the producer writes IN, and only the consumer writes OUT. If OUT==IN then the buffer is empty. If OUT < IN then the region beginning at memory[OUT] and extending upto, but not including, memory[IN] has data from the producer for the consumer. If IN < OUT then the region beginning at memory[OUT] and continuing upto, but not including, memory[LAST] has the first part of data, and the region from memory[FIRST] upto, but not including, memory[IN] has the second part of data from the producer for the consumer. Both the producer and consumer can operate simultaneously and achieve reliable one-way communication with low overhead. An early commercial implementation appeared in the Control Data Corporation (CDC) 6600/6500/6400 machines in the late 1960s. They had 11 or 12 processors: 1 or 2 CPU, plus 10 PPU (Peripheral Processor Unit). Each PPU had a 4K-word (18 bit) memory of its own, plus access to the 60-bit wide main memory. All I/O channels were connected to the PPUs, and not to the main CPU memory. Ordinarily, user code ran only on the CPU, and only the operating system ran PPUs. The only hardware interlock in the machine was the power-up lock which disabled every processor except PPU0 for system initialization. All I/O to/from main memory was conducted by a PPU using a circular buffer in main memory. The protocol was such that it was possible to stream a magnetic tape drive (10.5 inch reel of 0.5 inch tape, 2400 feet long, 556/800/1600 char-per-inch) and have a block size of 30 megabytes [the whole tape, with no inter-record gaps], with a user- mode program in the CPU doing meaningful processing of the data [not just copying]. The usual case with records of a few kilobytes or less [and gaps of a fraction of an inch with no data on the tape, which enabled the drive to start and stop if necessary without losing data] was even easier to stream. |
|
From: Dan K. <da...@ke...> - 2003-08-18 18:33:37
|
John Reiser wrote: >> Well, surely the buffer pointer updates need to be protected, unless >> you're using strictly atomic operations? > > No, the buffer pointer updates do not need to be "protected". As long > as the width of the pointer equals the width of the memory bus [or pointer > is narrower and the memory has natural operations of that width, and the > code uses them], and the storage for the pointer itself is naturally > aligned > [width divides address], then ordinary load and store operations suffice. > This is commonly the case; the significant exceptions are on "8-bit" > (or less) microcontrollers where a pointer is wider than the memory bus. Don't forget the other significant exception, which is on SMP machines. You may need memory barrier instructions to avoid caching problems if variables are used by more than one processor. Semaphores provide the needed memory barriers. I'm sure some pthread stuff does, too. Volatile variables don't, I think (except maybe in Java :-) It's such a pain that I personally use synchronous signal delivery for absolutely everything... - Dan -- Dan Kegel http://www.kegel.com http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045 |
|
From: Steve G <lin...@ya...> - 2003-08-18 19:35:47
|
>Semaphores provide the needed memory barriers. You cannot make any semaphore calls from a signal handler. >I'm sure some pthread stuff does, too. You cannot make any pthread calls either. Earlier you posted a link to the SUS, but if you look at the signal concepts page: http://www.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_04.html#tag_02_04 You will see sem_post() is the only interprocess communication call allowed in a signal handler. Signal handlers are somewhat like multithreaded programming but an entirely different animal. They have to be much simpler and only call the "whitelisted" functions if they call anything at all. -Steve Grubb __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: John R.
|
Dan Kegel wrote: > John Reiser wrote: > >>> Well, surely the buffer pointer updates need to be protected, unless >>> you're using strictly atomic operations? >> >> >> No, the buffer pointer updates do not need to be "protected". [snip] >> the significant exceptions are on "8-bit" >> (or less) microcontrollers where a pointer is wider than the memory bus. > > > Don't forget the other significant exception, which is on SMP > machines. You may need memory barrier instructions to > avoid caching problems if variables are used by more than > one processor. Cache coherency suffices. [If not cache coherent, then there is not _a_ memory: each cache is a different memory.] In one sense, cache coherency is stronger than necessary: as long as a write eventually propagates to the other cache [and becomes the value fetched by a read], then communication succeeds. The delay can affect throughput and latency, but not correctness. Multiple writes to the same control pointer [which are done by only one processor] may even be "optimized" by suppressing all but the most recent one. > > Semaphores provide the needed memory barriers. I'm sure > some pthread stuff does, too. > Volatile variables don't, I think (except maybe in Java :-) > > It's such a pain that I personally use synchronous signal > delivery for absolutely everything... > - Dan > |
|
From: Erik C. <er...@ar...> - 2003-08-18 20:47:08
|
On Mon, Aug 18, 2003 at 01:08:11PM -0700, John Reiser wrote: > Dan Kegel wrote: > > John Reiser wrote: > > > >>> Well, surely the buffer pointer updates need to be protected, unless > >>> you're using strictly atomic operations? > >> > >> > >> No, the buffer pointer updates do not need to be "protected". > [snip] > >> the significant exceptions are on "8-bit" > >> (or less) microcontrollers where a pointer is wider than the memory bus. > > > > > > Don't forget the other significant exception, which is on SMP > > machines. You may need memory barrier instructions to > > avoid caching problems if variables are used by more than > > one processor. > > Cache coherency suffices. No, it's also necessary that the CPU not reorder writes. x86 never does, but I think the Alpha, for example, does unless you put in a memory barrier. See also http://www-1.ibm.com/servers/esdd/articles/power4_mem.html Also, you have to consider the risk that the compiler will reorder writes. It can do this if there is no difference in a single-CPU scenario. To avoid this you need a compiler-level memory barrier. Linux uses: __asm__ __volatile__ ("": : :"memory") #define wmb() __asm__ __volatile__ ("": : :"memory") which is an empty assembler statement that has "clobbering memory" as a side effect. The effect is that gcc can't move a store across it. Probably on other compilers you can achieve the same effect with lower performance by inserting a call to an extern function. See also http://groups.google.com/groups?selm=linux.kernel.200007272355.QAA03148%40pizda.ninka.net -- Erik Corry er...@ar... |
|
From: John R.
|
Erik Corry wrote:
> On Mon, Aug 18, 2003 at 01:08:11PM -0700, John Reiser wrote:
>
>>Dan Kegel wrote:
>>
>>>John Reiser wrote:
>>>
>>>
>>>>>Well, surely the buffer pointer updates need to be protected, unless
>>>>>you're using strictly atomic operations?
>>>>
>>>>
>>>>No, the buffer pointer updates do not need to be "protected".
>>>
>>[snip]
>> >> the significant exceptions are on "8-bit"
>>
>>>>(or less) microcontrollers where a pointer is wider than the memory bus.
>>>
>>>
>>>Don't forget the other significant exception, which is on SMP
>>>machines. You may need memory barrier instructions to
>>>avoid caching problems if variables are used by more than
>>>one processor.
>>
>>Cache coherency suffices.
>
>
> No, it's also necessary that the CPU not reorder writes. x86 never does, but
> I think the Alpha, for example, does unless you put in a memory barrier.
>
No, cache coherency suffices, really! At least for the control pointers;
I agree that a write ordering memory barrier is necessary after writing
the data and before writing the control pointer.
Remember, each processor writes only _one_ control pointer. None of
the memory re-ordering schemes allow re-ordering writes to the _same_
address. If one [the only] processor writes successive values 1, 2, 3, 4, 5
to the _same_ memory address (say, 0x89abc), then it is not legal
for any sequence of reads of that address 0x89abc, by _any_ [one]
processor (including the processor that does the writing), to return
1, 3, 2, 5, 4; or anything except a monotonic subsequence of the
ordered sequence {<some prior value>, 1, 2, 3, 4, 5}.
|
|
From: Steve G <lin...@ya...> - 2003-08-19 01:14:22
|
>No, cache coherency suffices, really! You can't say that without having a particular problem in mind. For example, cache coherency doesn't suffice when I/O is in the mix. I think this thread has strayed way beyond Jeremy's original post. He was asking if it was worthwhile to have something like a Hellegrind race detector. Rather than pontificating the various ways a signal handler can work or not and how to synchronize, why don't we discuss the *merits* of a race detector? I think the sum of all of this thread is that signal handlers can be tricky. There are many that are bad. Should something be done to help locate problems? In one of the first e-mails of this thread, I wrote an algorithm that may help locate those problems. Does it work? Can anyone see a better way to do it? Am I the only one that believes a race detector would be helpful? This is an interesting subject, but I think we are off track. Jeremy's original question was thoughtful and deserves more consideration. Best Regards, -Steve Grubb __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com |
|
From: Nicholas N. <nj...@ca...> - 2003-08-19 07:38:42
|
On Mon, 18 Aug 2003, Steve G wrote: > I think this thread has strayed way beyond Jeremy's > original post. He was asking if it was worthwhile to have > something like a Hellegrind race detector. Hear hear! > Should something be done to help locate problems? In one of > the first e-mails of this thread, I wrote an algorithm that > may help locate those problems. Does it work? Can anyone > see a better way to do it? As Jeremy said, it sounds like it would be difficult to do. His Helgrind/Eraser-like algorithm sounds more suitable. However, adding it would make Crocus much more complicated, and I have enough things on my plate as it is. Someone else is very welcome to have a go at adding the functionality. > Am I the only one that believes a race detector would be helpful? I think you are much more aware of these things than most programmers, due to your experience with daemons, etc. N |