From: Travis O. <oli...@ie...> - 2006-08-23 18:45:30
|
I'm working on some macros that will allow extensions to be "interruptable" (i.e. with Ctrl-C). The idea came from SAGE but the implementation is complicated by the possibility of threads and making sure to handle clean-up code correctly when the interrupt returns. I'd like to get this in to 1.0 final. Anything needed will not require re-compilation of extension modules built for 1.0b2 however. This will be strictly "extra" and if an extension module doesn't use it there will be no problems. Step 1: Define the interface. Here are a couple of draft proposals. Please comment on them. 1) General purpose interface NPY_SIG_TRY { [code] } NPY_SIG_EXCEPT(signum) { [interrupt handling return] } NPY_SIG_ELSE [normal return] The idea of signum is to hold the signal actually caught. 2) Simpler interface NPY_SIG_TRY { [code] } NPY_SIG_EXCEPT_GOTO(label) [normal return] label: [interrupt handling return] C-extensions often use the notion of a label to handle failure code. If anybody has any thoughts on this, they would be greatly appreciated. Step 2: Implementation. I have the idea to have a single interrupt handler (defined globally in NumPy) that basically uses longjmp to return to the section of code corresponding to the thread that is handling the interrupt. I had thought to use a global variable containing a linked list of jmp_buf structures with a thread-id attached (PyThread_get_thread_ident()) so that the interrupt handler can search it to see if the thread has registered a return location. If it has not, then the intterupt handler will just return normally. In this way a thread that calls setjmpbuf will be sure to return to the correct place when it handles the interrupt. Concern: My thinking is that this mechanism should work whether or not the GIL is held so that we don't have to worry about whether or not the GIL is held except in the interrupt handling case (when Python exceptions are to be set). But, honestly, this gets very confusing. The sigjmp / longjmp mechanism for handling interrupts is not recommended under windows (not sure about mingw), but there we could possibly use Microsoft's __try and __except extension to implement. Initially, it would be "un-implemented" on platforms where it didn't work. Any comments are greatly appreciated -Travis |
From: Perry G. <pe...@st...> - 2006-08-23 21:41:12
|
I thought it might be useful to give a little more context on the problems involved in handling such interruptions. Basically, one doesn't want to exit out of places where data structures are incompletely set up, or memory isn't properly handled so that later references to these don't cause segfaults (or experience memory leaks). There may be more exotic cases but typically many extensions are as simple as: 1) Figure out what inputs one has and the mode of computation needed 2) allocate and setup output arrays 3) do computation, possibly lengthy, over arrays 4) free temporary arrays and other data structures 5) return results Typically, the interrupt handling is needed only for 3, the part that it may spend a very long time in. 1, 2, 4, and 5 are not worth interrupting, and the area that may cause the most trouble. I'd argue that many things could do with a very simple structure where section 3 is bracketed with macros. Something like: NPY_SIG_INTERRUPTABLE [long looping computational code that doesn't create or destroy objects] NPY_SIG_END_INTERRUPTABLE followed by the normal code to do 4 and 5. What happens during an interrupt is the computation code is exited and execution resumes right after the closing macro. Very often one doesn't care that the results in the arrays may be incomplete, or invalid numbers (presumably you know that since you just did control-C, but maybe I'm confused). Any reason that most cases couldn't be handled with something this simple? All cases can't be handled with this, but most should I think. Perry On Aug 23, 2006, at 2:45 PM, Travis Oliphant wrote: > > I'm working on some macros that will allow extensions to be > "interruptable" (i.e. with Ctrl-C). The idea came from SAGE but the > implementation is complicated by the possibility of threads and making > sure to handle clean-up code correctly when the interrupt returns. > > I'd like to get this in to 1.0 final. Anything needed will not > require > re-compilation of extension modules built for 1.0b2 however. This > will > be strictly "extra" and if an extension module doesn't use it there > will > be no problems. > > Step 1: > > Define the interface. Here are a couple of draft proposals. Please > comment on them. > > 1) General purpose interface > > NPY_SIG_TRY { > [code] > } > NPY_SIG_EXCEPT(signum) { > [interrupt handling return] > } > NPY_SIG_ELSE > [normal return] > > The idea of signum is to hold the signal actually caught. > > > 2) Simpler interface > > NPY_SIG_TRY { > [code] > } > NPY_SIG_EXCEPT_GOTO(label) > [normal return] > > label: > [interrupt handling return] > > > C-extensions often use the notion of a label to handle failure code. > > If anybody has any thoughts on this, they would be greatly > appreciated. > > > Step 2: > > Implementation. I have the idea to have a single interrupt handler > (defined globally in NumPy) that basically uses longjmp to return > to the > section of code corresponding to the thread that is handling the > interrupt. I had thought to use a global variable containing a linked > list of jmp_buf structures with a thread-id attached > (PyThread_get_thread_ident()) so that the interrupt handler can search > it to see if the thread has registered a return location. If it has > not, then the intterupt handler will just return normally. In > this way > a thread that calls setjmpbuf will be sure to return to the correct > place when it handles the interrupt. > > Concern: > > My thinking is that this mechanism should work whether or not the > GIL is > held so that we don't have to worry about whether or not the GIL is > held > except in the interrupt handling case (when Python exceptions are > to be > set). But, honestly, this gets very confusing. > > The sigjmp / longjmp mechanism for handling interrupts is not > recommended under windows (not sure about mingw), but there we could > possibly use Microsoft's __try and __except extension to implement. > Initially, it would be "un-implemented" on platforms where it > didn't work. > > Any comments are greatly appreciated > > -Travis > > > > > ---------------------------------------------------------------------- > --- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion |
From: David M. C. <co...@ph...> - 2006-08-23 23:35:56
|
On Wed, 23 Aug 2006 11:45:29 -0700 Travis Oliphant <oli...@ie...> wrote: > > I'm working on some macros that will allow extensions to be > "interruptable" (i.e. with Ctrl-C). The idea came from SAGE but the > implementation is complicated by the possibility of threads and making > sure to handle clean-up code correctly when the interrupt returns. > For writing clean-up code, here's some prior art on adding exceptions to C: http://www.ossp.org/pkg/lib/ex/ (BSD license) http://adomas.org/excc/ (GPL'd, so no good) http://ldeniau.web.cern.ch/ldeniau/html/exception/exception.html (no license given) The last one has functions that allow you to add pointers (and their deallocation functions) to a list so that they can be deallocated when an exception is thrown. (You don't necessarily need something like these libraries, but I thought I'd throw it in here, because it's along the same lines) > Step 2: > > Implementation. I have the idea to have a single interrupt handler > (defined globally in NumPy) that basically uses longjmp to return to the > section of code corresponding to the thread that is handling the > interrupt. I had thought to use a global variable containing a linked > list of jmp_buf structures with a thread-id attached > (PyThread_get_thread_ident()) so that the interrupt handler can search > it to see if the thread has registered a return location. If it has > not, then the intterupt handler will just return normally. In this way > a thread that calls setjmpbuf will be sure to return to the correct > place when it handles the interrupt. Signals and threads don't mix well at *all*. With POSIX semantics, synchronous signals (ones caused by the thread itself) should be sent to the handler for that thread. Asynchronous ones (like SIGINT for Ctrl-C) will be sent to an *arbitrary* thread. (Apple, for instance, doesn't make any guarantees on which thread gets it: http://developer.apple.com/qa/qa2001/qa1184.html) Best way I can see this is to have a SIGINT handler installed that sets a global variable, and check that every so often. It's such a good way that Python already does this -- Parser/intrcheck.c sets the handler, and you can use PyOS_InterruptOccurred() to check if one happened. So something like while (long running loop) { if (PyOS_InterruptOccurred()) goto error: ... useful stuff ... } error: This could be abstracted to a set of macros (with Perry's syntax): NPY_SIG_INTERRUPTABLE while (long loop) { NPY_CHECK_SIGINT; .. more stuff .. } NPY_SIG_END_INTERRUPTABLE where NPY_CHECK_SIGINT would do a longjmp(). Or come up with a good (fast) way to run stuff in another process :-) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
From: David C. <da...@ar...> - 2006-08-24 03:04:50
|
David M. Cooke wrote: > On Wed, 23 Aug 2006 11:45:29 -0700 > Travis Oliphant <oli...@ie...> wrote: > >> I'm working on some macros that will allow extensions to be >> "interruptable" (i.e. with Ctrl-C). The idea came from SAGE but the >> implementation is complicated by the possibility of threads and making >> sure to handle clean-up code correctly when the interrupt returns. >> > This is funny, I was just thinking about that yesterday. This is a major problem when writing C extensions in matlab (the manual says use the matlab allocator instead of malloc/new/whatever, but when you call a library, you cannot do that...). > > Best way I can see this is to have a SIGINT handler installed that sets a > global variable, and check that every so often. It's such a good way that > Python already does this -- Parser/intrcheck.c sets the handler, and you can > use PyOS_InterruptOccurred() to check if one happened. So something like This is the way I do it when writing extension under matlab. I am by no means knowledgeable about those kind of things, but this is the simplest solution I came up with so far. I would guess that because it uses one global variable, it should not matter which thread receives the signal ? > > while (long running loop) { > if (PyOS_InterruptOccurred()) goto error: > ... useful stuff ... > } > error: > > This could be abstracted to a set of macros (with Perry's syntax): > > NPY_SIG_INTERRUPTABLE > while (long loop) { > NPY_CHECK_SIGINT; > .. more stuff .. > } > NPY_SIG_END_INTERRUPTABLE > > where NPY_CHECK_SIGINT would do a longjmp(). Is there really a need for a longjmp ? What I simply do in this case is checking the global variable, and if its value changes, goto to the normal error handling. Let's say you have already a good error handling in your function, as Travis described in his email: status = do_stuff(); if (status < 0) { goto cleanup; } Then, to handle sigint, you need a global variable got_sigint which is modified by the signal handler, and check its value (the exact type of this variable is platform specific; on linux, I am using volatile sig_atomic_t, as recommeded by the GNU C doc):: /* status is 0 if everything is OK */ status = do_stuff(); if (status < 0) { goto cleanup; } sigprocmask (SIG_BLOCK, &block_sigint, NULL); if (got_sigint) { got_sigint = 0; goto cleanup; } sigprocmask (SIG_UNBLOCK, &block_sigint, NULL); So the error handling does not be modified, and no longjmp is needed ? Or maybe I don't understand what you mean. I think the case proposer by Perry is too restrictive: it is really common to use external libraries which we do not know whether they use memory allocation inside the processing, and there is a need to clean that too. > > Or come up with a good (fast) way to run stuff in another process :-) > This sounds a bit overkill, and a pain to implement for different platforms ? The checking of signals should be fast, but it has a cost (you have to use a branch) which prevents is from being called to often inside a loop, for example. David |
From: Travis O. <oli...@ee...> - 2006-08-24 22:38:47
|
David Cournapeau wrote: >>>I'm working on some macros that will allow extensions to be >>>"interruptable" (i.e. with Ctrl-C). The idea came from SAGE but the >>>implementation is complicated by the possibility of threads and making >>>sure to handle clean-up code correctly when the interrupt returns. >>> >>> >>> >This is funny, I was just thinking about that yesterday. This is a major >problem when writing C extensions in matlab (the manual says use the >matlab allocator instead of malloc/new/whatever, but when you call a >library, you cannot do that...). > > I'm glad many people are thinking about it. There is no reason we can't have a few ways to handle the situation. Currently in SVN, the simple NPY_SIGINT_ON [code] NPY_SIGINT_OFF approach is implemented (for platforms with sigsetjmp/siglongjmp). You can already use the approach suggested: if (PyOS_InterruptOccurred()) goto error to handle interrupts. The drawback of this approach is that the loop executes more slowly because a check for the interrupt occurs many times in the loop which costs time. The advantage is that it may work with threads (I'm not clear on whether or not PyOS_InterruptOccurred can be called without the GIL, though). >I think the case proposer by Perry is too restrictive: it is really >common to use external libraries which we do not know whether they use >memory allocation inside the processing, and there is a need to clean >that too. > > If nothing is known about memory allocation of the external library, then I don't see how it can be safely interrupted using any mechanism. What is available now is sufficient. I played far too long with how to handle threads, but was not able to come up with a solution, so for now I've punted. -Travis |
From: David M. C. <co...@ph...> - 2006-08-24 23:42:33
|
On Aug 24, 2006, at 18:38 , Travis Oliphant wrote: > > You can already use the approach suggested: > > if (PyOS_InterruptOccurred()) goto error > > to handle interrupts. The drawback of this approach is that the loop > executes more slowly because a check for the interrupt occurs many > times > in the loop which costs time. > > The advantage is that it may work with threads (I'm not clear on > whether > or not PyOS_InterruptOccurred can be called without the GIL, though). It should be; it's pure C code: int PyOS_InterruptOccurred(void) { if (!interrupted) return 0; interrupted = 0; return 1; } (where interrupted is a static int). -- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
From: Travis O. <oli...@ee...> - 2006-08-25 00:11:11
|
David M. Cooke wrote: >On Aug 24, 2006, at 18:38 , Travis Oliphant wrote: > > > >>You can already use the approach suggested: >> >>if (PyOS_InterruptOccurred()) goto error >> >>to handle interrupts. The drawback of this approach is that the loop >>executes more slowly because a check for the interrupt occurs many >>times >>in the loop which costs time. >> >>The advantage is that it may work with threads (I'm not clear on >>whether >>or not PyOS_InterruptOccurred can be called without the GIL, though). >> >> > >It should be; it's pure C code: > >int >PyOS_InterruptOccurred(void) >{ > if (!interrupted) > return 0; > interrupted = 0; > return 1; >} > > I tried to test this with threads using the following program and it doesn't seem to respond to interrupts. import threading import numpy.core.multiarray as ncm class mythread(threading.Thread): def run(self): print "Starting thread", self.getName() ncm.test_interrupt(1) print "Ending thread", self.getName() m1 = mythread() m2 = mythread() m1.start() m2.start() |
From: David C. <da...@ar...> - 2006-08-25 03:04:38
|
Travis Oliphant wrote: > I'm glad many people are thinking about it. There is no reason we > can't have a few ways to handle the situation. > > Currently in SVN, the simple > > NPY_SIGINT_ON > [code] > NPY_SIGINT_OFF > > approach is implemented (for platforms with sigsetjmp/siglongjmp). > > You can already use the approach suggested: > > if (PyOS_InterruptOccurred()) goto error > > to handle interrupts. The drawback of this approach is that the loop > executes more slowly because a check for the interrupt occurs many times > in the loop which costs time. > I am not sure whether there are other solutions... This is the way I saw signal handling done in common programs when I looked for a solution for my matlab extensions. > The advantage is that it may work with threads (I'm not clear on whether > or not PyOS_InterruptOccurred can be called without the GIL, though). > > >> I think the case proposer by Perry is too restrictive: it is really >> common to use external libraries which we do not know whether they use >> memory allocation inside the processing, and there is a need to clean >> that too. >> >> >> > > If nothing is known about memory allocation of the external library, > then I don't see how it can be safely interrupted using any mechanism. > If the library does nothing w.r.t signals, then you just have to clean all the things related to the library once you caught a signal. This is no different than cleaning your own code. Actually, cleaning libraries is the main reason why I implemented this signal scheme in matlab extensions, since they cannot use the matlab memory allocator, and because they live in the same memory space, calling several times the same extension can corrupt really quickly most of matlab memory space. Maybe there are some problems I am not aware of ? David |
From: Travis O. <oli...@ie...> - 2006-08-25 03:20:42
|
David Cournapeau wrote: >>> >>> >> If nothing is known about memory allocation of the external library, >> then I don't see how it can be safely interrupted using any mechanism. >> >> > If the library does nothing w.r.t signals, then you just have to clean > all the things related to the library once > you caught a signal. This is no different than cleaning your own code. > Right, as long as you know what to do you are O.K. I was just thinking about a hypothetical situation where the library allocated some temporary memory that it was going to free at the end of the subroutine but then an interrupt jumped out back to your code before it could finish. In a case like this, you would have to use the "check if interrupt has occurred" approach before and after the library call. But, then that library call is not interruptable. I could also see wanting to be able to interrupt a library calculation when you know it isn't allocating memory. So, I like having both possibilities available. So far we haven't actually put anything in the numpy code itself. I'm leaning to putting PyOS_InterruptOccurred-style checks in a few places at some point down the road. -Travis |
From: David C. <da...@ar...> - 2006-08-25 04:32:38
|
Travis Oliphant wrote: > > Right, as long as you know what to do you are O.K. I was just thinking > about a hypothetical situation where the library allocated some > temporary memory that it was going to free at the end of the subroutine > but then an interrupt jumped out back to your code before it could > finish. In a case like this, you would have to use the "check if > interrupt has occurred" approach before and after the library call. Indeed. By the way, I tried something for python.thread + signals. This is posix specific, and it works as expected on linux: - first, a C extension which implements the signal handling. It has a function called hello, which is the entry point of the C module, and calls the function process (which does random computation). It checks if it got a SIGINT signal, and returns -1 if caught. Returns 0 if no SIGINT called: - extension compiled into python module (I used boost python because I am too lazy to find how to do it in C :) ) - python script which creates several threads running the hello function. They run in parallel, and ctrl+C is correctly handled. I think this is signal specific, and this needs to be improved (this is just meant as a toy example): import threading import hello import time class mythread(threading.Thread): def __init__(self): threading.Thread.__init__(self) def run(self): print "Starting thread", self.getName() st = 0 while st == 0: st = hello.foo(self.getName()) # sleep to force the python interpreter to run # other threads if available time.sleep(1) if st == -1: print self.getName() + " got signal" print "Ending thread", self.getName() nthread = 5 t = [mythread() for i in range(nthread)] [i.start() for i in t] Then, you have something like: tarting thread Thread-1 Thread-1 processing... done clean called Starting thread Thread-5 Thread-5 processing... done clean called Starting thread Thread-3 Thread-3 processing... done clean called Starting thread Thread-2 Thread-2 processing... done hello.c:hello signal caught line 56 for thread Thread-2 clean called Thread-1 processing... done clean called Starting thread Thread-4 Thread-4 processing... done clean called Thread-5 processing... done clean called Thread-3 processing... done hello.c:hello signal caught line 56 for thread Thread-3 clean called Thread-2 got signal Ending thread Thread-2 Thread-1 processing... done clean called Thread-4 processing... done clean called Thread-5 processing... done clean called Thread-3 got signal Ending thread Thread-3 Thread-1 processing... done hello.c:hello signal caught line 56 for thread Thread-1 clean called Thread-4 processing... done clean called Thread-5 processing... done hello.c:hello signal caught line 56 for thread Thread-5 clean called Thread-1 got signal Ending thread Thread-1 Thread-4 processing... done clean called Thread-5 got signal Ending thread Thread-5 Thread-4 processing... done clean called Thread-4 processing... done clean called Thread-4 processing... done hello.c:hello signal caught line 56 for thread Thread-4 clean called Thread-4 got signal Ending thread Thread-4 (SIGINT are received when Ctrl+C on linux) You can find all sources here: http://www.ar.media.kyoto-u.ac.jp/members/david/numpysig/ Please note that I know almost nothing about all this stuff, I just naively implemented from the example of GNU C library, and it always worked for me on matlab on my machine. I do not know if this is portable, if this can work for other signals, etc... David |
From: Travis O. <oli...@ie...> - 2006-08-25 06:10:23
|
David Cournapeau wrote: > Indeed. > > By the way, I tried something for python.thread + signals. This is posix > specific, and it works as expected on linux: > Am I right that this could this be accomplished simply by throwing away all the interrupt handling stuff in the code and checking for PyOS_InterruptOccurred() in the place where you check for the global variable that your signal handler uses? Your signal handler does essentially what Python's signal handler already does, if I'm not mistaken. -Travis |
From: David C. <da...@ar...> - 2006-08-25 10:06:12
|
Travis Oliphant wrote: > David Cournapeau wrote: >> Indeed. >> >> By the way, I tried something for python.thread + signals. This is posix >> specific, and it works as expected on linux: >> > Am I right that this could this be accomplished simply by throwing away > all the interrupt handling stuff in the code and checking for > PyOS_InterruptOccurred() in the place where you check for the global > variable that your signal handler uses? Your signal handler does > essentially what Python's signal handler already does, if I'm not mistaken. I don't know how the python signal handler works, but I believe it should do more or less the same, indeed. The key idea is that it is important to mask other signals related to interrupting. To have a relatively clear view on this, if you have not seen it, you may take a look at the gnu C doc on signal handling: http://www.gnu.org/software/libc/manual/html_node/Defining-Handlers.html#Defining-Handlers After having given some thought, I am wondering about what exactly we are trying to do: - the main problem is to be able to interrupt some which may take a long time to compute, without corrupting the whole python process. - for that, those function need to be able to trap the usual signals corresponding to interrupt (SIGINT, etc... on Unix, equivalents on windows). There are two ways to handle a signal: - check regularly some global (that is, global to the whole process) value, and if change this value if a signal is trapped. That's the easier way, but this is not thread safe as I first thought (I will code an example if I have time). - the signal handler jumps to an other point of the program where cleaning is done: this is more complicated, and I am not sure we need the complication (I have never used this scheme, so I may just miss the point totally). I don't even want to think how it works in multi-threading environment :) Now, the threading issue came in, and I am not sure why we need to care: this is a problem if numpy is implemented in a multi-thread way, but I don't believe it to be the case, right ? An other solution, which is used I think in more sophisticated programs, is having one thread with high priority, which only job is to detect signals, and to mask all signals in all other threads. Again, this seems overkill (and highly non portable) ? And this should be the python interpreter job, no ? Actually, as this is a generic problem for any python extension code, other really smart people should have thought about that... If I am interpreting correctly what is said here http://docs.python.org/lib/module-signal.html, I believe that what you suggest (using PyOS_InterruptOccurred() at some points) is what shall be done: the python interpreter is making sure that the signal is send to the main thread, that is the thread where numpy is executed (that's my understanding on the way python interpreter works, not a fact). David |