From: Arthur <ajs...@op...> - 2006-11-04 14:34:26
|
Hi folks - Bruce is tied up, I know, trying to get his book to the publisher and will not get to looking to what I am proposing re: numpy compatibility until he gets that behind him. In the meantime --- A side effect of getting myself involved in that process was getting a more serious look at Vpython4.0 beta, which I had been avoiding because of the known bugs. And one look at the lighting_and_texture.py does get one excited. The mouse.getclick() issue - which I first saw manifested in running the dipole.py demo - seems the only deal killer bug. So I have taken a shot at it. After gaining what - given my C++ (lack of) expertise - is only a vague understanding of what is going on in the relevant code, I took a shot - just changing line 185 of the mouseobject.cpp from: shared_ptr<event> ret = events.py_pop(); to shared_ptr<event> ret = events.pop(); which has the effect - in the context of the code - of short-circuiting the running of code that involves the direct manipulation of the Python gil (global interpreter lock) when processing the return value from a mouseclick. And - lo and behold - from the best I can presently determine, the mouse.getclick() crash issues go away. Problem is that I suspect Jonathan had his reasons for doing his Python gil thing, and I cannot determine what unwanted side effects this change might have, e.g. on platforms other than windows. Jonathan is probably the only person in a position to comment here. You there, Jonathan? Art |
From: Jonathan B. <jbr...@ea...> - 2006-11-04 20:53:04
|
On Sat, 2006-11-04 at 09:33 -0500, Arthur wrote: > Hi folks - > > Bruce is tied up, I know, trying to get his book to the publisher and > will not get to looking to what I am proposing re: numpy compatibility > until he gets that behind him. > > In the meantime --- > > A side effect of getting myself involved in that process was getting a > more serious look at Vpython4.0 beta, which I had been avoiding because > of the known bugs. And one look at the lighting_and_texture.py does get > one excited. > > The mouse.getclick() issue - which I first saw manifested in running the > dipole.py demo - seems the only deal killer bug. > > So I have taken a shot at it. > > After gaining what - given my C++ (lack of) expertise - is only a vague > understanding of what is going on in the relevant code, I took a shot - > just changing line 185 of the mouseobject.cpp from: > > shared_ptr<event> ret = events.py_pop(); > > to > > shared_ptr<event> ret = events.pop(); > > which has the effect - in the context of the code - of short-circuiting > the running of code that involves the direct manipulation of the Python > gil (global interpreter lock) when processing the return value from a > mouseclick. > > And - lo and behold - from the best I can presently determine, the > mouse.getclick() crash issues go away. Very interesting. The function of py_pop() vs pop() is related to shutdown. Keep the following rule in mind: The Python interpreter itself is not thread-safe. For any C/C++ code to invoke any python function, including Py_INCREF() and kin, that code must hold the Global Interpreter Lock. In native/plain/pure python multithreaded code, both threads run because each one periodically releases and re-acquires the lock. VPython's rendering thread directly accesses Python objects very rarely, and therefore tries hard to avoid holding the GIL. The rendering of any array-like object (curve, convex, faces) must access the elements of the Numeric array, and shutdown code must tell the interpreter to force any infinite loops to exit. In the case of array access, no Python functions are called because the numeric/numarray wrappers for those particular access functions dove below the published API and accessed the pointers directly. Scenario: - user program calls mouse.get_click() when no events are queued up. This eventually dispatches to events.pop(), which blocks indefinitely until user action through the GUI generates a click event. - Instead of generating a click event, the user is done with the program and clicks on the window-close button. - The window management code closes out the windows and exits the background/worker/renderer thread. As the last action of the outgoing thread, (see src/python/wrap_display_kernel.cpp), force_py_exit() is called. This function adds a "pending call" to the interpreter to call std::exit() (which is also called by sys.exit()). This is necessary to force an infinite loop in the user's code (a very common idiom) to terminate. But - in order to add the pending call, the shutdown code must make a call into the interpreter. Therefore, it must acquire the GIL. Since the interpreter did not release the lock when it called events.pop(), a deadlock occurred. The interpreter thread is waiting for a user event that will never arrive, and the gui thread is waiting for the interpreter to release a lock that it never will. My solution to this problem was to explicitly release the GIL when popping events off any of the event queues (which is what py_pop() does), and periodically wake up to call any pending events (see include/util/atomic_queue.hpp, src/core/util/atomic_queue.cpp). Apparently, the code that makes this happen has a bug in it. Your options are to either fix that code or come up with a new shutdown strategy and implement it. > Problem is that I suspect Jonathan had his reasons for doing his Python > gil thing, and I cannot determine what unwanted side effects this change > might have, e.g. on platforms other than windows. I think the "failure to exit" problem will persist on Windows, although IDLE may attempt to force program termination in a way that I'm not expecting. > Jonathan is probably the only person in a position to comment here. > > You there, Jonathan? Yep. Hope this information helps. -Jonathan |
From: Arthur <ajs...@op...> - 2006-11-05 02:42:55
|
Jonathan - Thanks for the brain dump. Lots to try to digest. But with the little experimentation I have just done, on Windows, I am not seeing a difference in the shutdown behavior when using the *.pyd with my patch that (seems to) avoid the mouse.getclick() crash, and the one in the regular distro of 4.0bets5. In fact when running the Hanoi.py demo, either from SciTE or from IDLE, and shutting the display window, the python process stays alive and I need to force a shutdown. I hadn't noticed this previously because I had been using Textpad using python.exe which in effect opens a DOS window to run python, and I have gotten used to closing the DOS window manually after the display window closes, which must force the python process to shutdown. There is a "Close DOS window on exit" option in Textpad that I generally don't use, since I want to see any error messages generated. But I just toggled it on, and the DOS window does *not* close when closing the visual display, again consistent with the fact that the python process is not exiting. Is anyone finding anything different than what I am finding? Frankly I consider the process end issue a nuisance, but nothing worse, whereas the mouse.getclick() crash is a deal killer for vpython 4.xxx on Windows. Seems to me that if my patch solves that, we should go with it and try to deal with the shutdown issue as a separate matter, which at the moment it does seem to be in any case. All being said from what I am seeing from a Windows-centric viewpoint. Guess I should see if anything changes about this position when I try my patch on Linux, hopefully tomorrow. Art Jonathan Brandmeyer wrote: >On Sat, 2006-11-04 at 09:33 -0500, Arthur wrote: > > >>Hi folks - >> >>Bruce is tied up, I know, trying to get his book to the publisher and >>will not get to looking to what I am proposing re: numpy compatibility >>until he gets that behind him. >> >>In the meantime --- >> >>A side effect of getting myself involved in that process was getting a >>more serious look at Vpython4.0 beta, which I had been avoiding because >>of the known bugs. And one look at the lighting_and_texture.py does get >>one excited. >> >>The mouse.getclick() issue - which I first saw manifested in running the >>dipole.py demo - seems the only deal killer bug. >> >>So I have taken a shot at it. >> >>After gaining what - given my C++ (lack of) expertise - is only a vague >>understanding of what is going on in the relevant code, I took a shot - >>just changing line 185 of the mouseobject.cpp from: >> >>shared_ptr<event> ret = events.py_pop(); >> >>to >> >>shared_ptr<event> ret = events.pop(); >> >>which has the effect - in the context of the code - of short-circuiting >>the running of code that involves the direct manipulation of the Python >>gil (global interpreter lock) when processing the return value from a >>mouseclick. >> >>And - lo and behold - from the best I can presently determine, the >>mouse.getclick() crash issues go away. >> >> > >Very interesting. The function of py_pop() vs pop() is related to >shutdown. > >Keep the following rule in mind: >The Python interpreter itself is not thread-safe. For any C/C++ code to >invoke any python function, including Py_INCREF() and kin, that code >must hold the Global Interpreter Lock. In native/plain/pure python >multithreaded code, both threads run because each one periodically >releases and re-acquires the lock. > >VPython's rendering thread directly accesses Python objects very rarely, >and therefore tries hard to avoid holding the GIL. The rendering of any >array-like object (curve, convex, faces) must access the elements of the >Numeric array, and shutdown code must tell the interpreter to force any >infinite loops to exit. In the case of array access, no Python >functions are called because the numeric/numarray wrappers for those >particular access functions dove below the published API and accessed >the pointers directly. > > >Scenario: >- user program calls mouse.get_click() when no events are queued up. >This eventually dispatches to events.pop(), which blocks indefinitely >until user action through the GUI generates a click event. >- Instead of generating a click event, the user is done with the program >and clicks on the window-close button. >- The window management code closes out the windows and exits the >background/worker/renderer thread. As the last action of the outgoing >thread, (see src/python/wrap_display_kernel.cpp), force_py_exit() is >called. This function adds a "pending call" to the interpreter to call >std::exit() (which is also called by sys.exit()). This is necessary to >force an infinite loop in the user's code (a very common idiom) to >terminate. > >But - in order to add the pending call, the shutdown code must make a >call into the interpreter. Therefore, it must acquire the GIL. Since >the interpreter did not release the lock when it called events.pop(), a >deadlock occurred. The interpreter thread is waiting for a user event >that will never arrive, and the gui thread is waiting for the >interpreter to release a lock that it never will. > >My solution to this problem was to explicitly release the GIL when >popping events off any of the event queues (which is what py_pop() >does), and periodically wake up to call any pending events (see >include/util/atomic_queue.hpp, src/core/util/atomic_queue.cpp). >Apparently, the code that makes this happen has a bug in it. Your >options are to either fix that code or come up with a new shutdown >strategy and implement it. > > > >>Problem is that I suspect Jonathan had his reasons for doing his Python >>gil thing, and I cannot determine what unwanted side effects this change >>might have, e.g. on platforms other than windows. >> >> > >I think the "failure to exit" problem will persist on Windows, although >IDLE may attempt to force program termination in a way that I'm not >expecting. > > > >>Jonathan is probably the only person in a position to comment here. >> >>You there, Jonathan? >> >> > >Yep. Hope this information helps. > >-Jonathan > > > > > |