From: M.-A. L. <ma...@eg...> - 2009-03-16 21:14:42
|
It appears as if the new thread lock support in pyOpenSSL 0.8.0 is causing serious problems with Python and threads. We have observed regular core dumps using pyOpenSSL 0.8 which did not happen with 0.7. There's a patch available on SF which appears to fix the problem: http://sourceforge.net/tracker/index.php?func=detail&aid=2543118&group_id=31249&atid=401760 Any chance of getting that into a new release ? As it stands, version 0.8.0 is not really usable in multi-threaded applications. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 16 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
From: Jean-Paul C. <ex...@di...> - 2009-03-16 23:43:39
|
On Mon, 16 Mar 2009 21:43:49 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >It appears as if the new thread lock support in pyOpenSSL 0.8.0 is >causing serious problems with Python and threads. We have observed >regular core dumps using pyOpenSSL 0.8 which did not happen with >0.7. Ouch. >There's a patch available on SF which appears to fix the problem: > >http://sourceforge.net/tracker/index.php?func=detail&aid=2543118&group_id=31249&atid=401760 > >Any chance of getting that into a new release ? A quick skim of the patch leads me to believe it largely reverts one of the bug fixes I made in 0.8.0 which was causing thread-related crashes in 0.7.0. https://bugs.launchpad.net/pyopenssl/+bug/262381 includes some discussion of that issue. In the light of my final comment on that ticket, it may be that the new failure is much worse than the old failure. What would make things much easier for me is if someone could provide a test case which demonstrates the current bug. The patch which "fixes" the current problem looks simple enough (it's mostly noise - it could be 10 lines long instead of a hundred), but since I think it re-introduces another thread bug, I probably would prefer /not/ to apply it exactly, but to work out a solution which preserves the old fix and eliminates the new bug. If a unit test is too hard, then something along the lines of leakcheck/thread-crash.py would also be useful. If that's too hard, an explanation of why the current code is wrong would help. >As it stands, version 0.8.0 is not really usable in multi-threaded >applications. It seems that 0.7.0 wasn't really either. And I suspect that without various vendor-supplied patches that didn't make it upstream until 0.8.0, neither were any of the releases before that one either. :/ Having to revisit this part of pyOpenSSL again, I'm tempted to go for a solution I previously ruled out for performance reasons - explicitly tag each OpenSSL object wrapper (for those OpenSSL structures which can only be used in one thread) with a thread id and before each operation, check to make sure the current thread matches that thread id. This is similar to the approach taken by SQLite3, I think. It should provide the most robust protection against thread issues, but at the cost of a thread check per method. I'm hoping to have 0.9.0 out soon. Before this issue was raised, I was only blocked on some issues building Windows packages. I suppose it would be good to include some resolution to this issue in the next release though. If anyone will be at PyCon and wants to help out, let me know. Jean-Paul |
From: M.-A. L. <ma...@eg...> - 2009-03-17 11:37:18
|
On 2009-03-17 00:28, Jean-Paul Calderone wrote: > On Mon, 16 Mar 2009 21:43:49 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >> It appears as if the new thread lock support in pyOpenSSL 0.8.0 is >> causing serious problems with Python and threads. We have observed >> regular core dumps using pyOpenSSL 0.8 which did not happen with >> 0.7. > > Ouch. > >> There's a patch available on SF which appears to fix the problem: >> >> http://sourceforge.net/tracker/index.php?func=detail&aid=2543118&group_id=31249&atid=401760 >> >> Any chance of getting that into a new release ? > > A quick skim of the patch leads me to believe it largely reverts one of > the bug fixes I made in 0.8.0 which was causing thread-related crashes > in 0.7.0. https://bugs.launchpad.net/pyopenssl/+bug/262381 includes > some discussion of that issue. In the light of my final comment on > that ticket, it may be that the new failure is much worse than the old > failure. The old code wasn't really protecting the programmer against core dumps caused by misusing OpenSSL (e.g. sharing SSL connections between threads), but it also did not get in the way of multi-threaded programs which did implement the correct way of not sharing connections. However, with the 0.8.0 approach, SSL connections started getting in the way of perfectly fine working applications. At least in our application we also had the problem of not being able to reproduce the problem easily - it occurred seemingly random and the core dumps did not relate to the SSL code either. > What would make things much easier for me is if someone could provide a > test case which demonstrates the current bug. The patch which "fixes" > the current problem looks simple enough (it's mostly noise - it could > be 10 lines long instead of a hundred), but since I think it re-introduces > another thread bug, I probably would prefer /not/ to apply it exactly, > but to work out a solution which preserves the old fix and eliminates > the new bug. > > If a unit test is too hard, then something along the lines of > leakcheck/thread-crash.py would also be useful. > > If that's too hard, an explanation of why the current code is wrong > would help. I think you should not try to unroll the Py_BEGIN_ALLOW_THREADS ...Do some blocking I/O operation... Py_END_ALLOW_THREADS blocks and just use those macros in the code. Storing the thread state in some application data object is not a good approach, since there's nothing preventing some other thread from restoring the thread state saved by the thread that just released the lock. By contrast, the above macros will add a local stack variable and store the thread state there, so it's not possible for another thread to accidentally restore a thread state that it doesn't own. See http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock IMHO, programmers should just be made aware of the fact that OpenSSL connections should not be shared among threads - just like the DB-API strongly suggests to not share database connections between threads, but doesn't try to prevent this. >> As it stands, version 0.8.0 is not really usable in multi-threaded >> applications. > > It seems that 0.7.0 wasn't really either. And I suspect that without > various vendor-supplied patches that didn't make it upstream until > 0.8.0, neither were any of the releases before that one either. :/ See above. 0.7 worked just fine for applications using the correct implementation approach. > Having to revisit this part of pyOpenSSL again, I'm tempted to go for > a solution I previously ruled out for performance reasons - explicitly > tag each OpenSSL object wrapper (for those OpenSSL structures which > can only be used in one thread) with a thread id and before each > operation, check to make sure the current thread matches that thread id. > This is similar to the approach taken by SQLite3, I think. It should > provide the most robust protection against thread issues, but at the > cost of a thread check per method. I wouldn't go down that road... most Python C extensions simply use the above macro pair around blocking operations and don't bother with any further tricks to work around issues in the underlying C library code. > I'm hoping to have 0.9.0 out soon. Before this issue was raised, I > was only blocked on some issues building Windows packages. I suppose > it would be good to include some resolution to this issue in the > next release though. If anyone will be at PyCon and wants to help out, > let me know. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 17 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
From: Jean-Paul C. <ex...@di...> - 2009-03-17 14:59:58
|
On Tue, 17 Mar 2009 12:36:48 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >On 2009-03-17 00:28, Jean-Paul Calderone wrote: >> On Mon, 16 Mar 2009 21:43:49 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >>> It appears as if the new thread lock support in pyOpenSSL 0.8.0 is >>> causing serious problems with Python and threads. We have observed >>> regular core dumps using pyOpenSSL 0.8 which did not happen with >>> 0.7. >> >> Ouch. >> >>> There's a patch available on SF which appears to fix the problem: >>> >>> http://sourceforge.net/tracker/index.php?func=detail&aid=2543118&group_id=31249&atid=401760 >>> >>> Any chance of getting that into a new release ? >> >> A quick skim of the patch leads me to believe it largely reverts one of >> the bug fixes I made in 0.8.0 which was causing thread-related crashes >> in 0.7.0. https://bugs.launchpad.net/pyopenssl/+bug/262381 includes >> some discussion of that issue. In the light of my final comment on >> that ticket, it may be that the new failure is much worse than the old >> failure. > >The old code wasn't really protecting the programmer against >core dumps caused by misusing OpenSSL (e.g. sharing SSL connections >between threads), but it also did not get in the way of multi-threaded >programs which did implement the correct way of not sharing connections. > >However, with the 0.8.0 approach, SSL connections started getting >in the way of perfectly fine working applications. At least in our >application we also had the problem of not being able to reproduce >the problem easily - it occurred seemingly random and the core dumps >did not relate to the SSL code either. Isn't this just how it goes with threading bugs? ;) You're right, though, it's worse now than it was. > [snip] > >I think you should not try to unroll the > >Py_BEGIN_ALLOW_THREADS >...Do some blocking I/O operation... >Py_END_ALLOW_THREADS > >blocks and just use those macros in the code. pyOpenSSL has always done this. It's necessary if Python-defined callbacks are going to be supported, because the GIL must be re-acquired in the callback. This means thread state structured needs to be available in the callback, which in turn means Py_BEGIN_ALLOW_THREADS isn't usable. It's possible that PyGILState_Ensure and PyGILState_Release would be useful here, but I'm not sure. I need to investigate further. >Storing the thread state in some application data object is not a >good approach, since there's nothing preventing some other thread >from restoring the thread state saved by the thread that just >released the lock. > >By contrast, the above macros will add a local stack variable and >store the thread state there, so it's not possible for another >thread to accidentally restore a thread state that it doesn't >own. Right. That's why I changed pyOpenSSL /from/ doing that (prior to 0.8) to using thread local storage instead (0.8). However, it seems there's some bug in the thread local storage version of the locking scheme. >See >http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock > >IMHO, programmers should just be made aware of the fact that OpenSSL >connections should not be shared among threads - just like the DB-API >strongly suggests to not share database connections between threads, >but doesn't try to prevent this. > >>> As it stands, version 0.8.0 is not really usable in multi-threaded >>> applications. >> >> It seems that 0.7.0 wasn't really either. And I suspect that without >> various vendor-supplied patches that didn't make it upstream until >> 0.8.0, neither were any of the releases before that one either. :/ > >See above. > >0.7 worked just fine for applications using the correct >implementation approach. Right. I meant that 0.6 and earlier would crash randomly when pyOpenSSL APIs were being used from more than one thread. 0.7 is probably the most stable version of pyOpenSSL wrt threading. >> Having to revisit this part of pyOpenSSL again, I'm tempted to go for >> a solution I previously ruled out for performance reasons - explicitly >> tag each OpenSSL object wrapper (for those OpenSSL structures which >> can only be used in one thread) with a thread id and before each >> operation, check to make sure the current thread matches that thread id. >> This is similar to the approach taken by SQLite3, I think. It should >> provide the most robust protection against thread issues, but at the >> cost of a thread check per method. > >I wouldn't go down that road... most Python C extensions simply use the >above macro pair around blocking operations and don't bother with >any further tricks to work around issues in the underlying C library >code. I'll think about that. I'd prefer for there to be no way to crash the interpreter using pyOpenSSL, though. Probably which decision I make will depend in part on what the actual problem with the thread local storage approach in 0.8 is. Jean-Paul |
From: M.-A. L. <ma...@eg...> - 2009-03-17 18:57:17
|
On 2009-03-17 15:59, Jean-Paul Calderone wrote: > On Tue, 17 Mar 2009 12:36:48 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >> I think you should not try to unroll the >> >> Py_BEGIN_ALLOW_THREADS >> ...Do some blocking I/O operation... >> Py_END_ALLOW_THREADS >> >> blocks and just use those macros in the code. > > pyOpenSSL has always done this. It's necessary if Python-defined > callbacks are going to be supported, because the GIL must be re-acquired > in the callback. This means thread state structured needs to be available > in the callback, which in turn means Py_BEGIN_ALLOW_THREADS isn't usable. > > It's possible that PyGILState_Ensure and PyGILState_Release would be > useful here, but I'm not sure. I need to investigate further. Ah, callbacks... evil stuff :-) You're right. Callbacks don't work with the above approach. >> Storing the thread state in some application data object is not a >> good approach, since there's nothing preventing some other thread >>from restoring the thread state saved by the thread that just >> released the lock. >> >> By contrast, the above macros will add a local stack variable and >> store the thread state there, so it's not possible for another >> thread to accidentally restore a thread state that it doesn't >> own. > > Right. That's why I changed pyOpenSSL /from/ doing that (prior to 0.8) > to using thread local storage instead (0.8). However, it seems there's > some bug in the thread local storage version of the locking scheme. Perhaps it's related to this notice of caution in thread.c: """ Use PyThread_set_key_value(thekey, value) to associate void* value with thekey in the current thread. Each thread has a distinct mapping of thekey to a void* value. Caution: if the current thread already has a mapping for thekey, value is ignored. """ If value is ignored, MY_BEGIN_ALLOW_THREADS() won't save the current thread state, but leave the old one in place. It's also possible for this API to fail in case of a memory issue, but the macro doesn't check for this, so the application may end up not storing anything at all. MY_END_ALLOW_THREADS() needs to remove the stored TLS value in order to make room for the next MY_BEGIN_ALLOW_THREADS() call. Note that I'm not sure using TLS is a good idea: the implementations are highly platform dependent and may have undocumented/unwanted side-effects. IMHO, it's safer to store the thread id together with the tstate itself in the context and then check whether the thread id matches before restoring the state. This has the additional benefit of making some form of error raising possible in order to inform the programmer of the obvious mistake in using the connections from multiple threads, even if it's just a plain fprintf(stderr, ...). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 17 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
From: Jean-Paul C. <ex...@di...> - 2009-03-17 19:08:06
|
On Tue, 17 Mar 2009 19:57:09 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >On 2009-03-17 15:59, Jean-Paul Calderone wrote: > [snip] > >Perhaps it's related to this notice of caution in thread.c: > >""" >Use PyThread_set_key_value(thekey, value) to associate void* value with >thekey in the current thread. Each thread has a distinct mapping of thekey >to a void* value. Caution: if the current thread already has a mapping >for thekey, value is ignored. >""" Oh, boy. Yes, that could be it. I assumed it was more like setting a key in a Python dictionary. >If value is ignored, MY_BEGIN_ALLOW_THREADS() won't save the current >thread state, but leave the old one in place. So only the first MY_BEGIN_ALLOW_THREADS per thread does anything. That could break things, I guess. :) > >It's also possible for this API to fail in case of a memory issue, >but the macro doesn't check for this, so the application may end >up not storing anything at all. So the macros should do error checking as well (assuming it makes sense to continue to use TLS at all). >MY_END_ALLOW_THREADS() needs to remove the stored TLS value in order >to make room for the next MY_BEGIN_ALLOW_THREADS() call. > >Note that I'm not sure using TLS is a good idea: the implementations >are highly platform dependent and may have undocumented/unwanted >side-effects. Can you give some examples? The only thing I'm aware of is that performance may differ wildly from platform to platform, but I thought it was pretty good on all mainstream platforms these days. >IMHO, it's safer to store the thread id together with the tstate >itself in the context and then check whether the thread id matches >before restoring the state. > >This has the additional benefit of making some form of error >raising possible in order to inform the programmer of the obvious >mistake in using the connections from multiple threads, >even if it's just a plain fprintf(stderr, ...). That sounds essentially like the idea I mentioned in my earlier reply (the one I compared to SQLite). I see how all of that logic could be wrapped up in the MY_BEGIN/END_ALLOW_THREADS macro now, so I'm even more inclined to go that route now. Tests or scripts that produce failures would still be very helpful. Clearly the test suite doesn't exercise the buggy codepaths currently, and it seems neither does thread-crash.py (probably because it chokes so quickly due to its misuse of Connections in multiple threads). If I can reproduce the problem others are seeing, I'll be able to tell if a particular change actually fixes it. Otherwise I'm just guessing. :) Jean-Paul |
From: Gustavo M. <gmo...@gm...> - 2009-03-17 21:20:24
Attachments:
test-crash-client.py
test-crash-server.py
|
I could reproduce the crash. This happens only (at least in my case) in Win32 platform, I tried it on linux and does not occur. Just add a couple of certificates in the test-crash-server.py, class handler_class, method __init__. Run test-crash-server.py and then test-crash-client.py, and in seconds python crashes!. Gustavo. On 03/17/2009, Jean-Paul Calderone <ex...@di...> wrote: > > On Tue, 17 Mar 2009 19:57:09 +0100, "M.-A. Lemburg" <ma...@eg...> > wrote: > >On 2009-03-17 15:59, Jean-Paul Calderone wrote: > > > [snip] > > > > >Perhaps it's related to this notice of caution in thread.c: > > > >""" > >Use PyThread_set_key_value(thekey, value) to associate void* value with > >thekey in the current thread. Each thread has a distinct mapping of > thekey > >to a void* value. Caution: if the current thread already has a mapping > >for thekey, value is ignored. > >""" > > > Oh, boy. Yes, that could be it. I assumed it was more like setting a key > in a Python dictionary. > > > >If value is ignored, MY_BEGIN_ALLOW_THREADS() won't save the current > >thread state, but leave the old one in place. > > > So only the first MY_BEGIN_ALLOW_THREADS per thread does anything. That > could break things, I guess. :) > > > > > >It's also possible for this API to fail in case of a memory issue, > >but the macro doesn't check for this, so the application may end > >up not storing anything at all. > > > So the macros should do error checking as well (assuming it makes sense > to continue to use TLS at all). > > > >MY_END_ALLOW_THREADS() needs to remove the stored TLS value in order > >to make room for the next MY_BEGIN_ALLOW_THREADS() call. > > > >Note that I'm not sure using TLS is a good idea: the implementations > >are highly platform dependent and may have undocumented/unwanted > >side-effects. > > > Can you give some examples? The only thing I'm aware of is that > performance may differ wildly from platform to platform, but I thought > it was pretty good on all mainstream platforms these days. > > > >IMHO, it's safer to store the thread id together with the tstate > >itself in the context and then check whether the thread id matches > >before restoring the state. > > > >This has the additional benefit of making some form of error > >raising possible in order to inform the programmer of the obvious > >mistake in using the connections from multiple threads, > >even if it's just a plain fprintf(stderr, ...). > > > That sounds essentially like the idea I mentioned in my earlier reply > (the one I compared to SQLite). I see how all of that logic could be > wrapped up in the MY_BEGIN/END_ALLOW_THREADS macro now, so I'm even > more inclined to go that route now. > > Tests or scripts that produce failures would still be very helpful. > Clearly the test suite doesn't exercise the buggy codepaths currently, > and it seems neither does thread-crash.py (probably because it chokes > so quickly due to its misuse of Connections in multiple threads). > > If I can reproduce the problem others are seeing, I'll be able to tell > if a particular change actually fixes it. Otherwise I'm just guessing. :) > > > Jean-Paul > > > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > _______________________________________________ > pyopenssl-list mailing list > pyo...@li... > https://lists.sourceforge.net/lists/listinfo/pyopenssl-list > |
From: Jean-Paul C. <ex...@di...> - 2009-03-18 13:50:59
|
On Tue, 17 Mar 2009 18:19:54 -0300, Gustavo Moreira <gmo...@gm...> wrote: >I could reproduce the crash. This happens only (at least in my case) in >Win32 platform, I tried it on linux and does not occur. >Just add a couple of certificates in the test-crash-server.py, class >handler_class, method __init__. Run test-crash-server.py and then >test-crash-client.py, and in seconds python crashes!. > Hi Gustavo, This is great. I can reproduce the crash with this client/server pair. Thanks very much for providing them. :) I've filed a bug on Launchpad and attached them to it: https://bugs.launchpad.net/pyopenssl/+bug/344815 I'll try to resolve this soon and include it in the 0.9 release. Jean-Paul |
From: Adrian M. <ad...@co...> - 2009-03-18 14:52:59
|
Jean-Paul Calderone wrote: > On Tue, 17 Mar 2009 18:19:54 -0300, Gustavo Moreira <gmo...@gm...> wrote: > >> I could reproduce the crash. This happens only (at least in my case) in >> Win32 platform, I tried it on linux and does not occur. >> Just add a couple of certificates in the test-crash-server.py, class >> handler_class, method __init__. Run test-crash-server.py and then >> test-crash-client.py, and in seconds python crashes!. >> >> Hi again :) just for the record, we've found with Gustavo that this bug is sort of platform dependent. In our case, local thread dictionary is implemented with a linked list, using a key/threadId as key of the dictionary ( you can see it in thread.c from your python source code ). The problem appears when thread Ids are reused as it happens on win32 platform. The testcase Gustavo sent, creates a thread per connection on the server side, which produces at some point a thread with id used by some previous death thread. On linux platform thread ids have less probability of being reused, that's why the bug didn't trigger very quickly. cheers a/ |
From: M.-A. L. <ma...@eg...> - 2009-03-18 20:39:07
|
On 2009-03-17 19:57, M.-A. Lemburg wrote: >>> Storing the thread state in some application data object is not a >>> good approach, since there's nothing preventing some other thread >> >from restoring the thread state saved by the thread that just >>> released the lock. >>> >>> By contrast, the above macros will add a local stack variable and >>> store the thread state there, so it's not possible for another >>> thread to accidentally restore a thread state that it doesn't >>> own. >> Right. That's why I changed pyOpenSSL /from/ doing that (prior to 0.8) >> to using thread local storage instead (0.8). However, it seems there's >> some bug in the thread local storage version of the locking scheme. > > Perhaps it's related to this notice of caution in thread.c: > > """ > Use PyThread_set_key_value(thekey, value) to associate void* value with > thekey in the current thread. Each thread has a distinct mapping of thekey > to a void* value. Caution: if the current thread already has a mapping > for thekey, value is ignored. > """ > > If value is ignored, MY_BEGIN_ALLOW_THREADS() won't save the current > thread state, but leave the old one in place. > > It's also possible for this API to fail in case of a memory issue, > but the macro doesn't check for this, so the application may end > up not storing anything at all. > > MY_END_ALLOW_THREADS() needs to remove the stored TLS value in order > to make room for the next MY_BEGIN_ALLOW_THREADS() call. > > Note that I'm not sure using TLS is a good idea: the implementations > are highly platform dependent and may have undocumented/unwanted > side-effects. > > IMHO, it's safer to store the thread id together with the tstate > itself in the context and then check whether the thread id matches > before restoring the state. > > This has the additional benefit of making some form of error > raising possible in order to inform the programmer of the obvious > mistake in using the connections from multiple threads, > even if it's just a plain fprintf(stderr, ...). For better or worse, we have found that even the patch on SF doesn't fix the problem in all cases: While we no longer get segfaults on the second SSL request to a server application using the patched pyOpenSSL 0.8.0, we still get immediate segfaults if two clients try to access the server at the same time. This time around the problem shows up on Windows. Only going back to pyOpenSSL 0.7 solved both issues. I guess the reason for this is that pyOpenSSL 0.7 stores the thread state in the connection rather than the context as does the SF patch. Do you have any information on whether the context in OpenSSL is thread-safe ? The FAQ entry for OpenSSL only mentions connections: http://www.openssl.org/support/faq.html#PROG1 Note that it also mentions two callbacks that have to be initialized for multi-threaded applications: CRYPTO_set_locking_callback() and CRYPTO_set_id_callback() which pyOpenSSL does not set. Unfortunately, the man page explaining these is not very helpful: http://www.openssl.org/docs/crypto/threads.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 18 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
From: Jean-Paul C. <ex...@di...> - 2009-03-18 20:54:37
|
On Wed, 18 Mar 2009 21:38:49 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >On 2009-03-17 19:57, M.-A. Lemburg wrote: > > [snip] > >For better or worse, we have found that even the patch on SF doesn't >fix the problem in all cases: > >While we no longer get segfaults on the second SSL request to >a server application using the patched pyOpenSSL 0.8.0, we still >get immediate segfaults if two clients try to access the server >at the same time. This time around the problem shows up on Windows. > >Only going back to pyOpenSSL 0.7 solved both issues. > >I guess the reason for this is that pyOpenSSL 0.7 stores the >thread state in the connection rather than the context as does >the SF patch. > >Do you have any information on whether the context in OpenSSL is >thread-safe ? > >The FAQ entry for OpenSSL only mentions connections: >http://www.openssl.org/support/faq.html#PROG1 > >Note that it also mentions two callbacks that have to be initialized >for multi-threaded applications: CRYPTO_set_locking_callback() and >CRYPTO_set_id_callback() which pyOpenSSL does not set. > >Unfortunately, the man page explaining these is not very helpful: > >http://www.openssl.org/docs/crypto/threads.html You've linked to all the documentation I'm aware of. My interpretation of it has been that it means Context objects are threadsafe, as long as you initialize OpenSSL's threading layer (which pyOpenSSL does in 0.8.0 but not in any previous version!). Probably reading the implementation is the only way to be really sure (though, of course, a future release of OpenSSL might invalidate anything learned using that approach). Jean-Paul |
From: M.-A. L. <ma...@eg...> - 2009-03-18 21:28:43
|
On 2009-03-18 21:54, Jean-Paul Calderone wrote: > On Wed, 18 Mar 2009 21:38:49 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >> Do you have any information on whether the context in OpenSSL is >> thread-safe ? >> >> The FAQ entry for OpenSSL only mentions connections: >> http://www.openssl.org/support/faq.html#PROG1 >> >> Note that it also mentions two callbacks that have to be initialized >> for multi-threaded applications: CRYPTO_set_locking_callback() and >> CRYPTO_set_id_callback() which pyOpenSSL does not set. >> >> Unfortunately, the man page explaining these is not very helpful: >> >> http://www.openssl.org/docs/crypto/threads.html > > You've linked to all the documentation I'm aware of. My interpretation > of it has been that it means Context objects are threadsafe, as long as > you initialize OpenSSL's threading layer (which pyOpenSSL does in 0.8.0 > but not in any previous version!). Ah sorry, I grepped the wrong pyOpenSSL version. Good to know that 0.8.0 does setup the OpenSSL threading support ! > Probably reading the implementation > is the only way to be really sure (though, of course, a future release > of OpenSSL might invalidate anything learned using that approach). True. They do have a tendency to change internals frequently... even in sub-patch-level releases. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 18 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
From: Jean-Paul C. <ex...@di...> - 2009-03-22 16:32:29
|
Hello all, I just pushed a branch to launchpad which may address the 0.8 thread-related crashes. The bug, along with a reproduction script, is described here: https://bugs.launchpad.net/pyopenssl/+bug/344815 The branch, along with checkout instructions, which fixes the problem is here: https://code.launchpad.net/~exarkun/pyopenssl/thread-crash If you're not in to bzr, then you can get just the revision which includes the fix here: http://bazaar.launchpad.net/~exarkun/pyopenssl/thread-crash/revision/98 Note that the branch is from current trunk tip (ie, the latest development version). It is not a branch of the 0.8.0 release. In my testing, the reproduction scripts attached to the ticket produce a server-side segfault after running for between 2 and 10 seconds. It is likely that the bad behavior will trigger less frequently if run on a single core machine, but it may still trigger. With the branch linked above, the scripts run until they exhaust the available IP address space (each connection allocates a port which then goes into TIME_WAIT, so eventually there are no more ports free until the TIME_wAIT timeout expires). I can run the client multiple times without the server crashing. This is a good sign. Any additional testing anyone can do would be greatly appreciated. Thanks, Jean-Paul |
From: M.-A. L. <ma...@eg...> - 2009-03-24 23:24:38
|
On 2009-03-22 17:32, Jean-Paul Calderone wrote: > Hello all, > > I just pushed a branch to launchpad which may address the 0.8 thread-related > crashes. > > The bug, along with a reproduction script, is described here: > > https://bugs.launchpad.net/pyopenssl/+bug/344815 > > The branch, along with checkout instructions, which fixes the problem is here: > > https://code.launchpad.net/~exarkun/pyopenssl/thread-crash > > If you're not in to bzr, then you can get just the revision which includes > the fix here: > > http://bazaar.launchpad.net/~exarkun/pyopenssl/thread-crash/revision/98 This doesn't appear to cover all cases, esp. not if you have an application that uses recursion in callbacks. The patch causes an already existing thread state to get overwritten by another one. If the application starts returning from a deeper nesting, this will cause a wrong thread state to get restored (actually, the last one will get restored multiple times). To solve this, the thread state would have to be managed on a stack or recursive callbacks would have to be prohibited by checking whether there already is a saved thread state and then raising an exception instead of overwriting the state with a new one. The latter solution appears to be a lot easier to implement and much safer overall. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 25 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ |
From: Jean-Paul C. <ex...@di...> - 2009-04-01 20:26:49
|
On Wed, 25 Mar 2009 00:24:29 +0100, "M.-A. Lemburg" <ma...@eg...> wrote: >On 2009-03-22 17:32, Jean-Paul Calderone wrote: >> Hello all, >> >> I just pushed a branch to launchpad which may address the 0.8 thread-related >> crashes. >> >> The bug, along with a reproduction script, is described here: >> >> https://bugs.launchpad.net/pyopenssl/+bug/344815 >> >> The branch, along with checkout instructions, which fixes the problem is here: >> >> https://code.launchpad.net/~exarkun/pyopenssl/thread-crash >> >> If you're not in to bzr, then you can get just the revision which includes >> the fix here: >> >> http://bazaar.launchpad.net/~exarkun/pyopenssl/thread-crash/revision/98 > Hi Marc, Thanks for looking into this. Sorry for the delayed response, PyCon has been keeping me busy. >This doesn't appear to cover all cases, esp. not if you have an >application that uses recursion in callbacks. > >The patch causes an already existing thread state to get overwritten >by another one. If the application starts returning from a deeper >nesting, this will cause a wrong thread state to get restored (actually, >the last one will get restored multiple times). I think this will work out okay. The thread state is actually a TLS object anyway. Any given thread can only have one thread state object. So PyEval_SaveThread will always return the same PyInterpreterState*. At least, I think this is how it happens. Going into a bit more detail... Say someone calls Connection.send. MY_BEGIN_ALLOW_THREADS is invoked and PyEval_SaveThread swaps NULL in as the new PyInterpreterState and gives back the existing PyInterpreterState (call it S1). This gets saved with pyOpenSSL's _pyOpenSSL_tstate_key. Any key there already is deleted. Now say one of the pyOpenSSL global callbacks gets invoked. It starts by using MY_END_ALLOW_THREADS. This loads S1 from _pyOpenSSL_tstate_key and passes it to PyEval_RestoreThread which swaps S1 in as the new PyInterpreterState. Then it calls into some arbitrary Python code. Now say this arbitrary Python code calls another Connection's send method. MY_BEGIN_ALLOW_THREADS is invoked again. This deletes the current value of _pyOpenSSL_tstate_key and invokes PyEval_SaveThread again which swaps in NULL as the new PyInterpreterState and then saves the existing PyInterpreterState in _pyOpenSSL_tstate_key. Since the previous step saved S1 as the PyInterpreterState, this is S1 again. Eventually the send finishes, MY_END_ALLOW_THREADS is invoked. This the value of _pyOpenSSL_tstate_key which is S1, as described in the previous step. S1 is passed to PyEval_RestoreThread and then the send call is done. Now control returns to the global callback code which finishes up and gets ready to return back to OpenSSL. It does this by invoking MY_BEGIN_ALLOW_THREADS which clears _pyOpenSSL_tstate_key and then saves the result of PyEval_SaveThread (S1) in it. Then control returns to OpenSSL. Now the very top Connection.send finishes and the implementation of that method in pyOpenSSL gets ready to return to the calling Python code. It invokes MY_END_ALLOW_THREADS which loads S1 from _pyOpenSSL_tstate_key and passes it to PyEval_RestoreThread. Then it returns to the calling Python code. I wasn't actually very confident that the patch was correct, but now that I've actually written all that up and traced through the various levels of thread control APIs in CPython, I'm relatively confident that the patch is doing the right thing. I also learned that there are some APIs in CPython which do much of what pyOpenSSL is doing. They aren't available in Python 2.3 though, so I'm not quite ready to switch over to them. How does all this sound? Jean-Paul |
From: Jean-Paul C. <ex...@di...> - 2009-04-01 22:56:43
|
On Wed, 1 Apr 2009 15:26:43 -0500, Jean-Paul Calderone <ex...@di...> wrote: > > [snip] > >I wasn't actually very confident that the patch was correct, but now that >I've actually written all that up and traced through the various levels of >thread control APIs in CPython, I'm relatively confident that the patch is >doing the right thing. I also learned that there are some APIs in CPython >which do much of what pyOpenSSL is doing. They aren't available in Python >2.3 though, so I'm not quite ready to switch over to them. Shortly after sending this message, I realized that it is also a perfectly valid explanation for why the current trunk code also works correctly and does not segfault. However, the current trunk code does segfault, so something must be wrong. ;) After a few hours with gdb, I found the problem. Python threads identifiers are only unique for the lifetime of the thread. As threads are destroyed and others created, the identifiers may be re-used. Since thread identifiers are used as TLS keys, this means that thread A may set a TLS key then exit and a later thread B may then be able to see the value for that TLS key. This means that without the delete, thread B won't save the correct new PyThreadState*, thus causing the bug. Jean-Paul |
From: Adrian M. <ad...@co...> - 2009-04-02 02:40:03
|
Jean-Paul Calderone wrote: > On Wed, 1 Apr 2009 15:26:43 -0500, Jean-Paul Calderone <ex...@di...> wrote: > >> [snip] >> >> I wasn't actually very confident that the patch was correct, but now that >> I've actually written all that up and traced through the various levels of >> thread control APIs in CPython, I'm relatively confident that the patch is >> doing the right thing. I also learned that there are some APIs in CPython >> which do much of what pyOpenSSL is doing. They aren't available in Python >> 2.3 though, so I'm not quite ready to switch over to them. >> > > Shortly after sending this message, I realized that it is also a perfectly... :))) what you were doing 3/18/09 ?? :))) a/ |