#1027 python/std_map.i fails to lock correctly

closed-accepted
Olly Betts
python (259)
5
2009-07-29
2009-06-29
Paul Hampson
No

python/std_map.i does not have correct locking. In several functions, it takes the lock only for a PyErr_SetString and fails to protect a PyList_New or PyDict_New and related calls later on. In struct traits_asptr<std::map<K,T> >::asptr, no lock is taken at all.

I've attached a patch (against 1.3.38, but SVN indicates that this is the most recent version of this file) against Lib/python/std_map.i

Discussion

  • Paul Hampson
    Paul Hampson
    2009-06-29

    Patch for python/std_map.i to lock correctly.

     
    Attachments
  • Olly Betts
    Olly Betts
    2009-06-30

    Do you have a testcase which shows a problem with the current code?

    For a non-director function wrapper, we should be called with the GIL already locked, so there's no need to lock it, and trying to do so is actually harmful under mod_python as it can cause deadlocks (due to bugs in mod_python apparently).

    So my initial feeling is that actually the locks for PyErr_SetString(), etc are probably superfluous and should be removed rather than extended, but I might be wrong. A testcase which shows a problem would save a lot of time, and also provide a regression test for this if it does fix a problem.

     
  • Paul Hampson
    Paul Hampson
    2009-06-30

    I don't have a testcase I can post at this point, but the issue is that at least in the code I'm generating here, the only caller of the wrapper functions generated by this file is calling the SWIG_PYTHON_THREAD_BEGIN_ALLOW macro before calling them, which releases the GIL. It does this for a whole bunch of the methods, most of which it's fine for, because it then calls C++ functions between the ALLOWs.

    To block out an example from my current _wrap.cxx code, the code which is wrapping the values() method produced by the std_map.i looks like this:
    SWIGINTERN PyObject *_wrap_MyMap_values(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
    ...
    SWIG_PYTHON_THREAD_BEGIN_BLOCK
    // Parse args using Py* methods
    arg1 = reinterpret_cast< MyMap >* )(argp1);
    {
    SWIG_PYTHON_THREAD_BEGIN_ALLOW
    result = (PyObject *)mangled_myMap_values(arg1);
    SWIG_PYTHON_THREAD_END_ALLOW
    }
    ...
    SWIG_PYTHON_THREAD_END_BLOCK
    return
    }

    And this patch has definitely fixed random memory corruption which we chased down to finding two threads running inside the python interpreter, both apparently having gotten the same or at least overlapping blocks of RAM back from PyMalloc. Using the python debugging interpreter, it fails very quickly.

    I dread to think how mod_python manages to break the documented reentrancy behaviour of the GIL.

    The other option is to remove the wrapping SWIG_PYTHON_THREAD_BEGIN_ALLOW, although that's not provided by any of the .i files in my installation of SWIG, so I don't think it's an option I can really apply locally.

     
  • Paul Hampson
    Paul Hampson
    2009-06-30

    Test case for this bug report

     
    Attachments
  • Paul Hampson
    Paul Hampson
    2009-06-30

    OK, I've added a simple test case.

    "swig -c++ -python -threads maptest.i" produces maptest_wrap.cxx

    Look at the two instances of "std_map_Sl_int_Sc_int_Sg__values" in that file. The first is the method definition, generated by std_map.i, with no GIL protection for the calls to PyList_New and PyList_SET_ITEM. The second is the call to it, wrapped in SWIG_PYTHON_THREAD_BEGIN_ALLOW/SWIG_PYTHON_THREAD_END_ALLOW, which release the GIL.

     
  • Olly Betts
    Olly Betts
    2009-06-30

    • assigned_to: nobody --> olly
     
  • Olly Betts
    Olly Betts
    2009-06-30

    Thanks for the example.

    Note that GIL handling has changed in HEAD (it's changed not to try to acquire the GIL when we know we already have it), but this issue is still there - I see the GIL released as you say, so we should be reacquiring it (or not releasing it in the first place).

    The generation of SWIG_PYTHON_THREAD_BEGIN_ALLOW/SWIG_PYTHON_THREAD_END_ALLOW is controlled by %feature("nothreadallow") - this can be used to make us hold on to the GIL when wrapping a trivial method where the cost of the GIL handling might be more than the method call, but the methods in question here probably do warrant releasing it, at least for a larger map. So your patch seems plausible to me.

    For a regression test, is there a way to make maptest.i crash (or detect it misbehaving) for use as a regression test? I'd imagine it might be hard to do so fully repeatably given the nature of threads...

    I don't know the gory details of the mod_python issue, but I've observed it myself, and been assured it's the fault of mod_python.

     
  • Paul Hampson
    Paul Hampson
    2009-07-02

    As far as a regression test goes, simply having a program with two threads, one of which is doing Python stuff and the other is repeatedly accessing values() on the testmap (which could be prepopulated from C++ with lots of ints) should do it. If you run under the debugging build of python (on Debian it's already packaged in the python2.x-dbg packages) I always seemed to get a "Fatal UNREF" error or similar.

     
  • Olly Betts
    Olly Betts
    2009-07-29

    • status: open --> closed-accepted
     
  • Olly Betts
    Olly Betts
    2009-07-29

    Still failing to find time to sort out a test case, but it seems a pity to omit this fix from the next release just because of that...

    So applied to SVN trunk as r11465.