Menu

#881 Multiple modules leads to unhandled exception

None
closed-works-for-me
nobody
5
2022-10-29
2008-01-03
Ben Webb
No

When using methods from multiple methods, Python crashes with "terminate called after throwing an
instance of 'swig::stop_iteration'". Here is a simple example:

module1.i:
%module module1
%include "std_vector.i"
%template(vectori) std::vector<int>;

module2.i:
%module module2
%include "std_vector.i"
%template(vectorf) std::vector<float>;

Build with something like (this is with swig-1.3.33, Fedora 8):
swig -python -c++ module1.i
g++ -shared -I/usr/include/python2.5 -o _module1.so module1_wrap.cxx
swig -python -c++ module2.i
g++ -shared -I/usr/include/python2.5 -o _module2.so module2_wrap.cxx

Then run the following Python script:
import module1
import module2
a = module1.vectori()
for b in a: print b
a = module2.vectorf()
for b in a: print b

This crashes every time for me with:
terminate called after throwing an instance of 'swig::stop_iteration'

Incidentally, the following script always crashes for me too:
import module1
import module2
a = module1.vectori()
for b in a: print b

but the following always works:
import module2
import module1
a = module1.vectori()
for b in a: print b

So it looks to me as if one module is overriding the symbols of the other, and indeed, both modules are exporting thesame symbols:
$ nm -C _module1.so |grep stop
000242c8 V typeinfo for swig::stop_iteration
000242d0 V typeinfo name for swig::stop_iteration
$ nm -C _module2.so |grep stop
00024648 V typeinfo for swig::stop_iteration
00024650 V typeinfo name for swig::stop_iteration

On an Intel Mac (10.4) all of the examples work, but nm again shows that both dynamic libraries have the same symbols.

Discussion

  • Josh Cherry

    Josh Cherry - 2008-01-04

    Logged In: YES
    user_id=957678
    Originator: NO

    The symbols in one module are *not* overriding those of the other, which is why things don't work. These are weak symbols, as indicated by the 'V'. The intent is that just one typeinfo is used. That way if, for example, a stop_iteration from one shared library is thrown, and code in another library tries to catch it, the catch will succeed since the type information will compare equal. By default, Python loads shared libraries in such a way that they don't see each other's symbols. If you change that behavior, e.g., by doing

    import sys, dl
    sys.setdlopenflags(sys.getdlopenflags() | dl.RTLD_GLOBAL)

    before importing your modules, you won't get the crash (I'm not suggesting this as a practical solution, but if you do use it you probably want to restore the original settings afterwards).

    The issue arises because both modules wrap swig::PySwigIterator. The shadow object returned for the iterator is a module2.PySwigIterator, even though it comes from a module1 object (you can see this by printing iter(a)). Calling its methods thus leads to calls into _modules2.so. Making the iterator class names different will fix the problem. For example, adding

    #define PySwigIterator module1_PySwigIterator

    as the second line of module1.i will work around the problem. A more elegant fix might involve file-specific namespaces.

     
  • Ben Webb

    Ben Webb - 2008-01-04

    Logged In: YES
    user_id=69439
    Originator: YES

    Thanks for the informative response - you are of course absolutely right. I wasn't aware that Python did dlopen with RTLD_LOCAL, so that indeed explains the behavior.

    The #define workaround works for us. It seems to me that another possible "more elegant" fix would be to split the SWIG type table into two, one of which is shared in the usual way with all other modules with the same runtime version and SWIG_TYPE_TABLE define, and the other which is not shared at all. Obviously PySwigIterator would go in the second table. (It seems odd to propose a module-specific name for PySwigIterator and then share it globally.) But I don't know if this is feasible with the SWIG design and the supported scripting languages.

     
  • Ben Webb

    Ben Webb - 2008-01-12

    Logged In: YES
    user_id=69439
    Originator: YES

    Unfortunately the #define workaround seems to break some part of the SWIG type system. I'll poke around a little, but maybe others can shed some light on it. So now the situation is as follows:

    module1.i:
    %module module1
    #define PySwigIterator mod1_PySwigIterator
    %include "std_vector.i"
    %template(vectori) std::vector<int>;

    Build with, e.g.
    swig -python -c++ module1.i
    g++ -g -shared -fPIC -I/usr/include/python2.5 -o _module1.so module1_wrap.cxx

    Then run test.py:
    import module1
    module1.vectori().begin()

    This gives the output:
    swig/python detected a memory leak of type '(null)', no destructor found.

    This is irritating on my Linux box, but a showstopper on my Solaris box, since its printf() tries to dereference the null pointer and segfaults. My guess is the mod1_PySwigIterator destructor isn't being called where it should, but I don't know why. Without the #define all proceeds normally (but of course then the original two-module problem occurs).

     
  • William Fulton

    William Fulton - 2008-01-12

    Logged In: YES
    user_id=242951
    Originator: NO

    I just fixed the seg fault due to the printf in svn, but am unable to shed light on the real problem.

     
  • Ben Webb

    Ben Webb - 2008-01-13

    Logged In: YES
    user_id=69439
    Originator: YES

    Actually, I'd argue for that NULL triggering an assertion failure somewhere earlier in the code, since as far as I can tell, SWIG_TypePrettyName shouldn't be returning a NULL anyway...

    At any rate, I found the underlying problem. pyiterators.swg contains the following implementation for PySwigIterator::descriptor():

    static swig_type_info* descriptor() {
    static int init = 0;
    static swig_type_info* desc = 0;
    if (!init) {
    desc = SWIG_TypeQuery("swig::PySwigIterator *");
    init = 1;
    }
    return desc;
    }

    With the #define in place, the class is now called mod1_PySwigIterator, but SWIG_TypeQuery is still getting a PySwigIterator, since #defines don't affect string constants, of course. Changing the string constant to "swig::mod1_PySwigIterator *" makes the memory leak go away.

    This does make the #define workaround a little harder to use, however. I can of course use sed or perl to "fix" that one string constant in the generated wrapper after SWIG runs, but I wonder if anybody can think of a cleaner solution? Perhaps the pyiterators.swg code can be modified to use SWIG's macro system, and get a module-specific prefix that way?

     
  • Amaury Forgeot d'Arc

    I fell into the same problem.
    the "#define PySwigIterator module1_PySwigIterator" works for me,
    and even better if I change PySwigIterator::descriptor() this way:

    static swig_type_info* descriptor() {
    static int init = 0;
    static swig_type_info* desc = 0;
    if (!init) {
    #define _PySwigIterator_STRINGIZE(name) #name
    #define _PySwigIterator_NAME(name) _PySwigIterator_STRINGIZE(name)
    desc = SWIG_TypeQuery("swig::" _PySwigIterator_NAME(PySwigIterator) " *");
    #undef _PySwigIterator_STRINGIZE
    #undef _PySwigIterator_NAME
    init = 1;
    }
    return desc;
    }

    The "stringize" trick replaces the PySwigIterator token with its (possibly redefined) value between double quotes.

    Looking at the code, I suspect that Ruby has the same problem.

     
  • Milos Jakubicek

    Milos Jakubicek - 2009-09-03

    Could some one please look at this problem again and try to fix this in SWIG? This is indeed very annoying. The only solution working for me was adding:
    import sys, dl
    sys.setdlopenflags(sys.getdlopenflags() | dl.RTLD_GLOBAL)
    into my script, which is, as mentioned, not a practical solution.

    Thank you very much in advance!

     
  • Seth Johnson

    Seth Johnson - 2009-11-25

    I am also having this problem on the Red Hat systems that I use, although it doesn't crop up on the same files under Snow Leopard. All three are using swig 1.3.40.

     
  • Milos Jakubicek

    Milos Jakubicek - 2010-02-14

    Any chance this would get fixed? It's quite a big issue. And hard to debug. E.g. it occurs on Ubuntu 8.04, but not on Fedora 12 -- although the generated .py files are exactly the same (so I guess there is some underlying trigger depending on the gcc/glibc version).
    Thank you in advance!!!

     
  • - 2010-08-02

    I agree... because the problem appears for me on some platforms and not on others, it was really a pain to track down. (And, after extensive googling, this page is the only place I've been able to find it referenced.) I'd greatly appreciate a fix.

     
  • Olly Betts

    Olly Betts - 2022-03-21

    The original example runs successfully for me on Linux using SWIG git master with both Python 2.7.18 and Python 3.9.10. However it seems the default dlopen flags Python uses don't include RTLD_GLOBAL and nm shows the same symbols so I'm hesitant to close as fixed.

    Can anybody reproduce this with modern versions of SWIG and Python?

     
  • William Fulton

    William Fulton - 2022-10-29
    • status: open --> closed-works-for-me
    • Group: -->
     
  • William Fulton

    William Fulton - 2022-10-29

    I couldn't reproduce using swig-1.3.33 or swig-4.1.0 with python-2.7.17, Ubuntu 18.04.6 64-bit. Also could not produce using swig-1.3.33 with python-2.5.6, Ubuntu 14.04.5 64-bit. Closing as can't reproduce. Please re-open if reproducible on a modern Linux distribution.

     

Log in to post a comment.