many trajectories eventually segfault

Help
2013-10-08
2013-11-04
  • Matthew Egbert
    Matthew Egbert
    2013-10-08

    Hello,

    I have been working on integrating PyDSTool with the inspyred
    genetic algorithm package, to allow me to use machine learning
    algorithms to tune dynamical neural networks to solve particular
    tasks. The Radau and Dopri integrators seems to work most of the
    time, but after generating many trajectories (an inconsistent,
    unpredictable number, sometimes ~1000, sometimes ~100000), the program
    just dies with a segfault and no information about the origin of the
    segfault.

    I am using 64 bit ubuntu, and have modified the -m32 flags to be
    -m64 in Generators/Dopri_ODEsystem.py, Generators/Radau_ODEsystem.py
    and PyCont/ContClass.py.  (Simply removing the flags didn't seem to
    work…but because of the unpredictable nature of the segfaults, I'm
    not sure if there is a difference between removing the -m32 flags and
    replacing them with -m64 flags).

    When I run the run_all_tests.py script, the final output says:

    Test scripts that failed:
            test_variable_traj.py
            poly_interp_test.py
            PyCont_Hopfield.py
            interp_dopri_test.py
            PyCont_MorrisLecar_TypeI.py
            PyCont_MorrisLecar_TypeII.py
            PyCont_HindmarshRose.py
    Summary:
    Basic PyDSTool functions: appears to be broken on your system
    Map related modules: appears to work on your system
    VODE related modules: appears to be broken on your system
    Symbolic differentiation module: appears to work on your system
    Parameter estimation module: appears to work on your system
    PyCont: appears to be broken on your system
    Dopri ODE systems: appears to be broken on your system
    Radau ODE systems: appears to work on your system
    Parameter estimation module with external compilers: appears to work on your system
    PyCont interface to AUTO: appears to be broken on your system
    

    But, when I rerun some of these tests, there is no problem. Specifically:

    The following tests fail (not segfaulting, but due to not finding some
    libraries).  I think I saw that these are known bugs?? So I have not
    included the details of the output of these failures, but if it is
    helpful, I am happy to do so (just let me know).

            test_variable_traj.py
            poly_interp_test.py
            PyCont_Hopfield.py

    But the following test seems to pass without problem, generates a nice
    plot and exits without warnings or errors:

            interp_dopri_test.py

    The following tests all segfault when I run them with python, but not
    when I run them with ipython!

            PyCont_MorrisLecar_TypeI.py
            PyCont_MorrisLecar_TypeII.py
            PyCont_HindmarshRose.py

    They all seem to do so at the same point, close to the end of the
    test.  See the output at http://www.rhthm.com/pydstool, where I have
    indicated where the test fails.

    Running my program with ipython or python successfully generates
    thousands (if not 10s or 100s of thousands) of trajectories before
    segfaulting or reporting

    Fatal Python error: deallocating None
    Aborted (core dumped)
    

    I have confirmed that it is not the evolutionary optimisation of
    parameters that is causing the segfault, by replacing the system of
    equations with a set of very simple equations that have no parameters
    that are influenced by evolution.

    My questions are:

    1. What is likely to be causing the segfault? Is there a way to
    configure pydstool / the compilation of the C integrators to provide
    me with more information?

    2. Does anything stand out to anyone as a possible cause for the
    different behaviour between python and ipython.  Are there any library
    path variables that I should check?

    3. Is there anything that needs to be done when using the same
    generator to create many many trajectories? I have tried recreating
    the generator between the production of every generator, but that does
    not help.  When I run top during execution, I have not noticed any
    memory leaks.

    Basically, I have tried everything that I can think of to debug the
    error. Source code in a .tz is available at the following url.

    You will need to install the package inspyred avaiable through pip.

    http://www.rhthm.com/pydstool

    Any help you can provide would be much appreciated!  Any if you need
    any further information from me, please don't hesitate to ask.

    Cheers,
    Matthew

     
  • Matthew Egbert
    Matthew Egbert
    2013-10-08

    I am using the latest version of PyDSTool

    PyDSTool-0.88.121202.zip

    I have also tried, with no improvement.

    PyDSTool-0.88.120504.zip

    In : PyDSTool.__version__
    Out: '0.88'

    In : numpy.__version__
    Out: '1.7.1'

    In : scipy.__version__
    Out: '0.12.0'

     
  • Rob Clewley
    Rob Clewley
    2013-10-08

    Hi,

    Sorry that this has been a problem. We have been aware of the slow memory leak causing problems for long integrations for some time but have been unable to trace its origin (it's something either in the SWIG interfacing that is opaque to us or otherwise inherent in the C behind the array libraries). We *might* have inadvertently fixed it in the minor update that is posted here, but we haven't had a way to test this for the kind of case you have:

    https://dl.dropboxusercontent.com/u/2816627/PyDSTool-0.88.130726.zip

    Please try it with this version and let us know here whether it has solved the problem. The other issue is that 64 bit systems have been troublesome - sometimes they've worked fine and others not. We also have not yet been able to trace that but someone is looking into it.

    I don't know of any simple fix except to break up your long runs into multiple pieces. You can also split up many short runs (which can also cause it) by starting different threads each running PyDSTool for different groups of initial conditions. That, at least, avoids the problem.

     
  • Matthew Egbert
    Matthew Egbert
    2013-10-10

    Hello Rob,

    Thanks for your response. Unfortunately the lastest version that you posted did not resolve the problem.  I only did one test (more will follow) and it did seem to run for longer than before.  But just as soon as I got my hopes up…

    Fatal Python error: deallocating None
    Aborted (core dumped)

    Hit me up if you want me to run another test of some kind to help you debug what's going on.

    In terms of breaking the run into multiple pieces.  Is there a way to do that from within python (i.e. without starting the python interpreter between the pieces)?

    Thanks again.
    All the best,
    Matthew

     
  • Rob Clewley
    Rob Clewley
    2013-10-11

    Sorry that didn't work out. The only tests that could be more helpful would be if you had installed Python with the debug symbols included, and then ran this through the new gdb with the python bindings. I don't presently have such a setup ready to do such a test. If you happen to know enough about using gdb and would be willing to help us track down this elusive problem then it would be a great benefit to our project. You can email me privately if you'd like to get involved.

    You can start new threads within python using the os library to create independent threads and run arbitrary system commands (e.g. to run a python script), or the subprocess library to call python and wait on a return code.

     

  • Anonymous
    2013-10-16

    I wonder if this is related…

    If I run the following code, the memory usage keeps on increasing indefinitely (I am loosing a ~2 MB/second. ).

    Could there be a memory leak even before calls to any of the integrators?

    I ran into this issue while trying to integrate a system feeding in (MANY >10**4) initial conditions in a for loop.
    After about 1-2 hours of work my computer runs out of memory (~16 GB).

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    #!/usr/bin/env python
    import PyDSTool as dst
    def code():
        """ Runs the simulation for a single day of growth. Returning all the dynamics. """
        icdict = {'x': 1}
        x_rhs = '-0.1 * x'
        vardict = {'x': x_rhs}
        DSargs = dst.args()                   # create an empty object instance of the args class, call it DSargs
        DSargs.name = 'SHM'               # name our model
        DSargs.ics = icdict               # assign the icdict to the ics attribute
        DSargs.tdata = [0, 20]            # declare how long we expect to integrate for
        DSargs.varspecs = vardict         # assign the vardict dictionary to the 'varspecs' attribute of DSargs
        DS = dst.Generator.Vode_ODEsystem(DSargs)
        del DS
        return
    def main():
        for i in range(10**8):
            code()
    if __name__ == '__main__':
            main()
    

    $ uname -a
    Linux marbles 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

    In : import numpy

    In : numpy.__version__
    Out: '1.7.1'

    In : import scipy

    In : scipy.__version__
    Out: '0.11.0'

    This is true both for this link: https://dl.dropboxusercontent.com/u/2816627/PyDSTool-0.88.130726.zip
    and the version available for download from sourceforge.

     
  • Rob Clewley
    Rob Clewley
    2013-10-16

    Thanks for looking at this. I don't think it's related. The Vode generator is wrapping the scipy module and has nothing to do with our Dopri/Radau SWIG-interfaced code, that's for sure. As for the python side, for the initialization of the generator, most likely this commits memory to the pre-compiled Vode DLL that is loaded. Creating the DS object repeatedly will probably assign new memory to the DLL, and I believe that Python does not actually unload DLLs. So that, I suspect, is the source of that memory increase. There's nothing really I can do about that, but repeatedly creating the same Generator is obviously not recommended anyway.

    The segfault with Dopri comes from repeatedly calling into the integrator, not recreating its Generator wrapper (i.e. not reloading the DLL). Although, you'll probably get the same memory increase if you repeatedly recreate a Dopri-based Generator in your loop too…

    Let me know if my logic is off or you notice anything else.
    Thanks!