PyDSTool / Discussion / Help: many trajectories eventually segfault

Matthew Egbert - 2013-10-08

Hello,

I have been working on integrating PyDSTool with the inspyred
genetic algorithm package, to allow me to use machine learning
algorithms to tune dynamical neural networks to solve particular
tasks. The Radau and Dopri integrators seems to work most of the
time, but after generating many trajectories (an inconsistent,
unpredictable number, sometimes ~1000, sometimes ~100000), the program
just dies with a segfault and no information about the origin of the
segfault.

I am using 64 bit ubuntu, and have modified the -m32 flags to be
-m64 in Generators/Dopri_ODEsystem.py, Generators/Radau_ODEsystem.py
and PyCont/ContClass.py. (Simply removing the flags didn't seem to
work…but because of the unpredictable nature of the segfaults, I'm
not sure if there is a difference between removing the -m32 flags and
replacing them with -m64 flags).

When I run the run_all_tests.py script, the final output says:

Test scripts that failed:         test_variable_traj.py         poly_interp_test.py         PyCont_Hopfield.py         interp_dopri_test.py         PyCont_MorrisLecar_TypeI.py         PyCont_MorrisLecar_TypeII.py         PyCont_HindmarshRose.py Summary: Basic PyDSTool functions: appears to be broken on your system Map related modules: appears to work on your system VODE related modules: appears to be broken on your system Symbolic differentiation module: appears to work on your system Parameter estimation module: appears to work on your system PyCont: appears to be broken on your system Dopri ODE systems: appears to be broken on your system Radau ODE systems: appears to work on your system Parameter estimation module with external compilers: appears to work on your system PyCont interface to AUTO: appears to be broken on your system

But, when I rerun some of these tests, there is no problem. Specifically:

The following tests fail (not segfaulting, but due to not finding some
libraries). I think I saw that these are known bugs?? So I have not
included the details of the output of these failures, but if it is
helpful, I am happy to do so (just let me know).

        test_variable_traj.py
        poly_interp_test.py
        PyCont_Hopfield.py

But the following test seems to pass without problem, generates a nice
plot and exits without warnings or errors:

        interp_dopri_test.py

The following tests all segfault when I run them with python, but not
when I run them with ipython!

        PyCont_MorrisLecar_TypeI.py
        PyCont_MorrisLecar_TypeII.py
        PyCont_HindmarshRose.py

They all seem to do so at the same point, close to the end of the
test. See the output at http://www.rhthm.com/pydstool, where I have
indicated where the test fails.

Running my program with ipython or python successfully generates
thousands (if not 10s or 100s of thousands) of trajectories before
segfaulting or reporting

Fatal Python error: deallocating None Aborted (core dumped)

I have confirmed that it is not the evolutionary optimisation of
parameters that is causing the segfault, by replacing the system of
equations with a set of very simple equations that have no parameters
that are influenced by evolution.

My questions are:

1. What is likely to be causing the segfault? Is there a way to
configure pydstool / the compilation of the C integrators to provide
me with more information?

2. Does anything stand out to anyone as a possible cause for the
different behaviour between python and ipython. Are there any library
path variables that I should check?

3. Is there anything that needs to be done when using the same
generator to create many many trajectories? I have tried recreating
the generator between the production of every generator, but that does
not help. When I run top during execution, I have not noticed any
memory leaks.

Basically, I have tried everything that I can think of to debug the
error. Source code in a .tz is available at the following url.

You will need to install the package inspyred avaiable through pip.

http://www.rhthm.com/pydstool

Any help you can provide would be much appreciated! Any if you need
any further information from me, please don't hesitate to ask.

Cheers,
Matthew
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matthew Egbert - 2013-10-08

I am using the latest version of PyDSTool

PyDSTool-0.88.121202.zip

I have also tried, with no improvement.

PyDSTool-0.88.120504.zip

In : PyDSTool.__version__
Out: '0.88'

In : numpy.__version__
Out: '1.7.1'

In : scipy.__version__
Out: '0.12.0'

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rob Clewley - 2013-10-08

Hi,

Sorry that this has been a problem. We have been aware of the slow memory leak causing problems for long integrations for some time but have been unable to trace its origin (it's something either in the SWIG interfacing that is opaque to us or otherwise inherent in the C behind the array libraries). We *might* have inadvertently fixed it in the minor update that is posted here, but we haven't had a way to test this for the kind of case you have:

https://dl.dropboxusercontent.com/u/2816627/PyDSTool-0.88.130726.zip

Please try it with this version and let us know here whether it has solved the problem. The other issue is that 64 bit systems have been troublesome - sometimes they've worked fine and others not. We also have not yet been able to trace that but someone is looking into it.

I don't know of any simple fix except to break up your long runs into multiple pieces. You can also split up many short runs (which can also cause it) by starting different threads each running PyDSTool for different groups of initial conditions. That, at least, avoids the problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matthew Egbert - 2013-10-10

Hello Rob,

Thanks for your response. Unfortunately the lastest version that you posted did not resolve the problem. I only did one test (more will follow) and it did seem to run for longer than before. But just as soon as I got my hopes up…

Fatal Python error: deallocating None
Aborted (core dumped)

Hit me up if you want me to run another test of some kind to help you debug what's going on.

In terms of breaking the run into multiple pieces. Is there a way to do that from within python (i.e. without starting the python interpreter between the pieces)?

Thanks again.
All the best,
Matthew

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rob Clewley - 2013-10-11

Sorry that didn't work out. The only tests that could be more helpful would be if you had installed Python with the debug symbols included, and then ran this through the new gdb with the python bindings. I don't presently have such a setup ready to do such a test. If you happen to know enough about using gdb and would be willing to help us track down this elusive problem then it would be a great benefit to our project. You can email me privately if you'd like to get involved.

You can start new threads within python using the os library to create independent threads and run arbitrary system commands (e.g. to run a python script), or the subprocess library to call python and wait on a return code.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I wonder if this is related…

If I run the following code, the memory usage keeps on increasing indefinitely (I am loosing a ~2 MB/second. ).

Could there be a memory leak even before calls to any of the integrators?

I ran into this issue while trying to integrate a system feeding in (MANY >10**4) initial conditions in a for loop.
After about 1-2 hours of work my computer runs out of memory (~16 GB).

#!/usr/bin/env python
import PyDSTool as dst
def code():
    """ Runs the simulation for a single day of growth. Returning all the dynamics. """
    icdict = {'x': 1}
    x_rhs = '-0.1 * x'
    vardict = {'x': x_rhs}
    DSargs = dst.args()                   # create an empty object instance of the args class, call it DSargs
    DSargs.name = 'SHM'               # name our model
    DSargs.ics = icdict               # assign the icdict to the ics attribute
    DSargs.tdata = [0, 20]            # declare how long we expect to integrate for
    DSargs.varspecs = vardict         # assign the vardict dictionary to the 'varspecs' attribute of DSargs
    DS = dst.Generator.Vode_ODEsystem(DSargs)
    del DS
    return
def main():
    for i in range(10**8):
        code()
if __name__ == '__main__':
        main()

$ uname -a
Linux marbles 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

In : import numpy

In : numpy.__version__
Out: '1.7.1'

In : import scipy

In : scipy.__version__
Out: '0.11.0'

This is true both for this link: https://dl.dropboxusercontent.com/u/2816627/PyDSTool-0.88.130726.zip
and the version available for download from sourceforge.

Rob Clewley - 2013-10-16

Thanks for looking at this. I don't think it's related. The Vode generator is wrapping the scipy module and has nothing to do with our Dopri/Radau SWIG-interfaced code, that's for sure. As for the python side, for the initialization of the generator, most likely this commits memory to the pre-compiled Vode DLL that is loaded. Creating the DS object repeatedly will probably assign new memory to the DLL, and I believe that Python does not actually unload DLLs. So that, I suspect, is the source of that memory increase. There's nothing really I can do about that, but repeatedly creating the same Generator is obviously not recommended anyway.

The segfault with Dopri comes from repeatedly calling into the integrator, not recreating its Generator wrapper (i.e. not reloading the DLL). Although, you'll probably get the same memory increase if you repeatedly recreate a Dopri-based Generator in your loop too…

Let me know if my logic is off or you notice anything else.
Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre Foncelle - 2016-02-25

Hello,

I was wondering if you made any progress about this leak of memory. Indeed, when I re-create a Radau-based Generator in my loop, memory start growing.

Thank you !

Alexandre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rob Clewley - 2016-02-25

I don't think there has been any progress on the full rewrite of the interfaces to Radau and Dopri, alas. I think the work has begun, though. As an interim measure, you could see if you can break a large trajectory calculation into smaller pieces and see if the memory still grows when the integrator is restarted many times instead. If it still does, a small script could manage a loop that starts and kills a second python process to compute each part of the trajectory.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Evgenij Gr. - 2017-06-23
  
  Excuse me, what way of restarting integrator did you mean exactly? I've encountered the same problem with memory leak in Dopri/Radau and that way could be extremely useful to me. Somehow Generator's cleanupMemory method doesn't work at all (from the description I got the idea that it might solve problem, but no success) and I can't recompile RHS and recreate generator in a loop of a single script (first iteration is OK, at the second iteration something strange happens with paths and compilation fails).
  
  Last edit: Evgenij Gr. 2017-06-23
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Rob Clewley - 2017-06-26
    
    Sorry for the difficulty with this. cleanupMemory was our early attempt to
    mitigate the problem but I agree it doesn't seem to have helped. You
    definitely can't recreate the generator object -- the DLL will remain the
    same under the hood.
    
    I simply meant that, in my experience, there are fewer segfaults if one
    splits the time domain into chunks and compute trajectories over those
    restricted domains consecutively. That way, the same blocks of memory will
    be overwritten with each new call to compute, rather than running out of
    mem with one giant allocation for the entire domain.
    
    You can avoid the segfaults altogether if you use a control script with
    os.popen to call separate python scripts with each consecutive IC from the
    last domain chunk. It's a kludge but it's also not hard to put together,
    and we never got to the bottom of where the leak originated from having put
    some hours in with valgrind back in the day.
    
    On Thu, Jun 22, 2017 at 8:40 PM, Evgenij Gr. evgenijgr@users.sf.net wrote:
    
    Sorry, what way of restarting integrator did you mean exactly? I've
    encountered the same problem with memory leak in Dopri/Radau and that way
    could be extremely useful to me. Somehow Generator's cleanupMemory
    method doesn't work at all (from the description I got the idea that it
    might solve problem, but no success) and I can't recompile RHS and recreate
    generator in a loop of a single script (first iteration is OK, at the
    second iteration paths are screwed and compilation fails).
    
    many trajectories eventually segfault
    https://sourceforge.net/p/pydstool/discussion/472291/thread/b501c79d/?limit=50#9bc1/0ee8
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/pydstool/discussion/472291/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    alternate
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Maurizio De Pitta' - 2017-06-26
      
      Hi Rob,
      just in case, I recoded meanwhile the integrators in C++ both Radau and Dopri from the original source by Hairer. They do not use the PyDSTool Interface and are standalone integrators to use for your own model, but yet they run smoothly without segfault. So, back in the day, I believed that the segfault is somehow subtle and must be somehow generated by a memory leak or a some wrong array assignment in the python interface or maybe in the fortran code itself.
      
      If you deem it helpful, I can share my code. Just ping me in pvt.
      
      Cheers,
      M
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Evgenij Gr. - 2017-06-29
      
      Thank you for your answer! Yeah, calling separate script is a workaround that I currently have. I'll check your suggestion for splitting domain. I don't know whether it will work because mostly I integrate until trajectory returns to cross-section (I use terminal event for that). Maybe the return time is not as tame as I've expected and your suggestion can fix this.
      
      Can I ask about recreating generator here? What I've meant is that I delete folders with generated source code and library, recompile them from scratch and recreate generator after that. This was my first workaround, I was going to do this at each iteration. Could this worked (theoretically, at least) or it's a dead end?
      
      Update: I've checked the return time to cross-section and it's pretty consistent, about 40 units of PyDSTool time. So what I am usually doing is computing a lot of (>10000) these short trajectories that are ended up by some terminal event.
      
      Last edit: Evgenij Gr. 2017-06-29
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Rob Clewley - 2017-06-29
        
        Well, you can't unload and reload the same named module (i.e. the DLL
        created from the C code). So, you have to at least a whole new python
        process to be able to make that visible after delete/recreate. At the
        python level, merely recreating the generator is not sufficient to do a
        deep-level restart of the DLL and its memory allocation.
        
        You can time-domain split based on state-dependent events just as easily as
        by literal time. Just spawn subprocesses that run the next part until
        either a time limit is hit or the next event is hit, and so on.
        
        On Thu, Jun 29, 2017 at 7:16 AM, Evgenij Gr. evgenijgr@users.sf.net wrote:
        
        Thank you for your answer! Yeah, calling separate script is a workaround
        that I currently have. I'll check your suggestion for splitting domain. I
        don't know whether it will work because mostly I integrate until trajectory
        returns to cross-section (I use terminal event for that). Maybe the return
        time is not as tame as I've expected and your suggestion can fix this.
        
        Can I ask about recreating generator here? What I've meant is that I
        delete folders with generated source code and library, recompile them from
        scratch and recreate generator after that. This was my first workaround, I
        was going to do this at each iteration. Could this worked (theoretically,
        at least) or it's a dead end?
        
        many trajectories eventually segfault
        https://sourceforge.net/p/pydstool/discussion/472291/thread/b501c79d/?limit=50#9bc1/0ee8/1eee/1b14
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/pydstool/discussion/472291/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        alternate
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Evgenij Gr. - 2017-07-02
        
        Thanks fot the reply! I had some hopes for that way of handling the problem but okay, so much for that idea.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Maurizio De Pitta' - 2016-11-28

Hi folks,
I am also currently using Dopri and Radau within the framework of a minimization problem, which requires computation of potentially a very large number of trajectories. Unfortunately the problem got to a halting point insofar as after say 10-trajectories, memory resources saturate and error 137 (SIGKILL) is issued. So, it does not confirm this post, but I am afraid it adds support to the possibility that, somewhere, there is a memory leak in the integrators...

M

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

many trajectories eventually segfault

Forums

Help

many trajectories eventually segfault document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

many trajectories eventually segfault