I have been working on integrating PyDSTool with the inspyred
genetic algorithm package, to allow me to use machine learning
algorithms to tune dynamical neural networks to solve particular
tasks. The Radau and Dopri integrators seems to work most of the
time, but after generating many trajectories (an inconsistent,
unpredictable number, sometimes ~1000, sometimes ~100000), the program
just dies with a segfault and no information about the origin of the
segfault.
I am using 64 bit ubuntu, and have modified the -m32 flags to be
-m64 in Generators/Dopri_ODEsystem.py, Generators/Radau_ODEsystem.py
and PyCont/ContClass.py. (Simply removing the flags didn't seem to
work…but because of the unpredictable nature of the segfaults, I'm
not sure if there is a difference between removing the -m32 flags and
replacing them with -m64 flags).
When I run the run_all_tests.py script, the final output says:
Test scripts that failed:
test_variable_traj.py
poly_interp_test.py
PyCont_Hopfield.py
interp_dopri_test.py
PyCont_MorrisLecar_TypeI.py
PyCont_MorrisLecar_TypeII.py
PyCont_HindmarshRose.py
Summary:
Basic PyDSTool functions: appears to be broken on your system
Map related modules: appears to work on your system
VODE related modules: appears to be broken on your system
Symbolic differentiation module: appears to work on your system
Parameter estimation module: appears to work on your system
PyCont: appears to be broken on your system
Dopri ODE systems: appears to be broken on your system
Radau ODE systems: appears to work on your system
Parameter estimation module with external compilers: appears to work on your system
PyCont interface to AUTO: appears to be broken on your system
But, when I rerun some of these tests, there is no problem. Specifically:
The following tests fail (not segfaulting, but due to not finding some
libraries). I think I saw that these are known bugs?? So I have not
included the details of the output of these failures, but if it is
helpful, I am happy to do so (just let me know).
They all seem to do so at the same point, close to the end of the
test. See the output at http://www.rhthm.com/pydstool, where I have
indicated where the test fails.
Running my program with ipython or python successfully generates
thousands (if not 10s or 100s of thousands) of trajectories before
segfaulting or reporting
I have confirmed that it is not the evolutionary optimisation of
parameters that is causing the segfault, by replacing the system of
equations with a set of very simple equations that have no parameters
that are influenced by evolution.
My questions are:
1. What is likely to be causing the segfault? Is there a way to
configure pydstool / the compilation of the C integrators to provide
me with more information?
2. Does anything stand out to anyone as a possible cause for the
different behaviour between python and ipython. Are there any library
path variables that I should check?
3. Is there anything that needs to be done when using the same
generator to create many many trajectories? I have tried recreating
the generator between the production of every generator, but that does
not help. When I run top during execution, I have not noticed any
memory leaks.
Basically, I have tried everything that I can think of to debug the
error. Source code in a .tz is available at the following url.
You will need to install the package inspyred avaiable through pip.
Sorry that this has been a problem. We have been aware of the slow memory leak causing problems for long integrations for some time but have been unable to trace its origin (it's something either in the SWIG interfacing that is opaque to us or otherwise inherent in the C behind the array libraries). We *might* have inadvertently fixed it in the minor update that is posted here, but we haven't had a way to test this for the kind of case you have:
Please try it with this version and let us know here whether it has solved the problem. The other issue is that 64 bit systems have been troublesome - sometimes they've worked fine and others not. We also have not yet been able to trace that but someone is looking into it.
I don't know of any simple fix except to break up your long runs into multiple pieces. You can also split up many short runs (which can also cause it) by starting different threads each running PyDSTool for different groups of initial conditions. That, at least, avoids the problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your response. Unfortunately the lastest version that you posted did not resolve the problem. I only did one test (more will follow) and it did seem to run for longer than before. But just as soon as I got my hopes up…
Hit me up if you want me to run another test of some kind to help you debug what's going on.
In terms of breaking the run into multiple pieces. Is there a way to do that from within python (i.e. without starting the python interpreter between the pieces)?
Thanks again.
All the best,
Matthew
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry that didn't work out. The only tests that could be more helpful would be if you had installed Python with the debug symbols included, and then ran this through the new gdb with the python bindings. I don't presently have such a setup ready to do such a test. If you happen to know enough about using gdb and would be willing to help us track down this elusive problem then it would be a great benefit to our project. You can email me privately if you'd like to get involved.
You can start new threads within python using the os library to create independent threads and run arbitrary system commands (e.g. to run a python script), or the subprocess library to call python and wait on a return code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2013-10-16
I wonder if this is related…
If I run the following code, the memory usage keeps on increasing indefinitely (I am loosing a ~2 MB/second. ).
Could there be a memory leak even before calls to any of the integrators?
I ran into this issue while trying to integrate a system feeding in (MANY >10**4) initial conditions in a for loop.
After about 1-2 hours of work my computer runs out of memory (~16 GB).
#!/usr/bin/env pythonimportPyDSToolasdstdefcode():""" Runs the simulation for a single day of growth. Returning all the dynamics. """icdict={'x':1}x_rhs='-0.1 * x'vardict={'x':x_rhs}DSargs=dst.args()# create an empty object instance of the args class, call it DSargsDSargs.name='SHM'# name our modelDSargs.ics=icdict# assign the icdict to the ics attributeDSargs.tdata=[0,20]# declare how long we expect to integrate forDSargs.varspecs=vardict# assign the vardict dictionary to the 'varspecs' attribute of DSargsDS=dst.Generator.Vode_ODEsystem(DSargs)delDSreturndefmain():foriinrange(10**8):code()if__name__=='__main__':main()
$ uname -a
Linux marbles 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Thanks for looking at this. I don't think it's related. The Vode generator is wrapping the scipy module and has nothing to do with our Dopri/Radau SWIG-interfaced code, that's for sure. As for the python side, for the initialization of the generator, most likely this commits memory to the pre-compiled Vode DLL that is loaded. Creating the DS object repeatedly will probably assign new memory to the DLL, and I believe that Python does not actually unload DLLs. So that, I suspect, is the source of that memory increase. There's nothing really I can do about that, but repeatedly creating the same Generator is obviously not recommended anyway.
The segfault with Dopri comes from repeatedly calling into the integrator, not recreating its Generator wrapper (i.e. not reloading the DLL). Although, you'll probably get the same memory increase if you repeatedly recreate a Dopri-based Generator in your loop too…
Let me know if my logic is off or you notice anything else.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't think there has been any progress on the full rewrite of the interfaces to Radau and Dopri, alas. I think the work has begun, though. As an interim measure, you could see if you can break a large trajectory calculation into smaller pieces and see if the memory still grows when the integrator is restarted many times instead. If it still does, a small script could manage a loop that starts and kills a second python process to compute each part of the trajectory.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Excuse me, what way of restarting integrator did you mean exactly? I've encountered the same problem with memory leak in Dopri/Radau and that way could be extremely useful to me. Somehow Generator's cleanupMemory method doesn't work at all (from the description I got the idea that it might solve problem, but no success) and I can't recompile RHS and recreate generator in a loop of a single script (first iteration is OK, at the second iteration something strange happens with paths and compilation fails).
Last edit: Evgenij Gr. 2017-06-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry for the difficulty with this. cleanupMemory was our early attempt to
mitigate the problem but I agree it doesn't seem to have helped. You
definitely can't recreate the generator object -- the DLL will remain the
same under the hood.
I simply meant that, in my experience, there are fewer segfaults if one
splits the time domain into chunks and compute trajectories over those
restricted domains consecutively. That way, the same blocks of memory will
be overwritten with each new call to compute, rather than running out of
mem with one giant allocation for the entire domain.
You can avoid the segfaults altogether if you use a control script with
os.popen to call separate python scripts with each consecutive IC from the
last domain chunk. It's a kludge but it's also not hard to put together,
and we never got to the bottom of where the leak originated from having put
some hours in with valgrind back in the day.
Sorry, what way of restarting integrator did you mean exactly? I've
encountered the same problem with memory leak in Dopri/Radau and that way
could be extremely useful to me. Somehow Generator's cleanupMemory
method doesn't work at all (from the description I got the idea that it
might solve problem, but no success) and I can't recompile RHS and recreate
generator in a loop of a single script (first iteration is OK, at the
second iteration paths are screwed and compilation fails).
Hi Rob,
just in case, I recoded meanwhile the integrators in C++ both Radau and Dopri from the original source by Hairer. They do not use the PyDSTool Interface and are standalone integrators to use for your own model, but yet they run smoothly without segfault. So, back in the day, I believed that the segfault is somehow subtle and must be somehow generated by a memory leak or a some wrong array assignment in the python interface or maybe in the fortran code itself.
If you deem it helpful, I can share my code. Just ping me in pvt.
Cheers,
M
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for your answer! Yeah, calling separate script is a workaround that I currently have. I'll check your suggestion for splitting domain. I don't know whether it will work because mostly I integrate until trajectory returns to cross-section (I use terminal event for that). Maybe the return time is not as tame as I've expected and your suggestion can fix this.
Can I ask about recreating generator here? What I've meant is that I delete folders with generated source code and library, recompile them from scratch and recreate generator after that. This was my first workaround, I was going to do this at each iteration. Could this worked (theoretically, at least) or it's a dead end?
Update: I've checked the return time to cross-section and it's pretty consistent, about 40 units of PyDSTool time. So what I am usually doing is computing a lot of (>10000) these short trajectories that are ended up by some terminal event.
Last edit: Evgenij Gr. 2017-06-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, you can't unload and reload the same named module (i.e. the DLL
created from the C code). So, you have to at least a whole new python
process to be able to make that visible after delete/recreate. At the
python level, merely recreating the generator is not sufficient to do a
deep-level restart of the DLL and its memory allocation.
You can time-domain split based on state-dependent events just as easily as
by literal time. Just spawn subprocesses that run the next part until
either a time limit is hit or the next event is hit, and so on.
Thank you for your answer! Yeah, calling separate script is a workaround
that I currently have. I'll check your suggestion for splitting domain. I
don't know whether it will work because mostly I integrate until trajectory
returns to cross-section (I use terminal event for that). Maybe the return
time is not as tame as I've expected and your suggestion can fix this.
Can I ask about recreating generator here? What I've meant is that I
delete folders with generated source code and library, recompile them from
scratch and recreate generator after that. This was my first workaround, I
was going to do this at each iteration. Could this worked (theoretically,
at least) or it's a dead end?
Hi folks,
I am also currently using Dopri and Radau within the framework of a minimization problem, which requires computation of potentially a very large number of trajectories. Unfortunately the problem got to a halting point insofar as after say 10-trajectories, memory resources saturate and error 137 (SIGKILL) is issued. So, it does not confirm this post, but I am afraid it adds support to the possibility that, somewhere, there is a memory leak in the integrators...
M
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have been working on integrating PyDSTool with the inspyred
genetic algorithm package, to allow me to use machine learning
algorithms to tune dynamical neural networks to solve particular
tasks. The Radau and Dopri integrators seems to work most of the
time, but after generating many trajectories (an inconsistent,
unpredictable number, sometimes ~1000, sometimes ~100000), the program
just dies with a segfault and no information about the origin of the
segfault.
I am using 64 bit ubuntu, and have modified the -m32 flags to be
-m64 in Generators/Dopri_ODEsystem.py, Generators/Radau_ODEsystem.py
and PyCont/ContClass.py. (Simply removing the flags didn't seem to
work…but because of the unpredictable nature of the segfaults, I'm
not sure if there is a difference between removing the -m32 flags and
replacing them with -m64 flags).
When I run the run_all_tests.py script, the final output says:
But, when I rerun some of these tests, there is no problem. Specifically:
The following tests fail (not segfaulting, but due to not finding some
libraries). I think I saw that these are known bugs?? So I have not
included the details of the output of these failures, but if it is
helpful, I am happy to do so (just let me know).
test_variable_traj.py
poly_interp_test.py
PyCont_Hopfield.py
But the following test seems to pass without problem, generates a nice
plot and exits without warnings or errors:
interp_dopri_test.py
The following tests all segfault when I run them with python, but not
when I run them with ipython!
PyCont_MorrisLecar_TypeI.py
PyCont_MorrisLecar_TypeII.py
PyCont_HindmarshRose.py
They all seem to do so at the same point, close to the end of the
test. See the output at http://www.rhthm.com/pydstool, where I have
indicated where the test fails.
Running my program with ipython or python successfully generates
thousands (if not 10s or 100s of thousands) of trajectories before
segfaulting or reporting
I have confirmed that it is not the evolutionary optimisation of
parameters that is causing the segfault, by replacing the system of
equations with a set of very simple equations that have no parameters
that are influenced by evolution.
My questions are:
1. What is likely to be causing the segfault? Is there a way to
configure pydstool / the compilation of the C integrators to provide
me with more information?
2. Does anything stand out to anyone as a possible cause for the
different behaviour between python and ipython. Are there any library
path variables that I should check?
3. Is there anything that needs to be done when using the same
generator to create many many trajectories? I have tried recreating
the generator between the production of every generator, but that does
not help. When I run top during execution, I have not noticed any
memory leaks.
Basically, I have tried everything that I can think of to debug the
error. Source code in a .tz is available at the following url.
You will need to install the package inspyred avaiable through pip.
http://www.rhthm.com/pydstool
Any help you can provide would be much appreciated! Any if you need
any further information from me, please don't hesitate to ask.
Cheers,
Matthew
I am using the latest version of PyDSTool
PyDSTool-0.88.121202.zip
I have also tried, with no improvement.
PyDSTool-0.88.120504.zip
In : PyDSTool.__version__
Out: '0.88'
In : numpy.__version__
Out: '1.7.1'
In : scipy.__version__
Out: '0.12.0'
Hi,
Sorry that this has been a problem. We have been aware of the slow memory leak causing problems for long integrations for some time but have been unable to trace its origin (it's something either in the SWIG interfacing that is opaque to us or otherwise inherent in the C behind the array libraries). We *might* have inadvertently fixed it in the minor update that is posted here, but we haven't had a way to test this for the kind of case you have:
https://dl.dropboxusercontent.com/u/2816627/PyDSTool-0.88.130726.zip
Please try it with this version and let us know here whether it has solved the problem. The other issue is that 64 bit systems have been troublesome - sometimes they've worked fine and others not. We also have not yet been able to trace that but someone is looking into it.
I don't know of any simple fix except to break up your long runs into multiple pieces. You can also split up many short runs (which can also cause it) by starting different threads each running PyDSTool for different groups of initial conditions. That, at least, avoids the problem.
Hello Rob,
Thanks for your response. Unfortunately the lastest version that you posted did not resolve the problem. I only did one test (more will follow) and it did seem to run for longer than before. But just as soon as I got my hopes up…
Fatal Python error: deallocating None
Aborted (core dumped)
Hit me up if you want me to run another test of some kind to help you debug what's going on.
In terms of breaking the run into multiple pieces. Is there a way to do that from within python (i.e. without starting the python interpreter between the pieces)?
Thanks again.
All the best,
Matthew
Sorry that didn't work out. The only tests that could be more helpful would be if you had installed Python with the debug symbols included, and then ran this through the new gdb with the python bindings. I don't presently have such a setup ready to do such a test. If you happen to know enough about using gdb and would be willing to help us track down this elusive problem then it would be a great benefit to our project. You can email me privately if you'd like to get involved.
You can start new threads within python using the os library to create independent threads and run arbitrary system commands (e.g. to run a python script), or the subprocess library to call python and wait on a return code.
I wonder if this is related…
If I run the following code, the memory usage keeps on increasing indefinitely (I am loosing a ~2 MB/second. ).
Could there be a memory leak even before calls to any of the integrators?
I ran into this issue while trying to integrate a system feeding in (MANY >10**4) initial conditions in a for loop.
After about 1-2 hours of work my computer runs out of memory (~16 GB).
$ uname -a
Linux marbles 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
In : import numpy
In : numpy.__version__
Out: '1.7.1'
In : import scipy
In : scipy.__version__
Out: '0.11.0'
This is true both for this link: https://dl.dropboxusercontent.com/u/2816627/PyDSTool-0.88.130726.zip
and the version available for download from sourceforge.
Thanks for looking at this. I don't think it's related. The Vode generator is wrapping the scipy module and has nothing to do with our Dopri/Radau SWIG-interfaced code, that's for sure. As for the python side, for the initialization of the generator, most likely this commits memory to the pre-compiled Vode DLL that is loaded. Creating the DS object repeatedly will probably assign new memory to the DLL, and I believe that Python does not actually unload DLLs. So that, I suspect, is the source of that memory increase. There's nothing really I can do about that, but repeatedly creating the same Generator is obviously not recommended anyway.
The segfault with Dopri comes from repeatedly calling into the integrator, not recreating its Generator wrapper (i.e. not reloading the DLL). Although, you'll probably get the same memory increase if you repeatedly recreate a Dopri-based Generator in your loop too…
Let me know if my logic is off or you notice anything else.
Thanks!
Hello,
I was wondering if you made any progress about this leak of memory. Indeed, when I re-create a Radau-based Generator in my loop, memory start growing.
Thank you !
Alexandre
I don't think there has been any progress on the full rewrite of the interfaces to Radau and Dopri, alas. I think the work has begun, though. As an interim measure, you could see if you can break a large trajectory calculation into smaller pieces and see if the memory still grows when the integrator is restarted many times instead. If it still does, a small script could manage a loop that starts and kills a second python process to compute each part of the trajectory.
Excuse me, what way of restarting integrator did you mean exactly? I've encountered the same problem with memory leak in Dopri/Radau and that way could be extremely useful to me. Somehow Generator's cleanupMemory method doesn't work at all (from the description I got the idea that it might solve problem, but no success) and I can't recompile RHS and recreate generator in a loop of a single script (first iteration is OK, at the second iteration something strange happens with paths and compilation fails).
Last edit: Evgenij Gr. 2017-06-23
Sorry for the difficulty with this. cleanupMemory was our early attempt to
mitigate the problem but I agree it doesn't seem to have helped. You
definitely can't recreate the generator object -- the DLL will remain the
same under the hood.
I simply meant that, in my experience, there are fewer segfaults if one
splits the time domain into chunks and compute trajectories over those
restricted domains consecutively. That way, the same blocks of memory will
be overwritten with each new call to compute, rather than running out of
mem with one giant allocation for the entire domain.
You can avoid the segfaults altogether if you use a control script with
os.popen to call separate python scripts with each consecutive IC from the
last domain chunk. It's a kludge but it's also not hard to put together,
and we never got to the bottom of where the leak originated from having put
some hours in with valgrind back in the day.
On Thu, Jun 22, 2017 at 8:40 PM, Evgenij Gr. evgenijgr@users.sf.net wrote:
Hi Rob,
just in case, I recoded meanwhile the integrators in C++ both Radau and Dopri from the original source by Hairer. They do not use the PyDSTool Interface and are standalone integrators to use for your own model, but yet they run smoothly without segfault. So, back in the day, I believed that the segfault is somehow subtle and must be somehow generated by a memory leak or a some wrong array assignment in the python interface or maybe in the fortran code itself.
If you deem it helpful, I can share my code. Just ping me in pvt.
Cheers,
M
Thank you for your answer! Yeah, calling separate script is a workaround that I currently have. I'll check your suggestion for splitting domain. I don't know whether it will work because mostly I integrate until trajectory returns to cross-section (I use terminal event for that). Maybe the return time is not as tame as I've expected and your suggestion can fix this.
Can I ask about recreating generator here? What I've meant is that I delete folders with generated source code and library, recompile them from scratch and recreate generator after that. This was my first workaround, I was going to do this at each iteration. Could this worked (theoretically, at least) or it's a dead end?
Update: I've checked the return time to cross-section and it's pretty consistent, about 40 units of PyDSTool time. So what I am usually doing is computing a lot of (>10000) these short trajectories that are ended up by some terminal event.
Last edit: Evgenij Gr. 2017-06-29
Well, you can't unload and reload the same named module (i.e. the DLL
created from the C code). So, you have to at least a whole new python
process to be able to make that visible after delete/recreate. At the
python level, merely recreating the generator is not sufficient to do a
deep-level restart of the DLL and its memory allocation.
You can time-domain split based on state-dependent events just as easily as
by literal time. Just spawn subprocesses that run the next part until
either a time limit is hit or the next event is hit, and so on.
On Thu, Jun 29, 2017 at 7:16 AM, Evgenij Gr. evgenijgr@users.sf.net wrote:
Thanks fot the reply! I had some hopes for that way of handling the problem but okay, so much for that idea.
Hi folks,
I am also currently using Dopri and Radau within the framework of a minimization problem, which requires computation of potentially a very large number of trajectories. Unfortunately the problem got to a halting point insofar as after say 10-trajectories, memory resources saturate and error 137 (SIGKILL) is issued. So, it does not confirm this post, but I am afraid it adds support to the possibility that, somewhere, there is a memory leak in the integrators...
M