I've been writing a program to genetically evolve robocode bots, this requires conducting many series of battles over a long period of time. During this I seem to have discovered an issue with multiple instances of RobocodeEngine not being properly independent.
My first tip-off was that every time I created a new instance and added a listener to it, I'd get more and more results; each new instance somehow got created with all the previous listeners attached. I worked around this by removing each listener immediately after finishing with it, even though the garbage collector should have taken care of destroying the whole RobocodeEngine instance.
Now I'm running into problems with the performance deteriorating; each set of battles appears to take longer than the last in a roughly linear fashion. This seems like it is probably linked to the above issue, with something hanging around from each instance and bogging things down.
Is this a bug, or have I missed something? As far as I'm aware these instances should be entirely separate and unable to influence each other, which is clearly not the case.
Thanks,
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This sounds like bug. Intention would be to have them independent.
(Another) problem may be the Java security (untrusted code execution) sandbox, just guess.
Another thing to consider for parallel processing is skipping turns, because engine is not designed to measure robot-turn time in parallel.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Over the years there have been lots of such issues with the RobocodeEngine and in Robocode as general. And lots of memory leaks have been removed. Perhaps new ones are emerging, or it could be some other problem. I will definitely have a closer look into this issue soon.
@Mike: Do you have an easy way to reproduce this behavior? Perhaps some code / files I could get to reproduce this issue when debugging. It will make it easier for me to isolate the problem - and fix it. :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The code is all up on github here: https://github.com/MikeWorth/RoboNucleicAcid
I've run it for a few populations and as the graph shows, the time to run seems to have a strong linear trend. The 2 lines are for an 'a' population and a 'b' population; a was run for 1000 generations, then b for 1000 then back to a etc. When unfolded into order in which they were run it joins up into one roughly straight line. The 2 different populations would suggest that it isn't caused by the ongoing evolution in my code (although that isn't entirely out of the question)
This particular issue can be seen in RobotLeague, line 54 has the removeBattleListener that shouldn't be needed.
Parallel processing is not currently part of my system - possibly I'll write it eventually, but at the moment the old RobocodeEngine can be unallocated before the new one is created. I tried triggering this by setting it to null without success; not a great solution anyway.
Nice work, and thank you for sharing your code on GitHub. This will be a great way to reproduce the issue, and I am sure there is other people out there that will benefit from your work, and/or even give some good suggestions for how it can be improved etc. :-)
I will have a look at this ASAP. Stay tuned. It might take a while to figure out what is going on.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have now tracked down the memory leak with RobocodeEngine. I made a new Alpha version of Robocode, where the leak has been removed and where you don't need to call removeBattleListener() anymore. The problem was dangling listeners, that were not automatically removed when finalizing RobocodeEngine instances. Now these are always removed. :-)
Thanks for taking the time to look at this; what you've done seems to have fixed the issue with listeners hanging around improperly, however the speed still declines in a roughly linear way.
From where I'm sitting it looks like there is something static that must be accumulating within the RobocodeEngine class, but I'm not sure what. One crude way around this would be to destroy and re-initialise the whole class; is this possible in java at runtime?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for testing the Alpha 1 version. At least we solved a fraction of the problem by removing a memory leak. :-)
I have not seen the CPU speed decreasing yet on any of my machines and system, so I need to reproduce this scenario first. Perhaps the problem only occurs with a specific system or machine architecture. I have tested this under Windows 8 64-bit with a Intel Core i7. Perhaps this system has "too many" cores to see this problem easily.
I will try to did deeper into the speed problem. I have no clue what is causing it, so it will require much better CPU monitoring. A better alternative to re-initializing a class could be done using reflection to sweep out static fields etc. Robocode already does such "magic" with robots "forgetting" to clean static variables etc. But I guess it could be a single object somewhere that accumulates over time, like you propose. :-) If so, we only need to clean this single object/field only.
Last edit: Flemming N. Larsen 2013-02-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, I am monitoring the performance of rna.RobotBreeder right now in jvisualvm, and the memory usage, thread usage and CPU usage seems extremely stable to me - even after several hours.
Can you tell me more about your system (OS, CPU type). E.g. I am running on Windows 8 Pro x64 with an Intel Core i7.
Perhaps the problem you see is related to the system Robocode is running on. Are you using the Oracle/Sun JDK or OpenJDK? If you are using the OpenJDK, I should like you to test it on the Oracle JDK (version 7), as this is much more stable than the OpenJDK.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This version contains two independent fixes, and I have done lots of testing for battles running for a really looooong time.
If you still have problems, then please provide me with more information about your system.
If everything works perfect, then don't hesitate writing back so I can close the issue and mark it as solved. :-)
Thanks in advance!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've been writing a program to genetically evolve robocode bots, this requires conducting many series of battles over a long period of time. During this I seem to have discovered an issue with multiple instances of RobocodeEngine not being properly independent.
My first tip-off was that every time I created a new instance and added a listener to it, I'd get more and more results; each new instance somehow got created with all the previous listeners attached. I worked around this by removing each listener immediately after finishing with it, even though the garbage collector should have taken care of destroying the whole RobocodeEngine instance.
Now I'm running into problems with the performance deteriorating; each set of battles appears to take longer than the last in a roughly linear fashion. This seems like it is probably linked to the above issue, with something hanging around from each instance and bogging things down.
Is this a bug, or have I missed something? As far as I'm aware these instances should be entirely separate and unable to influence each other, which is clearly not the case.
Thanks,
Mike
This sounds like bug. Intention would be to have them independent.
(Another) problem may be the Java security (untrusted code execution) sandbox, just guess.
Another thing to consider for parallel processing is skipping turns, because engine is not designed to measure robot-turn time in parallel.
Over the years there have been lots of such issues with the RobocodeEngine and in Robocode as general. And lots of memory leaks have been removed. Perhaps new ones are emerging, or it could be some other problem. I will definitely have a closer look into this issue soon.
@Mike: Do you have an easy way to reproduce this behavior? Perhaps some code / files I could get to reproduce this issue when debugging. It will make it easier for me to isolate the problem - and fix it. :-)
The code is all up on github here: https://github.com/MikeWorth/RoboNucleicAcid
I've run it for a few populations and as the graph shows, the time to run seems to have a strong linear trend. The 2 lines are for an 'a' population and a 'b' population; a was run for 1000 generations, then b for 1000 then back to a etc. When unfolded into order in which they were run it joins up into one roughly straight line. The 2 different populations would suggest that it isn't caused by the ongoing evolution in my code (although that isn't entirely out of the question)
This particular issue can be seen in RobotLeague, line 54 has the removeBattleListener that shouldn't be needed.
Parallel processing is not currently part of my system - possibly I'll write it eventually, but at the moment the old RobocodeEngine can be unallocated before the new one is created. I tried triggering this by setting it to null without success; not a great solution anyway.
Thanks,
Mike
Nice work, and thank you for sharing your code on GitHub. This will be a great way to reproduce the issue, and I am sure there is other people out there that will benefit from your work, and/or even give some good suggestions for how it can be improved etc. :-)
I will have a look at this ASAP. Stay tuned. It might take a while to figure out what is going on.
Hi Mike,
I have now tracked down the memory leak with RobocodeEngine. I made a new Alpha version of Robocode, where the leak has been removed and where you don't need to call removeBattleListener() anymore. The problem was dangling listeners, that were not automatically removed when finalizing RobocodeEngine instances. Now these are always removed. :-)
You can try out the 1.8.0.1 Alpha 1 here:
http://robocode.sourceforge.net/files/robocode-1.8.0.1-Alpha-1-setup.jar
Please provide me with feedback if this version fixes the problem you have seen. Thanks in advance! :-)
Thanks for taking the time to look at this; what you've done seems to have fixed the issue with listeners hanging around improperly, however the speed still declines in a roughly linear way.
From where I'm sitting it looks like there is something static that must be accumulating within the RobocodeEngine class, but I'm not sure what. One crude way around this would be to destroy and re-initialise the whole class; is this possible in java at runtime?
Thanks for testing the Alpha 1 version. At least we solved a fraction of the problem by removing a memory leak. :-)
I have not seen the CPU speed decreasing yet on any of my machines and system, so I need to reproduce this scenario first. Perhaps the problem only occurs with a specific system or machine architecture. I have tested this under Windows 8 64-bit with a Intel Core i7. Perhaps this system has "too many" cores to see this problem easily.
I will try to did deeper into the speed problem. I have no clue what is causing it, so it will require much better CPU monitoring. A better alternative to re-initializing a class could be done using reflection to sweep out static fields etc. Robocode already does such "magic" with robots "forgetting" to clean static variables etc. But I guess it could be a single object somewhere that accumulates over time, like you propose. :-) If so, we only need to clean this single object/field only.
Last edit: Flemming N. Larsen 2013-02-20
Hi Mike,
I have now created a bug report for it here https://sourceforge.net/p/robocode/bugs/349/, as this issue is a bug.
I'll keep an eye on that, and post any further progress I make there
Thanks a lot
Actually, I am monitoring the performance of rna.RobotBreeder right now in jvisualvm, and the memory usage, thread usage and CPU usage seems extremely stable to me - even after several hours.
Can you tell me more about your system (OS, CPU type). E.g. I am running on Windows 8 Pro x64 with an Intel Core i7.
Perhaps the problem you see is related to the system Robocode is running on. Are you using the Oracle/Sun JDK or OpenJDK? If you are using the OpenJDK, I should like you to test it on the Oracle JDK (version 7), as this is much more stable than the OpenJDK.
@MikeWorth: I should like you to test my newest 1.8.1.0 Alpha 4 version here:
http://robocode.sourceforge.net/files/robocode-1.8.1.0-Alpha-4-setup.jar
This version contains two independent fixes, and I have done lots of testing for battles running for a really looooong time.
If you still have problems, then please provide me with more information about your system.
If everything works perfect, then don't hesitate writing back so I can close the issue and mark it as solved. :-)
Thanks in advance!