Menu

#231 Lockup on start if too many bots in robots dir (cont'd)

1.7.2
closed
RoboRumble (47)
5
2012-09-15
2009-09-26
Julian Kent
No

The previous bug was closed so I was unable to comment. This has not been resolved for me in 1.7.4. Perhaps this stack trace will help:

file:/home/jk/Desktop/robocode1.7.4/robots/darkcanuck.Gaff_1.32.jar is probably corrupted (java.io.FileNotFoundException /home/jk/Desktop/robocode1.7.4/robots/darkcanuck.Gaff_1.32.jar (Too many open files))
java.lang.NullPointerException
at net.sf.robocode.repository.root.ClassPathRoot.visitDirectory(ClassPathRoot.java:82)
at net.sf.robocode.repository.root.ClassPathRoot.update(ClassPathRoot.java:51)
at net.sf.robocode.repository.root.handlers.ClassPathHandler.visitDirectory(ClassPathHandler.java:36)
at net.sf.robocode.repository.root.handlers.RootHandler.visitDirectories(RootHandler.java:35)
at net.sf.robocode.repository.Database.update(Database.java:46)
at net.sf.robocode.repository.RepositoryManager.refresh(RepositoryManager.java:97)
at net.sf.robocode.repository.RepositoryManager.reload(RepositoryManager.java:121)
at net.sf.robocode.ui.WindowManager.showSplashScreen(WindowManager.java:358)
at net.sf.robocode.core.RobocodeMain.run(RobocodeMain.java:135)
at java.lang.Thread.run(Thread.java:619)

(At this point robocode freezes with the init screen still showing)

This situation may be unique to me as I have over 1300 unique jar files in my robots directory. Sorry for not responding to the last bug, I forgot to subscribe to the bug for email updates.

Discussion

1 2 > >> (Page 1 of 2)
  • Nat Pavasant

    Nat Pavasant - 2009-09-26

    You can always contact me, Flemming or Pavel to re-open it for you.

    I have no problem with 1191 jar files in my test, which already covers more than enough for RoboRumble. But we should investigate this further.

     
  • Flemming N. Larsen

    It is okay to open a new one. Some of the problem might have been solved with the old defect, but at least not all of it.

    Minifly, I should like you to attach the participant list(s) you have with all the robots. This way we could (hopefully) download all the robots and reproduce the problem.

    Thanks in advance!

     
  • Flemming N. Larsen

    Hi Minifly and Nat,

    I should like both of you to try out a fix I made on Robocode with closing JarFile file handles. I should like to know if it breaks anything and/or has a performance impact or not.

    You can download the patched version of 1.7.1.4 with the fix from here:

    http://robocode.sf.net/files/robocode-1.7.1.4_fix1-setup.jar

     
  • Nat Pavasant

    Nat Pavasant - 2009-09-28

    I don't have problem with fix1 version, also don't have on 1.7.1.4 too. But the fix come with more noticeable slowness.

     
  • Julian Kent

    Julian Kent - 2009-09-28

    Fnl, this hasn't fixed the problem. Perhaps it would be possible to build a 'debug' version which provides a better stack trace, instead of just printing that the file is "probably corrupted"?

    Unfortunately I cannot provide a participants list for all these bots, as many are old versions of bots, which have been retired. I did not notice any performance difference between this and the regular 1.7.1.4

     
  • Flemming N. Larsen

    Hi Minifly,

    The stack trace is already there, and IS telling the problem. Robocode concludes that a jar file is corrupted due to the FileNotFoundException with "Too many
    open file". The problem is that your JavaVM has run out of file descriptors, which is typically 1024 under Unix/Linux, and is used by physical files and sockets.

    This is not a typical problem under Windows, so therefore I guess you are running under Linux. If this is true, then you should change the ulimit to e.g. 10240 (number of inodes) like this:

    ulimit -n 65535

    More info here: http://www.faqs.org/docs/securing/x4733.html

    There is no easy way to work around this problem inside Robocode, but I will try to do it anyways.

    Nat: How much slower is the fix1 compared to the version without? Since the fix does not work for Minifly, I probably won't commit the fix to SVN.

     
  • Flemming N. Larsen

    Sorry, if the ulimit should be set to 10240 (10 KiB), then the command is like this:

    ulimit -n 10240

     
  • Nat Pavasant

    Nat Pavasant - 2009-09-30

    The total time for rebuild robot database (when I delete robot.database myself) is doubled.

    Not to be rude, but in old bug report he already said that he use OpenJDK6 under Linux. (Ubuntu IIRC) AND the above stack trace show path of '/home/jk/Desktop/robocode1.7.4/robots/darkcanuck.Gaff_1.32.jar'

     
  • Julian Kent

    Julian Kent - 2009-10-01

    Unfortunately ulimit isn't something that can be changed at runtime, only at startup, because there are always files that are already open.
    Also, it seems to me that for Robocode to be holding onto file descriptors even when it is not using them is a design flaw. Surely it should load the data and release the OS resources before attempting to load more data? Just because Windows allows us to open more file sockets doesn't mean it is good programming practice to do so. I can't think of any practical need to open more than 1024 sockets simultaneously from a single process.

    A custom file descriptor which is constructed with the data that may need to be retrieved from the Java file descriptor can be made, and these can be kept in memory rather than the Java ones, so that system resources are freed. To me this seems to be the simplest solution, which would require the least restructuring.

     
  • Flemming N. Larsen

    In short, you are right Minifly. Changing the ulimit (e.g. ulimit -S -n 10240) should be seen as a work-around for now.

    Robocode makes use of the URLConnection cache, which requires the file & socket handles to be open in order to provide fast access to the robots. This is one of the reasons why we are able to rebuild the robot database much faster now than in older versions of Robocode.

    It will require a new design in order to avoid this issue. I am thinking of creating a fix (perhaps only temporary) so that e.g. only up to 500 file handles will be put in the cache.

     
  • Flemming N. Larsen

    Hi Minifly,

    I have made a new version of Robocode, where the cache is disabled for UNIX based systems including Linux and Mac OS X. With this version I made a temporary fix, but the real version coming later will require a redesign of the URL/file caching.

    You can download the new version (Robocode 1.7.1.5 Alpha-1) from here:

    http://robocode.sf.net/files/robocode-1.7.1.5-Alpha-1-setup.jar

    I should like to know how big impact this has on the loading performance, if it has an impact at all?

     
  • Julian Kent

    Julian Kent - 2009-10-11

    Fnl, this still doesn't fix the problem. It still crashes after the 1024 bots are processed. Additionally, it loads MUCH slower than the previous version. Could you perhaps give me a link to the code where the error is occuring, perhaps I could provide additional insight?

     
  • Flemming N. Larsen

    Hi Minifly and Nat,

    I finally managed to fix this annoying bug. I got a lot of insights with this article:

    "Pitfalls of executing Java code from remote JAR files"
    http://www.szegedi.org/articles/remotejars.html

    Now the cached JarFiles are closed after loading the robot into the robot.database, and also temporary jar_cache entries are closed, meaning that the amount of open file sockets/handles is kept to a bare minimum.

    The good this with my fix is, that building up the robot.database is very near to the fast performance we are used to. :-)

    I should like you to test the newest 1.7.1.6 Alpha containing the fix, which you can download from here:

    http://robocode.sf.net/files/robocode-1.7.1.6-Alpha-setup.jar

    I have not commit the sources yet. I will do so if the fix works.

     
  • Julian Kent

    Julian Kent - 2009-12-08

    Hi fnl,
    Unfortunately I'm not at my home computer right now, and won't have access to it until late January. However, from what you describe it is now fully fixed. If anybody else runs linux and would test this vs. earlier 1.7.1.x (I believe Rednaxela runs linux) then that would be great. Otherwise I can only test this in late January.

    Thanks so much for your continued work on robocode =)

     
  • Flemming N. Larsen

    Okay. I will test this on Ubuntu myself in the near future. But I should still like you to test it late January, when you get access again. :-)

     
  • Flemming N. Larsen

    Please verify that this has been fixed properly. :-)

     
  • Flemming N. Larsen

    Today, I have tested the fix on Ubuntu 9.10 (64-bit), and it works like a sharm with 1200 .jar files. Hence, I consider this bug to be solved.

     
  • Nat Pavasant

    Nat Pavasant - 2009-12-11

    So far I notice no performance lost in the release. Actually I think is is a little faster, but that may be that my hard disk load now is probably less then when I test last time.

     
  • Julian Kent

    Julian Kent - 2010-01-09

    Hi Fnl I have returned sooner than expected, and just tested the new 1.7.1.6. It works perfectly. Thank you for resolving this =)

     
  • Flemming N. Larsen

    Thank you for verifying this fix. It was definitely one of the hard bugs to fix, so I am really happy that it does not exist anymore!

    I will close this bug now. :-)

     
  • Julian Kent

    Julian Kent - 2010-05-29

    Hi Fnl
    This was fixed for 1.7.2beta, but in the final release it is broken again :-(

     
  • Flemming N. Larsen

    Thanks for reporting this issue Minifly.

    I have been very careful not to break this, but for some reason one of the recent fixes of all the other bugs have broken this. I am sorry.

    I will fix this with the coming version 1.7.2.1 Beta.

     
  • Flemming N. Larsen

    I have now made more fixes to this robot, and tested this on both Windows and especially Ubuntu 10.04 LTS.

    It seems to work now.

    You can download 1.7.2.1 Alpha 1, which contains this fix:
    http://robocode.sf.net/files/robocode-1.7.2.1-Alpha-1-setup.jar

    Please let me now if it works for you now? :-)

     
  • Julian Kent

    Julian Kent - 2010-06-07

    That works perfectly =) Thank you Fnl

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.