Menu

Parser Performance issue

Help
2006-06-30
2013-04-27
  • Herve Menage

    Herve Menage - 2006-06-30

    Hello,

    We are experiencing performance issue with the parser, please could you help?

    We are running version v1_6_20050925

    Currently, the biggest page's weight we are parsing is 600KB.

    We noticed strong performance problem and out of memory errors.

    By analysing the JVM GC logs, and the heap dump files, we noticed there are a lot of "NodeList" objects. Up to 6MB of memory is requested by the application to allocate each of them.

    We are using the parse in servlets. We extended the Parser class to add support for custom tags, and we are connecting an URLConnection to load the source HTML. We are using the parser by instantiating it once per page, and we mainly use the getAttribute() and setAttribute() methods.

    Thank you.
    Herve

     
    • Derrick Oswald

      Derrick Oswald - 2006-07-01

      NodeList objects are the lists of child nodes within each CompositTag. It's unlikely that each of them takes 6MB, as they only contain references to the parsed nodes within a tag. Maybe the top level NodeList object (usually the HTML node) references 6MB worth of Java objects indirectly through other NodeList objects. That is the nature of the parse, being a nested representation of the linear HTML stream.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.