We are experiencing performance issue with the parser, please could you help?
We are running version v1_6_20050925
Currently, the biggest page's weight we are parsing is 600KB.
We noticed strong performance problem and out of memory errors.
By analysing the JVM GC logs, and the heap dump files, we noticed there are a lot of "NodeList" objects. Up to 6MB of memory is requested by the application to allocate each of them.
We are using the parse in servlets. We extended the Parser class to add support for custom tags, and we are connecting an URLConnection to load the source HTML. We are using the parser by instantiating it once per page, and we mainly use the getAttribute() and setAttribute() methods.
Thank you.
Herve
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
NodeList objects are the lists of child nodes within each CompositTag. It's unlikely that each of them takes 6MB, as they only contain references to the parsed nodes within a tag. Maybe the top level NodeList object (usually the HTML node) references 6MB worth of Java objects indirectly through other NodeList objects. That is the nature of the parse, being a nested representation of the linear HTML stream.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
We are experiencing performance issue with the parser, please could you help?
We are running version v1_6_20050925
Currently, the biggest page's weight we are parsing is 600KB.
We noticed strong performance problem and out of memory errors.
By analysing the JVM GC logs, and the heap dump files, we noticed there are a lot of "NodeList" objects. Up to 6MB of memory is requested by the application to allocate each of them.
We are using the parse in servlets. We extended the Parser class to add support for custom tags, and we are connecting an URLConnection to load the source HTML. We are using the parser by instantiating it once per page, and we mainly use the getAttribute() and setAttribute() methods.
Thank you.
Herve
NodeList objects are the lists of child nodes within each CompositTag. It's unlikely that each of them takes 6MB, as they only contain references to the parsed nodes within a tag. Maybe the top level NodeList object (usually the HTML node) references 6MB worth of Java objects indirectly through other NodeList objects. That is the nature of the parse, being a nested representation of the linear HTML stream.