Menu

Re-using Parser object

Help
alan113696
2005-03-14
2013-04-27
  • alan113696

    alan113696 - 2005-03-14

    I need to parse several HTML pages in a sequential manner. The straightforward way to do this is to create a new Parser object per page. Will this be expensive in terms of memory consumption and performance? Another way is to use the setURL() method from a Parser object. Will this have much better performance? Are there cleaner ways to accomplish the task of parsing many HTML pages?

     
    • Derrick Oswald

      Derrick Oswald - 2005-03-15

      The overhead of making a new parser as opposed to setURL() is the PrototypicalNodeFactory construction with all it's prototypes -- not huge but probably noticable over hundereds or thousands of pages.

      The setURL() method is about the only reusability that's been thought into it.  By creating your own Source class and using Parser.reset() you may be able to bypass some other allocations, but it's probably not worth it.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.