[Htmlparser-user] Problem when performing parse in a loop
Brought to you by:
derrickoswald
|
From: Ben S. <li...@bs...> - 2004-10-23 14:59:56
|
Hi,
I've come across a bit of a problem with the HTML parser, and couldn't find
the answer on the document pages. Was hoping someone could help me.
I'm parsing a number of pages, and extracting the links from them. These
links go into a list of links to parse (as long as they haven't been visited
already).
Just to test the idea, I came up with the following code:
while (i < 50)
{
Parser parser = new Parser ("http://www.<siteinhere>.com");
MyCustomizedVisitor visitor = new MyCustomizedVisitor ();
parser.visitAllNodesWith (visitor);
i++;
}
The code works fine for six iterations, but then gives the following error:
org.htmlparser.util.ParserException: Unexpected Exception occurred while
reading http://www.<site>.com, in nextNode;
java.lang.NullPointerException
at
org.htmlparser.PrototypicalNodeFactory.createTagNode(PrototypicalNodeFactory
.java:445)
at org.htmlparser.lexer.Lexer.makeTag(Lexer.java:776)
at org.htmlparser.lexer.Lexer.parseTag(Lexer.java:752)
at org.htmlparser.lexer.Lexer.nextNode(Lexer.java:278)
at
org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:11
1)
at org.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.java:92)
at org.htmlparser.Parser.visitAllNodesWith(Parser.java:751)
at LinkDemo3.main(LinkDemo3.java:56)
Exception in thread "main"
If I use a 'small' page, such as google's home page, it will fail on the
16th iteration. This suggests to me that something isn't being cleaned up,
but I thought that garbage collection would sort all this out.
I've tried adding "parser = null;" and "parser.reset();" but these don't
seem to help. Also, I tried using the
parser.extractAllNodesThatAre(LinkTag.class); method, but this does the same
:(
Any help gratefully received,
Ben Smith
li...@bs...
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.782 / Virus Database: 528 - Release Date: 22/10/2004
|