Menu

#295 Parser crashes after parser.reset()

open
nobody
5
2013-03-05
2013-03-05
Anonymous
No

There are 2 URLs which prevent the parser from processing the DOM after parser.reset():

http://www.petco.com/ http://www.petco.com/petco_Page_PC_greeniescoupon0213.aspx

The code is now:

urlRoot = createURLRoot(url);
parser = new Parser(url);
scrapeImages();
try { // Some web sites do not permit the parser to reset.
parser.reset();
scrapeMeta();
} catch (Exception e) {
parser = new Parser(url); // Re-read the URL and parse for meta data
scrapeMeta();
}

I used to just scrapeMeta after the parser.reset, but it failed on
NodeList list = (parser.parse(new TagNameFilter("meta")));

I know that I SHOULD write a recursive program to go all the way down the DOM and do this in one pass. I saw your sample code to do that once, but I didn't book mark it and can't find it again.

This is a very nice program.

Thanks.

Discussion


Log in to post a comment.