[Htmlparser-user] how to improve link extraction speed
Brought to you by:
derrickoswald
From: Wen <log...@gm...> - 2006-03-21 04:27:16
|
Hi, I'm using HTMLParser to parse a link that contains specific file type. ex. pdf files. It works fine but takes around 20 seconds to parse 20 websites. I noticed except NodeFilter, LinkExtractor or LinkRegexFilter may be able t= o achieve the same goal. Is there other ways to make the extraction process faster than the way I'm using now? Here is my code: for( 20 documents){ parser =3D new Parser(url); NodeFilter filter =3D new NodeClassFilter (LinkTag.class); NodeList links =3D new NodeList (); for (NodeIterator e =3D parser.elements (); e.hasMoreNodes (); = ) e.nextNode ().collectInto (links, filter); for (int in =3D 0; in < links.size (); in++) { LinkTag linkTag =3D (LinkTag)links.elementAt (in); if(linkTag.getLink().endsWith(".PDF")){ doSomething; } } Thanks in advanced. |