[Htmlparser-user] how to improve link extraction speed
Brought to you by:
derrickoswald
|
From: Wen <log...@gm...> - 2006-03-21 04:27:16
|
Hi,
I'm using HTMLParser to parse a link that contains specific file type. ex.
pdf files.
It works fine but takes around 20 seconds to parse 20 websites.
I noticed except NodeFilter, LinkExtractor or LinkRegexFilter may be able t=
o
achieve the same goal.
Is there other ways to make the extraction process faster than the way I'm
using now?
Here is my code:
for( 20 documents){
parser =3D new Parser(url);
NodeFilter filter =3D new NodeClassFilter (LinkTag.class);
NodeList links =3D new NodeList ();
for (NodeIterator e =3D parser.elements (); e.hasMoreNodes (); =
)
e.nextNode ().collectInto (links, filter);
for (int in =3D 0; in < links.size (); in++)
{
LinkTag linkTag =3D (LinkTag)links.elementAt (in);
if(linkTag.getLink().endsWith(".PDF")){
doSomething;
}
}
Thanks in advanced.
|