[Htmlparser-user] how to improve link extraction speed

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I'm using HTMLParser to parse a link that contains specific file type. ex.
pdf files.
It works fine but takes around 20 seconds to parse 20 websites.
I noticed except NodeFilter, LinkExtractor or LinkRegexFilter may be able t=
o
achieve the same goal.

Is there other ways to make the extraction process faster than the way I'm
using now?

Here is my code:
for( 20 documents){
            parser =3D new Parser(url);
            NodeFilter filter =3D new NodeClassFilter (LinkTag.class);
            NodeList links =3D new NodeList ();

            for (NodeIterator e =3D parser.elements (); e.hasMoreNodes (); =
)
                e.nextNode ().collectInto (links, filter);
            for (int in =3D 0; in < links.size (); in++)
            {
                LinkTag linkTag =3D (LinkTag)links.elementAt (in);
                if(linkTag.getLink().endsWith(".PDF")){
                        doSomething;
            }
}

Thanks in advanced.