Re: [Htmlparser-user] how to improve link extraction speed
Brought to you by:
derrickoswald
|
From: Wen <log...@ya...> - 2006-03-22 07:59:51
|
Hi Derrick,
Thank you for your reply. It does improve the speed. Thanks a
lot.
wen
--- Derrick Oswald <Der...@Ro...> wrote:
> Wen,
>
> I'm not sure it would be faster but...
> If you don't care about nesting or other types of nodes, you
> can supply
> the LinkTag as the only prototype for the node factory:
>
> PrototypicalNodeFactory factory = new
> PrototypicalNodeFactory (new
> LinkTag ());
> Parser parser = new Parser ();
> parser.setNodeFactory (factory);
> NodeFilter filter = new NodeClassFilter (LinkTag.class);
> for (20 documents)
> {
> parser.setURL (url);
> NodeList links = parser.extractAllNodesThatMatch (filter);
> for (int in = 0; in < links.size (); in++)
> ...
>
> In this way there will be no attempt at nesting the tags, so
> it should
> be faster.
> You also don't need to allocate a parser and filter within
> your loop.
>
> Derrick
>
> Wen wrote:
>
> > Hi,
> >
> > I'm using HTMLParser to parse a link that contains specific
> file type.
> > ex. pdf files.
> > It works fine but takes around 20 seconds to parse 20
> websites.
> > I noticed except NodeFilter, LinkExtractor or
> LinkRegexFilter may be
> > able to achieve the same goal.
> >
> > Is there other ways to make the extraction process faster
> than the way
> > I'm using now?
> >
> > Here is my code:
> > for( 20 documents){
> > parser = new Parser(url);
> > NodeFilter filter = new NodeClassFilter
> (LinkTag.class);
> > NodeList links = new NodeList ();
> >
> > for (NodeIterator e = parser.elements ();
> e.hasMoreNodes (); )
> > e.nextNode ().collectInto (links, filter);
> > for (int in = 0; in < links.size (); in++)
> > {
> > LinkTag linkTag = (LinkTag)links.elementAt
> (in);
> > if(linkTag.getLink().endsWith(".PDF")){
> > doSomething;
> > }
> > }
> >
> > Thanks in advanced.
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking
> scripting language
> that extends applications into web and mobile media. Attend
> the live webcast
> and join the prime developer group breaking into this new
> coding territory!
>
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
|