Re: [Htmlparser-user] how to improve link extraction speed
Brought to you by:
derrickoswald
From: Wen <log...@ya...> - 2006-03-22 07:59:51
|
Hi Derrick, Thank you for your reply. It does improve the speed. Thanks a lot. wen --- Derrick Oswald <Der...@Ro...> wrote: > Wen, > > I'm not sure it would be faster but... > If you don't care about nesting or other types of nodes, you > can supply > the LinkTag as the only prototype for the node factory: > > PrototypicalNodeFactory factory = new > PrototypicalNodeFactory (new > LinkTag ()); > Parser parser = new Parser (); > parser.setNodeFactory (factory); > NodeFilter filter = new NodeClassFilter (LinkTag.class); > for (20 documents) > { > parser.setURL (url); > NodeList links = parser.extractAllNodesThatMatch (filter); > for (int in = 0; in < links.size (); in++) > ... > > In this way there will be no attempt at nesting the tags, so > it should > be faster. > You also don't need to allocate a parser and filter within > your loop. > > Derrick > > Wen wrote: > > > Hi, > > > > I'm using HTMLParser to parse a link that contains specific > file type. > > ex. pdf files. > > It works fine but takes around 20 seconds to parse 20 > websites. > > I noticed except NodeFilter, LinkExtractor or > LinkRegexFilter may be > > able to achieve the same goal. > > > > Is there other ways to make the extraction process faster > than the way > > I'm using now? > > > > Here is my code: > > for( 20 documents){ > > parser = new Parser(url); > > NodeFilter filter = new NodeClassFilter > (LinkTag.class); > > NodeList links = new NodeList (); > > > > for (NodeIterator e = parser.elements (); > e.hasMoreNodes (); ) > > e.nextNode ().collectInto (links, filter); > > for (int in = 0; in < links.size (); in++) > > { > > LinkTag linkTag = (LinkTag)links.elementAt > (in); > > if(linkTag.getLink().endsWith(".PDF")){ > > doSomething; > > } > > } > > > > Thanks in advanced. > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking > scripting language > that extends applications into web and mobile media. Attend > the live webcast > and join the prime developer group breaking into this new > coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |