Re: [Htmlparser-user] extracting only certain links
Brought to you by:
derrickoswald
From: Raghavender S. <kin...@ho...> - 2002-04-30 19:16:16
|
hi Somik, I have tried urls www.nba.com, www.yahoo.com which seem to have lot of links. yahoo.com has 191 links when I tried. I wrote a small class Parser.java which I am mailing as an attachment. everytime I run my project using JBuilder after a series of parsing it throws a OutOfMemoryError. since I am using JBuilder I havent set any -ms or -mx parameters to run my Parser.java. so you might want to try it out. and the other thing I noticed while running the Parser.java was at the LinkScanner in the extractLink() method for a particular <a tag I get relativeLink as "null" and then when we do "return (new HTMLLinkProcessor()).extract(relativeLink,url);" it throws a NullPointerException in that method since relativeLink is null. The exact place it throws a NullPointerException is "if (link.indexOf("http://")==-1 && link.indexOf("mailto:")==-1 && url != null)" in "checkIfLinkIsRelative" method of HTMLLinkProcessor. this could be fixed. I fixed it....but the OutOfMemoryError seems to be potentially dangerous. Thanks, Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] extracting only certain links >Date: Tue, 30 Apr 2002 11:44:17 +0900 > >Semantic analysis... >Write a conditional to process the tag contents. You will have code like >this : > >if (node instanceof HTMLLinkTag) { > HTMLLinkTag linkTag = (HTMLLinkTag) node; > if (linkTag.getLink().indexOf("http://rd.yahoo.com")==0) { > // print the tag or display it however you want > } >} > >Regards >Somik >----- Original Message ----- >From: "Sodergren, M.G." <mg...@le...> >To: <htm...@li...> >Sent: Tuesday, April 30, 2002 2:19 AM >Subject: [Htmlparser-user] extracting only certain links > > >Hello. >When i enter a url like http://search.yahoo.com/bin/search?p=SEARCHENTERED >(yahoo result page for SEARCHENTERED),the program extracts all the links >from the html page but i just want it to extract the links that are >returned >as the result of my search by yahoo, so for example (with yahoo), all the >links beginning with <a href="http://srd.yahoo.com >but not the links beginning with <a href="http://rd.yahoo.com/ >so in other words all the links with srd and not rd. >How would i solve this problem? What code do i put and where? > >Thanks >Mats > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com |