Thread: Re: [Htmlparser-user] extracting only certain links
Brought to you by:
derrickoswald
From: Raghavender S. <kin...@ho...> - 2002-04-30 19:16:16
|
hi Somik, I have tried urls www.nba.com, www.yahoo.com which seem to have lot of links. yahoo.com has 191 links when I tried. I wrote a small class Parser.java which I am mailing as an attachment. everytime I run my project using JBuilder after a series of parsing it throws a OutOfMemoryError. since I am using JBuilder I havent set any -ms or -mx parameters to run my Parser.java. so you might want to try it out. and the other thing I noticed while running the Parser.java was at the LinkScanner in the extractLink() method for a particular <a tag I get relativeLink as "null" and then when we do "return (new HTMLLinkProcessor()).extract(relativeLink,url);" it throws a NullPointerException in that method since relativeLink is null. The exact place it throws a NullPointerException is "if (link.indexOf("http://")==-1 && link.indexOf("mailto:")==-1 && url != null)" in "checkIfLinkIsRelative" method of HTMLLinkProcessor. this could be fixed. I fixed it....but the OutOfMemoryError seems to be potentially dangerous. Thanks, Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] extracting only certain links >Date: Tue, 30 Apr 2002 11:44:17 +0900 > >Semantic analysis... >Write a conditional to process the tag contents. You will have code like >this : > >if (node instanceof HTMLLinkTag) { > HTMLLinkTag linkTag = (HTMLLinkTag) node; > if (linkTag.getLink().indexOf("http://rd.yahoo.com")==0) { > // print the tag or display it however you want > } >} > >Regards >Somik >----- Original Message ----- >From: "Sodergren, M.G." <mg...@le...> >To: <htm...@li...> >Sent: Tuesday, April 30, 2002 2:19 AM >Subject: [Htmlparser-user] extracting only certain links > > >Hello. >When i enter a url like http://search.yahoo.com/bin/search?p=SEARCHENTERED >(yahoo result page for SEARCHENTERED),the program extracts all the links >from the html page but i just want it to extract the links that are >returned >as the result of my search by yahoo, so for example (with yahoo), all the >links beginning with <a href="http://srd.yahoo.com >but not the links beginning with <a href="http://rd.yahoo.com/ >so in other words all the links with srd and not rd. >How would i solve this problem? What code do i put and where? > >Thanks >Mats > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com |
From: Raghavender S. <kin...@ho...> - 2002-05-01 00:15:32
|
hi Somik, sorry about the false alarm. there was bug in my code. but the second one-- the LinkScanner throwing a nullpointer exception is there. Ragahv >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] extracting only certain links >Date: Tue, 30 Apr 2002 11:44:17 +0900 > >Semantic analysis... >Write a conditional to process the tag contents. You will have code like >this : > >if (node instanceof HTMLLinkTag) { > HTMLLinkTag linkTag = (HTMLLinkTag) node; > if (linkTag.getLink().indexOf("http://rd.yahoo.com")==0) { > // print the tag or display it however you want > } >} > >Regards >Somik >----- Original Message ----- >From: "Sodergren, M.G." <mg...@le...> >To: <htm...@li...> >Sent: Tuesday, April 30, 2002 2:19 AM >Subject: [Htmlparser-user] extracting only certain links > > >Hello. >When i enter a url like http://search.yahoo.com/bin/search?p=SEARCHENTERED >(yahoo result page for SEARCHENTERED),the program extracts all the links >from the html page but i just want it to extract the links that are >returned >as the result of my search by yahoo, so for example (with yahoo), all the >links beginning with <a href="http://srd.yahoo.com >but not the links beginning with <a href="http://rd.yahoo.com/ >so in other words all the links with srd and not rd. >How would i solve this problem? What code do i put and where? > >Thanks >Mats > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |
From: Somik R. <so...@ya...> - 2002-05-01 06:07:52
|
The second issue that you mentioned is already fixed. It is in release 1.1 - have you got the latest release ? Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <htm...@li...> Sent: Wednesday, May 01, 2002 4:15 AM Subject: Re: [Htmlparser-user] extracting only certain links > hi Somik, > I have tried urls www.nba.com, www.yahoo.com which seem to have lot of > links. yahoo.com has 191 links when I tried. I wrote a small class > Parser.java which I am mailing as an attachment. everytime I run my project > using JBuilder after a series of parsing it throws a OutOfMemoryError. since > I am using JBuilder I havent set any -ms or -mx parameters to run my > Parser.java. so you might want to try it out. > and the other thing I noticed while running the Parser.java was at the > LinkScanner in the extractLink() method for a particular <a tag I get > relativeLink as "null" and then when we do > "return (new HTMLLinkProcessor()).extract(relativeLink,url);" > it throws a NullPointerException in that method since relativeLink is null. > The exact place it throws a NullPointerException is > "if (link.indexOf("http://")==-1 && link.indexOf("mailto:")==-1 && url != > null)" in "checkIfLinkIsRelative" method of HTMLLinkProcessor. this could be > fixed. I fixed it....but the OutOfMemoryError seems to be potentially > dangerous. > > Thanks, > Raghav > > >From: "Somik Raha" <so...@ya...> > >Reply-To: htm...@li... > >To: <htm...@li...> > >Subject: Re: [Htmlparser-user] extracting only certain links > >Date: Tue, 30 Apr 2002 11:44:17 +0900 > > > >Semantic analysis... > >Write a conditional to process the tag contents. You will have code like > >this : > > > >if (node instanceof HTMLLinkTag) { > > HTMLLinkTag linkTag = (HTMLLinkTag) node; > > if (linkTag.getLink().indexOf("http://rd.yahoo.com")==0) { > > // print the tag or display it however you want > > } > >} > > > >Regards > >Somik > >----- Original Message ----- > >From: "Sodergren, M.G." <mg...@le...> > >To: <htm...@li...> > >Sent: Tuesday, April 30, 2002 2:19 AM > >Subject: [Htmlparser-user] extracting only certain links > > > > > >Hello. > >When i enter a url like http://search.yahoo.com/bin/search?p=SEARCHENTERED > >(yahoo result page for SEARCHENTERED),the program extracts all the links > >from the html page but i just want it to extract the links that are > >returned > >as the result of my search by yahoo, so for example (with yahoo), all the > >links beginning with <a href="http://srd.yahoo.com > >but not the links beginning with <a href="http://rd.yahoo.com/ > >so in other words all the links with srd and not rd. > >How would i solve this problem? What code do i put and where? > > > >Thanks > >Mats > > > >_______________________________________________ > >Htmlparser-user mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > >_______________________________________________ > >Htmlparser-user mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > _________________________________________________________________ > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |