Re: [Htmlparser-user] extracting only certain links
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-05-01 06:07:52
|
The second issue that you mentioned is already fixed. It is in release 1.1 - have you got the latest release ? Regards, Somik ----- Original Message ----- From: "Raghavender Srimantula" <kin...@ho...> To: <htm...@li...> Sent: Wednesday, May 01, 2002 4:15 AM Subject: Re: [Htmlparser-user] extracting only certain links > hi Somik, > I have tried urls www.nba.com, www.yahoo.com which seem to have lot of > links. yahoo.com has 191 links when I tried. I wrote a small class > Parser.java which I am mailing as an attachment. everytime I run my project > using JBuilder after a series of parsing it throws a OutOfMemoryError. since > I am using JBuilder I havent set any -ms or -mx parameters to run my > Parser.java. so you might want to try it out. > and the other thing I noticed while running the Parser.java was at the > LinkScanner in the extractLink() method for a particular <a tag I get > relativeLink as "null" and then when we do > "return (new HTMLLinkProcessor()).extract(relativeLink,url);" > it throws a NullPointerException in that method since relativeLink is null. > The exact place it throws a NullPointerException is > "if (link.indexOf("http://")==-1 && link.indexOf("mailto:")==-1 && url != > null)" in "checkIfLinkIsRelative" method of HTMLLinkProcessor. this could be > fixed. I fixed it....but the OutOfMemoryError seems to be potentially > dangerous. > > Thanks, > Raghav > > >From: "Somik Raha" <so...@ya...> > >Reply-To: htm...@li... > >To: <htm...@li...> > >Subject: Re: [Htmlparser-user] extracting only certain links > >Date: Tue, 30 Apr 2002 11:44:17 +0900 > > > >Semantic analysis... > >Write a conditional to process the tag contents. You will have code like > >this : > > > >if (node instanceof HTMLLinkTag) { > > HTMLLinkTag linkTag = (HTMLLinkTag) node; > > if (linkTag.getLink().indexOf("http://rd.yahoo.com")==0) { > > // print the tag or display it however you want > > } > >} > > > >Regards > >Somik > >----- Original Message ----- > >From: "Sodergren, M.G." <mg...@le...> > >To: <htm...@li...> > >Sent: Tuesday, April 30, 2002 2:19 AM > >Subject: [Htmlparser-user] extracting only certain links > > > > > >Hello. > >When i enter a url like http://search.yahoo.com/bin/search?p=SEARCHENTERED > >(yahoo result page for SEARCHENTERED),the program extracts all the links > >from the html page but i just want it to extract the links that are > >returned > >as the result of my search by yahoo, so for example (with yahoo), all the > >links beginning with <a href="http://srd.yahoo.com > >but not the links beginning with <a href="http://rd.yahoo.com/ > >so in other words all the links with srd and not rd. > >How would i solve this problem? What code do i put and where? > > > >Thanks > >Mats > > > >_______________________________________________ > >Htmlparser-user mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > >_______________________________________________ > >Htmlparser-user mailing list > >Htm...@li... > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > _________________________________________________________________ > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |