Thread: Re: [Htmlparser-user] extracting only certain links

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

hi Somik,
I have tried urls www.nba.com, www.yahoo.com which seem to have lot of 
links. yahoo.com has 191 links when I tried. I wrote a small class 
Parser.java which I am mailing as an attachment. everytime I run my project 
using JBuilder after a series of parsing it throws a OutOfMemoryError. since 
I am using JBuilder I havent set any -ms or -mx parameters to run my 
Parser.java. so you might want to try it out.
and the other thing I noticed while running the Parser.java was at the 
LinkScanner in the extractLink() method for a particular <a tag I get 
relativeLink as "null" and then when we do
"return (new HTMLLinkProcessor()).extract(relativeLink,url);"
it throws  a NullPointerException in that method since relativeLink is null.
The exact place it throws a NullPointerException is
"if (link.indexOf("http://")==-1 && link.indexOf("mailto:")==-1 && url != 
null)" in "checkIfLinkIsRelative" method of HTMLLinkProcessor. this could be 
fixed. I fixed it....but the OutOfMemoryError seems to be potentially 
dangerous.

Thanks,
Raghav

>From: "Somik Raha" <so...@ya...>
>Reply-To: htm...@li...
>To: <htm...@li...>
>Subject: Re: [Htmlparser-user] extracting only certain links
>Date: Tue, 30 Apr 2002 11:44:17 +0900
>
>Semantic analysis...
>Write a conditional to process the tag contents. You will have code like
>this :
>
>if (node instanceof HTMLLinkTag) {
>     HTMLLinkTag linkTag = (HTMLLinkTag) node;
>     if (linkTag.getLink().indexOf("http://rd.yahoo.com")==0) {
>         // print the tag or display it however you want
>     }
>}
>
>Regards
>Somik
>----- Original Message -----
>From: "Sodergren, M.G." <mg...@le...>
>To: <htm...@li...>
>Sent: Tuesday, April 30, 2002 2:19 AM
>Subject: [Htmlparser-user] extracting only certain links
>
>
>Hello.
>When i enter a url like http://search.yahoo.com/bin/search?p=SEARCHENTERED
>(yahoo result page for SEARCHENTERED),the program extracts all the links
>from the html page but i just want it to extract the links that are 
>returned
>as the result of my search by yahoo, so for example (with yahoo), all the
>links beginning with <a href="http://srd.yahoo.com
>but not the links beginning with <a href="http://rd.yahoo.com/
>so in other words all the links with srd and not rd.
>How would i solve this problem? What code do i put and where?
>
>Thanks
>Mats
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user

_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com