Thread: [Htmlparser-developer] htmlparser 1.0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I tried the example applications using the bat-files
with htmlparser 1.0 with not very good success.

1)
runCrawler http://www.google.com 1
This gives a list of links on the abovementioned page I assume

2) (finnish broadcastin company)
runCrawler http://www.yle.fi 1
This throws
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: 
String ind
ex out of range: 27

3) (finnish commercial tvstation )
runCrawler http://www.mtv3.fi 1
this throws
Exception in thread "main" java.lang.OutOfMemoryError
         <<no stack trace available>>

4) my own simple homepage

After a rather long time throws:
Crawling to 
http://www.microsoft.com/ContentRedirect.asp?prd=iis&sbp=&pver=5.0&p
id=&ID=404&cat=web&os=&over=&hrd=&Opt1=&Opt2=&Opt3= crawlDepth = 0
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: 
String ind
ex out of range: 23
         at java.lang.String.substring(Unknown Source)
........
I don't think I have such microsoft links on my page. Probably something to
to with the activeisp.com that provides me with diskspace??

Similar result from my software page at www.kk-software.fi
--------------------
As a result of these experiments i did not understand what the Robot tries 
to do??

Any explanations to this?
regards
Kaarle

---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
tel: +358 50 3725844

Thread: [Htmlparser-developer] htmlparser 1.0

htmlparser-developer