Re: [Htmlparser-developer] htmlparser 1.0
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-01-08 15:46:16
|
Hi Kaarle, To answer your basic question - crawler will crawl through a url (like websnake and similar robot crawlers). It will pick up links and visit those links and so on recursively depending on the depth you define. The bugs you see are not bcos of the crawler code, but bcos of some parser bugs. The scanner bugs came in when I tried to fix the case when the style tags are in one big line with other stuff. Obviously, not enough test cases. Thankfully, you are htmlparser's best tester :) Your site and http://www.yle.fi are working fine now. mtv3 is giving the wierd out of mem excpetion and I am now fixing that. As soon as thats done, maintenance release 1.01 will be out. Cheers, Somik ----- Original Message ----- From: "Kaarle Kaila" <kaa...@ik...> To: <htm...@li...> Sent: Tuesday, January 08, 2002 3:34 AM Subject: [Htmlparser-developer] htmlparser 1.0 > I tried the example applications using the bat-files > with htmlparser 1.0 with not very good success. > > 1) > runCrawler http://www.google.com 1 > This gives a list of links on the abovementioned page I assume > > 2) (finnish broadcastin company) > runCrawler http://www.yle.fi 1 > This throws > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: > String ind > ex out of range: 27 > > 3) (finnish commercial tvstation ) > runCrawler http://www.mtv3.fi 1 > this throws > Exception in thread "main" java.lang.OutOfMemoryError > <<no stack trace available>> > > 4) my own simple homepage > > After a rather long time throws: > Crawling to > http://www.microsoft.com/ContentRedirect.asp?prd=iis&sbp=&pver=5.0&p > id=&ID=404&cat=web&os=&over=&hrd=&Opt1=&Opt2=&Opt3= crawlDepth = 0 > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: > String ind > ex out of range: 23 > at java.lang.String.substring(Unknown Source) > ........ > I don't think I have such microsoft links on my page. Probably something to > to with the activeisp.com that provides me with diskspace?? > > Similar result from my software page at www.kk-software.fi > -------------------- > As a result of these experiments i did not understand what the Robot tries > to do?? > > Any explanations to this? > regards > Kaarle > > --------------------------------------------- > Kaarle Kaila > http://www.iki.fi/kaila > mailto:kaa...@ik... > tel: +358 50 3725844 > > > > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com |