[Htmlparser-user] Bad formed web page
Brought to you by:
derrickoswald
From: R. <ced...@fr...> - 2002-06-26 15:48:45
|
Re Somik, First, thanks for your patch I'll download it as soon as possible. I've just tested your program with a web page which contains errors. I'm=20 programming a search engine and some pages may contains errors. I attached a copy of a bad page example: the problem is the page is trim=20 before its end (a download error for example). It miss a ">" ("<br") which cause the program crash with a null pointer=20 exception ... Can you fix this problem or tell me where (in the sources) I can search for= =20 patching ? Thanks by advance for your good support. Cedric. At 20:28 26/06/2002 +0900, you wrote: >Hi Cedric, > This has been fixed. These two scanners (meta and title tag scanners)= =20 > were not being associated with their tags. Reproduced with a test case=20 > and fixed. Code on CVS has been updated. This bug fix will make it in the= =20 > next integration release (hopefully this weekend). > Thanks for the bug report. >Cheers, >Somik >>----- Original Message ----- >>From: <mailto:so...@ya...>Somik Raha >>To:=20 >><mailto:htm...@li...>htm...@li...urce= forge.net=20 >> >>Sent: Wednesday, June 26, 2002 8:13 PM >>Subject: Re: [Htmlparser-user] -m option doesn't work ? >> >>It does look like a bug - you could probably open a BugZilla report (from= =20 >><http://htmlparser.sourceforge.net>http://htmlparser.sourceforge.net),=20 >>and describe your fix. I will also try to take a deeper look as soon as I= =20 >>find some time. >> >>Regards, >>Somik >>>----- Original Message ----- >>>From: <mailto:ced...@fr...>C=E9dric Rosa >>>To:=20 >>><mailto:htm...@li...>htm...@li...urc= eforge.net=20 >>> >>>Sent: Wednesday, June 26, 2002 8:14 PM >>>Subject: Re: [Htmlparser-user] -m option doesn't work ? >>> >>>I've tried with many urls, it's the same problem, but you can check with= : >>>"<http://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm>htt= p://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm" >>> >>>I've just modified the source code to make it works (and now it woks= fine) >>>... so maybe it's a bug ? >>> >>>Thanks for your help. >>> >>>Cedric. >>> >>>At 20:02 26/06/2002 +0900, you wrote: >>> >Hi Cedric, >>> > Can you give us the url, or send the page over? >>> > >>> >Regards >>> >Somik >>> >>----- Original Message ----- >>> >>From:=20 >>> <<mailto:ced...@fr...>mailto:ced...@fr...>C=E9dric= Rosa >>> >>To: >>> >><<mailto:htm...@li...>mailto:htmlparser-user@= =20 >>>= lists.sourceforge.net><mailto:htm...@li...>htmlpar= ser...@li...=20 >>> >>> >> >>> >>Sent: Wednesday, June 26, 2002 5:40 PM >>> >>Subject: [Htmlparser-user] -m option doesn't work ? >>> >> >>> >>Hello, >>> >> >>> >>When I'm trying to parse a web page with htmlparser with this code: >>> >> >>> >>HTMLParser parser =3D new HTMLParser("foo.html"); >>> >>parser.registerScanners(); >>> >>parser.parse(null); >>> >> >>> >>eveything is OK but when I tried to parse the page with : >>> >> >>> >>parser.parse("-m"); >>> >>or >>> >>parser.parse("-t"); >>> >> >>> >>I received no answer from the software even if page contains meta tag= or >>> >>title. >>> >> >>> >>What's wrong ? >>> >> >>> >>thanks by advance for your answers. >>> >> >>> >>Cedric. >>> >> >>> >> >>> >> >>> >>------------------------------------------------------- >>> >>This sf.net email is sponsored by: Jabber Inc. >>> >>Don't miss the IM event of the season | Special offer for OSDN= members! >>> >>JabConf 2002, Aug. 20-22, Keystone, CO >>> >><<http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn>http:/= =20 >>> /www.jabberconf.com/osdn >>> >>_______________________________________________ >>> >>Htmlparser-user mailing list >>> >><<mailto:Htm...@li...>mailto:Htmlparser-user@= =20 >>>= lists.sourceforge.net><mailto:Htm...@li...>Htmlpar= ser...@li... >>> >><https://lists.sourceforge.net/lists/listinfo/htmlparser-user>https://= =20 >>> lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >>> >>>------------------------------------------------------- >>>This sf.net email is sponsored by: Jabber Inc. >>>Don't miss the IM event of the season | Special offer for OSDN members! >>>JabConf 2002, Aug. 20-22, Keystone, CO=20 >>><http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn >>>_______________________________________________ >>>Htmlparser-user mailing list >>><mailto:Htm...@li...>Htm...@li...urc= eforge.net >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user |