Thread: [Htmlparser-developer] Re: [Htmlparser-user] Bad formed web page
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-06-27 02:17:08
|
Hi Cedric, Thanks for the bug report. This has been reproduced in = HTMLTagTest.testBrokenTag(), and has been fixed. The parser now runs = without failing on the same html file provided. This fix will make it in the next integration release. Regarding your earlier bug report, although the bug has been fixed, = I am thinking I should introduce a template method, so that new scanner = writers dont have to bother about registering the tags with their = respective scanners. Hopefully this refactoring will be in soon enabling scanners to be = written safely. Also need to get cracking at Claude's refactoring = suggestions. Regards, Somik ----- Original Message -----=20 From: C=E9dric Rosa=20 To: htm...@li...=20 Sent: Thursday, June 27, 2002 12:48 AM Subject: [Htmlparser-user] Bad formed web page Re Somik, First, thanks for your patch I'll download it as soon as possible. I've just tested your program with a web page which contains errors. = I'm=20 programming a search engine and some pages may contains errors. I attached a copy of a bad page example: the problem is the page is = trim=20 before its end (a download error for example). It miss a ">" ("<br") which cause the program crash with a null = pointer=20 exception ... Can you fix this problem or tell me where (in the sources) I can = search for=20 patching ? Thanks by advance for your good support. Cedric. At 20:28 26/06/2002 +0900, you wrote: >Hi Cedric, > This has been fixed. These two scanners (meta and title tag = scanners)=20 > were not being associated with their tags. Reproduced with a test = case=20 > and fixed. Code on CVS has been updated. This bug fix will make it = in the=20 > next integration release (hopefully this weekend). > Thanks for the bug report. >Cheers, >Somik >>----- Original Message ----- >>From: <mailto:so...@ya...>Somik Raha >>To:=20 = >><mailto:htm...@li...>htm...@li...u= rceforge.net=20 >> >>Sent: Wednesday, June 26, 2002 8:13 PM >>Subject: Re: [Htmlparser-user] -m option doesn't work ? >> >>It does look like a bug - you could probably open a BugZilla report = (from=20 = >><http://htmlparser.sourceforge.net>http://htmlparser.sourceforge.net), = >>and describe your fix. I will also try to take a deeper look as soon = as I=20 >>find some time. >> >>Regards, >>Somik >>>----- Original Message ----- >>>From: <mailto:ced...@fr...>C=E9dric Rosa >>>To:=20 = >>><mailto:htm...@li...>htm...@li...= urceforge.net=20 >>> >>>Sent: Wednesday, June 26, 2002 8:14 PM >>>Subject: Re: [Htmlparser-user] -m option doesn't work ? >>> >>>I've tried with many urls, it's the same problem, but you can check = with : = >>>"<http://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm>= http://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm" >>> >>>I've just modified the source code to make it works (and now it = woks fine) >>>... so maybe it's a bug ? >>> >>>Thanks for your help. >>> >>>Cedric. >>> >>>At 20:02 26/06/2002 +0900, you wrote: >>> >Hi Cedric, >>> > Can you give us the url, or send the page over? >>> > >>> >Regards >>> >Somik >>> >>----- Original Message ----- >>> >>From:=20 >>> = <<mailto:ced...@fr...>mailto:ced...@fr...>C=E9dric = Rosa >>> >>To: >>> = >><<mailto:htm...@li...>mailto:htmlparser-user@ = >>> = lists.sourceforge.net><mailto:htm...@li...>htmlp= ars...@li...=20 >>> >>> >> >>> >>Sent: Wednesday, June 26, 2002 5:40 PM >>> >>Subject: [Htmlparser-user] -m option doesn't work ? >>> >> >>> >>Hello, >>> >> >>> >>When I'm trying to parse a web page with htmlparser with this = code: >>> >> >>> >>HTMLParser parser =3D new HTMLParser("foo.html"); >>> >>parser.registerScanners(); >>> >>parser.parse(null); >>> >> >>> >>eveything is OK but when I tried to parse the page with : >>> >> >>> >>parser.parse("-m"); >>> >>or >>> >>parser.parse("-t"); >>> >> >>> >>I received no answer from the software even if page contains = meta tag or >>> >>title. >>> >> >>> >>What's wrong ? >>> >> >>> >>thanks by advance for your answers. >>> >> >>> >>Cedric. >>> >> >>> >> >>> >> >>> >>------------------------------------------------------- >>> >>This sf.net email is sponsored by: Jabber Inc. >>> >>Don't miss the IM event of the season | Special offer for OSDN = members! >>> >>JabConf 2002, Aug. 20-22, Keystone, CO >>> = >><<http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn>http:/ = >>> /www.jabberconf.com/osdn >>> >>_______________________________________________ >>> >>Htmlparser-user mailing list >>> = >><<mailto:Htm...@li...>mailto:Htmlparser-user@ = >>> = lists.sourceforge.net><mailto:Htm...@li...>Htmlp= ars...@li... >>> = >><https://lists.sourceforge.net/lists/listinfo/htmlparser-user>https:// = >>> lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >>> >>>------------------------------------------------------- >>>This sf.net email is sponsored by: Jabber Inc. >>>Don't miss the IM event of the season | Special offer for OSDN = members! >>>JabConf 2002, Aug. 20-22, Keystone, CO=20 >>><http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn >>>_______________________________________________ >>>Htmlparser-user mailing list = >>><mailto:Htm...@li...>Htm...@li...= urceforge.net >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user |