[Htmlparser-developer] Re: [Htmlparser-user] Bad formed web page
Brought to you by:
derrickoswald
|
From: Somik R. <so...@ya...> - 2002-06-27 02:17:08
|
Hi Cedric,
Thanks for the bug report. This has been reproduced in =
HTMLTagTest.testBrokenTag(), and has been fixed. The parser now runs =
without failing on the same html file provided.
This fix will make it in the next integration release.
Regarding your earlier bug report, although the bug has been fixed, =
I am thinking I should introduce a template method, so that new scanner =
writers dont have to bother about registering the tags with their =
respective scanners.
Hopefully this refactoring will be in soon enabling scanners to be =
written safely. Also need to get cracking at Claude's refactoring =
suggestions.
Regards,
Somik
----- Original Message -----=20
From: C=E9dric Rosa=20
To: htm...@li...=20
Sent: Thursday, June 27, 2002 12:48 AM
Subject: [Htmlparser-user] Bad formed web page
Re Somik,
First, thanks for your patch I'll download it as soon as possible.
I've just tested your program with a web page which contains errors. =
I'm=20
programming a search engine and some pages may contains errors.
I attached a copy of a bad page example: the problem is the page is =
trim=20
before its end (a download error for example).
It miss a ">" ("<br") which cause the program crash with a null =
pointer=20
exception ...
Can you fix this problem or tell me where (in the sources) I can =
search for=20
patching ?
Thanks by advance for your good support.
Cedric.
At 20:28 26/06/2002 +0900, you wrote:
>Hi Cedric,
> This has been fixed. These two scanners (meta and title tag =
scanners)=20
> were not being associated with their tags. Reproduced with a test =
case=20
> and fixed. Code on CVS has been updated. This bug fix will make it =
in the=20
> next integration release (hopefully this weekend).
> Thanks for the bug report.
>Cheers,
>Somik
>>----- Original Message -----
>>From: <mailto:so...@ya...>Somik Raha
>>To:=20
=
>><mailto:htm...@li...>htm...@li...=
rceforge.net=20
>>
>>Sent: Wednesday, June 26, 2002 8:13 PM
>>Subject: Re: [Htmlparser-user] -m option doesn't work ?
>>
>>It does look like a bug - you could probably open a BugZilla report =
(from=20
=
>><http://htmlparser.sourceforge.net>http://htmlparser.sourceforge.net), =
>>and describe your fix. I will also try to take a deeper look as soon =
as I=20
>>find some time.
>>
>>Regards,
>>Somik
>>>----- Original Message -----
>>>From: <mailto:ced...@fr...>C=E9dric Rosa
>>>To:=20
=
>>><mailto:htm...@li...>htm...@li...=
urceforge.net=20
>>>
>>>Sent: Wednesday, June 26, 2002 8:14 PM
>>>Subject: Re: [Htmlparser-user] -m option doesn't work ?
>>>
>>>I've tried with many urls, it's the same problem, but you can check =
with :
=
>>>"<http://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm>=
http://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm"
>>>
>>>I've just modified the source code to make it works (and now it =
woks fine)
>>>... so maybe it's a bug ?
>>>
>>>Thanks for your help.
>>>
>>>Cedric.
>>>
>>>At 20:02 26/06/2002 +0900, you wrote:
>>> >Hi Cedric,
>>> > Can you give us the url, or send the page over?
>>> >
>>> >Regards
>>> >Somik
>>> >>----- Original Message -----
>>> >>From:=20
>>> =
<<mailto:ced...@fr...>mailto:ced...@fr...>C=E9dric =
Rosa
>>> >>To:
>>> =
>><<mailto:htm...@li...>mailto:htmlparser-user@ =
>>> =
lists.sourceforge.net><mailto:htm...@li...>htmlp=
ars...@li...=20
>>>
>>> >>
>>> >>Sent: Wednesday, June 26, 2002 5:40 PM
>>> >>Subject: [Htmlparser-user] -m option doesn't work ?
>>> >>
>>> >>Hello,
>>> >>
>>> >>When I'm trying to parse a web page with htmlparser with this =
code:
>>> >>
>>> >>HTMLParser parser =3D new HTMLParser("foo.html");
>>> >>parser.registerScanners();
>>> >>parser.parse(null);
>>> >>
>>> >>eveything is OK but when I tried to parse the page with :
>>> >>
>>> >>parser.parse("-m");
>>> >>or
>>> >>parser.parse("-t");
>>> >>
>>> >>I received no answer from the software even if page contains =
meta tag or
>>> >>title.
>>> >>
>>> >>What's wrong ?
>>> >>
>>> >>thanks by advance for your answers.
>>> >>
>>> >>Cedric.
>>> >>
>>> >>
>>> >>
>>> >>-------------------------------------------------------
>>> >>This sf.net email is sponsored by: Jabber Inc.
>>> >>Don't miss the IM event of the season | Special offer for OSDN =
members!
>>> >>JabConf 2002, Aug. 20-22, Keystone, CO
>>> =
>><<http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn>http:/ =
>>> /www.jabberconf.com/osdn
>>> >>_______________________________________________
>>> >>Htmlparser-user mailing list
>>> =
>><<mailto:Htm...@li...>mailto:Htmlparser-user@ =
>>> =
lists.sourceforge.net><mailto:Htm...@li...>Htmlp=
ars...@li...
>>> =
>><https://lists.sourceforge.net/lists/listinfo/htmlparser-user>https:// =
>>> lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>
>>>
>>>
>>>-------------------------------------------------------
>>>This sf.net email is sponsored by: Jabber Inc.
>>>Don't miss the IM event of the season | Special offer for OSDN =
members!
>>>JabConf 2002, Aug. 20-22, Keystone, CO=20
>>><http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn
>>>_______________________________________________
>>>Htmlparser-user mailing list
=
>>><mailto:Htm...@li...>Htm...@li...=
urceforge.net
>>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|