Thread: Re: Fw: [Htmlparser-user] Bad formed web page

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello Somik,

Thanks for this fix. But when I download the CVS version of HTMLParser and=
=20
try to parse the page again I get this error:
"java.lang.OutOfMemoryError
         <<no stack trace available>>
Exception in thread "main" "

Is-it normal ? Should I catch this error and write my own code around ?

Other question, I can't run the software with two options. Is-it normal ?=20
Why don't you set the options before the name of the file to parse ?

Last, a friend (Tarik Mokhtari) wrote a "little" normalizer to convert=20
"&*". Maybe it could be a good idea to  add it to the project ?

It can be used like this:
public HTMLStringNode(String text,int textBegin,int textEnd)
{
   NormalizeHtmlCode normalizer =3D new NormalizeHtmlCode();
   this.text =3D normalizer.html2text(text);
   this.textBegin =3D textBegin;
   this.textEnd =3D textEnd;
}
You can implement it with the meta-tags, ...

Regards,

Cedric.

At 08:23 27/06/2002 +0200, you wrote:
>
>----- Original Message -----
>From: <mailto:so...@ya...>Somik Raha
>To:=20
><mailto:htm...@li...>htm...@li...urcef=
orge.net=20
>
>Cc:=20
><mailto:htm...@li...>htmlparser-developer@lis=
ts.sourceforge.net=20
>
>Sent: Thursday, June 27, 2002 4:11 AM
>Subject: Re: [Htmlparser-user] Bad formed web page
>
>Hi Cedric,
>     Thanks for the bug report. This has been reproduced in=20
> HTMLTagTest.testBrokenTag(), and has been fixed. The parser now runs=20
> without failing on the same html file provided.
>     This fix will make it in the next integration release.
>
>     Regarding your earlier bug report, although the bug has been fixed, I=
=20
> am thinking I should introduce a template method, so that new scanner=20
> writers dont have to bother about registering the tags with their=20
> respective scanners.
>
>     Hopefully this refactoring will be in soon enabling scanners to be=20
> written safely. Also need to get cracking at Claude's refactoring=
 suggestions.
>
>Regards,
>Somik
>----- Original Message -----
>From: <mailto:ced...@fr...>C=E9dric Rosa
>To:=20
><mailto:htm...@li...>htm...@li...urcef=
orge.net=20
>
>Sent: Thursday, June 27, 2002 12:48 AM
>Subject: [Htmlparser-user] Bad formed web page
>
>Re Somik,
>
>First, thanks for your patch I'll download it as soon as possible.
>
>I've just tested your program with a web page which contains errors. I'm
>programming a search engine and some pages may contains errors.
>I attached a copy of a bad page example: the problem is the page is trim
>before its end (a download error for example).
>It miss a ">" ("<br") which cause the program crash with a null pointer
>exception ...
>Can you fix this problem or tell me where (in the sources) I can search for
>patching ?
>
>Thanks by advance for your good support.
>
>Cedric.
>
>
>
>
>
>At 20:28 26/06/2002 +0900, you wrote:
> >Hi Cedric,
> >     This has been fixed. These two scanners (meta and title tag=
 scanners)
> > were not being associated with their tags. Reproduced with a test case
> > and fixed. Code on CVS has been updated. This bug fix will make it in=
 the
> > next integration release (hopefully this weekend).
> >     Thanks for the bug report.
> >Cheers,
> >Somik
> >>----- Original Message -----
> >>From: <mailto:so...@ya...>Somik Raha
> >>To:
> >><mailto:htm...@li...>htm...@li...ur=
=20
> ceforge.net
> >>
> >>Sent: Wednesday, June 26, 2002 8:13 PM
> >>Subject: Re: [Htmlparser-user] -m option doesn't work ?
> >>
> >>It does look like a bug - you could probably open a BugZilla report=
 (from
> >><http://htmlparser.sourceforge.net>http://htmlparser.sourceforge.net),
> >>and describe your fix. I will also try to take a deeper look as soon as=
 I
> >>find some time.
> >>
> >>Regards,
> >>Somik
> >>>----- Original Message -----
> >>>From: <mailto:ced...@fr...>C=E9dric Rosa
> >>>To:
> >>><mailto:htm...@li...>htm...@li...u=
=20
> rceforge.net
> >>>
> >>>Sent: Wednesday, June 26, 2002 8:14 PM
> >>>Subject: Re: [Htmlparser-user] -m option doesn't work ?
> >>>
> >>>I've tried with many urls, it's the same problem, but you can check=20
> with :
> >>>"<http://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm>h=
=20
> ttp://www.cybergeo.presse.fr/actualit/nouvparu/crendus/irstcr3.htm"
> >>>
> >>>I've just modified the source code to make it works (and now it woks=20
> fine)
> >>>... so maybe it's a bug ?
> >>>
> >>>Thanks for your help.
> >>>
> >>>Cedric.
> >>>
> >>>At 20:02 26/06/2002 +0900, you wrote:
> >>> >Hi Cedric,
> >>> >     Can you give us the url, or send the page over?
> >>> >
> >>> >Regards
> >>> >Somik
> >>> >>----- Original Message -----
> >>> >>From:
> >>>=20
> <<mailto:ced...@fr...>mailto:ced...@fr...>C=E9dric=
 Rosa
> >>> >>To:
> >>>=
 >><<mailto:htm...@li...>mailto:htmlparser-user@
> >>>=20
>=
 lists.sourceforge.net><mailto:htm...@li...>htmlpar=
ser...@li...=20
>
> >>>
> >>> >>
> >>> >>Sent: Wednesday, June 26, 2002 5:40 PM
> >>> >>Subject: [Htmlparser-user] -m option doesn't work ?
> >>> >>
> >>> >>Hello,
> >>> >>
> >>> >>When I'm trying to parse a web page with htmlparser with this code:
> >>> >>
> >>> >>HTMLParser parser =3D new HTMLParser("foo.html");
> >>> >>parser.registerScanners();
> >>> >>parser.parse(null);
> >>> >>
> >>> >>eveything is OK but when I tried to parse the page with :
> >>> >>
> >>> >>parser.parse("-m");
> >>> >>or
> >>> >>parser.parse("-t");
> >>> >>
> >>> >>I received no answer from the software even if page contains meta=20
> tag or
> >>> >>title.
> >>> >>
> >>> >>What's wrong ?
> >>> >>
> >>> >>thanks by advance for your answers.
> >>> >>
> >>> >>Cedric.
> >>> >>
> >>> >>
> >>> >>
> >>> >>-------------------------------------------------------
> >>> >>This sf.net email is sponsored by: Jabber Inc.
> >>> >>Don't miss the IM event of the season | Special offer for OSDN=20
> members!
> >>> >>JabConf 2002, Aug. 20-22, Keystone, CO
> >>>=
 >><<http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn>http:/
> >>> /www.jabberconf.com/osdn
> >>> >>_______________________________________________
> >>> >>Htmlparser-user mailing list
> >>>=
 >><<mailto:Htm...@li...>mailto:Htmlparser-user@
> >>>=20
>=
 lists.sourceforge.net><mailto:Htm...@li...>Htmlpar=
ser...@li...=20
>
> >>>=
 >><https://lists.sourceforge.net/lists/listinfo/htmlparser-user>https://
> >>> lists.sourceforge.net/lists/listinfo/htmlparser-user
> >>>
> >>>
> >>>
> >>>-------------------------------------------------------
> >>>This sf.net email is sponsored by: Jabber Inc.
> >>>Don't miss the IM event of the season | Special offer for OSDN members!
> >>>JabConf 2002, Aug. 20-22, Keystone, CO
> >>><http://www.jabberconf.com/osdn>http://www.jabberconf.com/osdn
> >>>_______________________________________________
> >>>Htmlparser-user mailing list
> >>><mailto:Htm...@li...>Htm...@li...u=
=20
> rceforge.net
> >>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user