Thread: [Htmlparser-developer] Bug Report
Brought to you by:
derrickoswald
|
From: Claude D. <CD...@ar...> - 2002-07-30 20:31:17
Attachments:
tech_chat_archives.html
|
We've found a number of documents, from the same site, that use a convention in the source documents that the browsers seem to deal with well enough but that HTMLParser hangs on. While it's arguable this is not valid HTML these documents should probably not cause hanging behavior. The "<!-->" sequence (not including quotes) is apparently at fault. If the parser recognized and ignored these, I think this would help. I've attached a document that causes this hanging behavior. |
|
From: Somik R. <so...@ya...> - 2002-07-31 00:53:37
|
Hi Claude,
I will take a look at this as soon as I get some time. One request =
-- could you open a bug report from http://htmlparser.sourceforge.net
Cheers,
Somik
----- Original Message -----=20
From: Claude Duguay=20
To: htm...@li...=20
Sent: Wednesday, July 31, 2002 5:31 AM
Subject: [Htmlparser-developer] Bug Report
We've found a number of documents, from the same site, that use a
convention in the source documents that the browsers seem to deal with
well enough but that HTMLParser hangs on. While it's arguable this is
not valid HTML these documents should probably not cause hanging
behavior.
The "<!-->" sequence (not including quotes) is apparently at fault. If
the parser recognized and ignored these, I think this would help. I've
attached a document that causes this hanging behavior.
|
|
From: Somik R. <so...@ya...> - 2002-08-04 01:07:43
|
Hi Claude
I've fixed this bug, but I found another on the page you sent which =
I dont know how to fix :
<img src"/images/spacer.gif" width=3D"1" height=3D"1" alt=3D"">
This one is driving me crazy - how can a browser accept this!!
Anyway, I am throwing exceptions now.. I need to think and see if its =
possible to accept this as well.
Regards,
Somik
|
|
From: Kaarle K. <kaa...@ik...> - 2002-08-04 04:02:34
|
On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent which= I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" height=3D= "1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila |
|
From: Somik R. <so...@ya...> - 2002-08-04 06:09:36
|
Hi Kaarle,
I was also thinking the fix might be done in parseParameters().
But the point is - as humans, we can easily tell that it should be =
taken as src=3D.
So this correction should be possible... I found that the current =
crash is happening in the HTMLImageScanner class- which means =
parseParameters can be left as is, and we could try to add this =
intelligence (correction) from the scanner end - and perhaps fix the tag =
and call for it to be parsed again.=20
=20
A second reason is this kind of smart logic makes sense only in a =
particular context, and it might not be good to clutter =
parseParameters() which has to stay as optimal as possible. I will try =
to work on these lines, and see if a fix is possible.
Cheers,
Somik
=20
----- Original Message -----=20
From: Kaarle Kaila=20
To: htm...@li...=20
Sent: Sunday, August 04, 2002 1:00 PM
Subject: Re: [Htmlparser-developer] Bug Report
On Sunday 04 August 2002 04:07, Somik Raha wrote:
> Hi Claude
> I've fixed this bug, but I found another on the page you sent =
which I
> dont know how to fix : <img src"/images/spacer.gif" width=3D"1" =
height=3D"1"
> alt=3D"">
I would say that no reason to accept it as src=3D"/images/spacer.gif"
but maybe it could be accepted as 'src/images/spacer.gif'=20
or "/images/spacer.gif" or someting similar i.e as just a bad =
parameter
name without value. I don't know how parseParameters would take it
but it should probably do something like that.
regards
Kaarle
>
> This one is driving me crazy - how can a browser accept this!!
> Anyway, I am throwing exceptions now.. I need to think and see if =
its
> possible to accept this as well.
>
> Regards,
> Somik
--=20
-------------------------------------------
Kaarle Kaila
mailto:kaa...@ik...
http://www.iki.fi/kaila
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|
|
From: Somik R. <so...@ya...> - 2002-08-04 06:31:22
|
Hi Kaarle,
I've managed to fix this bug in HTMLImageScanner. Meanwhile, there =
is a small issue - it seems that parseParameters() cannot handle -=20
<tag name=3D"">
I'd expect to have an empty string in the hashtable, but the testcase =
breaks. (HTMLImageScannerTest.testMissingEqualTo()). Although for this =
release, we can go without this fix. I will put in a report soon.
Cheers,
Somik
----- Original Message -----=20
From: Kaarle Kaila=20
To: htm...@li...=20
Sent: Sunday, August 04, 2002 1:00 PM
Subject: Re: [Htmlparser-developer] Bug Report
On Sunday 04 August 2002 04:07, Somik Raha wrote:
> Hi Claude
> I've fixed this bug, but I found another on the page you sent =
which I
> dont know how to fix : <img src"/images/spacer.gif" width=3D"1" =
height=3D"1"
> alt=3D"">
I would say that no reason to accept it as src=3D"/images/spacer.gif"
but maybe it could be accepted as 'src/images/spacer.gif'=20
or "/images/spacer.gif" or someting similar i.e as just a bad =
parameter
name without value. I don't know how parseParameters would take it
but it should probably do something like that.
regards
Kaarle
>
> This one is driving me crazy - how can a browser accept this!!
> Anyway, I am throwing exceptions now.. I need to think and see if =
its
> possible to accept this as well.
>
> Regards,
> Somik
--=20
-------------------------------------------
Kaarle Kaila
mailto:kaa...@ik...
http://www.iki.fi/kaila
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|