Thread: [Htmlparser-developer] Bug Report
Brought to you by:
derrickoswald
From: Claude D. <CD...@ar...> - 2002-07-30 20:31:17
Attachments:
tech_chat_archives.html
|
We've found a number of documents, from the same site, that use a convention in the source documents that the browsers seem to deal with well enough but that HTMLParser hangs on. While it's arguable this is not valid HTML these documents should probably not cause hanging behavior. The "<!-->" sequence (not including quotes) is apparently at fault. If the parser recognized and ignored these, I think this would help. I've attached a document that causes this hanging behavior. |
From: Somik R. <so...@ya...> - 2002-07-31 00:53:37
|
Hi Claude, I will take a look at this as soon as I get some time. One request = -- could you open a bug report from http://htmlparser.sourceforge.net Cheers, Somik ----- Original Message -----=20 From: Claude Duguay=20 To: htm...@li...=20 Sent: Wednesday, July 31, 2002 5:31 AM Subject: [Htmlparser-developer] Bug Report We've found a number of documents, from the same site, that use a convention in the source documents that the browsers seem to deal with well enough but that HTMLParser hangs on. While it's arguable this is not valid HTML these documents should probably not cause hanging behavior. The "<!-->" sequence (not including quotes) is apparently at fault. If the parser recognized and ignored these, I think this would help. I've attached a document that causes this hanging behavior. |
From: Somik R. <so...@ya...> - 2002-08-04 01:07:43
|
Hi Claude I've fixed this bug, but I found another on the page you sent which = I dont know how to fix : <img src"/images/spacer.gif" width=3D"1" height=3D"1" alt=3D""> This one is driving me crazy - how can a browser accept this!! Anyway, I am throwing exceptions now.. I need to think and see if its = possible to accept this as well. Regards, Somik |
From: Kaarle K. <kaa...@ik...> - 2002-08-04 04:02:34
|
On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent which= I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" height=3D= "1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila |
From: Somik R. <so...@ya...> - 2002-08-04 06:09:36
|
Hi Kaarle, I was also thinking the fix might be done in parseParameters(). But the point is - as humans, we can easily tell that it should be = taken as src=3D. So this correction should be possible... I found that the current = crash is happening in the HTMLImageScanner class- which means = parseParameters can be left as is, and we could try to add this = intelligence (correction) from the scanner end - and perhaps fix the tag = and call for it to be parsed again.=20 =20 A second reason is this kind of smart logic makes sense only in a = particular context, and it might not be good to clutter = parseParameters() which has to stay as optimal as possible. I will try = to work on these lines, and see if a fix is possible. Cheers, Somik =20 ----- Original Message -----=20 From: Kaarle Kaila=20 To: htm...@li...=20 Sent: Sunday, August 04, 2002 1:00 PM Subject: Re: [Htmlparser-developer] Bug Report On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent = which I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" = height=3D"1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad = parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if = its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |
From: Somik R. <so...@ya...> - 2002-08-04 06:31:22
|
Hi Kaarle, I've managed to fix this bug in HTMLImageScanner. Meanwhile, there = is a small issue - it seems that parseParameters() cannot handle -=20 <tag name=3D""> I'd expect to have an empty string in the hashtable, but the testcase = breaks. (HTMLImageScannerTest.testMissingEqualTo()). Although for this = release, we can go without this fix. I will put in a report soon. Cheers, Somik ----- Original Message -----=20 From: Kaarle Kaila=20 To: htm...@li...=20 Sent: Sunday, August 04, 2002 1:00 PM Subject: Re: [Htmlparser-developer] Bug Report On Sunday 04 August 2002 04:07, Somik Raha wrote: > Hi Claude > I've fixed this bug, but I found another on the page you sent = which I > dont know how to fix : <img src"/images/spacer.gif" width=3D"1" = height=3D"1" > alt=3D""> I would say that no reason to accept it as src=3D"/images/spacer.gif" but maybe it could be accepted as 'src/images/spacer.gif'=20 or "/images/spacer.gif" or someting similar i.e as just a bad = parameter name without value. I don't know how parseParameters would take it but it should probably do something like that. regards Kaarle > > This one is driving me crazy - how can a browser accept this!! > Anyway, I am throwing exceptions now.. I need to think and see if = its > possible to accept this as well. > > Regards, > Somik --=20 ------------------------------------------- Kaarle Kaila mailto:kaa...@ik... http://www.iki.fi/kaila ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |