RE: [Htmlparser-developer] Update (Claude - ur feedback needed)
Brought to you by:
derrickoswald
From: Claude D. <CD...@ar...> - 2002-08-06 16:19:06
|
My own expectations are fairly simple. =20 1) If the page is unparsable because it is ill-formed, the parser should throw an exception. This is a priority behavior in that it is better for the parser to report problems than it is for it to hang because the internal logic to handle ill-formed documents has gotten too complicated or unpredictable. =20 2) If it is possible for the parser to handle certain types of ill-formed documents, this should be considered a desirable feature, but never at the expense of handling properly formed documents or notifiying the library user that something went wrong if it couldn't. =20 It may be best to consider these separate issues. Since item 1 is imperative and item 2 is a feature, you may want to consider making item 2 a selectable feature. That is to say, there may be a need to have a 'strict' mode that never handles ill-formed documents (which has plenty of value in and of itself, given that some folks actually want to recognize bad HTML), and another 'liberal' mode, that does it''s best to compensate for flaws in the document. =20 The problem with compensating for ill-formed documents will always be that to handle it one way may interfere with an alternate interpretation, which in some cases may also be correct. In cases where there is not alternate interpretation, the solution is simple. I cases where an alternate interpretation is possible, the code is inevitably wrong to someone who wanted to see the alternate behavior. It's probably best, then, to further separate the compensation criteria to handle ONLY those cases where the interpretation is unambiguous. =20 -----Original Message----- From: Somik Raha [mailto:so...@ya...]=20 Sent: Tuesday, August 06, 2002 12:11 AM To: htm...@li... Subject: Re: [Htmlparser-developer] Update (Claude - ur feedback needed) Hi Kaarle, It seems like we may have acted hastily in correcting this (even in HTMLImageScanner). I just tried Claude's page again, and I find that the image is not parsed. Amit also mentioned sometime back that we ought to flag some kind of error.=20 Of course IE does not collapse- it continues parsing.=20 So - I think you should not put in this fix to parseParameters(). I should also rollback my fix and throw an error (?) - or probably throw a bad image tag, where you cannot retrieve the data. OTOH - the other side of the coin is - if someday people decide to kick IE out, and write a new browser with this parser, such pages would work fine. In which case, it would be good to have fixes like this. =20 I find myself tilting to the former argument, however attractive the latter may sound. Amit, Claude--> what are your comments ? Claude - as this bug was reported by you - I'd like to ask what do you expect ? =20 Regards, Somik =20 =20 ----- Original Message -----=20 From: Kaarle Kaila <mailto:kaa...@kk...> =20 To: so...@ya... ; htm...@li...=20 Sent: Tuesday, August 06, 2002 4:07 PM Subject: Re: [Htmlparser-developer] Update I still had a look at the code and made a small addition that would accept <a b"c"> as <a b=3D"c"> Would it be usefull to have it inserted into CVS? or is it OK as it is? regards Kaarle PS! I can't access CVS until the evening=20 ---- Original Message ---- From: so...@ya... To: htm...@li... Subject: Re: [Htmlparser-developer] Update Date: Tue, 6 Aug 2002 15:42:29 +0900 >Hi Kaarle, > Thanks for the clarification. > >Regards, >Somik > > >I did not really do that I think. I just made a testcase that=20 >seems=20 > >to verify=20 > >that <a b"c"> will be assume to be <a b> , same as <a b=3D""> > > > >Oh - then what happens to c, is it ignored?=20 > > > > Yes! That's what seems to happen. As I said I only added a testcase > to verify what happens. I did not change the code for this purpose. > > regards > Kaarle > > > > >Cheers, > >Somik > > > ----------------------------- > Kaarle Kaila > http://www.iki.fi/kaila > mailto:kaa...@ik... > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > ----------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... |