RE: [Htmlparser-developer] Update (Claude - ur feedback needed)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

My own expectations are fairly simple.
=20
1) If the page is unparsable because it is ill-formed, the parser should
throw an exception. This is a priority behavior in that it is better for
the parser to report problems than it is for it to hang because the
internal logic to handle ill-formed documents has gotten too complicated
or unpredictable.
=20
2) If it is possible for the parser to handle certain types of
ill-formed documents, this should be considered a desirable feature, but
never at the expense of handling properly formed documents or notifiying
the library user that something went wrong if it couldn't.
=20
It may be best to consider these separate issues. Since item 1 is
imperative and item 2 is a feature, you may want to consider making item
2 a selectable feature. That is to say, there may be a need to have a
'strict' mode that never handles ill-formed documents (which has plenty
of value in and of itself, given that some folks actually want to
recognize bad HTML), and another 'liberal' mode, that does it''s best to
compensate for flaws in the document.
=20
The problem with compensating for ill-formed documents will always be
that to handle it one way may interfere with an alternate
interpretation, which in some cases may also be correct. In cases where
there is not alternate interpretation, the solution is simple. I cases
where an alternate interpretation is possible, the code is inevitably
wrong to someone who wanted to see the alternate behavior. It's probably
best, then, to further separate the compensation criteria to handle ONLY
those cases where the interpretation is unambiguous.
=20
-----Original Message-----
From: Somik Raha [mailto:so...@ya...]=20
Sent: Tuesday, August 06, 2002 12:11 AM
To: htm...@li...
Subject: Re: [Htmlparser-developer] Update (Claude - ur feedback needed)

Hi Kaarle,
    It seems like we may have acted hastily in correcting this (even in
HTMLImageScanner). I just tried Claude's page again, and I find that the
image is not parsed. Amit also mentioned sometime back that we ought to
flag some kind of error.=20
    Of course IE does not collapse- it continues parsing.=20
    So - I think you should not put in this fix to parseParameters(). I
should also rollback my fix and throw an error (?) - or probably throw a
bad image tag, where you cannot retrieve the data.
    OTOH - the other side of the coin is - if someday people decide to
kick IE out, and write a new browser with this parser, such pages would
work fine. In which case, it would be good to have fixes like this.
=20
    I find myself tilting to the former argument, however attractive the
latter may sound. Amit, Claude--> what are your comments ?
    Claude - as this bug was reported by you - I'd like to ask what do
you expect ?
=20
Regards,
Somik
=20
=20

----- Original Message -----=20
From: Kaarle Kaila <mailto:kaa...@kk...> =20
To: so...@ya... ; htm...@li...=20
Sent: Tuesday, August 06, 2002 4:07 PM
Subject: Re: [Htmlparser-developer] Update

I still had a look at the code and made a small addition
that would accept <a b"c"> as <a b=3D"c">
Would it be usefull to have it inserted into CVS?
or is it OK as it is?

regards
Kaarle

PS! I can't access CVS until the evening=20

---- Original Message ----
From: so...@ya...
To: htm...@li...
Subject: Re: [Htmlparser-developer] Update
Date: Tue, 6 Aug 2002 15:42:29 +0900

>Hi Kaarle,
>    Thanks for the clarification.
>
>Regards,
    >Somik
>
>  >I did not really do that I think. I just made a testcase that=20
>seems=20
>  >to verify=20
>  >that <a b"c"> will be assume to be <a b>   , same as <a b=3D"">
>  >
>  >Oh - then what happens to c, is it ignored?=20
>  >
>
>  Yes! That's what seems to happen. As I said I only added a testcase
>  to verify what happens. I did not change the code for this purpose.
>
>  regards
>  Kaarle
>
>
>
>  >Cheers,
>  >Somik
>  >
>  -----------------------------
>  Kaarle Kaila
>  http://www.iki.fi/kaila
>  mailto:kaa...@ik...
>
>
>
>  -------------------------------------------------------
>  This sf.net email is sponsored by:ThinkGeek
>  Welcome to geek heaven.
>  http://thinkgeek.com/sf
>  _______________________________________________
>  Htmlparser-developer mailing list
>  Htm...@li...
>  https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
-----------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...