Hi Folks,
We seem to have a heroic parser now...
You can check out the latest code from CVS.
Here's the fix. As you know - if we have an additional erroneous =
inverted comma in a tag, the parser cannot judge whether to treat this =
as erroneous or valid. Now the parser has some amount of intelligence - =
if it encounters an inverted comma, and a close tag character, then it =
does a check to see whether it should treat this as an error or a valid =
character.
This decision making process is facilitated with a strictVector - =
which holds the tags for which it should not make allowances. Currently, =
there is only one - "INPUT" (Should we have any more? ). If the tag =
being parsed is not a strict tag like INPUT, then it is assumed that =
this is an erroneous tag and needs to be corrected.
The correction process occurs (and is validated with some testcases =
in HTMLTag - particularly testStrictParsing). If you go thru that =
testcase - you will see that the attributes are also correctly =
retrieved.
This solution doesent break anything else - we have 82 testcases, =
all passing.
I'd be grateful if folks can test this version and let me know if =
this solution is acceptable.
=20
Also - a general question - would you prefer something like nightly =
drop packages for downloading, or is a request to checkout from CVS fine =
?
Thanks and Regards,
Somik =20
|