[Htmlparser-developer] Annette's bug(Dirty HTML) fixed - Some intelligence added to the parser
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-05-03 08:15:23
|
Hi Folks, We seem to have a heroic parser now... You can check out the latest code from CVS. Here's the fix. As you know - if we have an additional erroneous = inverted comma in a tag, the parser cannot judge whether to treat this = as erroneous or valid. Now the parser has some amount of intelligence - = if it encounters an inverted comma, and a close tag character, then it = does a check to see whether it should treat this as an error or a valid = character. This decision making process is facilitated with a strictVector - = which holds the tags for which it should not make allowances. Currently, = there is only one - "INPUT" (Should we have any more? ). If the tag = being parsed is not a strict tag like INPUT, then it is assumed that = this is an erroneous tag and needs to be corrected. The correction process occurs (and is validated with some testcases = in HTMLTag - particularly testStrictParsing). If you go thru that = testcase - you will see that the attributes are also correctly = retrieved. This solution doesent break anything else - we have 82 testcases, = all passing. I'd be grateful if folks can test this version and let me know if = this solution is acceptable. =20 Also - a general question - would you prefer something like nightly = drop packages for downloading, or is a request to checkout from CVS fine = ? Thanks and Regards, Somik =20 |