[Htmlparser-developer] Integration Release 1.2-2002_07_28 is out
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-07-28 07:26:51
|
Hi Folks, This week's integration release is out - 1.2-2002_07_28. This contains some major bug fixes. They are : [1] Fixed bug in HTMLParser.openConnection(), mistaking files for urls if they contain "http" or "www" anywhere. [2] Updated HTMLEndTag, this was accidentally left out in the previous release. [3] Fixed Bug 586062 - relative links bug - if first char is a slash, then the subdirectories of the url need to be ignored. [4] Fixed Bug 586222 - HTMLRemarkNode bug - if a line with a remark ndoe contains a string before it, the string is ignored. [5] Fixed major bug - allowing auto-correction of malformed tags. Current code is very robust. Fix allowed removal of strictness vector concept, making the design simpler. [6] Fixed bug 586756 - in HTMLRemarkNode, if there are empty lines only, the finite state machine would crash My thanks to John Zook and Cedric Rosa for bug reports and suggestions. Bytway, the strictness vector concept has been removed as I mentiond in point [5] - this is probably the most important fix in this release. The parser now begins to show some intelligence- it can auto-correct tags and put inverted commas at the right places. All test cases are passing, and I have put in some intensive amount of testing. Tags like : [1] <Meta name="sdsd" value="sdsds""> [2] <Meta name="sdsd" value="sdsd"sds"> [3] <Meta name="sadd" value="sdsd " sdsd sds "> can be handled now. In case 2 and 3 - the parser corrects them to <Meta name="sdsd" value="sdsdsds"> and <Meta name="sadd" value="sdsd sdsd sds "> respectively. We can also handle tags of a fourth kind : [4] <crazy tag="</I>" dfkdlkfld=dfdf> The criterion now is, if within the inverted comma, there is a begin tag, then we shall expect an end tag, and not think its an error. This is a fundamental change in the parsing automaton in HTMLTag.java. Regards, Somik |