Thread: [Htmlparser-developer] Integration Release 1.2-2002_07_28 is out

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
    This week's integration release is out - 1.2-2002_07_28.

    This contains some major bug fixes. They are :
[1] Fixed bug in HTMLParser.openConnection(), mistaking files for urls if
they contain "http" or "www" anywhere.
[2] Updated HTMLEndTag, this was accidentally left out in the previous
release.
[3] Fixed Bug 586062 - relative links bug - if first char is a slash, then
the subdirectories of the url need to be ignored.
[4] Fixed Bug 586222 - HTMLRemarkNode bug - if a line with a remark ndoe
contains a string before it, the string is ignored.
[5] Fixed major bug - allowing auto-correction of malformed tags. Current
code is very robust. Fix allowed removal of strictness vector concept,
making the design simpler.
[6] Fixed bug 586756 - in HTMLRemarkNode, if there are empty lines only, the
finite state machine would crash

My thanks to John Zook and Cedric Rosa for bug reports and suggestions.
Bytway, the strictness vector concept has been removed as I mentiond in
point [5] - this is probably the most important fix in this release. The
parser now begins to show some intelligence- it can auto-correct tags and
put inverted commas at the right places. All test cases are passing, and I
have put in some intensive amount of testing.

Tags like :
[1] <Meta name="sdsd" value="sdsds"">
[2] <Meta name="sdsd" value="sdsd"sds">
[3] <Meta name="sadd" value="sdsd " sdsd  sds ">

can be handled now. In case 2 and 3 - the parser corrects them to
<Meta name="sdsd" value="sdsdsds"> and
<Meta name="sadd" value="sdsd  sdsd  sds "> respectively.

We can also handle tags of a fourth kind :
[4] <crazy tag="</I>" dfkdlkfld=dfdf>

The criterion now is, if within the inverted comma, there is a begin tag,
then we shall expect an end tag, and not think its an error. This is a
fundamental change in the parsing automaton in HTMLTag.java.

Regards,
Somik

Thread: [Htmlparser-developer] Integration Release 1.2-2002_07_28 is out

htmlparser-developer