Thread: [Htmlparser-developer] Fixed major bug

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
     A major bug fix has been done. I had previously reported that the =
parser crashes when encountering very dirty html of the form :
<A HREF=3D"http://www.somelink.com">SomeText<A>

Instead of the end tag, we put in a begin tag by mistake, and the parser =
promptly crashes. This called for a modification in the evaluate() =
method, as the current scanners dont have more than existing local info =
about the parsing process. But now, Ive introduced a parameter - which =
takes in the scanner. So, if a tag was being parsed, and in the process =
of the parsing, another tag starts being parsed, then the second tag =
will now know that a scanner process is already running.

This enables the HTMLLinkScanner to come to the conclusion that its =
current parsing activity is of a dirty html tag, and hence take the =
appropriate action (flag the scanner into a dirty mode, and return an =
HTMLEndTag - which is expected by the previous scanner).

This solves this bug - and finally we can handle some really crazy =
pages...
This fix and some others, along with some additions (META and TITLE) =
will make it to release 1.1 (coming soon). Currently, the latest code is =
available thru CVS.

In case any of you have written your own scanners - you will need to =
modify the evaluate method signature to be compatible with the new =
HTMLTagScanner.

Regards,
Somik

Thread: [Htmlparser-developer] Fixed major bug

htmlparser-developer