Hi Folks,
A major bug fix has been done. I had previously reported that the =
parser crashes when encountering very dirty html of the form :
<A HREF=3D"http://www.somelink.com">SomeText<A>
Instead of the end tag, we put in a begin tag by mistake, and the parser =
promptly crashes. This called for a modification in the evaluate() =
method, as the current scanners dont have more than existing local info =
about the parsing process. But now, Ive introduced a parameter - which =
takes in the scanner. So, if a tag was being parsed, and in the process =
of the parsing, another tag starts being parsed, then the second tag =
will now know that a scanner process is already running.
This enables the HTMLLinkScanner to come to the conclusion that its =
current parsing activity is of a dirty html tag, and hence take the =
appropriate action (flag the scanner into a dirty mode, and return an =
HTMLEndTag - which is expected by the previous scanner).
This solves this bug - and finally we can handle some really crazy =
pages...
This fix and some others, along with some additions (META and TITLE) =
will make it to release 1.1 (coming soon). Currently, the latest code is =
available thru CVS.
In case any of you have written your own scanners - you will need to =
modify the evaluate method signature to be compatible with the new =
HTMLTagScanner.
Regards,
Somik
|