[Htmlparser-developer] Fixed major bug
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-03-31 09:19:42
|
Hi Folks, A major bug fix has been done. I had previously reported that the = parser crashes when encountering very dirty html of the form : <A HREF=3D"http://www.somelink.com">SomeText<A> Instead of the end tag, we put in a begin tag by mistake, and the parser = promptly crashes. This called for a modification in the evaluate() = method, as the current scanners dont have more than existing local info = about the parsing process. But now, Ive introduced a parameter - which = takes in the scanner. So, if a tag was being parsed, and in the process = of the parsing, another tag starts being parsed, then the second tag = will now know that a scanner process is already running. This enables the HTMLLinkScanner to come to the conclusion that its = current parsing activity is of a dirty html tag, and hence take the = appropriate action (flag the scanner into a dirty mode, and return an = HTMLEndTag - which is expected by the previous scanner). This solves this bug - and finally we can handle some really crazy = pages... This fix and some others, along with some additions (META and TITLE) = will make it to release 1.1 (coming soon). Currently, the latest code is = available thru CVS. In case any of you have written your own scanners - you will need to = modify the evaluate method signature to be compatible with the new = HTMLTagScanner. Regards, Somik |