[Htmlparser-developer] Big Architecture Overhaul

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Folks,
    I have finally fulfilled my promise - a major overhaul of the design =
is done - all test cases are passing. I have updated the latest code on =
CVS. Ive tried to keep the interface consistent, so user applications =
wont break. The changes are mainly internal. However, big change is - =
you need to call registerScanners() on the parser object.=20
    No more confusing anonymous scanner registration. You can register =
by calling parser.addScanner(some scanner object), and also remove the =
same.
    Was able to do all this within an hour (thanks to the test cases).

    Bad news though - I discovered two bugs (which I verified, have =
existed earlier) -=20

[1] When scanning yahoo.com, the parser goes into an infinite loop
[2] In extractImageLocn(), there seems to be some problem in parsing =
dynamic links, in constructing relative paths.=20

Also extractImageLocn is badly in need of refactoring.

I think we can look forward to a release of HTMLParser 1.0 pretty soon =
with these two bugs fixed, and also incorporating parseParameters inside =
the Scanners' logic. Looking forward to your comments (bug findings) and =
help.

Cheers,
Somik