Hi Folks,
I have finally fulfilled my promise - a major overhaul of the design =
is done - all test cases are passing. I have updated the latest code on =
CVS. Ive tried to keep the interface consistent, so user applications =
wont break. The changes are mainly internal. However, big change is - =
you need to call registerScanners() on the parser object.=20
No more confusing anonymous scanner registration. You can register =
by calling parser.addScanner(some scanner object), and also remove the =
same.
Was able to do all this within an hour (thanks to the test cases).
Bad news though - I discovered two bugs (which I verified, have =
existed earlier) -=20
[1] When scanning yahoo.com, the parser goes into an infinite loop
[2] In extractImageLocn(), there seems to be some problem in parsing =
dynamic links, in constructing relative paths.=20
Also extractImageLocn is badly in need of refactoring.
I think we can look forward to a release of HTMLParser 1.0 pretty soon =
with these two bugs fixed, and also incorporating parseParameters inside =
the Scanners' logic. Looking forward to your comments (bug findings) and =
help.
Cheers,
Somik
|