[Htmlparser-developer] Big Architecture Overhaul
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2001-12-24 09:35:04
|
Hi Folks, I have finally fulfilled my promise - a major overhaul of the design = is done - all test cases are passing. I have updated the latest code on = CVS. Ive tried to keep the interface consistent, so user applications = wont break. The changes are mainly internal. However, big change is - = you need to call registerScanners() on the parser object.=20 No more confusing anonymous scanner registration. You can register = by calling parser.addScanner(some scanner object), and also remove the = same. Was able to do all this within an hour (thanks to the test cases). Bad news though - I discovered two bugs (which I verified, have = existed earlier) -=20 [1] When scanning yahoo.com, the parser goes into an infinite loop [2] In extractImageLocn(), there seems to be some problem in parsing = dynamic links, in constructing relative paths.=20 Also extractImageLocn is badly in need of refactoring. I think we can look forward to a release of HTMLParser 1.0 pretty soon = with these two bugs fixed, and also incorporating parseParameters inside = the Scanners' logic. Looking forward to your comments (bug findings) and = help. Cheers, Somik |