Re: [Htmlparser-developer] toPlainTextString() feedback requested

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Dhaval, Sam,
> I agree with Sam when he says that "don't fix it when its not broken".

I quite disagree with both of u on this philosophy. Looking at the code with
Joshua has made me realize its actually ghastly. A serious code cleanup is
needed, there is just too much duplication! Most of the scanner code is
unreadable, while the tag constructors are a nightmare. I dont think I will
be at peace till a couple of rounds of refactoring has been completed.

I do not think the panic on the visitor is warranted. Like I said before,
the current access methods will continue to be present. However, having a
visitor will make life simpler, as in  - there's so much code now that uses
the same loop over and over again. We can replace code like :

Vector links = new Vector();
for (Enumeration e = parser.elements();e.hasMoreElements();) {
    HTMLNode node = (HTMLNode)e.nextElement();
    if (node instanceof HTMLLinkTag) {
        links.add(node);
    }
}

with :

HTMLLinkVisitor linkVisitor = new HTMLLinkVisitor();
collectNodesWith(linkVisitor);
Vector links = linkVisitor.getResult();

This looks so much more readable and simple. Of course, you could still use
the old way - its just that you would have the option of making life easier.

In the latest round of refactorings, we've put in a HTMLCompositeTag, from
which the link, form and select tags inherit. The toPlainTextString() and
toHTML() methods are now in the parent class. All tests passing (except the
charset tests).

Regards,
Somik