Re: [Htmlparser-developer] toPlainTextString() feedback requested

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Sam,

> I guess my surprise at your (or perhaps Somik's) focus on refactoring
> comes from the fact that while the htmlparser is a great piece of
> software, the javadocs and other documentation could use some attention.
>  For example, I can't find any explanation in the javadocs or otherwise
> of how the filters are supposed to work with the different scanners, or
> what values they are allowed to take.

The last few weeks, I've been only adding docs...
The Sample Programs are new - you will find an example of using filters here
:
http://htmlparser.sourceforge.net/samples/linksEmbedded.html

From the javadoc :
http://htmlparser.sourceforge.net/javadoc/org/htmlparser/HTMLNode.html
(check collectInto)

I guess more javadoc should be written about this, but I didn't bother to
write it bcos the filters were mostly used for demonstration from the
command line. When you type java -jar htmlparser.jar, you would see a help
menu of each filter.

Its only with the introduction of Collection Parameter that we've been using
filter strings for actual collection of data, and that has been documented.
If its not important, its not documented :).

> I generally work to "if it's not broken don't fix it", but I often add
> "before you start fixing it, make sure your documentation is up to date".
> Using the Visitor pattern may make it easier for clients to get the data
> they need, but given that the htmlparser is "working" (well it works for
> me), I would say that the more urgent issue here is making sure all the
> documentation is up to date.   I have a lot of positive things to say
> about htmlparser, so don't take it the wrong way, when I say that the
> biggest problem I've had in using it in the last few weeks is inadequate
> javadocs.

Sam - feel free to post as often as you like on the list. I'd be glad to
help you out. The lack of adequate documentation is of course my
responsibility and in light of my explanation above, if you can suggest
other areas of inadequate docs, I'll look into it.

> >Duplication removal is reason #1.
> >
> As I mention above.  One should be careful of duplication removal for
> the sake of it.
>
> > Removal of hard-coded logic is reason #2.
> >
> This is a good reason.  However I get the feeling that introduction of
> these Visitor classes will make the system conceptually more difficult
> to use rather than easier.  I would feel better if the current set up
> was more fully documented before more complexity was added.
>
> And even if the Visitor pattern is used, I would recommend leaving
> methods like toPlainTextString() etc in place, but just making them
> short cut implementations to certain kinds of visitor-using methods.
>  This will allow people who have yet to grasp the Visitor  pattern
> something to work with.

Don't worry, we'll leave toPlainTextString() alone - in fact, we've already
begun doing the short cut implementations. :)
The purpose of my initial mail was not to alarm you, but to know more about
realistic "customer" stories.

> If you are keen to see lots of people using htmlparser, I think that you
> don't want people to have to come to terms with too many new concepts at
> once.  You say yourself that the Visitor pattern takes some getting used
> to.  I think the whole scanner concept takes some getting used to ....
>

Bytway, have you gone thru the Sample Programs - I should've thought that
this new addition will make life very simple. If not, we'd probably need
more docs..

Regards,
Somik