Re: [Htmlparser-developer] RE: question about using HTMLParser in Apache JMeter

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Derrick Oswald wrote:

> Yes.  In the transition from using a straight Lexer to get basic nodes 
> (lexer.nodes package), to using the Parser to get nodes that can be 
> visited (htmlparser package), the Lexer needs to generate nodes it was 
> not compiled with.  Hence the Parser replaces the Lexer as the 
> NodeFactory that the Lexer calls when it needs to create a Node.

IMO, the NodeFactory is better off as its own object.  The Parser can 
use a default instance of it.  Clients can configure the Parser to use a 
specific NodeFactory.  This is important for decorating nodes and tags.  
In addition, we don't want to give the Parser too many responsibilities, 
as it complicates its design.

At present, we've made some choices about which tags are visitable - 
i.e. visitable nodes and tags are hard-coded into our NodeVisitor 
class.  I'm not sure what you mean above when you write "using the 
Parser to get nodes that can be visited"?

> I'm thinking this concept should be augmented in the Parser's 
> createTagNode to look up the name of the node (from the attribute list 
> provided), and create specific types of tags (FormTag, TableTag etc.) 
> by cloning empty tags from a Hashtable of possible tag types (possibly 
> called mBlastocyst in reference to undifferentiated stem cells).

Sounds like the Prototype pattern.   The trouble with this approach is 
getting the right data into the node/tag.  You can clone a tag that has 
no data, then you got to get the right data into the tag.  Since 
different tags have different data needs, it gets complicated.  Have you 
considered these issues?

> This would provide a concrete implementation of createTag in 
> CompositeTagScanner, removing a lot of near duplicate code from the 
> scanners, and allow end users to plug in their own tags via a call like
>   setTagFor ("BODY", new myBodyTag())
> on the Parser. Details on interaction with the scanners have to be 
> worked out, but it seems the end user wouldn't have to replace the 
> scanner to get their own tags out.

When you say "this would provide a concrete ...." I don't follow.  Why 
is a Prototype-based createTagNode method a prerequisite for removing 
near duplicate code in the scanners?   i.e. couldn't that be done 
regardless of whether a Prototype solution is used?  What am I missing?

best regards
jk