[Htmlparser-developer] factories and prototypes
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2003-10-06 10:24:25
|
was subject: Re: [Htmlparser-developer] RE: question about using HTMLParser in Apache JMeter Joshua, The parser can be a NodeFactory with just three additional methods. It's still replaceable because the factory is set on the Lexer, i.e. clients can still create and set their own NodeFactory, even using the parser as a delegate for methods they don't want to handle. A major benefit of interface design is to avoid spurious trivial classes. A node that's visitable has a signature: void accept (NodeVisitor visitor) By incorporating that signature, because the NodeVisitor class knows about specific high level composite node types (why only Image, Link and Title?), the low level Lexer jar file would have to drag in a whole lot of other stuff. So currently the low level tags only implement (vacuously): void accept (Object visitor) and then the high level Tag class thunks up to the more specific signature with an up-cast. If NodeVisitor were to only handle base types (String, Remark and Tag) this could be avoided. The fact that the NodeVisitor class knows about ImageTag, LinkTag and TitleTag makes it less useful in the presence of user supplied node types; but that's it's inherent flaw. Getting data into user supplied nodes is easy: each tag is presented with the attributes and children found by the scanner, what else is there? The current implementation does it the other way, each scanner is the one that figures out the special data and then creates a new specialized tag by some byzantine constructor taking arguments that only it can understand. The tag is reduced to regurgitating the simple strings it was given. Typical example; FrameScanner has extractFrameLocn() and extractFrameName() which it passes into the FrameTag constructor. Why not have FrameTag figure this stuff out? The TagScanner class is abstract, partly because of the signature: protected abstract Tag createTag(TagData tagData, Tag tag, String url) throws ParserException; Each scanner has code like: public Tag createTag(TagData tagData, CompositeTagData compositeTagData) throws ParserException { return new BulletList(tagData,compositeTagData); } With a 'Prototype' solution, the TagScanner class could implement: public Tag createTag(TagData tagData, CompositeTagData compositeTagData) throws ParserException { Tag tag = mBlastocyst.get (tagData.getTagName ()); if (null == tag) tag = new Tag (tagData, compositeTagData); // should use the NodeFactory else { tag = (Tag)tag.clone (); tag.setData (tagData, compositeTagData); } return (tag); } which would remove the need for each class to implement it. How would you remove the createTag() code from all the scanners without prototypes? The above is couched in current TagData format, but in reality it would be more like: tag = (Tag)tag.clone (); tag.setAttributes (attributes); tag.setChildren (children); Derrick Joshua Kerievsky wrote: > Derrick Oswald wrote: > >> Yes. In the transition from using a straight Lexer to get basic >> nodes (lexer.nodes package), to using the Parser to get nodes that >> can be visited (htmlparser package), the Lexer needs to generate >> nodes it was not compiled with. Hence the Parser replaces the Lexer >> as the NodeFactory that the Lexer calls when it needs to create a Node. > > > IMO, the NodeFactory is better off as its own object. The Parser can > use a default instance of it. Clients can configure the Parser to use > a specific NodeFactory. This is important for decorating nodes and > tags. In addition, we don't want to give the Parser too many > responsibilities, as it complicates its design. > > At present, we've made some choices about which tags are visitable - > i.e. visitable nodes and tags are hard-coded into our NodeVisitor > class. I'm not sure what you mean above when you write "using the > Parser to get nodes that can be visited"? > >> I'm thinking this concept should be augmented in the Parser's >> createTagNode to look up the name of the node (from the attribute >> list provided), and create specific types of tags (FormTag, TableTag >> etc.) by cloning empty tags from a Hashtable of possible tag types >> (possibly called mBlastocyst in reference to undifferentiated stem >> cells). > > > Sounds like the Prototype pattern. The trouble with this approach is > getting the right data into the node/tag. You can clone a tag that > has no data, then you got to get the right data into the tag. Since > different tags have different data needs, it gets complicated. Have > you considered these issues? > >> This would provide a concrete implementation of createTag in >> CompositeTagScanner, removing a lot of near duplicate code from the >> scanners, and allow end users to plug in their own tags via a call like >> setTagFor ("BODY", new myBodyTag()) >> on the Parser. Details on interaction with the scanners have to be >> worked out, but it seems the end user wouldn't have to replace the >> scanner to get their own tags out. > > > When you say "this would provide a concrete ...." I don't follow. Why > is a Prototype-based createTagNode method a prerequisite for removing > near duplicate code in the scanners? i.e. couldn't that be done > regardless of whether a Prototype solution is used? What am I missing? > > best regards > jk > |