Re: [Htmlparser-developer] factories and prototypes
Brought to you by:
derrickoswald
From: Joshua K. <jo...@in...> - 2003-10-06 18:56:49
|
Derrick Oswald wrote: > The parser can be a NodeFactory with just three additional methods. > It's still replaceable because the factory is set on the Lexer, i.e. > clients can still create and set their own NodeFactory, even using the > parser as a delegate for methods they don't want to handle. A major > benefit of interface design is to avoid spurious trivial classes. Let me see if I can understand your design. You want a user of the parser to first get access to the Lexar to then set which NodeFactory to use? I must be misunderstanding something. Most users of the parser shouldn't even know the Lexar exists, right? It's a low-level detail to average parser users. A NodeFactory encapsulates data and methods used in node/tag creation - nothing spurious or trivial about it. In fact, small classes (such as NodeFactory) which have one responsibility are easier to understand, extend and maintain. Furthermore, one method on the parser is all it takes to let parser users set a NodeFactory instance. On the other hand, the current implementation has three separate methods to handle node/tag creation. I dislike that design because: * it bloats the Parser interface, which is already heavily bloated with too many methods * it gives the Parser a new responsibility which it has no business having: node/tag creation * it adds code to an already fat Parser class that's overburdened with responsibilities. I'm sensing that you prefer to build and work with Large Classes. Is that correct? If so, are you aware that Large Class is a smell? See Refactoring: Improving the Design of Existing Code, by Martin Fowler. The chapter on smells was co-written by Kent Beck and Martin Fowler. > A node that's visitable has a signature: > void accept (NodeVisitor visitor) Yeah, I'm the guy who popularized the use of Visitors in the parser - remember? You were against their usage. Have you come around to the dark side? > By incorporating that signature, because the NodeVisitor class knows > about specific high level composite node types (why only Image, Link > and Title?), the low level Lexer jar file would have to drag in a > whole lot of other stuff. So currently the low level tags only > implement (vacuously): > void accept (Object visitor) > and then the high level Tag class thunks up to the more specific > signature with an up-cast. If NodeVisitor were to only handle base > types (String, Remark and Tag) this could be avoided. The fact that > the NodeVisitor class knows about ImageTag, LinkTag and TitleTag makes > it less useful in the presence of user supplied node types; but that's > it's inherent flaw. When I wrote NodeVisitor, I deliberately avoided making it aware of nodes or tags beyond StringNode, Tag and EndTag. The reason? To be able to visit other node/tag types, scanners must be registered and a Visitor, being separate from the whole scanner mechanism, cannot guarantee that a given scanner is registered. Over time, people started adding visitXYZ methods to the NodeVisitor interface, such as visitLink, etc. Was that necessary? I don't think so. If one needs information about Links, Images, etc., one doesn't need to use a Visitor. If we use reflection, we can likely make a NodeVisitor that could visit any node/tag type. That would perhaps be slow, since reflection is slow, but it would be a useful experiment. In addition, those who use a reflection-based Visitor may not care about speed. > Getting data into user supplied nodes is easy: each tag is presented > with the attributes and children found by the scanner, what else is > there? The current implementation does it the other way, each scanner > is the one that figures out the special data and then creates a new > specialized tag by some byzantine constructor taking arguments that > only it can understand. The tag is reduced to regurgitating the simple > strings it was given. Typical example; FrameScanner has > extractFrameLocn() and extractFrameName() which it passes into the > FrameTag constructor. Why not have FrameTag figure this stuff out? > > The TagScanner class is abstract, partly because of the signature: > protected abstract Tag createTag(TagData tagData, Tag tag, String > url) throws ParserException; > Each scanner has code like: > public Tag createTag(TagData tagData, CompositeTagData > compositeTagData) throws ParserException > { > return new BulletList(tagData,compositeTagData); > } > With a 'Prototype' solution, the TagScanner class could implement: > public Tag createTag(TagData tagData, CompositeTagData > compositeTagData) throws ParserException > { > Tag tag = mBlastocyst.get (tagData.getTagName ()); > if (null == tag) > tag = new Tag (tagData, compositeTagData); // should use > the NodeFactory > else > { > tag = (Tag)tag.clone (); > tag.setData (tagData, compositeTagData); > } > return (tag); > } > which would remove the need for each class to implement it. How would > you remove the createTag() code from all the scanners without prototypes? How would a prototype approach account for the stack in the following code, from the OptionTagScanner: public Tag createTag( TagData tagData, CompositeTagData compositeTagData) { if (!stack.empty () && (this == stack.peek ())) stack.pop (); return new OptionTag(tagData,compositeTagData); } BTW, FormTagScanner has a similar stack. Believe it or not, Derrick, I like the Prototype pattern and have even considered using it within StringNodeFactory - I didn't proceed because I didn't find a genuine need. Now you've uncovered a possible real need for Prototype in the parser -- I'm all for exploring it. I just want to be clear about what we're doing. You say we could remove a lot of duplicated code in the scanners - I can see lots of code that creates specific tag instances and yes, Prototype can help make that code go away. However the scanners also appear to do useful work (like usage of a stack or implementing the evaluate method) and I'm not seeing how that would easily transfer to the node/tag classes without making those classes overly complex. best regards, jk |