Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser Node.java,1.49,1.50 PrototypicalNodeFactory.java,1.7,
Brought to you by:
derrickoswald
From: Derrick O. <der...@us...> - 2004-06-14 00:07:02
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3132 Modified Files: Node.java PrototypicalNodeFactory.java package.html Log Message: Rework PrototypicalNodeFactory to use interfaces. Index: package.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/package.html,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** package.html 2 Jan 2004 16:24:52 -0000 1.20 --- package.html 14 Jun 2004 00:06:51 -0000 1.21 *************** *** 29,44 **** --> </head> ! <body bgcolor="white"> ! The basic API classes which will be used by most users when working with the html parser (the Parser class is the most important one in this). ! ! <h2>Related Documentation</h2> ! ! For overviews, tutorials, examples, guides, and tool documentation, please see: ! <ul> ! <li><a href="http://htmlparser.sourceforge.net">HTML Parser Home Page</a> ! </ul> ! ! <!-- Put @see and @since tags down here. --> ! </body> </html> --- 29,59 ---- --> </head> ! <body> ! The basic API classes which will be used by most developers when working with ! the HTML Parser. ! <p>The {@link org.htmlparser.Parser} class is the main high level class that ! provides simplified access to the contents of an HTML page. The page can be ! specified as either a URLConnection or a String. In the case of a String, an ! attempt is made to open it as a URL, and if that fails it assumes it is a local ! disk file. ! A wide range of methods is available to customize the operation of the Parser, ! as well as access specific pieces of the page as ! {@link org.htmlparser.Node Nodes}.</p> ! <p>The {@link org.htmlparser.NodeFactory} interface specifies the requirements ! for a developer to have the Parser or Lexer generate nodes. Three types of ! nodes are required: {@link org.htmlparser.Text}, {@link org.htmlparser.Remark} ! and {@link org.htmlparser.Tag Tags}. Tags contain lists ! of child nodes and {@link org.htmlparser.Attribute attributes}.</p> ! <p>The only provided implementation of the NodeFactory interface ! is the {@link org.htmlparser.PrototypicalNodeFactory} which ! operates by holding example nodes and cloning them as needed to satisfy the ! requests for nodes by the Parser. The Lexer is it's own NodeFactory, returning ! new {@link org.htmlparser.nodes.TextNode}, ! {@link org.htmlparser.nodes.RemarkNode} and undifferentiated ! {@link org.htmlparser.nodes.TagNode Tagnodes} (see the ! {@link org.htmlparser.nodes nodes} package).</p> ! <p>The {@link org.htmlparser.NodeFilter} interface is used by the filtering ! code to determine if a node meets a certain criteria. Some generic examples of ! filters can be found in the {@link org.htmlparser.filters filters} package. </body> </html> Index: Node.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v retrieving revision 1.49 retrieving revision 1.50 diff -C2 -d -r1.49 -r1.50 *** Node.java 24 May 2004 00:38:15 -0000 1.49 --- Node.java 14 Jun 2004 00:06:51 -0000 1.50 *************** *** 31,67 **** import org.htmlparser.visitors.NodeVisitor; public interface Node { /** ! * Returns a string representation of the node. This is an important method, it allows a simple string transformation ! * of a web page, regardless of a node.<br> ! * Typical application code (for extracting only the text from a web page) would then be simplified to :<br> * <pre> ! * Node node; ! * for (Enumeration e = parser.elements();e.hasMoreElements();) { ! * node = (Node)e.nextElement(); ! * System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string ! * } * </pre> */ ! public abstract String toPlainTextString(); /** ! * This method will make it easier when using html parser to reproduce html pages (with or without modifications) ! * Applications reproducing html can use this method on nodes which are to be used or transferred as they were ! * recieved, with the original html */ ! public abstract String toHtml(); /** * Return the string representation of the node. ! * Subclasses must define this method, and this is typically to be used in the manner<br> ! * <pre>System.out.println(node)</pre> ! * @return java.lang.String */ ! public abstract String toString(); /** ! * Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node * satisfies the filtering criteria.<P> * --- 31,98 ---- import org.htmlparser.visitors.NodeVisitor; + /** + * Specifies the minimum requirements for nodes returned by the Lexer or Parser. + * There are three types of nodes in HTML: text, remarks and tags. You may wish + * to define your own nodes to be returned by the + * {@link org.htmlparser.lexer.Lexer} or {@link Parser}, but each of the types + * must support this interface. + * More specific interface requirements for each of the node types are specified + * by the {@link Text}, {@link Remark} and {@link Tag} interfaces. + */ public interface Node + extends + Cloneable { /** ! * A string representation of the node. ! * This is an important method, it allows a simple string transformation ! * of a web page, regardless of a node. For a Text node this is obviously ! * the textual contents itself. For a Remark node this is the remark ! * contents (sic). For tags this is the text contents of it's children ! * (if any). Because multiple nodes are combined when presenting ! * a page in a browser, this will not reflect what a user would see. ! * See HTML specification section 9.1 White space ! * <a href="http://www.w3.org/TR/html4/struct/text.html#h-9.1"> ! * http://www.w3.org/TR/html4/struct/text.html#h-9.1</a>.<br> ! * Typical application code (for extracting only the text from a web page) ! * would be:<br> * <pre> ! * for (Enumeration e = parser.elements (); e.hasMoreElements ();) ! * // or do whatever processing you wish with the plain text string ! * System.out.println ((Node)e.nextElement ()).toPlainTextString ()); * </pre> + * @return The text of this node including it's children. */ ! public abstract String toPlainTextString (); /** ! * Return the HTML for this node. ! * This should be the exact sequence of characters that were encountered by ! * the parser that caused this node to be created. Where this breaks down is ! * where broken nodes (tags and remarks) have been encountered and fixed. ! * Applications reproducing html can use this method on nodes which are to ! * be used or transferred as they were received or created. ! * @return The (exact) sequence of characters that would cause this node ! * to be returned by the parser or lexer. */ ! public abstract String toHtml (); /** * Return the string representation of the node. ! * The return value may not be the entire contents of the node, and non- ! * printable characters may be translated in order to make them visible. ! * This is typically to be used in ! * the manner<br> ! * <pre> ! * System.out.println (node); ! * </pre> ! * or within a debugging environment. ! * @return A string representation of this node suitable for printing, ! * that isn't too large. */ ! public abstract String toString (); /** ! * Collect this node and its child nodes (if applicable) into a list, provided the node * satisfies the filtering criteria.<P> * *************** *** 71,99 **** * get it at the top-level, as many tags (like form tags), can contain * links embedded in them. We could get the links out by checking if the ! * current node is a {@link org.htmlparser.tags.CompositeTag}, and going through its children. ! * So this method provides a convenient way to do this.<P> * * Using collectInto(), programs get a lot shorter. Now, the code to * extract all links from a page would look like: * <pre> ! * NodeList collectionList = new NodeList(); * NodeFilter filter = new TagNameFilter ("A"); ! * for (NodeIterator e = parser.elements(); e.hasMoreNodes();) ! * e.nextNode().collectInto(collectionList, filter); * </pre> ! * Thus, collectionList will hold all the link nodes, irrespective of how * deep the links are embedded.<P> * * Another way to accomplish the same objective is: * <pre> ! * NodeList collectionList = new NodeList(); * NodeFilter filter = new TagClassFilter (LinkTag.class); ! * for (NodeIterator e = parser.elements(); e.hasMoreNodes();) ! * e.nextNode().collectInto(collectionList, filter); * </pre> * This is slightly less specific because the LinkTag class may be * registered for more than one node name, e.g. <LINK> tags too. */ ! public abstract void collectInto(NodeList collectionList, NodeFilter filter); /** --- 102,133 ---- * get it at the top-level, as many tags (like form tags), can contain * links embedded in them. We could get the links out by checking if the ! * current node is a {@link org.htmlparser.tags.CompositeTag}, and going ! * through its children. So this method provides a convenient way to do this.<P> * * Using collectInto(), programs get a lot shorter. Now, the code to * extract all links from a page would look like: * <pre> ! * NodeList list = new NodeList (); * NodeFilter filter = new TagNameFilter ("A"); ! * for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) ! * e.nextNode ().collectInto (list, filter); * </pre> ! * Thus, <code>list</code> will hold all the link nodes, irrespective of how * deep the links are embedded.<P> * * Another way to accomplish the same objective is: * <pre> ! * NodeList list = new NodeList (); * NodeFilter filter = new TagClassFilter (LinkTag.class); ! * for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) ! * e.nextNode ().collectInto (list, filter); * </pre> * This is slightly less specific because the LinkTag class may be * registered for more than one node name, e.g. <LINK> tags too. + * @param list The list to collect nodes into. + * @param filter The criteria to use when deciding if a node should + * be added to the list. */ ! public abstract void collectInto (NodeList list, NodeFilter filter); /** *************** *** 101,105 **** * <br>deprecated Use {@link #getStartPosition} */ ! public abstract int elementBegin(); /** --- 135,139 ---- * <br>deprecated Use {@link #getStartPosition} */ ! public abstract int elementBegin (); /** *************** *** 107,114 **** * <br>deprecated Use {@link #getEndPosition} */ ! public abstract int elementEnd(); /** * Gets the starting position of the node. * @return The start position. */ --- 141,149 ---- * <br>deprecated Use {@link #getEndPosition} */ ! public abstract int elementEnd (); /** * Gets the starting position of the node. + * This is the character (not byte) offset of this node in the page. * @return The start position. */ *************** *** 123,126 **** --- 158,163 ---- /** * Gets the ending position of the node. + * This is the character (not byte) offset of the character following this + * node in the page. * @return The end position. */ *************** *** 134,138 **** /** ! * Apply the visitor object (of type NodeVisitor) to this node. */ public abstract void accept (NodeVisitor visitor); --- 171,176 ---- /** ! * Apply the visitor to this node. ! * @param visitor The visitor to this node. */ public abstract void accept (NodeVisitor visitor); *************** *** 140,147 **** /** * Get the parent of this node. ! * This will always return null when parsing without scanners, ! * i.e. if semantic parsing was not performed. ! * The object returned from this method can be safely cast to a <code>CompositeTag</code>. ! * @return The parent of this node, if it's been set, <code>null</code> otherwise. */ public abstract Node getParent (); --- 178,188 ---- /** * Get the parent of this node. ! * This will always return null when parsing with the ! * {@link org.htmlparser.lexer.Lexer}. ! * Currently, the object returned from this method can be safely cast to a ! * {@link org.htmlparser.tags.CompositeTag}, but this behaviour should not ! * be expected in the future. ! * @return The parent of this node, if it's been set, <code>null</code> ! * otherwise. */ public abstract Node getParent (); *************** *** 149,153 **** /** * Sets the parent of this node. ! * @param node The node that contains this node. Must be a <code>CompositeTag</code>. */ public abstract void setParent (Node node); --- 190,194 ---- /** * Sets the parent of this node. ! * @param node The node that contains this node. */ public abstract void setParent (Node node); *************** *** 155,159 **** /** * Get the children of this node. ! * @return The list of children contained by this node, if it's been set, <code>null</code> otherwise. */ public abstract NodeList getChildren (); --- 196,201 ---- /** * Get the children of this node. ! * @return The list of children contained by this node, if it's been set, ! * <code>null</code> otherwise. */ public abstract NodeList getChildren (); *************** *** 167,172 **** /** * Returns the text of the node. */ ! public String getText(); /** --- 209,216 ---- /** * Returns the text of the node. + * @return The contents of the string or remark node, and in the case of + * a tag, the contents of the tag less the enclosing angle brackets. */ ! public String getText (); /** *************** *** 174,178 **** * @param text The new text for the node. */ ! public void setText(String text); /** --- 218,222 ---- * @param text The new text for the node. */ ! public void setText (String text); /** *************** *** 181,188 **** * bold text on and off. * Only a few tags have semantic meaning to the parser. These have to do ! * with the character set to use (<META>), the base URL to use * (<BASE>). Other than that, the semantic meaning is up to the ! * application and it's custom nodes. */ ! public void doSemanticAction () throws ParserException; } --- 225,304 ---- * bold text on and off. * Only a few tags have semantic meaning to the parser. These have to do ! * with the character set to use (<META>) and the base URL to use * (<BASE>). Other than that, the semantic meaning is up to the ! * application and it's custom nodes.<br> ! * The semantic action is performed when the node has been parsed. For ! * composite nodes (those that contain other nodes), the children will have ! * already been parsed and will be available via {@link #getChildren}. */ ! public void doSemanticAction () ! throws ! ParserException; ! ! // ! // Cloneable interface ! // ! ! /** ! * Allow cloning of nodes. ! * Creates and returns a copy of this object. The precise meaning ! * of "copy" may depend on the class of the object. The general ! * intent is that, for any object <tt>x</tt>, the expression: ! * <blockquote> ! * <pre> ! * x.clone() != x</pre></blockquote> ! * will be true, and that the expression: ! * <blockquote> ! * <pre> ! * x.clone().getClass() == x.getClass()</pre></blockquote> ! * will be <tt>true</tt>, but these are not absolute requirements. ! * While it is typically the case that: ! * <blockquote> ! * <pre> ! * x.clone().equals(x)</pre></blockquote> ! * will be <tt>true</tt>, this is not an absolute requirement. ! * <p> ! * By convention, the returned object should be obtained by calling ! * <tt>super.clone</tt>. If a class and all of its superclasses (except ! * <tt>Object</tt>) obey this convention, it will be the case that ! * <tt>x.clone().getClass() == x.getClass()</tt>. ! * <p> ! * By convention, the object returned by this method should be independent ! * of this object (which is being cloned). To achieve this independence, ! * it may be necessary to modify one or more fields of the object returned ! * by <tt>super.clone</tt> before returning it. Typically, this means ! * copying any mutable objects that comprise the internal "deep structure" ! * of the object being cloned and replacing the references to these ! * objects with references to the copies. If a class contains only ! * primitive fields or references to immutable objects, then it is usually ! * the case that no fields in the object returned by <tt>super.clone</tt> ! * need to be modified. ! * <p> ! * The method <tt>clone</tt> for class <tt>Object</tt> performs a ! * specific cloning operation. First, if the class of this object does ! * not implement the interface <tt>Cloneable</tt>, then a ! * <tt>CloneNotSupportedException</tt> is thrown. Note that all arrays ! * are considered to implement the interface <tt>Cloneable</tt>. ! * Otherwise, this method creates a new instance of the class of this ! * object and initializes all its fields with exactly the contents of ! * the corresponding fields of this object, as if by assignment; the ! * contents of the fields are not themselves cloned. Thus, this method ! * performs a "shallow copy" of this object, not a "deep copy" operation. ! * <p> ! * The class <tt>Object</tt> does not itself implement the interface ! * <tt>Cloneable</tt>, so calling the <tt>clone</tt> method on an object ! * whose class is <tt>Object</tt> will result in throwing an ! * exception at run time. ! * ! * @return a clone of this instance. ! * @exception CloneNotSupportedException if the object's class does not ! * support the <code>Cloneable</code> interface. Subclasses ! * that override the <code>clone</code> method can also ! * throw this exception to indicate that an instance cannot ! * be cloned. ! * @see java.lang.Cloneable ! */ ! public Object clone () ! throws ! CloneNotSupportedException; } Index: PrototypicalNodeFactory.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/PrototypicalNodeFactory.java,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** PrototypicalNodeFactory.java 24 May 2004 16:18:12 -0000 1.7 --- PrototypicalNodeFactory.java 14 Jun 2004 00:06:51 -0000 1.8 *************** *** 31,34 **** --- 31,35 ---- import java.util.Locale; import java.util.Map; + import java.util.Set; import java.util.Vector; *************** *** 39,42 **** --- 40,44 ---- import org.htmlparser.Text; import org.htmlparser.lexer.Page; + import org.htmlparser.nodes.AbstractNode; import org.htmlparser.nodes.TextNode; import org.htmlparser.nodes.RemarkNode; *************** *** 74,80 **** /** * A node factory based on the prototype pattern. ! * This factory uses the prototype pattern to generate new Tag nodes. * Prototype tags, in the form of undifferentiated tags are held in a hash ! * table. On a */ public class PrototypicalNodeFactory --- 76,91 ---- /** * A node factory based on the prototype pattern. ! * This factory uses the prototype pattern to generate new nodes. ! * It generates generic text and remark nodes from prototypes accessed ! * via the textPrototype and remarkPrototype properties respectively. ! * These are cloned as needed to form new {@link Text} and {@link Remark} nodes. * Prototype tags, in the form of undifferentiated tags are held in a hash ! * table. On a request for a tag, the attributes are examined for the name ! * of the tag and if a prototype of that name is registered, it is cloned ! * and the clone is given the characteristics ! * {@link Attribute Attributes}, start and end position) of the requested tag. ! * If no tag is registered under the needed name, a generic tag is created. ! * Note that in all casses, the {@link Page} property is only set if the node ! * is a subclass of {@link AbstractNode}. */ public class PrototypicalNodeFactory *************** *** 84,88 **** { /** ! * The list of tags to return at the top level. * The list is keyed by tag name. */ --- 95,109 ---- { /** ! * The prototypical text node. ! */ ! protected Text mText; ! ! /** ! * The prototypical remark node. ! */ ! protected Remark mRemark; ! ! /** ! * The list of tags to return. * The list is keyed by tag name. */ *************** *** 90,94 **** /** ! * Create a new factory with all but DOM tags registered. */ public PrototypicalNodeFactory () --- 111,115 ---- /** ! * Create a new factory with all tags registered. */ public PrototypicalNodeFactory () *************** *** 99,106 **** --- 120,131 ---- /** * Create a new factory with no registered tags. + * @param empty If <code>true</code>, creates an empty factory, + * otherwise is equivalent to {@link #PrototypicalNodeFactory()}. */ public PrototypicalNodeFactory (boolean empty) { clear (); + mText = new TextNode (null, 0, 0); + mRemark = new RemarkNode (null, 0, 0); if (!empty) registerTags (); *************** *** 108,112 **** /** ! * Create a new factory with the given tag as the only one registered. */ public PrototypicalNodeFactory (org.htmlparser.tags.Tag tag) --- 133,138 ---- /** ! * Create a new factory with the given tag as the only registered tag. ! * @param tag The single tag to register in the otherwise empty factory. */ public PrototypicalNodeFactory (org.htmlparser.tags.Tag tag) *************** *** 118,121 **** --- 144,148 ---- /** * Create a new factory with the given tags registered. + * @param tags The tags to register in the otherwise empty factory. */ public PrototypicalNodeFactory (org.htmlparser.tags.Tag[] tags) *************** *** 129,137 **** * Adds a tag to the registry. * @param id The name under which to register the tag. ! * @param tag The tag to be returned from a createTag(id) call. ! * @return The tag previously registered with that id, * or <code>null</code> if none. */ ! public Tag put (String id, org.htmlparser.tags.Tag tag) { return ((Tag)mBlastocyst.put (id, tag)); --- 156,164 ---- * Adds a tag to the registry. * @param id The name under which to register the tag. ! * @param tag The tag to be returned from a {@link #createTagNode} call. ! * @return The tag previously registered with that id if any, * or <code>null</code> if none. */ ! public Tag put (String id, Tag tag) { return ((Tag)mBlastocyst.put (id, tag)); *************** *** 141,149 **** * Gets a tag from the registry. * @param id The name of the tag to return. ! * @return The tag registered under the id name or <code>null</code> if none. */ ! public org.htmlparser.tags.Tag get (String id) { ! return ((org.htmlparser.tags.Tag)mBlastocyst.get (id)); } --- 168,176 ---- * Gets a tag from the registry. * @param id The name of the tag to return. ! * @return The tag registered under the <code>id</code> name or <code>null</code> if none. */ ! public Tag get (String id) { ! return ((Tag)mBlastocyst.get (id)); } *************** *** 151,159 **** * Remove a tag from the registry. * @param id The name of the tag to remove. ! * @return The tag that was registered with that id. */ ! public org.htmlparser.tags.Tag remove (String id) { ! return ((org.htmlparser.tags.Tag)mBlastocyst.remove (id)); } --- 178,186 ---- * Remove a tag from the registry. * @param id The name of the tag to remove. ! * @return The tag that was registered with that <code>id</code>. */ ! public Tag remove (String id) { ! return ((Tag)mBlastocyst.remove (id)); } *************** *** 166,170 **** --- 193,211 ---- } + /** + * Get the list of tag names. + * @return The names of the tags currently registered. + */ + public Set getTagNames () + { + return (mBlastocyst.keySet ()); + } + /** + * Register a tag. + * Registers the given tag under every id the tag has. + * @param tag The tag to register (subclass of + * {@link org.htmlparser.tags.Tag}). + */ public void registerTag (org.htmlparser.tags.Tag tag) { *************** *** 176,179 **** --- 217,226 ---- } + /** + * Unregister a tag. + * Unregisters the given tag from every id the tag has. + * @param tag The tag to unregister (subclass of + * {@link org.htmlparser.tags.Tag}). + */ public void unregisterTag (org.htmlparser.tags.Tag tag) { *************** *** 185,188 **** --- 232,261 ---- } + /** + * Register a tag. + * Registers the given tag under the tag {@link Tag#getTagName() name}. + * @param tag The tag to register (implements {@link org.htmlparser.Tag}). + */ + public void registerTag (Tag tag) + { + put (tag.getTagName (), tag); + } + + /** + * Unregister a tag. + * Unregisters the given tag from the tag {@link Tag#getTagName() name}. + * @param tag The tag to unregister (implements {@link org.htmlparser.Tag}). + */ + public void unregisterTag (Tag tag) + { + remove (tag.getTagName ()); + } + + /** + * Register all known tags in the tag package. + * Registers tags from the {@link org.htmlparser.tags tag package} by + * calling {@link #registerTag(org.htmlparser.tags.Tag) registerTag()}. + * @return 'this' nodefactory as a convenience. + */ public PrototypicalNodeFactory registerTags () { *************** *** 220,223 **** --- 293,338 ---- } + /** + * Get the object being used to generate text nodes. + * @return The prototype for {@link Text} nodes. + */ + public Text getTextPrototype () + { + return (mText); + } + + /** + * Set the object to be used to generate text nodes. + * @param text The prototype for {@link Text} nodes. + */ + public void setTextPrototype (Text text) + { + if (null == text) + throw new IllegalArgumentException ("text prototype node cannot be null"); + else + mText = text; + } + + /** + * Get the object being used to generate remark nodes. + * @return The prototype for {@link Remark} nodes. + */ + public Remark getRemarkPrototype () + { + return (mRemark); + } + + /** + * Set the object to be used to generate remark nodes. + * @param remark The prototype for {@link Remark} nodes. + */ + public void setRemarkPrototype (Remark remark) + { + if (null == remark) + throw new IllegalArgumentException ("remark prototype node cannot be null"); + else + mRemark = remark; + } + // // NodeFactory interface *************** *** 228,236 **** * @param page The page the node is on. * @param start The beginning position of the string. ! * @param end The ending positiong of the string. */ public Text createStringNode (Page page, int start, int end) { ! return (new TextNode (page, start, end)); } --- 343,368 ---- * @param page The page the node is on. * @param start The beginning position of the string. ! * @param end The ending position of the string. */ public Text createStringNode (Page page, int start, int end) { ! Text ret; ! ! try ! { ! ret = (Text)(getTextPrototype ().clone ()); ! if (ret instanceof AbstractNode) ! ((AbstractNode)ret).setPage (page); ! else ! ret.setText (page.getText (start, end)); ! ret.setStartPosition (start); ! ret.setEndPosition (end); ! } ! catch (CloneNotSupportedException cnse) ! { ! ret = new TextNode (page, start, end); ! } ! ! return (ret); } *************** *** 243,247 **** public Remark createRemarkNode (Page page, int start, int end) { ! return (new RemarkNode (page, start, end)); } --- 375,405 ---- public Remark createRemarkNode (Page page, int start, int end) { ! int first; ! int last; ! Remark ret; ! ! try ! { ! ret = (Remark)(getRemarkPrototype ().clone ()); ! // if (ret instanceof AbstractNode) ! // ((AbstractNode)ret).setPage (page); ! // else ! { ! first = start + 4; // <!-- ! last = end - 3; // --> ! if (first >= last) ! ret.setText (""); ! else ! ret.setText (page.getText (first, last)); ! } ! ret.setStartPosition (start); ! ret.setEndPosition (end); ! } ! catch (CloneNotSupportedException cnse) ! { ! ret = new RemarkNode (page, start, end); ! } ! ! return (ret); } *************** *** 263,268 **** Attribute attribute; String id; ! org.htmlparser.tags.Tag prototype; ! org.htmlparser.tags.Tag ret; ret = null; --- 421,426 ---- Attribute attribute; String id; ! Tag prototype; ! Tag ret; ret = null; *************** *** 281,289 **** if (id.endsWith ("/")) id = id.substring (0, id.length () - 1); ! prototype = (org.htmlparser.tags.Tag)mBlastocyst.get (id); if (null != prototype) { ! ret = (org.htmlparser.tags.Tag)prototype.clone (); ! ret.setPage (page); ret.setStartPosition (start); ret.setEndPosition (end); --- 439,448 ---- if (id.endsWith ("/")) id = id.substring (0, id.length () - 1); ! prototype = (Tag)mBlastocyst.get (id); if (null != prototype) { ! ret = (Tag)prototype.clone (); ! if (ret instanceof AbstractNode) ! ((AbstractNode)ret).setPage (page); ret.setStartPosition (start); ret.setEndPosition (end); *************** *** 299,302 **** --- 458,462 ---- } if (null == ret) + // generate a generic node ret = new org.htmlparser.tags.Tag (page, start, end, attributes); |