Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser Node.java,1.49,1.50 PrototypicalNodeFactory.java,1.7,

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3132

Modified Files:
	Node.java PrototypicalNodeFactory.java package.html 
Log Message:
Rework PrototypicalNodeFactory to use interfaces.


Index: package.html
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/package.html,v
retrieving revision 1.20
retrieving revision 1.21
diff -C2 -d -r1.20 -r1.21
*** package.html	2 Jan 2004 16:24:52 -0000	1.20
--- package.html	14 Jun 2004 00:06:51 -0000	1.21
***************
*** 29,44 ****
  -->
  </head>
! <body bgcolor="white">
! The basic API classes which will be used by most users when working with the html parser (the Parser class is the most important one in this).
! 
! <h2>Related Documentation</h2>
! 
! For overviews, tutorials, examples, guides, and tool documentation, please see:
! <ul>
!     <li><a href="http://htmlparser.sourceforge.net">HTML Parser Home Page</a>
! </ul>
! 
! <!-- Put @see and @since tags down here. -->
! 
  </body>
  </html>
--- 29,59 ----
  -->
  </head>
! <body>
! The basic API classes which will be used by most developers when working with
! the HTML Parser.
! <p>The {@link org.htmlparser.Parser} class is the main high level class that
! provides simplified access to the contents of an HTML page. The page can be
! specified as either a URLConnection or a String. In the case of a String, an
! attempt is made to open it as a URL, and if that fails it assumes it is a local
! disk file.
! A wide range of methods is available to customize the operation of the Parser,
! as well as access specific pieces of the page as
! {@link org.htmlparser.Node Nodes}.</p>
! <p>The {@link org.htmlparser.NodeFactory} interface specifies the requirements
! for a developer to have the Parser or Lexer generate nodes. Three types of
! nodes are required: {@link org.htmlparser.Text}, {@link org.htmlparser.Remark}
! and {@link org.htmlparser.Tag Tags}. Tags contain lists
! of child nodes and {@link org.htmlparser.Attribute attributes}.</p>
! <p>The only provided implementation of the NodeFactory interface
! is the {@link org.htmlparser.PrototypicalNodeFactory} which
! operates by holding example nodes and cloning them as needed to satisfy the
! requests for nodes by the Parser. The Lexer is it's own NodeFactory, returning
! new {@link org.htmlparser.nodes.TextNode},
! {@link org.htmlparser.nodes.RemarkNode} and undifferentiated
! {@link org.htmlparser.nodes.TagNode Tagnodes} (see the
! {@link org.htmlparser.nodes nodes} package).</p>
! <p>The {@link org.htmlparser.NodeFilter} interface is used by the filtering
! code to determine if a node meets a certain criteria. Some generic examples of
! filters can be found in the {@link org.htmlparser.filters filters} package.
  </body>
  </html>

Index: Node.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v
retrieving revision 1.49
retrieving revision 1.50
diff -C2 -d -r1.49 -r1.50
*** Node.java	24 May 2004 00:38:15 -0000	1.49
--- Node.java	14 Jun 2004 00:06:51 -0000	1.50
***************
*** 31,67 ****
  import org.htmlparser.visitors.NodeVisitor;
  
  public interface Node
  {
      /**
!      * Returns a string representation of the node. This is an important method, it allows a simple string transformation
!      * of a web page, regardless of a node.<br>
!      * Typical application code (for extracting only the text from a web page) would then be simplified to  :<br>
       * <pre>
!      * Node node;
!      * for (Enumeration e = parser.elements();e.hasMoreElements();) {
!      *    node = (Node)e.nextElement();
!      *    System.out.println(node.toPlainTextString()); // Or do whatever processing you wish with the plain text string
!      * }
       * </pre>
       */
!     public abstract String toPlainTextString();
  
      /**
!      * This method will make it easier when using html parser to reproduce html pages (with or without modifications)
!      * Applications reproducing html can use this method on nodes which are to be used or transferred as they were
!      * recieved, with the original html
       */
!     public abstract String toHtml();
  
      /**
       * Return the string representation of the node.
!      * Subclasses must define this method, and this is typically to be used in the manner<br>
!      * <pre>System.out.println(node)</pre>
!      * @return java.lang.String
       */
!     public abstract String toString();
  
      /**
!      * Collect this node and its child nodes (if-applicable) into the collectionList parameter, provided the node
       * satisfies the filtering criteria.<P>
       *
--- 31,98 ----
  import org.htmlparser.visitors.NodeVisitor;
  
+ /**
+  * Specifies the minimum requirements for nodes returned by the Lexer or Parser.
+  * There are three types of nodes in HTML: text, remarks and tags. You may wish
+  * to define your own nodes to be returned by the
+  * {@link org.htmlparser.lexer.Lexer} or {@link Parser}, but each of the types
+  * must support this interface. 
+  * More specific interface requirements for each of the node types are specified
+  * by the {@link Text}, {@link Remark} and {@link Tag} interfaces.
+  */
  public interface Node
+     extends
+         Cloneable
  {
      /**
!      * A string representation of the node.
!      * This is an important method, it allows a simple string transformation
!      * of a web page, regardless of a node. For a Text node this is obviously
!      * the textual contents itself. For a Remark node this is the remark
!      * contents (sic). For tags this is the text contents of it's children
!      * (if any). Because multiple nodes are combined when presenting
!      * a page in a browser, this will not reflect what a user would see.
!      * See HTML specification section 9.1 White space
!      * <a href="http://www.w3.org/TR/html4/struct/text.html#h-9.1">
!      * http://www.w3.org/TR/html4/struct/text.html#h-9.1</a>.<br>
!      * Typical application code (for extracting only the text from a web page)
!      * would be:<br>
       * <pre>
!      * for (Enumeration e = parser.elements (); e.hasMoreElements ();)
!      *     // or do whatever processing you wish with the plain text string
!      *     System.out.println ((Node)e.nextElement ()).toPlainTextString ());
       * </pre>
+      * @return The text of this node including it's children.
       */
!     public abstract String toPlainTextString ();
  
      /**
!      * Return the HTML for this node.
!      * This should be the exact sequence of characters that were encountered by
!      * the parser that caused this node to be created. Where this breaks down is
!      * where broken nodes (tags and remarks) have been encountered and fixed. 
!      * Applications reproducing html can use this method on nodes which are to
!      * be used or transferred as they were received or created.
!      * @return The (exact) sequence of characters that would cause this node
!      * to be returned by the parser or lexer.
       */
!     public abstract String toHtml ();
  
      /**
       * Return the string representation of the node.
!      * The return value may not be the entire contents of the node, and non-
!      * printable characters may be translated in order to make them visible. 
!      * This is typically to be used in
!      * the manner<br>
!      * <pre>
!      * System.out.println (node);
!      * </pre>
!      * or within a debugging environment.
!      * @return A string representation of this node suitable for printing,
!      * that isn't too large.
       */
!     public abstract String toString ();
  
      /**
!      * Collect this node and its child nodes (if applicable) into a list, provided the node
       * satisfies the filtering criteria.<P>
       *
***************
*** 71,99 ****
       * get it at the top-level, as many tags (like form tags), can contain
       * links embedded in them. We could get the links out by checking if the
!      * current node is a {@link org.htmlparser.tags.CompositeTag}, and going through its children.
!      * So this method provides a convenient way to do this.<P>
       *
       * Using collectInto(), programs get a lot shorter. Now, the code to
       * extract all links from a page would look like:
       * <pre>
!      * NodeList collectionList = new NodeList();
       * NodeFilter filter = new TagNameFilter ("A");
!      * for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
!      *      e.nextNode().collectInto(collectionList, filter);
       * </pre>
!      * Thus, collectionList will hold all the link nodes, irrespective of how
       * deep the links are embedded.<P>
       *
       * Another way to accomplish the same objective is:
       * <pre>
!      * NodeList collectionList = new NodeList();
       * NodeFilter filter = new TagClassFilter (LinkTag.class);
!      * for (NodeIterator e = parser.elements(); e.hasMoreNodes();)
!      *      e.nextNode().collectInto(collectionList, filter);
       * </pre>
       * This is slightly less specific because the LinkTag class may be
       * registered for more than one node name, e.g. &lt;LINK&gt; tags too.
       */
!     public abstract void collectInto(NodeList collectionList, NodeFilter filter);
  
      /**
--- 102,133 ----
       * get it at the top-level, as many tags (like form tags), can contain
       * links embedded in them. We could get the links out by checking if the
!      * current node is a {@link org.htmlparser.tags.CompositeTag}, and going
!      * through its children. So this method provides a convenient way to do this.<P>
       *
       * Using collectInto(), programs get a lot shorter. Now, the code to
       * extract all links from a page would look like:
       * <pre>
!      * NodeList list = new NodeList ();
       * NodeFilter filter = new TagNameFilter ("A");
!      * for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
!      *      e.nextNode ().collectInto (list, filter);
       * </pre>
!      * Thus, <code>list</code> will hold all the link nodes, irrespective of how
       * deep the links are embedded.<P>
       *
       * Another way to accomplish the same objective is:
       * <pre>
!      * NodeList list = new NodeList ();
       * NodeFilter filter = new TagClassFilter (LinkTag.class);
!      * for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
!      *      e.nextNode ().collectInto (list, filter);
       * </pre>
       * This is slightly less specific because the LinkTag class may be
       * registered for more than one node name, e.g. &lt;LINK&gt; tags too.
+      * @param list The list to collect nodes into.
+      * @param filter The criteria to use when deciding if a node should
+      * be added to the list.
       */
!     public abstract void collectInto (NodeList list, NodeFilter filter);
  
      /**
***************
*** 101,105 ****
       * <br>deprecated Use {@link #getStartPosition}
       */
!     public abstract int elementBegin();
  
      /**
--- 135,139 ----
       * <br>deprecated Use {@link #getStartPosition}
       */
!     public abstract int elementBegin ();
  
      /**
***************
*** 107,114 ****
       * <br>deprecated Use {@link #getEndPosition}
       */
!     public abstract int elementEnd();
  
      /**
       * Gets the starting position of the node.
       * @return The start position.
       */
--- 141,149 ----
       * <br>deprecated Use {@link #getEndPosition}
       */
!     public abstract int elementEnd ();
  
      /**
       * Gets the starting position of the node.
+      * This is the character (not byte) offset of this node in the page.
       * @return The start position.
       */
***************
*** 123,126 ****
--- 158,163 ----
      /**
       * Gets the ending position of the node.
+      * This is the character (not byte) offset of the character following this
+      * node in the page.
       * @return The end position.
       */
***************
*** 134,138 ****
  
      /**
!      * Apply the visitor object (of type NodeVisitor) to this node.
       */
      public abstract void accept (NodeVisitor visitor);
--- 171,176 ----
  
      /**
!      * Apply the visitor to this node.
!      * @param visitor The visitor to this node.
       */
      public abstract void accept (NodeVisitor visitor);
***************
*** 140,147 ****
      /**
       * Get the parent of this node.
!      * This will always return null when parsing without scanners,
!      * i.e. if semantic parsing was not performed.
!      * The object returned from this method can be safely cast to a <code>CompositeTag</code>.
!      * @return The parent of this node, if it's been set, <code>null</code> otherwise.
       */
      public abstract Node getParent ();
--- 178,188 ----
      /**
       * Get the parent of this node.
!      * This will always return null when parsing with the
!      * {@link org.htmlparser.lexer.Lexer}.
!      * Currently, the object returned from this method can be safely cast to a
!      * {@link org.htmlparser.tags.CompositeTag}, but this behaviour should not
!      * be expected in the future.
!      * @return The parent of this node, if it's been set, <code>null</code>
!      * otherwise.
       */
      public abstract Node getParent ();
***************
*** 149,153 ****
      /**
       * Sets the parent of this node.
!      * @param node The node that contains this node. Must be a <code>CompositeTag</code>.
       */
      public abstract void setParent (Node node);
--- 190,194 ----
      /**
       * Sets the parent of this node.
!      * @param node The node that contains this node.
       */
      public abstract void setParent (Node node);
***************
*** 155,159 ****
      /**
       * Get the children of this node.
!      * @return The list of children contained by this node, if it's been set, <code>null</code> otherwise.
       */
      public abstract NodeList getChildren ();
--- 196,201 ----
      /**
       * Get the children of this node.
!      * @return The list of children contained by this node, if it's been set,
!      * <code>null</code> otherwise.
       */
      public abstract NodeList getChildren ();
***************
*** 167,172 ****
      /**
       * Returns the text of the node.
       */
!     public String getText();
  
      /**
--- 209,216 ----
      /**
       * Returns the text of the node.
+      * @return The contents of the string or remark node, and in the case of
+      * a tag, the contents of the tag less the enclosing angle brackets.
       */
!     public String getText ();
  
      /**
***************
*** 174,178 ****
       * @param text The new text for the node.
       */
!     public void setText(String text);
  
      /**
--- 218,222 ----
       * @param text The new text for the node.
       */
!     public void setText (String text);
  
      /**
***************
*** 181,188 ****
       * bold text on and off.
       * Only a few tags have semantic meaning to the parser. These have to do
!      * with the character set to use (&lt;META&gt;), the base URL to use
       * (&lt;BASE&gt;). Other than that, the semantic meaning is up to the
!      * application and it's custom nodes.
       */
!     public void doSemanticAction () throws ParserException;
  }
--- 225,304 ----
       * bold text on and off.
       * Only a few tags have semantic meaning to the parser. These have to do
!      * with the character set to use (&lt;META&gt;) and the base URL to use
       * (&lt;BASE&gt;). Other than that, the semantic meaning is up to the
!      * application and it's custom nodes.<br>
!      * The semantic action is performed when the node has been parsed. For
!      * composite nodes (those that contain other nodes), the children will have
!      * already been parsed and will be available via {@link #getChildren}.
       */
!     public void doSemanticAction ()
!         throws
!             ParserException;
! 
!     //
!     // Cloneable interface
!     //
! 
!     /**
!      * Allow cloning of nodes.
!      * Creates and returns a copy of this object.  The precise meaning 
!      * of "copy" may depend on the class of the object. The general 
!      * intent is that, for any object <tt>x</tt>, the expression:
!      * <blockquote>
!      * <pre>
!      * x.clone() != x</pre></blockquote>
!      * will be true, and that the expression:
!      * <blockquote>
!      * <pre>
!      * x.clone().getClass() == x.getClass()</pre></blockquote>
!      * will be <tt>true</tt>, but these are not absolute requirements. 
!      * While it is typically the case that:
!      * <blockquote>
!      * <pre>
!      * x.clone().equals(x)</pre></blockquote>
!      * will be <tt>true</tt>, this is not an absolute requirement. 
!      * <p>
!      * By convention, the returned object should be obtained by calling
!      * <tt>super.clone</tt>.  If a class and all of its superclasses (except
!      * <tt>Object</tt>) obey this convention, it will be the case that
!      * <tt>x.clone().getClass() == x.getClass()</tt>.
!      * <p>
!      * By convention, the object returned by this method should be independent
!      * of this object (which is being cloned).  To achieve this independence,
!      * it may be necessary to modify one or more fields of the object returned
!      * by <tt>super.clone</tt> before returning it.  Typically, this means
!      * copying any mutable objects that comprise the internal "deep structure"
!      * of the object being cloned and replacing the references to these
!      * objects with references to the copies.  If a class contains only
!      * primitive fields or references to immutable objects, then it is usually
!      * the case that no fields in the object returned by <tt>super.clone</tt>
!      * need to be modified.
!      * <p>
!      * The method <tt>clone</tt> for class <tt>Object</tt> performs a 
!      * specific cloning operation. First, if the class of this object does 
!      * not implement the interface <tt>Cloneable</tt>, then a 
!      * <tt>CloneNotSupportedException</tt> is thrown. Note that all arrays 
!      * are considered to implement the interface <tt>Cloneable</tt>. 
!      * Otherwise, this method creates a new instance of the class of this 
!      * object and initializes all its fields with exactly the contents of 
!      * the corresponding fields of this object, as if by assignment; the
!      * contents of the fields are not themselves cloned. Thus, this method 
!      * performs a "shallow copy" of this object, not a "deep copy" operation.
!      * <p>
!      * The class <tt>Object</tt> does not itself implement the interface 
!      * <tt>Cloneable</tt>, so calling the <tt>clone</tt> method on an object 
!      * whose class is <tt>Object</tt> will result in throwing an
!      * exception at run time.
!      *
!      * @return     a clone of this instance.
!      * @exception  CloneNotSupportedException  if the object's class does not
!      *               support the <code>Cloneable</code> interface. Subclasses
!      *               that override the <code>clone</code> method can also
!      *               throw this exception to indicate that an instance cannot
!      *               be cloned.
!      * @see java.lang.Cloneable
!      */
!     public Object clone ()
!         throws
!             CloneNotSupportedException;
  }

Index: PrototypicalNodeFactory.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/PrototypicalNodeFactory.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** PrototypicalNodeFactory.java	24 May 2004 16:18:12 -0000	1.7
--- PrototypicalNodeFactory.java	14 Jun 2004 00:06:51 -0000	1.8
***************
*** 31,34 ****
--- 31,35 ----
  import java.util.Locale;
  import java.util.Map;
+ import java.util.Set;
  import java.util.Vector;
  
***************
*** 39,42 ****
--- 40,44 ----
  import org.htmlparser.Text;
  import org.htmlparser.lexer.Page;
+ import org.htmlparser.nodes.AbstractNode;
  import org.htmlparser.nodes.TextNode;
  import org.htmlparser.nodes.RemarkNode;
***************
*** 74,80 ****
  /**
   * A node factory based on the prototype pattern.
!  * This factory uses the prototype pattern to generate new Tag nodes.
   * Prototype tags, in the form of undifferentiated tags are held in a hash
!  * table. On a 
   */
  public class PrototypicalNodeFactory
--- 76,91 ----
  /**
   * A node factory based on the prototype pattern.
!  * This factory uses the prototype pattern to generate new nodes.
!  * It generates generic text and remark nodes from prototypes accessed
!  * via the textPrototype and remarkPrototype properties respectively.
!  * These are cloned as needed to form new {@link Text} and {@link Remark} nodes.
   * Prototype tags, in the form of undifferentiated tags are held in a hash
!  * table. On a request for a tag, the attributes are examined for the name
!  * of the tag and if a prototype of that name is registered, it is cloned
!  * and the clone is given the characteristics
!  * {@link Attribute Attributes}, start and end position) of the requested tag.
!  * If no tag is registered under the needed name, a generic tag is created.
!  * Note that in all casses, the {@link Page} property is only set if the node
!  * is a subclass of {@link AbstractNode}.
   */
  public class PrototypicalNodeFactory
***************
*** 84,88 ****
  {
      /**
!      * The list of tags to return at the top level.
       * The list is keyed by tag name.
       */
--- 95,109 ----
  {
      /**
!      * The prototypical text node.
!      */
!     protected Text mText;
! 
!     /**
!      * The prototypical remark node.
!      */
!     protected Remark mRemark;
! 
!     /**
!      * The list of tags to return.
       * The list is keyed by tag name.
       */
***************
*** 90,94 ****
  
      /**
!      * Create a new factory with all but DOM tags registered.
       */
      public PrototypicalNodeFactory ()
--- 111,115 ----
  
      /**
!      * Create a new factory with all tags registered.
       */
      public PrototypicalNodeFactory ()
***************
*** 99,106 ****
--- 120,131 ----
      /**
       * Create a new factory with no registered tags.
+      * @param empty If <code>true</code>, creates an empty factory,
+      * otherwise is equivalent to {@link #PrototypicalNodeFactory()}.
       */
      public PrototypicalNodeFactory (boolean empty)
      {
          clear ();
+         mText = new TextNode (null, 0, 0);
+         mRemark = new RemarkNode (null, 0, 0);
          if (!empty)
              registerTags ();
***************
*** 108,112 ****
  
      /**
!      * Create a new factory with the given tag as the only one registered.
       */
      public PrototypicalNodeFactory (org.htmlparser.tags.Tag tag)
--- 133,138 ----
  
      /**
!      * Create a new factory with the given tag as the only registered tag.
!      * @param tag The single tag to register in the otherwise empty factory.
       */
      public PrototypicalNodeFactory (org.htmlparser.tags.Tag tag)
***************
*** 118,121 ****
--- 144,148 ----
      /**
       * Create a new factory with the given tags registered.
+      * @param tags The tags to register in the otherwise empty factory.
       */
      public PrototypicalNodeFactory (org.htmlparser.tags.Tag[] tags)
***************
*** 129,137 ****
       * Adds a tag to the registry.
       * @param id The name under which to register the tag.
!      * @param tag The tag to be returned from a createTag(id) call.
!      * @return The tag previously registered with that id,
       * or <code>null</code> if none.
       */
!     public Tag put (String id, org.htmlparser.tags.Tag tag)
      {
          return ((Tag)mBlastocyst.put (id, tag));
--- 156,164 ----
       * Adds a tag to the registry.
       * @param id The name under which to register the tag.
!      * @param tag The tag to be returned from a {@link #createTagNode} call.
!      * @return The tag previously registered with that id if any,
       * or <code>null</code> if none.
       */
!     public Tag put (String id, Tag tag)
      {
          return ((Tag)mBlastocyst.put (id, tag));
***************
*** 141,149 ****
       * Gets a tag from the registry.
       * @param id The name of the tag to return.
!      * @return The tag registered under the id name or <code>null</code> if none.
       */
!     public org.htmlparser.tags.Tag get (String id)
      {
!         return ((org.htmlparser.tags.Tag)mBlastocyst.get (id));
      }
  
--- 168,176 ----
       * Gets a tag from the registry.
       * @param id The name of the tag to return.
!      * @return The tag registered under the <code>id</code> name or <code>null</code> if none.
       */
!     public Tag get (String id)
      {
!         return ((Tag)mBlastocyst.get (id));
      }
  
***************
*** 151,159 ****
       * Remove a tag from the registry.
       * @param id The name of the tag to remove.
!      * @return The tag that was registered with that id.
       */
!     public org.htmlparser.tags.Tag remove (String id)
      {
!         return ((org.htmlparser.tags.Tag)mBlastocyst.remove (id));
      }
  
--- 178,186 ----
       * Remove a tag from the registry.
       * @param id The name of the tag to remove.
!      * @return The tag that was registered with that <code>id</code>.
       */
!     public Tag remove (String id)
      {
!         return ((Tag)mBlastocyst.remove (id));
      }
  
***************
*** 166,170 ****
--- 193,211 ----
      }
  
+     /**
+      * Get the list of tag names.
+      * @return The names of the tags currently registered.
+      */
+     public Set getTagNames ()
+     {
+         return (mBlastocyst.keySet ());
+     }
  
+     /**
+      * Register a tag.
+      * Registers the given tag under every id the tag has.
+      * @param tag The tag to register (subclass of
+      * {@link org.htmlparser.tags.Tag}).
+      */
      public void registerTag (org.htmlparser.tags.Tag tag)
      {
***************
*** 176,179 ****
--- 217,226 ----
      }
  
+     /**
+      * Unregister a tag.
+      * Unregisters the given tag from every id the tag has.
+      * @param tag The tag to unregister (subclass of
+      * {@link org.htmlparser.tags.Tag}).
+      */
      public void unregisterTag (org.htmlparser.tags.Tag tag)
      {
***************
*** 185,188 ****
--- 232,261 ----
      }
  
+     /**
+      * Register a tag.
+      * Registers the given tag under the tag {@link Tag#getTagName() name}.
+      * @param tag The tag to register (implements {@link org.htmlparser.Tag}).
+      */
+     public void registerTag (Tag tag)
+     {
+         put (tag.getTagName (), tag);
+     }
+ 
+     /**
+      * Unregister a tag.
+      * Unregisters the given tag from the tag {@link Tag#getTagName() name}.
+      * @param tag The tag to unregister (implements {@link org.htmlparser.Tag}).
+      */
+     public void unregisterTag (Tag tag)
+     {
+         remove (tag.getTagName ());
+     }
+ 
+     /**
+      * Register all known tags in the tag package.
+      * Registers tags from the {@link org.htmlparser.tags tag package} by
+      * calling {@link #registerTag(org.htmlparser.tags.Tag) registerTag()}.
+      * @return 'this' nodefactory as a convenience.
+      */
      public PrototypicalNodeFactory registerTags ()
      {
***************
*** 220,223 ****
--- 293,338 ----
      }
  
+     /**
+      * Get the object being used to generate text nodes.
+      * @return The prototype for {@link Text} nodes.
+      */
+     public Text getTextPrototype ()
+     {
+         return (mText);
+     }
+ 
+     /**
+      * Set the object to be used to generate text nodes.
+      * @param text The prototype for {@link Text} nodes.
+      */
+     public void setTextPrototype (Text text)
+     {
+         if (null == text)
+             throw new IllegalArgumentException ("text prototype node cannot be null");
+         else
+             mText = text;
+     }
+ 
+     /**
+      * Get the object being used to generate remark nodes.
+      * @return The prototype for {@link Remark} nodes.
+      */
+     public Remark getRemarkPrototype ()
+     {
+         return (mRemark);
+     }
+ 
+     /**
+      * Set the object to be used to generate remark nodes.
+      * @param remark The prototype for {@link Remark} nodes.
+      */
+     public void setRemarkPrototype (Remark remark)
+     {
+         if (null == remark)
+             throw new IllegalArgumentException ("remark prototype node cannot be null");
+         else
+             mRemark = remark;
+     }
+ 
      //
      // NodeFactory interface
***************
*** 228,236 ****
       * @param page The page the node is on.
       * @param start The beginning position of the string.
!      * @param end The ending positiong of the string.
       */
      public Text createStringNode (Page page, int start, int end)
      {
!         return (new TextNode (page, start, end));
      }
  
--- 343,368 ----
       * @param page The page the node is on.
       * @param start The beginning position of the string.
!      * @param end The ending position of the string.
       */
      public Text createStringNode (Page page, int start, int end)
      {
!         Text ret;
! 
!         try
!         {
!             ret = (Text)(getTextPrototype ().clone ());
!             if (ret instanceof AbstractNode)
!                 ((AbstractNode)ret).setPage (page);
!             else
!                 ret.setText (page.getText (start, end));
!             ret.setStartPosition (start);
!             ret.setEndPosition (end);
!         }
!         catch (CloneNotSupportedException cnse)
!         {
!             ret = new TextNode (page, start, end);
!         }
! 
!         return (ret);
      }
  
***************
*** 243,247 ****
      public Remark createRemarkNode (Page page, int start, int end)
      {
!         return (new RemarkNode (page, start, end));
      }
  
--- 375,405 ----
      public Remark createRemarkNode (Page page, int start, int end)
      {
!         int first;
!         int last;
!         Remark ret;
!         
!         try
!         {
!             ret = (Remark)(getRemarkPrototype ().clone ());
! //            if (ret instanceof AbstractNode)
! //                ((AbstractNode)ret).setPage (page);
! //            else
!             {
!                 first = start + 4; // <!--
!                 last = end - 3; // -->
!                 if (first >= last)
!                     ret.setText ("");
!                 else
!                     ret.setText (page.getText (first, last));
!             }
!             ret.setStartPosition (start);
!             ret.setEndPosition (end);
!         }
!         catch (CloneNotSupportedException cnse)
!         {
!             ret = new RemarkNode (page, start, end);
!         }
! 
!         return (ret);
      }
  
***************
*** 263,268 ****
          Attribute attribute;
          String id;
!         org.htmlparser.tags.Tag prototype;
!         org.htmlparser.tags.Tag ret;
  
          ret = null;
--- 421,426 ----
          Attribute attribute;
          String id;
!         Tag prototype;
!         Tag ret;
  
          ret = null;
***************
*** 281,289 ****
                          if (id.endsWith ("/"))
                              id = id.substring (0, id.length () - 1);
!                         prototype = (org.htmlparser.tags.Tag)mBlastocyst.get (id);
                          if (null != prototype)
                          {
!                             ret = (org.htmlparser.tags.Tag)prototype.clone ();
!                             ret.setPage (page);
                              ret.setStartPosition (start);
                              ret.setEndPosition (end);
--- 439,448 ----
                          if (id.endsWith ("/"))
                              id = id.substring (0, id.length () - 1);
!                         prototype = (Tag)mBlastocyst.get (id);
                          if (null != prototype)
                          {
!                             ret = (Tag)prototype.clone ();
!                             if (ret instanceof AbstractNode)
!                                 ((AbstractNode)ret).setPage (page);
                              ret.setStartPosition (start);
                              ret.setEndPosition (end);
***************
*** 299,302 ****
--- 458,462 ----
          }
          if (null == ret)
+             // generate a generic node
              ret = new org.htmlparser.tags.Tag (page, start, end, attributes);

Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser Node.java,1.49,1.50 PrototypicalNodeFactory.java,1.7,

htmlparser-cvs