[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes AbstractNode.java,NONE,1.1 Attribute.java

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1:/tmp/cvs-serv9123/lexer/nodes

Added Files:
	AbstractNode.java Attribute.java RemarkNode.java 
	StringNode.java TagNode.java package.html 
Log Message:
Third drop for new i/o subsystem.

--- NEW FILE: AbstractNode.java ---
// HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
// 
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
// 
// Postal Address : 
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley, 
// CA 94708, USA
// Website : http://www.industriallogic.com
// 
// This class was contributed by 
// Derrick Oswald
//

package org.htmlparser.lexer.nodes;

import org.htmlparser.lexer.Page;

/**
 * Extend org.htmlparser.AbstractNode temporarily to add the Page.
 * <em>This will be folded into org.htmlparser.AbstractNode eventually.</em>
 */
public abstract class AbstractNode extends org.htmlparser.AbstractNode
{
    /**
     * The page this node came from.
     */
    protected Page mPage;

    /**
     * Create a lexeme.
     * Remember the page and start & end cursor positions.
	 * @param page The page this tag was read from.
     * @param start The starting offset of this node within the page.
     * @param end The ending offset of this node within the page.
     */
    public AbstractNode (Page page, int start, int end)
    {
        super (start, end);
        mPage = page;
    }

    /**
     * Get the page this node came from.
     * @return The page that supplied this node.
     */
    public Page getPage ()
    {
        return (mPage);
    }
}

--- NEW FILE: Attribute.java ---
// HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
// 
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
// 
// Postal Address : 
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley, 
// CA 94708, USA
// Website : http://www.industriallogic.com
// 
// This class was contributed by 
// Derrick Oswald
//

package org.htmlparser.lexer.nodes;

/**
 * An attribute within a tag.
 * <p>If Name is null, it's whitepace and Value has the text.
 * <p>If Name is not null, and Value is null it's a standalone attribute.
 * <p>If Name is not null, and Value is "", and Quote is zero it's an empty attribute.
 * <p>If Name is not null, and Value is "", and Quote is ' it's an empty single quoted attribute.
 * <p>If Name is not null, and Value is "", and Quote is " it's an empty double quoted attribute.
 * <p>If Name is not null, and Value is something, and Quote is zero it's a naked attribute.
 * <p>If Name is not null, and Value is something, and Quote is ' it's a single quoted attribute.
 * <p>If Name is not null, and Value is something, and Quote is " it's a double quoted attribute.
 */
public class Attribute
{
    /**
     * The name of this attribute.
     * The part before the equals sign, or the stand-alone attribute.
     */
    String mName;

    /**
     * The value of the attribute.
     * The part after the equals sign.
     */
    String mValue;

    /**
     * The quote, if any, surrounding the value of the attribute, if any.
     */
    char mQuote;

    /**
     * Create an attribute with the name, value and quote character given.
     * @param name The name of this attribute, or null if it's just whitespace.
     * @param value The value of the attribute or null if it's a stand-alone.
     * @param quote The quote, if any, surrounding the value of the attribute,
     * (i.e. ' or "), or zero if none.
     */
    public Attribute (String name, String value, char quote)
    {
        mName = name;
        mValue = value;
        mQuote = quote;
    }

    /**
     * Get the name of this attribute.
     * The part before the equals sign, or the stand-alone attribute.
     * @return The name, or <code>null</code> if it's just a whitepace 'attribute'.
     */
    public String getName ()
    {
        return (mName);
    }

    /**
     * Get the value of the attribute.
     * The part after the equals sign, or the text if it's just a whitepace 'attribute'.
     * @return The value, or <code>null</code> if it's a stand-alone attribute,
     * or the text if it's just a whitepace 'attribute'.
     */
    public String getValue ()
    {
        return (mValue);
    }

    /**
     * Get the quote, if any, surrounding the value of the attribute, if any.
     * @return Either ' or " if the attribute value was quoted, or zero
     * if there are no quotes around it.
     */
    public char getQuote ()
    {
        return (mQuote);
    }

    /**
     * Get a text representation of this attribute.
     * Suitable for insertion into a start tag, the output is one of
     * the forms:
     * <code>
     * <pre>
     * value
     * name
     * name= value
     * name= 'value'
     * name= "value"
     * </pre>
     * </code>
     * @param buffer The accumulator for placing the text into.
     */
    public void toString (StringBuffer buffer)
    {
        String value;
        String name;

        value = getValue ();
        name = getName ();
        if (null == name)
        {
            if (value != null)
                buffer.append (value);
        }
        else
        {
            buffer.append (name);
            if (null != value)
            {
                buffer.append ("=");
                if (0 != getQuote ())
                    buffer.append (getQuote ());
                buffer.append (value);
                if (0 != getQuote ())
                    buffer.append (getQuote ());
            }
        }
    }

    /**
     * Get a text representation of this attribute.
     * @return A string that can be used within a start tag.
     * @see #toString(StringBuffer)
     */
    public String toString ()
    {
        String value;
        String name;
        int length;
        StringBuffer ret;

        // calculate the size we'll need to avoid extra StringBuffer allocations
        length = 0;
        value = getValue ();
        name = getName ();
        if (null == getName ())
        {
            if (value != null)
                length += value.length ();
        }
        else
        {
            length += name.length ();
            if (null != value)
            {
                length += 1;
                length += value.length ();
                if (0 != getQuote ())
                    length += 2;
            }
        }
        ret = new StringBuffer (length);
        toString (ret);

        return (ret.toString ());
    }
}

--- NEW FILE: RemarkNode.java ---
// HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
// 
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
// 
// Postal Address : 
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley, 
// CA 94708, USA
// Website : http://www.industriallogic.com

package org.htmlparser.lexer.nodes;

import org.htmlparser.lexer.Cursor;
import org.htmlparser.lexer.Page;
import org.htmlparser.util.NodeList;
import org.htmlparser.visitors.NodeVisitor;

/**
 * The remark tag is identified and represented by this class.
 */
public class RemarkNode extends AbstractNode
{
	public final static String REMARK_NODE_FILTER="-r";

	/** 
	 * Constructor takes in the text string, beginning and ending posns.
	 * @param page The page this string is on.
	 * @param start The beginning position of the string.
	 * @param end The ending positiong of the string.
	 */
	public RemarkNode (Page page, int start, int end)
	{
		super (page, start, end);
	}

    /** 
	 * Returns the text contents of the comment tag.
     * todo: this only works for the usual case.
	 */
	public String getText()
	{
		return (mPage.getText (elementBegin () + 4, elementEnd () - 3));
	}

    public String toPlainTextString()
    {
		return (getText());
	}
	public String toHtml() {
		return (mPage.getText (elementBegin (), elementEnd ()));
	}
	/**
	 * Print the contents of the remark tag.
	 */
	public String toString()
	{
        Cursor start;
        Cursor end;

        start = new Cursor (getPage (), elementBegin ());
        end = new Cursor (getPage (), elementEnd ());
		return ("Rem (" + start.toString () + "," + end.toString () + "): " + getText ());
	}

	public void collectInto(NodeList collectionList, String filter) {
		if (filter==REMARK_NODE_FILTER) collectionList.add(this);
	}

	public void accept(NodeVisitor visitor) {
// todo: fix this
//		visitor.visitRemarkNode(this);
	}
}

--- NEW FILE: StringNode.java ---
// HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
// 
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
// 
// Postal Address : 
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley, 
// CA 94708, USA
// Website : http://www.industriallogic.com

package org.htmlparser.lexer.nodes;

import org.htmlparser.lexer.Cursor;
import org.htmlparser.lexer.Page;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.visitors.NodeVisitor;

/**
 * Normal text in the HTML document is represented by this class.
 */
public class StringNode extends AbstractNode
{
	public static final String STRING_FILTER = "-string";

	/** 
	 * Constructor takes in the text string, beginning and ending posns.
	 * @param page The page this string is on.
	 * @param start The beginning position of the string.
	 * @param end The ending positiong of the string.
	 */
	public StringNode (Page page, int start, int end)
	{
		super (page, start, end);
	}

    /**
     * Returns the text of the string line
     */
    public String getText ()
    {
        return (toHtml ());
    }

    /**
     * Sets the string contents of the node.
     * @param text The new text for the node.
     */
    public void setText (String text)
    {
        try
        {
            mPage = new Page (text);
            nodeBegin = 0;
            nodeEnd = text.length ();
        }
        catch (ParserException pe)
        {
        }
    }

    public String toPlainTextString ()
    {
        return (toHtml ());
    }

    public String toHtml ()
    {
        return (mPage.getText (elementBegin (), elementEnd ()));
    }

    public String toString ()
	{
        Cursor start;
        Cursor end;

        start = new Cursor (getPage (), elementBegin ());
        end = new Cursor (getPage (), elementEnd ());
		return ("Txt (" + start.toString () + "," + end.toString () + "): " + getText ());
	}

    public void collectInto (NodeList collectionList, String filter)
    {
        if (STRING_FILTER == filter)
            collectionList.add (this);
    }

    public void accept (NodeVisitor visitor)
    {
// todo: fix this
//        visitor.visitStringNode (this);
    }
}

--- NEW FILE: TagNode.java ---
// HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
//
// Postal Address :
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley,
// CA 94708, USA
// Website : http://www.industriallogic.com

package org.htmlparser.lexer.nodes;

import java.util.Enumeration;
import java.util.HashSet;
import java.util.Hashtable;
import java.util.Map;
import java.util.Vector;

import org.htmlparser.lexer.Cursor;
import org.htmlparser.lexer.Page;
import org.htmlparser.parserHelper.SpecialHashtable;
import org.htmlparser.parserHelper.TagParser;
import org.htmlparser.scanners.TagScanner;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.visitors.NodeVisitor;
/**
 * Tag represents a generic tag. This class allows users to register specific
 * tag scanners, which can identify links, or image references. This tag asks the
 * scanners to run over the text, and identify. It can be used to dynamically
 * configure a parser.
 * @author Kaarle Kaila 23.10.2001
 */
public class TagNode extends AbstractNode
{
	public static final String TYPE = "TAG";
	/**
	 * Constant used as value for the value of the tag name
	 * in parseParameters  (Kaarle Kaila 3.8.2001)
	 */
	public final static String TAGNAME = "$<TAGNAME>$";
	public final static String EMPTYTAG = "$<EMPTYTAG>$";
    public final static String NULLVALUE = "$<NULL>$";
    public final static String NOTHING = "$<NOTHING>$";
	private final static String EMPTY_STRING="";

	private static TagParser tagParser;
	private boolean emptyXmlTag = false;

    /**
	 * The tag attributes.
     * Objects of type Attribute.
	 */
	protected Vector mAttributes;

	/**
	 * Scanner associated with this tag (useful for extraction of filtering data from a
	 * HTML node)
	 */
	protected TagScanner thisScanner = null;

    /**
     * Set of tags that breaks the flow.
     */
    protected static HashSet breakTags;
    static
    {
        breakTags = new HashSet (30);
        breakTags.add ("BLOCKQUOTE");
        breakTags.add ("BODY");
        breakTags.add ("BR");
        breakTags.add ("CENTER");
        breakTags.add ("DD");
        breakTags.add ("DIR");
        breakTags.add ("DIV");
        breakTags.add ("DL");
        breakTags.add ("DT");
        breakTags.add ("FORM");
        breakTags.add ("H1");
        breakTags.add ("H2");
        breakTags.add ("H3");
        breakTags.add ("H4");
        breakTags.add ("H5");
        breakTags.add ("H6");
        breakTags.add ("HEAD");
        breakTags.add ("HR");
        breakTags.add ("HTML");
        breakTags.add ("ISINDEX");
        breakTags.add ("LI");
        breakTags.add ("MENU");
        breakTags.add ("NOFRAMES");
        breakTags.add ("OL");
        breakTags.add ("P");
        breakTags.add ("PRE");
        breakTags.add ("TD");
        breakTags.add ("TH");
        breakTags.add ("TITLE");
        breakTags.add ("UL");
    }

	/**
	 * Create a tag with the location and attributes provided
	 * @param page The page this tag was read from.
     * @param start The starting offset of this node within the page.
     * @param end The ending offset of this node within the page.
     * @param attributes The list of attributes that were parsed in this tag.
     * @see Attribute
	 */
	public TagNode (Page page, int start, int end, Vector attributes)
	{
		super (page, start, end);
        mAttributes = attributes;
	}

	/**
	 * Locate the tag withing the input string, by parsing from the given position
	 * @param reader HTML reader to be provided so as to allow reading of next line
	 * @param input Input String
	 * @param position Position to start parsing from
	 */
//	public static Tag find(NodeReader reader,String input,int position) {
//		return tagParser.find(reader,input,position);
//	}

	/**
	 * In case the tag is parsed at the scan method this will return value of a
	 * parameter not implemented yet
	 * @param name of parameter
	 */
	public String getAttribute (String name)
    {
	    return ((String)getAttributes().get(name.toUpperCase()));
	}

	/**
	 * Set attribute with given key, value pair.
	 * @param key
	 * @param value
	 */
	public void setAttribute(String key, String value)
    {
		getAttributes ().put(key,value);
	}

	/**
	 * In case the tag is parsed at the scan method this will return value of a
	 * parameter not implemented yet
	 * @param name of parameter
	 * @deprecated use getAttribute instead
	 */
	public String getParameter(String name)
    {
		return (String)getAttributes().get (name.toUpperCase());
	}

	/**
	 * Gets the attributes in the tag.
     * NOTE: Values of the extended hashtable are two element arrays of String,
     * with the first element being the original name (not uppercased), 
     * and the second element being the value.
	 * @return Returns a special hashtable of attributes in two element String arrays.
	 */
	public Vector getAttributesEx()
    {
		return mAttributes;
	}

	/**
	 * Gets the attributes in the tag.
	 * @return Returns a Hashtable of attributes
	 */
	public Hashtable getAttributes()
    {
        Vector attributes;
        Attribute attribute;
        String value;
        Hashtable ret;

        ret = new SpecialHashtable ();
        attributes = getAttributesEx ();
        if (0 < attributes.size ())
        {
            // special handling for the node name
            attribute = (Attribute)attributes.elementAt (0);
            ret.put (org.htmlparser.tags.Tag.TAGNAME, attribute.getName ().toUpperCase ());
            // the rest
            for (int i = 1; i < attributes.size (); i++)
            {
                attribute = (Attribute)attributes.elementAt (i);
                if (null != attribute.getName ())
                {
                    value = attribute.getValue ();
                    if ('\'' == attribute.getQuote ())
                        value = "'" + value + "'";
                    else if ('"' == attribute.getQuote ())
                        value = "\"" + value + "\"";
                    else if ((null != value) && value.equals (""))
                        value = NOTHING;
                    if (null == value)
                        value = NULLVALUE;
                    ret.put (attribute.getName (), value);
                }
            }
        }
        else
            ret.put (org.htmlparser.tags.Tag.TAGNAME, "");

        return (ret);
	}

    public String getTagName(){
	    return getParameter(TAGNAME);
	}

    /**
	 * Return the text contained in this tag
	 */
	public String getText()
    {
		return (mPage.getText (elementBegin () + 1, elementEnd () - 1));
	}

	/**
	 * Return the scanner associated with this tag.
	 */
	public TagScanner getThisScanner()
	{
		return thisScanner;
	}

    /**
     * Extract the first word from the given string.
     * Words are delimited by whitespace or equals signs.
     * @param s The string to get the word from.
     * @return The first word.
     */
//    public static String extractWord (String s)
//    {
//        int length;
//        boolean parse;
//        char ch;
//        StringBuffer ret;
//
//        length = s.length ();
//        ret = new StringBuffer (length);
//		parse = true;
//		for (int i = 0; i < length && parse; i++)
//        {
//			ch = s.charAt (i);
//			if (Character.isWhitespace (ch) || ch == '=')
//                parse = false;
//            else
//                ret.append (Character.toUpperCase (ch));
//		}
//
//		return (ret.toString ());
//	}

	/**
	 * Scan the tag to see using the registered scanners, and attempt identification.
	 * @param url URL at which HTML page is located
	 * @param reader The NodeReader that is to be used for reading the url
	 */
//	public AbstractNode scan(Map scanners,String url,NodeReader reader) throws ParserException
//	{
//		if (tagContents.length()==0) return this;
//		try {
//			boolean found=false;
//			AbstractNode retVal=null;
//			// Find the first word in the scanners
//			String firstWord = extractWord(tagContents.toString());
//			// Now, get the scanner associated with this.
//			TagScanner scanner = (TagScanner)scanners.get(firstWord);
//			
//			// Now do a deep check
//			if (scanner != null &&
//					scanner.evaluate(
//						tagContents.toString(),
//						reader.getPreviousOpenScanner()
//					)
//				)
//			{
//				found=true;
//                TagScanner save;
//                save = reader.getPreviousOpenScanner ();
//				reader.setPreviousOpenScanner(scanner);
//				retVal=scanner.createScannedNode(this,url,reader,tagLine);
//				reader.setPreviousOpenScanner(save);
//			}
//
//			if (!found) return this;
//			else {
//			    return retVal;
//			}
//		}
//		catch (Exception e) {
//			String errorMsg;
//			if (tagContents!=null) errorMsg = tagContents.toString(); else errorMsg="null";
//			throw new ParserException("Tag.scan() : Error while scanning tag, tag contents = "+errorMsg+", tagLine = "+tagLine,e);
//		}
//	}

	/**
	 * Sets the attributes.
	 * @param attributes The attribute collection to set.
	 */
	public void setAttributes (Hashtable attributes)
    {
        Vector att;
        String key;
        String value;
        char quote;
        Attribute attribute;

        att = new Vector ();
        for (Enumeration e = attributes.keys (); e.hasMoreElements (); )
        {
            key = (String)e.nextElement ();
            value = (String)attributes.get (key);
            if (value.startsWith ("'") && value.endsWith ("'") && (2 <= value.length ()))
            {
                quote = '\'';
                value = value.substring (1, value.length () - 1);
            }
            else if (value.startsWith ("\"") && value.endsWith ("\"") && (2 <= value.length ()))
            {
                quote = '"';
                value = value.substring (1, value.length () - 1);
            }
            else
                quote = (char)0;
            attribute = new Attribute (key, value, quote);
			att.addElement (attribute);
        }
		this.mAttributes = att;
	}

	/**
	 * Sets the attributes.
     * NOTE: Values of the extended hashtable are two element arrays of String,
     * with the first element being the original name (not uppercased), 
     * and the second element being the value.
	 * @param attribs The attribute collection to set.
	 */
    public void setAttributesEx (Vector attribs)
    {
        mAttributes = attribs;
    }

	/**
	 * Sets the nodeBegin.
	 * @param tagBegin The nodeBegin to set
	 */
	public void setTagBegin(int tagBegin) {
		this.nodeBegin = tagBegin;
	}

	/**
	 * Gets the nodeBegin.
	 * @return The nodeBegin value.
	 */
	public int getTagBegin() {
		return (nodeBegin);
	}

	/**
	 * Sets the nodeEnd.
	 * @param tagEnd The nodeEnd to set
	 */
	public void setTagEnd(int tagEnd) {
		this.nodeEnd = tagEnd;
	}

	/**
	 * Gets the nodeEnd.
	 * @return The nodeEnd value.
	 */
	public int getTagEnd() {
		return (nodeEnd);
	}

    public void setText (String text)
    {
        try
        {
            mPage = new Page (text);
            nodeBegin = 0;
            nodeEnd = text.length ();
        }
        catch (ParserException pe)
        {
        }
    }
	public void setThisScanner(TagScanner scanner)
	{
		thisScanner = scanner;
	}

	public String toPlainTextString() {
		return EMPTY_STRING;
	}

	/**
	 * A call to a tag's toHTML() method will render it in HTML
	 * Most tags that do not have children and inherit from Tag,
	 * do not need to override toHTML().
	 * @see org.htmlparser.Node#toHtml()
	 */
	public String toHtml()
    {
		StringBuffer ret;
        Vector attributes;
        Attribute attribute;
        String value;

        ret = new StringBuffer ();
        attributes = getAttributesEx ();
		ret.append ("<");
        if (0 < attributes.size ())
        {
            // special handling for the node name
            attribute = (Attribute)attributes.elementAt (0);
            ret.append (attribute.getName ());
            // the rest
            for (int i = 1; i < attributes.size (); i++)
            {
                attribute = (Attribute)attributes.elementAt (i);
                attribute.toString (ret);
            }
        }
		if (isEmptyXmlTag ())
            ret.append ("/");
		ret.append (">");

		return (ret.toString ());
    }

	/**
	 * Print the contents of the tag
	 */
	public String toString()
	{
        String tag;
        Cursor start;
        Cursor end;

        tag = getTagName ();
        if (tag.startsWith ("/"))
            tag = "End";
        else
            tag = "Tag";
        start = new Cursor (getPage (), elementBegin ());
        end = new Cursor (getPage (), elementEnd ());
		return (tag + " (" + start.toString () + "," + end.toString () + "): " + getText ());
	}

    /**
	 * Sets the tagParser.
	 * @param tagParser The tagParser to set
	 */
	public static void setTagParser(TagParser tagParser) {
//todo: fix this		Tag.tagParser = tagParser;
	}

    /**
     * Determines if the given tag breaks the flow of text.
     * @return <code>true</code> if following text would start on a new line,
     * <code>false</code> otherwise.
     */
    public boolean breaksFlow ()
    {
        return (breakTags.contains (getText ().toUpperCase ()));
    }

    /**
     * This method verifies that the current tag matches the provided
     * filter. The match is based on the string object and not its contents,
     * so ensure that you are using static final filter strings provided
     * in the tag classes.
     * @see org.htmlparser.Node#collectInto(NodeList, String)
     */
	public void collectInto(NodeList collectionList, String filter) {
		if (thisScanner!=null && thisScanner.getFilter()==filter) 
			collectionList.add(this);
	}

	/**
	 * Returns table of attributes in the tag
	 * @return Hashtable
	 * @deprecated This method is deprecated. Use getAttributes() instead.
	 */
	public Hashtable getParsed() {
		return getAttributes ();
	}

	/**
	 * Sometimes, a scanner may need to request a re-evaluation of the
	 * attributes in a tag. This may happen when there is some correction
	 * activity. An example of its usage can be found in ImageTag.
	 * <br>
	 * <B>Note:<B> This is an intensive task, hence call only when
	 * really necessary
	 * @return Hashtable
	 */
	public Hashtable redoParseAttributes()
    {
        mAttributes = null;
        getAttributesEx ();
		return (getAttributes ());
	}

	public void accept(NodeVisitor visitor) {
// todo: fix this		visitor.visitTag(this);
	}

	public String getType() {
		return TYPE;
	}

	/**
	 * Is this an empty xml tag of the form<br>
	 * &lt;tag/&gt; 
	 * @return boolean
	 */
	public boolean isEmptyXmlTag() {
		return emptyXmlTag;
	}

	public void setEmptyXmlTag(boolean emptyXmlTag) {
		this.emptyXmlTag = emptyXmlTag;
	}

}

--- NEW FILE: package.html ---
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<HTML>
<HEAD>
<!--

  @(#)package.html	1.60 98/01/27

 HTMLParser Library v1_4_20030810 - A java-based parser for HTML
 Copyright (C) Dec 31, 2000 Somik Raha

 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
 License as published by the Free Software Foundation; either
 version 2.1 of the License, or (at your option) any later version.

 This library is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 Lesser General Public License for more details.

 You should have received a copy of the GNU Lesser General Public
 License along with this library; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

 For any questions or suggestions, you can write to me at :
 Email :so...@in...

 Postal Address : 
 Somik Raha
 Extreme Programmer & Coach
 Industrial Logic Corporation
 2583 Cedar Street, Berkeley, 
 CA 94708, USA
 Website : http://www.industriallogic.com

-->
<TITLE>Nodes Package</TITLE>
</HEAD>
<BODY>
The nodes package will eventually be the lexemes returned by the base level I/O subsystem.
<EM>It is currently under development.</EM>
There are three types of lexems so far, <code>RemarkNode</code>, <code>StringNode</code> and
<code>TagNode</code>. Within the <code>TagNode</code> objects is a list of
<code>Attribute</code> objects.<p>
The <code>Lexer</code> parses the HTML stream into a contiguous stream of these
tokens. They all implement the <code>Node</code> interface and are derived from the
<code>AbstractNode</code> class.
</BODY>
</HTML>

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes AbstractNode.java,NONE,1.1 Attribute.java

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes AbstractNode.java,NONE,1.1 Attribute.java,NONE,1.1 RemarkNode.java,NONE,1.1 StringNode.java,NONE,1.1 TagNode.java,NONE,1.1 package.html,NONE,1.1