htmlparser-cvs Mailing List for HTML Parser (Page 14)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Derrick O. <der...@us...> - 2004-07-14 01:58:15
|
Update of /cvsroot/htmlparser/htmlparser/lib In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11717/lib Added Files: sax2.jar Log Message: Implement rudimentary sax parser. Currently exposes DOM parser via sax project (http://sourceforge.net/projects/sax) interfaces. --- NEW FILE: sax2.jar --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-07-14 01:58:15
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11717 Modified Files: build.xml Log Message: Implement rudimentary sax parser. Currently exposes DOM parser via sax project (http://sourceforge.net/projects/sax) interfaces. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.69 retrieving revision 1.70 diff -C2 -d -r1.69 -r1.70 *** build.xml 2 Jul 2004 00:49:26 -0000 1.69 --- build.xml 14 Jul 2004 01:58:02 -0000 1.70 *************** *** 121,124 **** --- 121,125 ---- <property name="junit.jar" value="${lib}/junit.jar"/> <property name="commons-logging.jar" value="${lib}/commons-logging.jar"/> + <property name="sax2.jar" value="${lib}/sax2.jar"/> <taskdef resource="checkstyletask.properties" *************** *** 222,226 **** <target name="compile" description="compile all java files"> ! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**,org/htmlparser/util/Generate.java" debug="on" classpath="src:${commons-logging.jar}"/> </target> --- 223,227 ---- <target name="compile" description="compile all java files"> ! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**" debug="on" classpath="src:${commons-logging.jar}"/> </target> *************** *** 246,253 **** <target name="compileparser" depends="compilelexer" description="compile parser java files"> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}"> <include name="org/htmlparser/**/*.java"/> <exclude name="org/htmlparser/tests/**"/> - <exclude name="org/htmlparser/util/Generate.java"/> <exclude name="org/htmlparser/lexerapplications/**/*.java"/> </javac> --- 247,253 ---- <target name="compileparser" depends="compilelexer" description="compile parser java files"> ! <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}:${sax2.jar}"> <include name="org/htmlparser/**/*.java"/> <exclude name="org/htmlparser/tests/**"/> <exclude name="org/htmlparser/lexerapplications/**/*.java"/> </javac> *************** *** 294,298 **** basedir="${src}" includes="**/*.class **/*.gif" ! excludes="org/htmlparser/tests/**/*.class,org/htmlparser/util/Generate.class"> <manifest> <attribute name="Main-Class" value="org.htmlparser.Parser"/> --- 294,298 ---- basedir="${src}" includes="**/*.class **/*.gif" ! excludes="org/htmlparser/tests/**/*.class"> <manifest> <attribute name="Main-Class" value="org.htmlparser.Parser"/> *************** *** 342,345 **** --- 342,346 ---- <pathelement location="${junit.jar}"/> <pathelement location="${commons-logging.jar}"/> + <pathelement location="${sax2.jar}"/> <pathelement location="${java.home}/../lib/tools.jar"/> </classpath> *************** *** 351,354 **** --- 352,356 ---- <pathelement location="${junit.jar}"/> <pathelement location="${commons-logging.jar}"/> + <pathelement location="${sax2.jar}"/> <pathelement location="${java.home}/../lib/tools.jar"/> </classpath> *************** *** 418,423 **** --- 420,427 ---- <group title="Beans" packages="org.htmlparser.beans"/> <group title="Patterns" packages="org.htmlparser.visitors,org.htmlparser.nodeDecorators,org.htmlparser.filters"/> + <group title="Sax" packages="org.htmlparser.sax"/> <group title="Utility" packages="org.htmlparser.util,org.htmlparser.util.sort"/> <link href="http://java.sun.com/j2se/1.4.2/docs/api/"/> + <link href="http://www.saxproject.org/apidoc/"/> </javadoc> <copy file="${resources}/inherit.gif" tofile="${docs}/javadoc/resources/inherit.gif" overwrite="true"/> |
From: Derrick O. <der...@us...> - 2004-07-14 01:58:15
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11717/src/org/htmlparser/sax Added Files: Attributes.java Feedback.java Locator.java XMLReader.java package.html Log Message: Implement rudimentary sax parser. Currently exposes DOM parser via sax project (http://sourceforge.net/projects/sax) interfaces. --- NEW FILE: Locator.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax/Locator.java,v $ // $Author: derrickoswald $ // $Date: 2004/07/14 01:58:02 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.sax; import org.htmlparser.Parser; import org.htmlparser.lexer.Lexer; /** * Transforms character offsets into line and column in the HTML file. */ public class Locator implements org.xml.sax.Locator { /** * Underlying parser object. */ protected Parser mParser; /** * Creates a locator for the given parser. * @param parser The parser with the {@link org.htmlparser.lexer.Page Page} being accessed. */ public Locator (Parser parser) { mParser = parser; } /** * Return the public identifier for the current document event. * * <p>The return value is the public identifier of the document * entity or of the external parsed entity in which the markup * triggering the event appears.</p> * * @return A string containing the public identifier, or * null if none is available. * @see #getSystemId */ public String getPublicId () { return (null); // I assume this would be <title></title> } /** * Return the system identifier for the current document event. * * <p>The return value is the system identifier of the document * entity or of the external parsed entity in which the markup * triggering the event appears.</p> * * <p>If the system identifier is a URL, the parser must resolve it * fully before passing it to the application. For example, a file * name must always be provided as a <em>file:...</em> URL, and other * kinds of relative URI are also resolved against their bases.</p> * * @return A string containing the system identifier, or null * if none is available. * @see #getPublicId */ public String getSystemId () { return (mParser.getURL ()); } /** * Return the line number where the current document event ends. * Lines are delimited by line ends, which are defined in * the XML specification. * * <p><strong>Warning:</strong> The return value from the method * is intended only as an approximation for the sake of diagnostics; * it is not intended to provide sufficient information * to edit the character content of the original XML document. * In some cases, these "line" numbers match what would be displayed * as columns, and in others they may not match the source text * due to internal entity expansion. </p> * * <p>The return value is an approximation of the line number * in the document entity or external parsed entity where the * markup triggering the event appears.</p> * * <p>If possible, the SAX driver should provide the line position * of the first character after the text associated with the document * event. The first line is line 1.</p> * * @return The line number, or -1 if none is available. * @see #getColumnNumber */ public int getLineNumber () { Lexer lexer; lexer = mParser.getLexer (); return (lexer.getPage ().row (lexer.getCursor ())); } /** * Return the column number where the current document event ends. * This is one-based number of Java <code>char</code> values since * the last line end. * * <p><strong>Warning:</strong> The return value from the method * is intended only as an approximation for the sake of diagnostics; * it is not intended to provide sufficient information * to edit the character content of the original XML document. * For example, when lines contain combining character sequences, wide * characters, surrogate pairs, or bi-directional text, the value may * not correspond to the column in a text editor's display. </p> * * <p>The return value is an approximation of the column number * in the document entity or external parsed entity where the * markup triggering the event appears.</p> * * <p>If possible, the SAX driver should provide the line position * of the first character after the text associated with the document * event. The first column in each line is column 1.</p> * * @return The column number, or -1 if none is available. * @see #getLineNumber */ public int getColumnNumber () { Lexer lexer; lexer = mParser.getLexer (); return (lexer.getPage ().column (lexer.getCursor ())); } } --- NEW FILE: package.html --- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <html> <head> <!-- HTMLParser Library $Name: $ - A java-based parser for HTML http://sourceforge.org/projects/htmlparser Copyright (C) 2004 Derrick Oswald Revision Control Information $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax/package.html,v $ $Author: derrickoswald $ $Date: 2004/07/14 01:58:02 $ $Revision: 1.1 $ This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA --> </head> <body bgcolor="white"> The sax package implements a SAX (Simple API for XML) parser for HTML. It uses the SAX 2 interfaces available from the <A href="http://sourceforge.net/projects/sax/">sax</A> project.<br> The HTML parser sax package is currently in it's infancy and just exposes the DOM Parser via a SAX interface. The driver name is "org.htmlparser.sax.XMLReader" and a simplistic test program is available in the org.htmlparser.tests package as SAXTest.java.<br> Some major pieces are missing, like namespace support (HTML files won't generally have much in the way of namespaces), attribute type info, resolvers and DTD handlers, among many other things.<br> </body> </html> --- NEW FILE: XMLReader.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax/XMLReader.java,v $ // $Author: derrickoswald $ // $Date: 2004/07/14 01:58:02 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.sax; import java.io.IOException; import org.xml.sax.ContentHandler; import org.xml.sax.DTDHandler; import org.xml.sax.EntityResolver; import org.xml.sax.ErrorHandler; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.helpers.NamespaceSupport; import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.Remark; import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.util.DefaultParserFeedback; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.util.ParserFeedback; /** * SAX parser. * Generates callbacks on the {@link ContentHandler} based on encountered nodes. * <br><em>Preliminary</em>. * <pre> * org.xml.sax.XMLReader reader = org.xml.sax.helpers.XMLReaderFactory.createXMLReader ("org.htmlparser.sax.XMLReader"); * org.xml.sax.ContentHandler content = new MyContentHandler (); * reader.setContentHandler (content); * org.xml.sax.ErrorHandler errors = new MyErrorHandler (); * reader.setErrorHandler (errors); * reader.parse ("http://cbc.ca"); * </pre> */ public class XMLReader implements org.xml.sax.XMLReader { /** * Determines if namespace handling is on. * All XMLReaders are required to recognize the feature names: * <ul> * <li><code>http://xml.org/sax/features/namespaces</code> - * a value of "true" indicates namespace URIs and unprefixed * local names for element and attribute names will be available</li> * <li><code>http://xml.org/sax/features/namespace-prefixes</code> - * a value of "true" indicates that XML qualified names (with * prefixes) and attributes (including xmlns* attributes) will * be available. * </ul> */ protected boolean mNameSpaces; // namespaces /** * Determines if namespace prefix handling is on. * @see #mNameSpaces */ protected boolean mNameSpacePrefixes; // namespace-prefixes /** * <em> not implemented</em> */ protected EntityResolver mEntityResolver; /** * <em> not implemented</em> */ protected DTDHandler mDTDHandler; /** * The content callback object. */ protected ContentHandler mContentHandler; /** * The error handler object. */ protected ErrorHandler mErrorHandler; /** * The underlying DOM parser. */ protected Parser mParser; /** * Namspace utility object. */ protected NamespaceSupport mSupport; /** * Qualified name parts. */ protected String mParts[]; /** * Create an SAX parser. */ public XMLReader () { mNameSpaces = true; mNameSpacePrefixes = false; mEntityResolver = null; mDTDHandler = null; mContentHandler = null; mErrorHandler = null; mSupport = new NamespaceSupport (); mSupport.pushContext (); mSupport.declarePrefix ("", "http://www.w3.org/TR/REC-html40"); // todo: // xmlns:html='http://www.w3.org/TR/REC-html40' // or xmlns:html='http://www.w3.org/1999/xhtml' mParts = new String[3]; } //////////////////////////////////////////////////////////////////// // Configuration. //////////////////////////////////////////////////////////////////// /** * Look up the value of a feature flag. * * <p>The feature name is any fully-qualified URI. It is * possible for an XMLReader to recognize a feature name but * temporarily be unable to return its value. * Some feature values may be available only in specific * contexts, such as before, during, or after a parse. * Also, some feature values may not be programmatically accessible. * (In the case of an adapter for SAX1 {@link Parser}, there is no * implementation-independent way to expose whether the underlying * parser is performing validation, expanding external entities, * and so forth.) </p> * * <p>All XMLReaders are required to recognize the * http://xml.org/sax/features/namespaces and the * http://xml.org/sax/features/namespace-prefixes feature names.</p> * * <p>Typical usage is something like this:</p> * * <pre> * XMLReader r = new MySAXDriver(); * * // try to activate validation * try { * r.setFeature("http://xml.org/sax/features/validation", true); * } catch (SAXException e) { * System.err.println("Cannot activate validation."); * } * * // register event handlers * r.setContentHandler(new MyContentHandler()); * r.setErrorHandler(new MyErrorHandler()); * * // parse the first document * try { * r.parse("http://www.foo.com/mydoc.xml"); * } catch (IOException e) { * System.err.println("I/O exception reading XML document"); * } catch (SAXException e) { * System.err.println("XML exception reading document."); * } * </pre> * * <p>Implementors are free (and encouraged) to invent their own features, * using names built on their own URIs.</p> * * @param name The feature name, which is a fully-qualified URI. * @return The current value of the feature (true or false). * @exception org.xml.sax.SAXNotRecognizedException If the feature * value can't be assigned or retrieved. * @exception org.xml.sax.SAXNotSupportedException When the * XMLReader recognizes the feature name but * cannot determine its value at this time. * @see #setFeature */ public boolean getFeature (String name) throws SAXNotRecognizedException, SAXNotSupportedException { boolean ret; if (name.equals ("http://xml.org/sax/features/namespaces")) ret = mNameSpaces; else if (name.equals ("http://xml.org/sax/features/namespace-prefixes")) ret = mNameSpacePrefixes; else throw new SAXNotSupportedException (name + " not yet understood"); return (ret); } /** * Set the value of a feature flag. * * <p>The feature name is any fully-qualified URI. It is * possible for an XMLReader to expose a feature value but * to be unable to change the current value. * Some feature values may be immutable or mutable only * in specific contexts, such as before, during, or after * a parse.</p> * * <p>All XMLReaders are required to support setting * http://xml.org/sax/features/namespaces to true and * http://xml.org/sax/features/namespace-prefixes to false.</p> * * @param name The feature name, which is a fully-qualified URI. * @param value The requested value of the feature (true or false). * @exception org.xml.sax.SAXNotRecognizedException If the feature * value can't be assigned or retrieved. * @exception org.xml.sax.SAXNotSupportedException When the * XMLReader recognizes the feature name but * cannot set the requested value. * @see #getFeature */ public void setFeature (String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException { if (name.equals ("http://xml.org/sax/features/namespaces")) mNameSpaces = value; else if (name.equals ("http://xml.org/sax/features/namespace-prefixes")) mNameSpacePrefixes = value; else throw new SAXNotSupportedException (name + " not yet understood"); } /** * Look up the value of a property. * * <p>The property name is any fully-qualified URI. It is * possible for an XMLReader to recognize a property name but * temporarily be unable to return its value. * Some property values may be available only in specific * contexts, such as before, during, or after a parse.</p> * * <p>XMLReaders are not required to recognize any specific * property names, though an initial core set is documented for * SAX2.</p> * * <p>Implementors are free (and encouraged) to invent their own properties, * using names built on their own URIs.</p> * * @param name The property name, which is a fully-qualified URI. * @return The current value of the property. * @exception org.xml.sax.SAXNotRecognizedException If the property * value can't be assigned or retrieved. * @exception org.xml.sax.SAXNotSupportedException When the * XMLReader recognizes the property name but * cannot determine its value at this time. * @see #setProperty */ public Object getProperty (String name) throws SAXNotRecognizedException, SAXNotSupportedException { throw new SAXNotSupportedException (name + " not yet understood"); } /** * Set the value of a property. * * <p>The property name is any fully-qualified URI. It is * possible for an XMLReader to recognize a property name but * to be unable to change the current value. * Some property values may be immutable or mutable only * in specific contexts, such as before, during, or after * a parse.</p> * * <p>XMLReaders are not required to recognize setting * any specific property names, though a core set is defined by * SAX2.</p> * * <p>This method is also the standard mechanism for setting * extended handlers.</p> * * @param name The property name, which is a fully-qualified URI. * @param value The requested value for the property. * @exception org.xml.sax.SAXNotRecognizedException If the property * value can't be assigned or retrieved. * @exception org.xml.sax.SAXNotSupportedException When the * XMLReader recognizes the property name but * cannot set the requested value. */ public void setProperty (String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException { throw new SAXNotSupportedException (name + " not yet understood"); } //////////////////////////////////////////////////////////////////// // Event handlers. //////////////////////////////////////////////////////////////////// /** * Allow an application to register an entity resolver. * * <p>If the application does not register an entity resolver, * the XMLReader will perform its own default resolution.</p> * * <p>Applications may register a new or different resolver in the * middle of a parse, and the SAX parser must begin using the new * resolver immediately.</p> * * @param resolver The entity resolver. * @see #getEntityResolver */ public void setEntityResolver (EntityResolver resolver) { mEntityResolver = resolver; } /** * Return the current entity resolver. * * @return The current entity resolver, or null if none * has been registered. * @see #setEntityResolver */ public EntityResolver getEntityResolver () { return (mEntityResolver); } /** * Allow an application to register a DTD event handler. * * <p>If the application does not register a DTD handler, all DTD * events reported by the SAX parser will be silently ignored.</p> * * <p>Applications may register a new or different handler in the * middle of a parse, and the SAX parser must begin using the new * handler immediately.</p> * * @param handler The DTD handler. * @see #getDTDHandler */ public void setDTDHandler (DTDHandler handler) { mDTDHandler = handler; } /** * Return the current DTD handler. * * @return The current DTD handler, or null if none * has been registered. * @see #setDTDHandler */ public DTDHandler getDTDHandler () { return (mDTDHandler); } /** * Allow an application to register a content event handler. * * <p>If the application does not register a content handler, all * content events reported by the SAX parser will be silently * ignored.</p> * * <p>Applications may register a new or different handler in the * middle of a parse, and the SAX parser must begin using the new * handler immediately.</p> * * @param handler The content handler. * @see #getContentHandler */ public void setContentHandler (ContentHandler handler) { mContentHandler = handler; } /** * Return the current content handler. * * @return The current content handler, or null if none * has been registered. * @see #setContentHandler */ public ContentHandler getContentHandler () { return (mContentHandler); } /** * Allow an application to register an error event handler. * * <p>If the application does not register an error handler, all * error events reported by the SAX parser will be silently * ignored; however, normal processing may not continue. It is * highly recommended that all SAX applications implement an * error handler to avoid unexpected bugs.</p> * * <p>Applications may register a new or different handler in the * middle of a parse, and the SAX parser must begin using the new * handler immediately.</p> * * @param handler The error handler. * @see #getErrorHandler */ public void setErrorHandler (ErrorHandler handler) { mErrorHandler = handler; } /** * Return the current error handler. * * @return The current error handler, or null if none * has been registered. * @see #setErrorHandler */ public ErrorHandler getErrorHandler () { return (mErrorHandler); } //////////////////////////////////////////////////////////////////// // Parsing. //////////////////////////////////////////////////////////////////// /** * Parse an XML document. * * <p>The application can use this method to instruct the XML * reader to begin parsing an XML document from any valid input * source (a character stream, a byte stream, or a URI).</p> * * <p>Applications may not invoke this method while a parse is in * progress (they should create a new XMLReader instead for each * nested XML document). Once a parse is complete, an * application may reuse the same XMLReader object, possibly with a * different input source. * Configuration of the XMLReader object (such as handler bindings and * values established for feature flags and properties) is unchanged * by completion of a parse, unless the definition of that aspect of * the configuration explicitly specifies other behavior. * (For example, feature flags or properties exposing * characteristics of the document being parsed.) * </p> * * <p>During the parse, the XMLReader will provide information * about the XML document through the registered event * handlers.</p> * * <p>This method is synchronous: it will not return until parsing * has ended. If a client application wants to terminate * parsing early, it should throw an exception.</p> * * @param input The input source for the top-level of the * XML document. * @exception org.xml.sax.SAXException Any SAX exception, possibly * wrapping another exception. * @exception java.io.IOException An IO exception from the parser, * possibly from a byte stream or character stream * supplied by the application. * @see org.xml.sax.InputSource * @see #parse(java.lang.String) * @see #setEntityResolver * @see #setDTDHandler * @see #setContentHandler * @see #setErrorHandler */ public void parse (InputSource input) throws IOException, SAXException { throw new SAXException ("parse (InputSource input) is not yet supported"); } /** * Parse an XML document from a system identifier (URI). * * <p>This method is a shortcut for the common case of reading a * document from a system identifier. It is the exact * equivalent of the following:</p> * * <pre> * parse(new InputSource(systemId)); * </pre> * * <p>If the system identifier is a URL, it must be fully resolved * by the application before it is passed to the parser.</p> * * @param systemId The system identifier (URI). * @exception org.xml.sax.SAXException Any SAX exception, possibly * wrapping another exception. * @exception java.io.IOException An IO exception from the parser, * possibly from a byte stream or character stream * supplied by the application. * @see #parse(org.xml.sax.InputSource) */ public void parse (String systemId) throws IOException, SAXException { Locator locator; ParserFeedback feedback; if (null != mContentHandler) try { mParser = new Parser (systemId); locator = new Locator (mParser); if (null != mErrorHandler) feedback = new Feedback (mErrorHandler, locator); else feedback = new DefaultParserFeedback (DefaultParserFeedback.QUIET); mParser.setFeedback (feedback); // OK, try a simplistic parse mContentHandler.setDocumentLocator (locator); try { mContentHandler.startDocument (); for (NodeIterator iterator = mParser.elements (); iterator.hasMoreNodes (); ) doSAX (iterator.nextNode ()); mContentHandler.endDocument (); } catch (SAXException se) { if (null != mErrorHandler) mErrorHandler.fatalError ( new SAXParseException ("contentHandler threw me", locator, se)); } } catch (ParserException pe) { if (null != mErrorHandler) mErrorHandler.fatalError ( new SAXParseException (pe.getMessage (), "", systemId, 0, 0)); } } /** * Process nodes recursively on the DocumentHandler. * Calls methods on the handler based on the type and whether it's an end tag. * Processes composite tags recursively. * Does rudimentary namespace processing according to the state of {@link #mNameSpaces} * and {@link #mNameSpacePrefixes}. * @param node The htmlparser node to traverse. */ protected void doSAX (Node node) throws ParserException, SAXException { Tag tag; Tag end; if (node instanceof Remark) { String text = mParser.getLexer ().getPage ().getText (node.getStartPosition (), node.getEndPosition ()); mContentHandler.ignorableWhitespace (text.toCharArray (), 0, text.length ()); } else if (node instanceof Text) { String text = mParser.getLexer ().getPage ().getText (node.getStartPosition (), node.getEndPosition ()); mContentHandler.characters (text.toCharArray (), 0, text.length ()); } else if (node instanceof Tag) { tag = (Tag)node; if (mNameSpaces) mSupport.processName (tag.getTagName (), mParts, false); else { mParts[0] = ""; mParts[1] = ""; } if (mNameSpacePrefixes) mParts[2] = tag.getTagName (); else if (mNameSpaces) mParts[2] = ""; else mParts[2] = tag.getTagName (); mContentHandler.startElement ( mParts[0], // uri mParts[1], // local mParts[2], // raw new Attributes (tag, mSupport, mParts)); NodeList children = tag.getChildren (); if (null != children) for (int i = 0; i < children.size (); i++) doSAX (children.elementAt (i)); end = tag.getEndTag (); if (null != end) { if (mNameSpaces) mSupport.processName (end.getTagName (), mParts, false); else { mParts[0] = ""; mParts[1] = ""; } if (mNameSpacePrefixes) mParts[2] = end.getTagName (); else if (mNameSpaces) mParts[2] = ""; else mParts[2] = end.getTagName (); mContentHandler.endElement ( mParts[0], // uri mParts[1], // local mParts[2]); // raw } } } } --- NEW FILE: Feedback.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax/Feedback.java,v $ // $Author: derrickoswald $ // $Date: 2004/07/14 01:58:02 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.sax; import org.xml.sax.ErrorHandler; import org.xml.sax.Locator; import org.xml.sax.SAXParseException; import org.htmlparser.util.ParserException; import org.htmlparser.util.ParserFeedback; import org.xml.sax.SAXException; /** * Mediates between the feedback mechanism of the htmlparser and an error handler. */ public class Feedback implements ParserFeedback { /** * The error handler to call back on. */ protected ErrorHandler mErrorHandler; /** * The locator for tag positions. */ protected Locator mLocator; /** * Create a feedback/error handler mediator. * @param handler The callback object. * @param locator A locator for error locations. */ public Feedback (ErrorHandler handler, Locator locator) { mErrorHandler = handler; mLocator = locator; } /** * <em>Just eats the info message.</em> * @param message {@inheritDoc} */ public void info (String message) { // swallow } /** * Calls {@link ErrorHandler#warning(SAXParseException) ErrorHandler.warning}. * @param message {@inheritDoc} */ public void warning (String message) { try { mErrorHandler.warning ( new SAXParseException (message, mLocator)); } catch (SAXException se) { se.printStackTrace (); } } /** * Calls {@link ErrorHandler#error(SAXParseException) ErrorHandler.error}. * @param message {@inheritDoc} */ public void error (String message, ParserException e) { try { mErrorHandler.error ( new SAXParseException (message, mLocator, e)); } catch (SAXException se) { se.printStackTrace (); } } } --- NEW FILE: Attributes.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax/Attributes.java,v $ // $Author: derrickoswald $ // $Date: 2004/07/14 01:58:02 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.sax; import java.util.Vector; import org.htmlparser.Attribute; import org.htmlparser.Tag; import org.xml.sax.helpers.NamespaceSupport; /** * Provides access to the tag attributes. */ public class Attributes implements org.xml.sax.Attributes { /** * The tag from which attributes are exposed. */ protected Tag mTag; /** * The utility class that converts namespaces. */ protected NamespaceSupport mSupport; /** * Elements of the qname. * Allocated once for all uses of {@link #mSupport}. */ protected String[] mParts; /** * Create an attibute access object. * @param tag The tag to expose. * @param support The namespace converter. * @param parts The elements of the qualified name. */ public Attributes (Tag tag, NamespaceSupport support, String[] parts) { mTag = tag; mSupport = support; mParts = parts; } //////////////////////////////////////////////////////////////////// // Indexed access. //////////////////////////////////////////////////////////////////// /** * Return the number of attributes in the list. * * <p>Once you know the number of attributes, you can iterate * through the list.</p> * * @return The number of attributes in the list. * @see #getURI(int) * @see #getLocalName(int) * @see #getQName(int) * @see #getType(int) * @see #getValue(int) */ public int getLength () { return (mTag.getAttributesEx ().size () - 1); } /** * Look up an attribute's Namespace URI by index. * * @param index The attribute index (zero-based). * @return The Namespace URI, or the empty string if none * is available, or null if the index is out of * range. * @see #getLength */ public String getURI (int index) { mSupport.processName (getQName (index), mParts, true); return (mParts[0]); } /** * Look up an attribute's local name by index. * * @param index The attribute index (zero-based). * @return The local name, or the empty string if Namespace * processing is not being performed, or null * if the index is out of range. * @see #getLength */ public String getLocalName (int index) { mSupport.processName (getQName (index), mParts, true); return (mParts[1]); } /** * Look up an attribute's XML qualified (prefixed) name by index. * * @param index The attribute index (zero-based). * @return The XML qualified name, or the empty string * if none is available, or null if the index * is out of range. * @see #getLength */ public String getQName (int index) { Attribute attribute; String ret; attribute = (Attribute)(mTag.getAttributesEx ().elementAt (index + 1)); if (attribute.isWhitespace ()) ret = "#text"; else ret = attribute.getName (); return (ret); } /** * Look up an attribute's type by index. * * <p>The attribute type is one of the strings "CDATA", "ID", * "IDREF", "IDREFS", "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", * or "NOTATION" (always in upper case).</p> * * <p>If the parser has not read a declaration for the attribute, * or if the parser does not report attribute types, then it must * return the value "CDATA" as stated in the XML 1.0 Recommendation * (clause 3.3.3, "Attribute-Value Normalization").</p> * * <p>For an enumerated attribute that is not a notation, the * parser will report the type as "NMTOKEN".</p> * * @param index The attribute index (zero-based). * @return The attribute's type as a string, or null if the * index is out of range. * @see #getLength */ public String getType (int index) { return ("CDATA"); } /** * Look up an attribute's value by index. * * <p>If the attribute value is a list of tokens (IDREFS, * ENTITIES, or NMTOKENS), the tokens will be concatenated * into a single string with each token separated by a * single space.</p> * * @param index The attribute index (zero-based). * @return The attribute's value as a string, or null if the * index is out of range. * @see #getLength */ public String getValue (int index) { Attribute attribute; String ret; attribute = (Attribute)(mTag.getAttributesEx ().elementAt (index + 1)); ret = attribute.getValue (); if (null == ret) ret = ""; return (ret); } //////////////////////////////////////////////////////////////////// // Name-based query. //////////////////////////////////////////////////////////////////// /** * Look up the index of an attribute by Namespace name. * * @param uri The Namespace URI, or the empty string if * the name has no Namespace URI. * @param localName The attribute's local name. * @return The index of the attribute, or -1 if it does not * appear in the list. */ public int getIndex (String uri, String localName) { Vector attributes; int size; Attribute attribute; String string; int ret; ret = -1; attributes = mTag.getAttributesEx (); if (null != attributes) { size = attributes.size (); for (int i = 1; i < size; i++) { attribute = (Attribute)attributes.elementAt (i); string = attribute.getName (); if (null != string) // not whitespace { mSupport.processName (string, mParts, true); if ( uri.equals (mParts[0]) & localName.equalsIgnoreCase (mParts[1])) { ret = i; i = size; // exit fast } } } } return (ret); } /** * Look up the index of an attribute by XML qualified (prefixed) name. * * @param qName The qualified (prefixed) name. * @return The index of the attribute, or -1 if it does not * appear in the list. */ public int getIndex (String qName) { mSupport.processName (qName, mParts, true); return (getIndex (mParts[0], mParts[1])); } /** * Look up an attribute's type by Namespace name. * * <p>See {@link #getType(int) getType(int)} for a description * of the possible types.</p> * * @param uri The Namespace URI, or the empty String if the * name has no Namespace URI. * @param localName The local name of the attribute. * @return The attribute type as a string, or null if the * attribute is not in the list or if Namespace * processing is not being performed. */ public String getType (String uri, String localName) { return (null); } /** * Look up an attribute's type by XML qualified (prefixed) name. * * <p>See {@link #getType(int) getType(int)} for a description * of the possible types.</p> * * @param qName The XML qualified name. * @return The attribute type as a string, or null if the * attribute is not in the list or if qualified names * are not available. */ public String getType (String qName) { return (null); } /** * Look up an attribute's value by Namespace name. * * <p>See {@link #getValue(int) getValue(int)} for a description * of the possible values.</p> * * @param uri The Namespace URI, or the empty String if the * name has no Namespace URI. * @param localName The local name of the attribute. * @return The attribute value as a string, or null if the * attribute is not in the list. */ public String getValue (String uri, String localName) { return (mTag.getAttribute (localName)); } /** * Look up an attribute's value by XML qualified (prefixed) name. * * <p>See {@link #getValue(int) getValue(int)} for a description * of the possible values.</p> * * @param qName The XML qualified name. * @return The attribute value as a string, or null if the * attribute is not in the list or if qualified names * are not available. */ public String getValue (String qName) { mSupport.processName (qName, mParts, true); return (getValue (mParts[0], mParts[1])); } } |
From: Derrick O. <der...@us...> - 2004-07-14 01:58:15
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11717/src/org/htmlparser/tests Added Files: SAXTest.java Log Message: Implement rudimentary sax parser. Currently exposes DOM parser via sax project (http://sourceforge.net/projects/sax) interfaces. --- NEW FILE: SAXTest.java --- // SAXTest.java - test application for SAX2 package org.htmlparser.tests; import java.io.IOException; import java.net.MalformedURLException; import java.net.URL; import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; import org.xml.sax.ErrorHandler; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.XMLReaderFactory; /** * Test class for SAX2. */ public class SAXTest implements ContentHandler, ErrorHandler { //////////////////////////////////////////////////////////////////// // Main app. //////////////////////////////////////////////////////////////////// /** * Main application entry point. */ public static void main (String args[]) { System.out.println("************************************" + "************************************"); System.out.println("* Testing SAX2"); System.out.println("************************************" + "************************************"); System.out.print("\n"); // // Figure out the XML reader // // String driverName = // System.getProperty("org.xml.sax.driver", // "org.apache.xerces.parsers.SAXParser"); String driverName = "org.htmlparser.sax.XMLReader"; System.out.println("SAX driver class: " + driverName + "\n (you can specify a different one using the " + "org.xml.sax.driver property)"); System.out.print("\n"); // // Create the XML reader // System.out.println("Now, we'll try to create an instance of the " + "driver, using XMLReaderFactory"); XMLReader reader = null; try { reader = XMLReaderFactory.createXMLReader(driverName); } catch (SAXException e) { System.out.println("Failed to create XMLReader: " + e.getMessage() + "\nMake sure that the class actually " + "exists and is present on your CLASSPATH" + "\nor specify a different class using the " + "org.xml.sax.driver property"); System.exit(1); } System.out.println("XMLReader created successfully\n"); // // Check features. // System.out.println("Checking defaults for some well-known features:"); checkFeature(reader, "http://xml.org/sax/features/namespaces"); checkFeature(reader, "http://xml.org/sax/features/namespace-prefixes"); checkFeature(reader, "http://xml.org/sax/features/string-interning"); checkFeature(reader, "http://xml.org/sax/features/validation"); checkFeature(reader, "http://xml.org/sax/features/external-general-entities"); checkFeature(reader, "http://xml.org/sax/features/external-parameter-entities"); System.out.print("\n"); // // Assign handlers. // System.out.println("Creating and assigning handlers\n"); SAXTest handler = new SAXTest(); reader.setContentHandler(handler); reader.setErrorHandler(handler); // // Parse documents. // if (args.length > 0) { for (int i = 0; i < args.length; i++) { String systemId = makeAbsoluteURL(args[i]); System.out.println("Trying file " + systemId); try { reader.parse(systemId); } catch (SAXException e1) { System.out.println(systemId + " failed with XML error: " + e1.getMessage()); } catch (IOException e2) { System.out.println(systemId + " failed with I/O error: " + e2.getMessage()); } System.out.print("\n"); } } else { System.out.println("No documents supplied on command line; " + "parsing skipped."); } // // Done. // System.out.println("SAX2 test finished."); } /** * Check and display the value of a feature. */ private static void checkFeature (XMLReader reader, String name) { try { System.out.println(" " + name + " = " + reader.getFeature(name)); } catch (SAXNotRecognizedException e) { System.out.println("XMLReader does not recognize feature " + name); } catch (SAXNotSupportedException e) { System.out.println("XMLReader recognizes feature " + name + " but does not support checking its value"); } } /** * Construct an absolute URL if necessary. * * This method is useful for relative file paths on a command * line; it converts them to absolute file: URLs, using the * correct path separator. This method is based on an * original suggestion by James Clark. * * @param url The (possibly relative) URL. * @return An absolute URL of some sort. */ private static String makeAbsoluteURL (String url) { URL baseURL; String currentDirectory = System.getProperty("user.dir"); String fileSep = System.getProperty("file.separator"); String file = currentDirectory.replace(fileSep.charAt(0), '/') + '/'; if (file.charAt(0) != '/') { file = "/" + file; } try { baseURL = new URL("file", null, file); return new URL(baseURL, url).toString(); } catch (MalformedURLException e) { System.err.println(url + ": " + e.getMessage()); return url; } } private static String makeNSName (String uri, String localName, String qName) { if (uri.equals("")) uri = "[none]"; if (localName.equals("")) localName = "[none]"; if (qName.equals("")) qName = "[none]"; return uri + '/' + localName + '/' + qName; } private static String escapeData (char ch[], int start, int length) { StringBuffer buf = new StringBuffer(); for (int i = start; i < start + length; i++) { switch(ch[i]) { case '\n': buf.append("\\n"); break; case '\t': buf.append("\\t"); break; case '\r': buf.append("\\r"); break; default: buf.append(ch[i]); break; } } return buf.toString(); } //////////////////////////////////////////////////////////////////// // Implementation of org.xml.sax.ContentHandler. //////////////////////////////////////////////////////////////////// public void setDocumentLocator (Locator locator) { System.out.println(" EVENT: setDocumentLocator"); } public void startDocument () throws SAXException { System.out.println(" EVENT: startDocument"); } public void endDocument () throws SAXException { System.out.println(" EVENT: endDocument"); } public void startPrefixMapping (String prefix, String uri) throws SAXException { System.out.println(" EVENT: startPrefixMapping " + prefix + " = " + uri); } public void endPrefixMapping (String prefix) throws SAXException { System.out.println(" EVENT: endPrefixMapping " + prefix); } public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { System.out.println(" EVENT: startElement " + makeNSName(namespaceURI, localName, qName)); int attLen = atts.getLength(); for (int i = 0; i < attLen; i++) { char ch[] = atts.getValue(i).toCharArray(); System.out.println(" Attribute " + makeNSName(atts.getURI(i), atts.getLocalName(i), atts.getQName(i)) + '=' + escapeData(ch, 0, ch.length)); } } public void endElement (String namespaceURI, String localName, String qName) throws SAXException { System.out.println(" EVENT: endElement " + makeNSName(namespaceURI, localName, qName)); } public void characters (char ch[], int start, int length) throws SAXException { System.out.println(" EVENT: characters " + escapeData(ch, start, length)); } public void ignorableWhitespace (char ch[], int start, int length) throws SAXException { System.out.println(" EVENT: ignorableWhitespace " + escapeData(ch, start, length)); } public void processingInstruction (String target, String data) throws SAXException { System.out.println(" EVENT: processingInstruction " + target + ' ' + data); } public void skippedEntity (String name) throws SAXException { System.out.println(" EVENT: skippedEntity " + name); } //////////////////////////////////////////////////////////////////// // Implementation of org.xml.sax.ErrorHandler. //////////////////////////////////////////////////////////////////// public void warning (SAXParseException e) throws SAXException { System.out.println(" EVENT: warning " + e.getMessage() + ' ' + e.getSystemId() + ' ' + e.getLineNumber() + ' ' + e.getColumnNumber()); } public void error (SAXParseException e) throws SAXException { System.out.println(" EVENT: error " + e.getMessage() + ' ' + e.getSystemId() + ' ' + e.getLineNumber() + ' ' + e.getColumnNumber()); } public void fatalError (SAXParseException e) throws SAXException { System.out.println(" EVENT: fatal error " + e.getMessage() + ' ' + e.getSystemId() + ' ' + e.getLineNumber() + ' ' + e.getColumnNumber()); } } // end of SAXTest.java |
From: Derrick O. <der...@us...> - 2004-07-13 23:36:11
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18170/sax Log Message: Directory /cvsroot/htmlparser/htmlparser/src/org/htmlparser/sax added to the repository |
From: Derrick O. <der...@us...> - 2004-07-13 01:02:48
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28121/docs Modified Files: contributors.html Log Message: Add fix to Page.getContentType() suggested by Manuel Polo. Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** contributors.html 26 Jun 2004 11:25:00 -0000 1.11 --- contributors.html 13 Jul 2004 01:02:39 -0000 1.12 *************** *** 396,400 **** </tr> </table> ! <p>Thanks to Enrico Triolo, Gernot Fricke, Nick Burch, Stephen Harrington, Domenico Lordi, Kamen, John Zook, Cheng Jun, Mazlan Mat, Rob Shields, Wolfgang Germund, Raj Sharma, Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, and Manpreet Singh --- 396,401 ---- </tr> </table> ! <p>Thanks to Manuel Polo, Enrico Triolo, Gernot Fricke, Nick Burch, ! Stephen Harrington, Domenico Lordi, Kamen, John Zook, Cheng Jun, Mazlan Mat, Rob Shields, Wolfgang Germund, Raj Sharma, Robert Kausch, Gordon Deudney, Serge Kruppa, Roger Kjensrud, and Manpreet Singh |
From: Derrick O. <der...@us...> - 2004-07-13 01:02:48
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28121/src/org/htmlparser/lexer Modified Files: Page.java Log Message: Add fix to Page.getContentType() suggested by Manuel Polo. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** Page.java 3 Jul 2004 13:56:08 -0000 1.38 --- Page.java 13 Jul 2004 01:02:38 -0000 1.39 *************** *** 460,463 **** --- 460,464 ---- { URLConnection connection; + String content; String ret; *************** *** 465,469 **** connection = getConnection (); if (null != connection) ! ret = connection.getContentType (); return (ret); --- 466,474 ---- connection = getConnection (); if (null != connection) ! { ! content = connection.getContentType (); ! if (null != content) ! ret = content; ! } return (ret); |
From: Derrick O. <der...@us...> - 2004-07-03 13:56:49
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14744/util Modified Files: ParserUtils.java Log Message: Further fix to bug #973137 Double-bytes characters are messed after parsing. Created a proper String based source with the encoding only optionally specified. A string is no longer converted to a byte array and then back to characters. Index: ParserUtils.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** ParserUtils.java 2 Jul 2004 00:49:32 -0000 1.42 --- ParserUtils.java 3 Jul 2004 13:56:09 -0000 1.43 *************** *** 27,31 **** package org.htmlparser.util; - import java.io.ByteArrayInputStream; import java.io.UnsupportedEncodingException; import java.util.ArrayList; --- 27,30 ---- *************** *** 1098,1103 **** Parser parser = new Parser(); Lexer lexer = new Lexer(); ! String defCharSet = new Page().DEFAULT_CHARSET; ! Page page = new Page(new ByteArrayInputStream(input.getBytes(defCharSet)), defCharSet); lexer.setPage(page); parser.setLexer(lexer); --- 1097,1101 ---- Parser parser = new Parser(); Lexer lexer = new Lexer(); ! Page page = new Page(input); lexer.setPage(page); parser.setLexer(lexer); |
From: Derrick O. <der...@us...> - 2004-07-03 13:56:31
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14744/lexer Modified Files: Page.java Source.java Added Files: InputStreamSource.java StringSource.java Log Message: Further fix to bug #973137 Double-bytes characters are messed after parsing. Created a proper String based source with the encoding only optionally specified. A string is no longer converted to a byte array and then back to characters. Index: Source.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Source.java,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** Source.java 2 Jan 2004 16:24:53 -0000 1.15 --- Source.java 3 Jul 2004 13:56:08 -0000 1.16 *************** *** 27,52 **** package org.htmlparser.lexer; - import java.io.ByteArrayInputStream; import java.io.IOException; - import java.io.InputStream; - import java.io.InputStreamReader; - import java.io.ObjectInputStream; - import java.io.ObjectOutputStream; import java.io.Reader; import java.io.Serializable; ! import java.io.UnsupportedEncodingException; /** * A buffered source of characters. ! * A Source is very similar to a the following construct: * <pre> ! * new InputStreamReader (new BufferedInputStream (connection.getInputStream ()), charset) * </pre> ! * It differs from the above, in two ways: ! * <li>the fetching of bytes from the connection's input stream may be asynchronous</li> * <li>the character set may be changed, which resets the input stream</li> ! * */ ! public class Source extends Reader --- 27,50 ---- package org.htmlparser.lexer; import java.io.IOException; import java.io.Reader; import java.io.Serializable; ! ! import org.htmlparser.util.ParserException; /** * A buffered source of characters. ! * A Source is very similar to a Reader, like: * <pre> ! * new InputStreamReader (connection.getInputStream (), charset) * </pre> ! * It differs from the above, in three ways: ! * <ul> ! * <li>the fetching of bytes may be asynchronous</li> * <li>the character set may be changed, which resets the input stream</li> ! * <li>characters may be requested more than once, so in general they will be buffered</li> ! * </ul> */ ! public abstract class Source extends Reader *************** *** 55,258 **** { /** ! * An initial buffer size. ! */ ! public static int BUFFER_SIZE = 16384; ! ! /** ! * Return value when no more characters are left. */ public static final int EOF = -1; /** - * The stream of bytes. - */ - protected transient InputStream mStream; - - /** - * The character set in use. - */ - protected String mEncoding; - - /** - * The converter from bytes to characters. - */ - protected transient InputStreamReader mReader; - - /** - * The characters read so far. - */ - public /*volatile*/ char[] mBuffer; - - /** - * The number of valid bytes in the buffer. - */ - public /*volatile*/ int mLevel; - - /** - * The offset of the next byte returned by read(). - */ - public /*volatile*/ int mOffset; - - /** - * The bookmark. - */ - protected int mMark; - - /** - * Create a source of characters using the default character set. - * @param stream The stream of bytes to use. - */ - public Source (InputStream stream) - throws - UnsupportedEncodingException - { - this (stream, null, BUFFER_SIZE); - } - - /** - * Create a source of characters. - * @param stream The stream of bytes to use. - * @param charset The character set used in encoding the stream. - */ - public Source (InputStream stream, String charset) - throws - UnsupportedEncodingException - { - this (stream, charset, BUFFER_SIZE); - } - /** - * Create a source of characters. - * @param stream The stream of bytes to use. - * @param charset The character set used in encoding the stream. - */ - public Source (InputStream stream, String charset, int buffer_size) - throws - UnsupportedEncodingException - { - if (null == stream) - stream = new Stream (null); - mStream = stream; - if (null == charset) - { - mReader = new InputStreamReader (stream); - mEncoding = mReader.getEncoding (); - } - else - { - mEncoding = charset; - mReader = new InputStreamReader (stream, charset); - } - mBuffer = new char[buffer_size]; - mLevel = 0; - mOffset = 0; - mMark = -1; - } - - // - // Serialization support - // - - private void writeObject (ObjectOutputStream out) - throws - IOException - { - int offset; - char[] buffer; - - if (null != mStream) - { - // remember the offset, drain the input stream, restore the offset - offset = mOffset; - buffer = new char[4096]; - while (-1 != read (buffer)) - ; - mOffset = offset; - } - - out.defaultWriteObject (); - } - - private void readObject (ObjectInputStream in) - throws - IOException, - ClassNotFoundException - { - in.defaultReadObject (); - if (null != mBuffer) // buffer is null when destroy's been called - // pretend we're open, mStream goes null when exhausted - mStream = new ByteArrayInputStream (new byte[0]); - } - - /** - * Get the input stream being used. - * @return The current input stream. - */ - public InputStream getStream () - { - return (mStream); - } - - /** * Get the encoding being used to convert characters. * @return The current encoding. */ ! public String getEncoding () ! { ! return (mEncoding); ! } /** ! * Fetch more characters from the underlying reader. ! * Has no effect if the underlying reader has been drained. ! * @param min The minimum to read. ! * @exception IOException If the underlying reader read() throws one. */ ! protected void fill (int min) throws ! IOException ! { ! char[] buffer; ! int size; ! int read; ! ! if (null != mReader) // mReader goes null when it's been sucked dry ! { ! size = mBuffer.length - mLevel; // available space ! if (size < min) // oops, better get some buffer space ! { ! // unknown length... keep doubling ! size = mBuffer.length * 2; ! read = mLevel + min; ! if (size < read) // or satisfy min, whichever is greater ! size = read; ! else ! min = size - mLevel; // read the max ! buffer = new char[size]; ! } ! else ! { ! buffer = mBuffer; ! min = size; ! } ! ! // read into the end of the 'new' buffer ! read = mReader.read (buffer, mLevel, min); ! if (-1 == read) ! { ! mReader.close (); ! mReader = null; ! } ! else ! { ! if (mBuffer != buffer) ! { // copy the bytes previously read ! System.arraycopy (mBuffer, 0, buffer, 0, mLevel); ! mBuffer = buffer; ! } ! mLevel += read; ! } ! // todo, should repeat on read shorter than original min ! } ! } // --- 53,84 ---- { /** ! * Return value when the source is exhausted. ! * Has a value of {@value}. */ public static final int EOF = -1; /** * Get the encoding being used to convert characters. * @return The current encoding. */ ! public abstract String getEncoding (); /** ! * Set the encoding to the given character set. ! * If the current encoding is the same as the requested encoding, ! * this method is a no-op. Otherwise any subsequent characters read from ! * this source will have been decoded using the given character set.<p> ! * If characters have already been consumed from this source, it is expected ! * that an exception will be thrown if the characters read so far would ! * be different if the encoding being set was used from the start. ! * @param character_set The character set to use to convert characters. ! * @exception ParserException If a character mismatch occurs between ! * characters already provided and those that would have been returned ! * had the new character set been in effect from the beginning. An ! * exception is also thrown if the character set is not recognized. */ ! public abstract void setEncoding (String character_set) throws ! ParserException; // *************** *** 262,350 **** /** * Does nothing. ! * It's supposed to close the stream, but use destroy() instead. * @see #destroy */ ! public void close () throws IOException ! { ! } /** * Read a single character. * This method will block until a character is available, ! * an I/O error occurs, or the end of the stream is reached. * @return The character read, as an integer in the range 0 to 65535 ! * (<tt>0x00-0xffff</tt>), or -1 if the end of the stream has ! * been reached * @exception IOException If an I/O error occurs. */ ! public int read () throws IOException ! { ! int ret; ! ! if (mLevel - mOffset < 1) ! { ! if (null == mStream) // mStream goes null on close() ! throw new IOException ("reader is closed"); ! fill (1); ! if (mOffset >= mLevel) ! ret = EOF; ! else ! ret = mBuffer[mOffset++]; ! } ! else ! ret = mBuffer[mOffset++]; ! ! return (ret); ! } /** * Read characters into a portion of an array. This method will block ! * until some input is available, an I/O error occurs, or the end of the ! * stream is reached. * @param cbuf Destination buffer * @param off Offset at which to start storing characters * @param len Maximum number of characters to read ! * @return The number of characters read, or -1 if the end of the ! * stream has been reached * @exception IOException If an I/O error occurs. */ ! public int read (char[] cbuf, int off, int len) throws IOException ! { ! int ret; ! ! if (null == mStream) // mStream goes null on close() ! throw new IOException ("reader is closed"); ! if ((null == cbuf) || (0 > off) || (0 > len)) ! throw new IOException ("illegal argument read (" ! + ((null == cbuf) ? "null" : "cbuf") ! + ", " + off + ", " + len + ")"); ! if (mLevel - mOffset < len) ! fill (len - (mLevel - mOffset)); // minimum to satisfy this request ! if (mOffset >= mLevel) ! ret = EOF; ! else ! { ! ret = Math.min (mLevel - mOffset, len); ! System.arraycopy (mBuffer, mOffset, cbuf, off, ret); ! mOffset += ret; ! } ! ! return (ret); ! } /** * Read characters into an array. * This method will block until some input is available, an I/O error occurs, ! * or the end of the stream is reached. * @param cbuf Destination buffer. ! * @return The number of characters read, or -1 if the end of the stream has ! * been reached. * @exception IOException If an I/O error occurs. */ ! public int read (char[] cbuf) throws IOException ! { ! return (read (cbuf, 0, cbuf.length)); ! } /** --- 88,138 ---- /** * Does nothing. ! * It's supposed to close the source, but use {@link #destroy} instead. * @see #destroy */ ! public abstract void close () throws IOException; /** * Read a single character. * This method will block until a character is available, ! * an I/O error occurs, or the source is exhausted. * @return The character read, as an integer in the range 0 to 65535 ! * (<tt>0x00-0xffff</tt>), or {@link #EOF} if the source is exhausted. * @exception IOException If an I/O error occurs. */ ! public abstract int read () throws IOException; /** * Read characters into a portion of an array. This method will block ! * until some input is available, an I/O error occurs, or the source is ! * exhausted. * @param cbuf Destination buffer * @param off Offset at which to start storing characters * @param len Maximum number of characters to read ! * @return The number of characters read, or {@link #EOF} if the esource is ! * exhausted. * @exception IOException If an I/O error occurs. */ ! public abstract int read (char[] cbuf, int off, int len) throws IOException; /** * Read characters into an array. * This method will block until some input is available, an I/O error occurs, ! * or the source is exhausted. * @param cbuf Destination buffer. ! * @return The number of characters read, or {@link #EOF} if the esource is ! * exhausted. * @exception IOException If an I/O error occurs. */ + public abstract int read (char[] cbuf) throws IOException; ! /** ! * Tell whether this source is ready to be read. ! * @return <code>true</code> if the next read() is guaranteed not to block ! * for input, <code>false</code> otherwise. ! * Note that returning false does not guarantee that the next read will block. ! * @exception IOException If an I/O error occurs. ! */ ! public abstract boolean ready () throws IOException; /** *************** *** 353,408 **** * @exception IllegalStateException If the source has been closed. */ ! public void reset () ! { ! if (null == mStream) // mStream goes null on close() ! throw new IllegalStateException ("source is closed"); ! if (-1 != mMark) ! mOffset = mMark; ! else ! mOffset = 0; ! } ! ! /** ! * Tell whether this stream supports the mark() operation. ! * @return <code>true</code> if and only if this stream supports the mark operation. ! */ ! public boolean markSupported () ! { ! return (true); ! } /** ! * Mark the present position in the stream. Subsequent calls to reset() ! * will attempt to reposition the stream to this point. Not all ! * character-input streams support the mark() operation. ! * @param readAheadLimit <em>Not used.</em> ! * @exception IOException <em>Never thrown</em>. ! * */ ! public void mark (int readAheadLimit) throws IOException ! { ! if (null == mStream) // mStream goes null on close() ! throw new IOException ("reader is closed"); ! mMark = mOffset; ! } /** ! * Tell whether this stream is ready to be read. ! * @return <code>true</code> if the next read() is guaranteed not to block ! * for input, <code>false</code> otherwise. ! * Note that returning false does not guarantee that the next read will block. ! * @exception IOException <em>Never thrown</em>. */ ! public boolean ready () throws IOException ! { ! if (null == mStream) // mStream goes null on close() ! throw new IOException ("reader is closed"); ! return (mOffset < mLevel); ! } /** * Skip characters. * This method will block until some characters are available, ! * an I/O error occurs, or the end of the stream is reached. * <em>Note: n is treated as an int</em> * @param n The number of characters to skip. --- 141,168 ---- * @exception IllegalStateException If the source has been closed. */ ! public abstract void reset (); /** ! * Tell whether this source supports the mark() operation. ! * @return <code>true</code> if and only if this source supports the mark ! * operation. */ ! public abstract boolean markSupported (); /** ! * Mark the present position. ! * Subsequent calls to {@link #reset} ! * will attempt to reposition the source to this point. Not all ! * sources support the mark() operation. ! * @param readAheadLimit The minimum number of characters that can be read ! * before this mark becomes invalid. ! * @exception IOException If an I/O error occurs. */ ! public abstract void mark (int readAheadLimit) throws IOException; /** * Skip characters. * This method will block until some characters are available, ! * an I/O error occurs, or the source is exhausted. * <em>Note: n is treated as an int</em> * @param n The number of characters to skip. *************** *** 411,432 **** * @exception IOException If an I/O error occurs. */ ! public long skip (long n) throws IOException ! { ! long ret; ! ! if (null == mStream) // mStream goes null on close() ! throw new IOException ("reader is closed"); ! if (mLevel - mOffset < n) ! fill ((int)(n - (mLevel - mOffset))); // minimum to satisfy this request ! if (mOffset >= mLevel) ! ret = EOF; ! else ! { ! ret = Math.min (mLevel - mOffset, n); ! mOffset += ret; ! } ! ! return (ret); ! } // --- 171,175 ---- * @exception IOException If an I/O error occurs. */ ! public abstract long skip (long n) throws IOException; // *************** *** 436,475 **** /** * Undo the read of a single character. ! * @exception IOException If no characters have been read. */ ! public void unread () throws IOException ! { ! if (0 < mOffset) ! mOffset--; ! else ! throw new IOException ("can't unread no characters"); ! } /** ! * Close the stream. Once a stream has been closed, further read(), ! * ready(), mark(), or reset() invocations will throw an IOException. ! * Closing a previously-closed stream, however, has no effect. ! * @exception IOException If an I/O error occurs */ ! public void destroy () throws IOException ! { ! mStream = null; ! if (null != mReader) ! mReader.close (); ! mReader = null; ! mBuffer = null; ! mLevel = 0; ! mOffset = 0; ! mMark = -1; ! } /** * Get the position (in characters). ! * @return The number of characters that have been read. */ ! public int offset () ! { ! return (mOffset); ! } /** --- 179,247 ---- /** * Undo the read of a single character. ! * @exception IOException If the source is closed or no characters have ! * been read. */ ! public abstract void unread () throws IOException; /** ! * Retrieve a character again. ! * @param offset The offset of the character. ! * @return The character at <code>offset</code>. ! * @exception IOException If the source is closed or the offset is beyond ! * {@link #offset()}. */ ! public abstract char getCharacter (int offset) throws IOException; ! ! /** ! * Retrieve characters again. ! * @param array The array of characters. ! * @param offset The starting position in the array where characters are to be placed. ! * @param start The starting position, zero based. ! * @param end The ending position ! * (exclusive, i.e. the character at the ending position is not included), ! * zero based. ! * @exception IOException If the source is closed or the start or end is ! * beyond {@link #offset()}. ! */ ! public abstract void getCharacters (char[] array, int offset, int start, int end) throws IOException; ! ! /** ! * Retrieve a string comprised of characters already read. ! * @param offset The offset of the first character. ! * @param length The number of characters to retrieve. ! * @return A string containing the <code>length</code> characters at <code>offset</code>. ! * @exception IOException If the source is closed. ! */ ! public abstract String getString (int offset, int length) throws IOException; ! ! /** ! * Append characters already read into a <code>StringBuffer</code>. ! * @param buffer The buffer to append to. ! * @param offset The offset of the first character. ! * @param length The number of characters to retrieve. ! * @return A string containing the <code>length</code> characters at <code>offset</code>. ! * @exception IOException If the source is closed or the offset or ! * (offset + length) is beyond {@link #offset()}. ! */ ! public abstract void getCharacters (StringBuffer buffer, int offset, int length) throws IOException; ! ! /** ! * Close the source. ! * Once a source has been closed, further {@link #read() read}, ! * {@link #ready ready}, {@link #mark mark}, {@link #reset reset}, ! * {@link #skip skip}, {@link #unread unread}, ! * {@link #getCharacter getCharacter} or {@link #getString getString} ! * invocations will throw an IOException. ! * Closing a previously-closed source, however, has no effect. ! * @exception IOException If an I/O error occurs. ! */ ! public abstract void destroy () throws IOException; /** * Get the position (in characters). ! * @return The number of characters that have already been read, or ! * {@link #EOF} if the source is closed. */ ! public abstract int offset (); /** *************** *** 477,483 **** * @return The number of characters that can be read without blocking. */ ! public int available () ! { ! return (mLevel - mOffset); ! } } --- 249,252 ---- * @return The number of characters that can be read without blocking. */ ! public abstract int available (); } --- NEW FILE: InputStreamSource.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/InputStreamSource.java,v $ // $Author: derrickoswald $ // $Date: 2004/07/03 13:56:08 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.lexer; import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; import java.io.Reader; import java.io.Serializable; import java.io.UnsupportedEncodingException; import org.htmlparser.util.EncodingChangeException; import org.htmlparser.util.ParserException; /** * A source of characters based on an InputStream such as from a URLConnection. */ public class InputStreamSource extends Source { /** * An initial buffer size. * Has a default value of 16384. */ public static int BUFFER_SIZE = 16384; /** * The stream of bytes. * Set to <code>null</code> when the source is closed. */ protected transient InputStream mStream; /** * The character set in use. */ protected String mEncoding; /** * The converter from bytes to characters. */ protected transient InputStreamReader mReader; /** * The characters read so far. */ public /*volatile*/ char[] mBuffer; /** * The number of valid bytes in the buffer. */ public /*volatile*/ int mLevel; /** * The offset of the next byte returned by read(). */ public /*volatile*/ int mOffset; /** * The bookmark. */ protected int mMark; /** * Create a source of characters using the default character set. * @param stream The stream of bytes to use. * @exception UnsupportedEncodingException If the default character set is unsupported. */ public InputStreamSource (InputStream stream) throws UnsupportedEncodingException { this (stream, null, BUFFER_SIZE); } /** * Create a source of characters. * @param stream The stream of bytes to use. * @param charset The character set used in encoding the stream. * @exception UnsupportedEncodingException If the character set is unsupported. */ public InputStreamSource (InputStream stream, String charset) throws UnsupportedEncodingException { this (stream, charset, BUFFER_SIZE); } /** * Create a source of characters. * @param stream The stream of bytes to use. * @param charset The character set used in encoding the stream. * @param buffer_size The initial character buffer size. * @exception UnsupportedEncodingException If the character set is unsupported. */ public InputStreamSource (InputStream stream, String charset, int buffer_size) throws UnsupportedEncodingException { if (null == stream) stream = new Stream (null); mStream = stream; if (null == charset) { mReader = new InputStreamReader (stream); mEncoding = mReader.getEncoding (); } else { mEncoding = charset; mReader = new InputStreamReader (stream, charset); } mBuffer = new char[buffer_size]; mLevel = 0; mOffset = 0; mMark = -1; } // // Serialization support // private void writeObject (ObjectOutputStream out) throws IOException { int offset; char[] buffer; if (null != mStream) { // remember the offset, drain the input stream, restore the offset offset = mOffset; buffer = new char[4096]; while (EOF != read (buffer)) ; mOffset = offset; } out.defaultWriteObject (); } private void readObject (ObjectInputStream in) throws IOException, ClassNotFoundException { in.defaultReadObject (); if (null != mBuffer) // buffer is null when destroy's been called // pretend we're open, mStream goes null when exhausted mStream = new ByteArrayInputStream (new byte[0]); } /** * Get the input stream being used. * @return The current input stream. */ public InputStream getStream () { return (mStream); } /** * Get the encoding being used to convert characters. * @return The current encoding. */ public String getEncoding () { return (mEncoding); } /** * Begins reading from the source with the given character set. * If the current encoding is the same as the requested encoding, * this method is a no-op. Otherwise any subsequent characters read from * this page will have been decoded using the given character set.<p> * Some magic happens here to obtain this result if characters have already * been consumed from this source. * Since a Reader cannot be dynamically altered to use a different character * set, the underlying stream is reset, a new Source is constructed * and a comparison made of the characters read so far with the newly * read characters up to the current position. * If a difference is encountered, or some other problem occurs, * an exception is thrown. * @param character_set The character set to use to convert bytes into * characters. * @exception ParserException If a character mismatch occurs between * characters already provided and those that would have been returned * had the new character set been in effect from the beginning. An * exception is also thrown if the underlying stream won't put up with * these shenanigans. */ public void setEncoding (String character_set) throws ParserException { String encoding; InputStream stream; char[] buffer; int offset; char[] new_chars; encoding = getEncoding (); if (!encoding.equalsIgnoreCase (character_set)) { stream = getStream (); try { buffer = mBuffer; offset = mOffset; stream.reset (); mEncoding = character_set; mReader = new InputStreamReader (stream, character_set); mBuffer = new char[mBuffer.length]; mLevel = 0; mOffset = 0; mMark = -1; if (0 != offset) { new_chars = new char[offset]; if (offset != read (new_chars)) throw new ParserException ("reset stream failed"); for (int i = 0; i < offset; i++) if (new_chars[i] != buffer[i]) throw new EncodingChangeException ("character mismatch (new: " + new_chars[i] + " != old: " + buffer[i] + ") for encoding change from " + encoding + " to " + character_set + " at character offset " + offset); } } catch (IOException ioe) { throw new ParserException (ioe.getMessage (), ioe); } } } /** * Fetch more characters from the underlying reader. * Has no effect if the underlying reader has been drained. * @param min The minimum to read. * @exception IOException If the underlying reader read() throws one. */ protected void fill (int min) throws IOException { char[] buffer; int size; int read; if (null != mReader) // mReader goes null when it's been sucked dry { size = mBuffer.length - mLevel; // available space if (size < min) // oops, better get some buffer space { // unknown length... keep doubling size = mBuffer.length * 2; read = mLevel + min; if (size < read) // or satisfy min, whichever is greater size = read; else min = size - mLevel; // read the max buffer = new char[size]; } else { buffer = mBuffer; min = size; } // read into the end of the 'new' buffer read = mReader.read (buffer, mLevel, min); if (EOF == read) { mReader.close (); mReader = null; } else { if (mBuffer != buffer) { // copy the bytes previously read System.arraycopy (mBuffer, 0, buffer, 0, mLevel); mBuffer = buffer; } mLevel += read; } // todo, should repeat on read shorter than original min } } // // Reader overrides // /** * Does nothing. * It's supposed to close the source, but use destroy() instead. * @see #destroy */ public void close () throws IOException { } /** * Read a single character. * This method will block until a character is available, * an I/O error occurs, or the end of the stream is reached. * @return The character read, as an integer in the range 0 to 65535 * (<tt>0x00-0xffff</tt>), or {@link #EOF EOF} if the end of the stream has * been reached * @exception IOException If an I/O error occurs. */ public int read () throws IOException { int ret; if (mLevel - mOffset < 1) { if (null == mStream) throw new IOException ("source is closed"); fill (1); if (mOffset >= mLevel) ret = EOF; else ret = mBuffer[mOffset++]; } else ret = mBuffer[mOffset++]; return (ret); } /** * Read characters into a portion of an array. This method will block * until some input is available, an I/O error occurs, or the end of the * stream is reached. * @param cbuf Destination buffer * @param off Offset at which to start storing characters * @param len Maximum number of characters to read * @return The number of characters read, or {@link #EOF EOF} if the end of * the stream has been reached * @exception IOException If an I/O error occurs. */ public int read (char[] cbuf, int off, int len) throws IOException { int ret; if (null == mStream) throw new IOException ("source is closed"); if ((null == cbuf) || (0 > off) || (0 > len)) throw new IOException ("illegal argument read (" + ((null == cbuf) ? "null" : "cbuf") + ", " + off + ", " + len + ")"); if (mLevel - mOffset < len) fill (len - (mLevel - mOffset)); // minimum to satisfy this request if (mOffset >= mLevel) ret = EOF; else { ret = Math.min (mLevel - mOffset, len); System.arraycopy (mBuffer, mOffset, cbuf, off, ret); mOffset += ret; } return (ret); } /** * Read characters into an array. * This method will block until some input is available, an I/O error occurs, * or the end of the stream is reached. * @param cbuf Destination buffer. * @return The number of characters read, or {@link #EOF EOF} if the end of * the stream has been reached. * @exception IOException If an I/O error occurs. */ public int read (char[] cbuf) throws IOException { return (read (cbuf, 0, cbuf.length)); } /** * Reset the source. * Repositions the read point to begin at zero. * @exception IllegalStateException If the source has been closed. */ public void reset () { if (null == mStream) throw new IllegalStateException ("source is closed"); if (-1 != mMark) mOffset = mMark; else mOffset = 0; } /** * Tell whether this source supports the mark() operation. * @return <code>true</code>. */ public boolean markSupported () { return (true); } /** * Mark the present position in the source. * Subsequent calls to {@link #reset()} * will attempt to reposition the source to this point. * @param readAheadLimit <em>Not used.</em> * @exception IOException If the source is closed. * */ public void mark (int readAheadLimit) throws IOException { if (null == mStream) throw new IOException ("source is closed"); mMark = mOffset; } /** * Tell whether this source is ready to be read. * @return <code>true</code> if the next read() is guaranteed not to block * for input, <code>false</code> otherwise. * Note that returning false does not guarantee that the next read will block. * @exception IOException If the source is closed. */ public boolean ready () throws IOException { if (null == mStream) throw new IOException ("source is closed"); return (mOffset < mLevel); } /** * Skip characters. * This method will block until some characters are available, * an I/O error occurs, or the end of the stream is reached. * <em>Note: n is treated as an int</em> * @param n The number of characters to skip. * @return The number of characters actually skipped * @exception IllegalArgumentException If <code>n</code> is negative. * @exception IOException If an I/O error occurs. */ public long skip (long n) throws IOException { long ret; if (null == mStream) throw new IOException ("source is closed"); if (mLevel - mOffset < n) fill ((int)(n - (mLevel - mOffset))); // minimum to satisfy this request if (mOffset >= mLevel) ret = EOF; else { ret = Math.min (mLevel - mOffset, n); mOffset += ret; } return (ret); } // // Methods not in your Daddy's Reader // /** * Undo the read of a single character. * @exception IOException If the source is closed or no characters have * been read. */ public void unread () throws IOException { if (null == mStream) throw new IOException ("source is closed"); if (0 < mOffset) mOffset--; else throw new IOException ("can't unread no characters"); } /** * Retrieve a character again. * @param offset The offset of the character. * @return The character at <code>offset</code>. * @exception IOException If the offset is beyond {@link #offset()} or the * source is closed. */ public char getCharacter (int offset) throws IOException { char ret; if (null == mStream) throw new IOException ("source is closed"); if (offset >= mBuffer.length) throw new IOException ("illegal read ahead"); else ret = mBuffer[offset]; return (ret); } /** * Retrieve characters again. * @param array The array of characters. * @param offset The starting position in the array where characters are to be placed. * @param start The starting position, zero based. * @param end The ending position * (exclusive, i.e. the character at the ending position is not included), * zero based. * @exception IOException If the start or end is beyond {@link #offset()} * or the source is closed. */ public void getCharacters (char[] array, int offset, int start, int end) throws IOException { if (null == mStream) throw new IOException ("source is closed"); System.arraycopy (mBuffer, start, array, offset, end - start); } /** * Retrieve a string. * @param offset The offset of the first character. * @param length The number of characters to retrieve. * @return A string containing the <code>length</code> characters at <code>offset</code>. * @exception IOException If the offset or (offset + length) is beyond * {@link #offset()} or the source is closed. */ public String getString (int offset, int length) throws IOException { String ret; if (null == mStream) throw new IOException ("source is closed"); if (offset + length >= mBuffer.length) throw new IOException ("illegal read ahead"); else ret = new String (mBuffer, offset, length); return (ret); } /** * Append characters already read into a <code>StringBuffer</code>. * @param buffer The buffer to append to. * @param offset The offset of the first character. * @param length The number of characters to retrieve. * @return A string containing the <code>length</code> characters at <code>offset</code>. * @exception IOException If the offset or (offset + length) is beyond * {@link #offset()} or the source is closed. */ public void getCharacters (StringBuffer buffer, int offset, int length) throws IOException { if (null == mStream) throw new IOException ("source is closed"); buffer.append (mBuffer, offset, length); } /** * Close the source. * Once a source has been closed, further {@link #read() read}, * {@link #ready ready}, {@link #mark mark}, {@link #reset reset}, * {@link #skip skip}, {@link #unread unread}, * {@link #getCharacter getCharacter} or {@link #getString getString} * invocations will throw an IOException. * Closing a previously-closed source, however, has no effect. * @exception IOException If an I/O error occurs */ public void destroy () throws IOException { mStream = null; if (null != mReader) mReader.close (); mReader = null; mBuffer = null; mLevel = 0; mOffset = 0; mMark = -1; } /** * Get the position (in characters). * @return The number of characters that have already been read, or * {@link #EOF EOF} if the source is closed. */ public int offset () { int ret; if (null == mStream) ret = EOF; else ret = mOffset; return (ret); } /** * Get the number of available characters. * @return The number of characters that can be read without blocking or * zero if the source is closed. */ public int available () { int ret; if (null == mStream) ret = 0; else ret = mLevel - mOffset; return (ret); } } --- NEW FILE: StringSource.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/StringSource.java,v $ // $Author: derrickoswald $ // $Date: 2004/07/03 13:56:08 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.lexer; import java.io.IOException; import org.htmlparser.util.ParserException; /** * A source of characters based on a String. */ public class StringSource extends Source { /** * The source of characters. */ protected String mString; /** * The current offset into the string. */ protected int mOffset; /** * The encoding to report. * Only used by {@link #getEncoding}. */ protected String mEncoding; /** * The bookmark. */ protected int mMark; /** * Construct a source using the provided string. * Until it is set, the encoding will be reported as ISO-8859-1. * @param string The source of characters. */ public StringSource (String string) { this (string, "ISO-8859-1"); } /** * Construct a source using the provided string and encoding. * The encoding is only used by {@link #getEncoding}. * @param string The source of characters. * @param character_set The encoding to report. */ public StringSource (String string, String character_set) { mString = (null == string) ? "" : string; mOffset = 0; mEncoding = character_set; mMark = -1; } /** * Get the encoding being used to convert characters. * @return The current encoding. */ public String getEncoding () { return (mEncoding); } /** * Set the encoding to the given character set. * This simply sets the encoding reported by {@link #getEncoding}. * @param character_set The character set to use to convert characters. * @exception ParserException <em>Not thrown</em>. */ public void setEncoding (String character_set) throws ParserException { mEncoding = character_set; } // // Reader overrides // /** * Does nothing. * It's supposed to close the source, but use destroy() instead. * @see #destroy */ public void close () throws IOException { } /** * Read a single character. * @return The character read, as an integer in the range 0 to 65535 * (<tt>0x00-0xffff</tt>), or {@link #EOF EOF} if the source is exhausted. * @exception IOException If an I/O error occurs. */ public int read () throws IOException { int ret; if (null == mString) throw new IOException ("source is closed"); else if (mOffset >= mString.length ()) ret = EOF; else { ret = mString.charAt (mOffset); mOffset++; } return (ret); } /** * Read characters into a portion of an array. * @param cbuf Destination buffer * @param off Offset at which to start storing characters * @param len Maximum number of characters to read * @return The number of characters read, or {@link #EOF EOF} if the source * is exhausted. * @exception IOException If an I/O error occurs. */ public int read (char[] cbuf, int off, int len) throws IOException { int length; int ret; if (null == mString) throw new IOException ("source is closed"); else { length = mString.length (); if (mOffset >= length) ret = EOF; else { if (len > length - mOffset) len = length - mOffset; mString.getChars (mOffset, mOffset + len, cbuf, off); mOffset += len; ret = len; } } return (ret); } /** * Read characters into an array. * @param cbuf Destination buffer. * @return The number of characters read, or {@link #EOF EOF} if the source * is exhausted. * @exception IOException If an I/O error occurs. */ public int read (char[] cbuf) throws IOException { return (read (cbuf, 0, cbuf.length)); } /** * Tell whether this source is ready to be read. * @return Equivalent to a non-zero {@link #available()}, i.e. there are * still more characters to read. * @exception IOException Thrown if the source is closed. */ public boolean ready () throws IOException { if (null == mString) throw new IOException ("source is closed"); return (mOffset < mString.length ()); } /** * Reset the source. * Repositions the read point to begin at zero. * @exception IllegalStateException If the source has been closed. */ public void reset () { if (null == mString) throw new IllegalStateException ("source is closed"); else if (-1 != mMark) mOffset = mMark; else mOffset = 0; } /** * Tell whether this source supports the mark() operation. * @return <code>true</code>. */ public boolean markSupported () { return (true); } /** * Mark the present position in the source. * Subsequent calls to {@link #reset()} * will attempt to reposition the source to this point. * @param readAheadLimit <em>Not used.</em> * @exception IOException Thrown if the source is closed. * */ public void mark (int readAheadLimit) throws IOException { if (null == mString) throw new IOException ("source is closed"); mMark = mOffset; } /** * Skip characters. * <em>Note: n is treated as an int</em> * @param n The number of characters to skip. * @return The number of characters actually skipped * @exception IllegalArgumentException If <code>n</code> is negative. * @exception IOException If the source is closed. */ public long skip (long n) throws IOException { int length; long ret; if (null == mString) throw new IOException ("source is closed"); if (n < 0) throw new IllegalArgumentException ("cannot skip backwards"); else { length = mString.length (); if (mOffset >= length) n = 0L; else if (n > length - mOffset) n = length - mOffset; mOffset += n; ret = n; } return (ret); } // // Methods not in your Daddy's Reader // /** * Undo the read of a single character. * @exception IOException If no characters have been read or the source is closed. */ public void unread () throws IOException { if (null == mString) throw new IOException ("source is closed"); else if (mOffset <= 0) throw new IOException ("can't unread no characters"); else mOffset--; } /** * Retrieve a character again. * @param offset The offset of the character. * @return The character at <code>offset</code>. * @exception IOException If the source is closed or an attempt is made to * read beyond {@link #offset()}. */ public char getCharacter (int offset) throws IOException { char ret; if (null == mString) throw new IOException ("source is closed"); else if (offset >= mOffset) throw new IOException ("read beyond current offset"); else ret = mString.charAt (offset); return (ret); } /** * Retrieve characters again. * @param array The array of characters. * @param offset The starting position in the array where characters are to be placed. * @param start The starting position, zero based. * @param end The ending position * (exclusive, i.e. the character at the ending position is not included), * zero based. * @exception IOException If the source is closed or an attempt is made to * read beyond {@link #offset()}. */ public void getCharacters (char[] array, int offset, int start, int end) throws IOException { if (null == mString) throw new IOException ("source is closed"); else { if (end > mOffset) throw new IOException ("read beyond current offset"); else mString.getChars (start, end, array, offset); } } /** * Retrieve a string comprised of characters already read. * Asking for characters ahead of {@link #offset()} will throw an exception. * @param offset The offset of the first character. * @param length The number of characters to retrieve. * @return A string containing the <code>length</code> characters at <code>offset</code>. * @exception IOException If the source is closed or an attempt is made to * read beyond {@link #offset()}. */ public String getString (int offset, int length) throws IOException { String ret; if (null == mString) throw new IOException ("source is closed"); else { if (offset + length > mOffset) throw new IOException ("read beyond end of string"); else ret = mString.substring (offset, offset + length); } return (ret); } /** * Append characters already read into a <code>StringBuffer</code>. * Asking for characters ahead of {@link #offset()} will throw an exception. * @param buffer The buffer to append to. * @param offset The offset of the first character. * @param length The number of characters to retrieve. * @return A string containing the <code>length</code> characters at <code>offset</code>. * @exception IOException If the source is closed or an attempt is made to * read beyond {@link #offset()}. */ public void getCharacters (StringBuffer buffer, int offset, int length) throws IOException { if (null == mString) throw new IOException ("source is closed"); else { if (offset + length > mOffset) throw new IOException ("read beyond end of string"); else buffer.append (mString.substring (offset, offset + length)); } } /** * Close the source. * Once a source has been closed, further {@link #read() read}, * {@link #ready ready}, {@link #mark mark}, {@link #reset reset}, * {@link #skip skip}, {@link #unread unread}, * {@link #getCharacter getCharacter} or {@link #getString getString} * invocations will throw an IOException. * Closing a previously-closed source, however, has no effect. * @exception IOException <em>Not thrown</em> */ public void destroy () throws IOException { mString = null; } /** * Get the position (in characters). * @return The number of characters that have already been read, or * {@link #EOF EOF} if the source is closed. */ public int offset () { int ret; if (null == mString) ret = EOF; else ret = mOffset; return (ret); } /** * Get the number of available characters. * @return The number of characters that can be read or zero if the source * is closed. */ public int available () { int ret; if (null == mString) ret = 0; else ret = mString.length () - mOffset; return (ret); } } Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** Page.java 8 Jun 2004 10:20:18 -0000 1.37 --- Page.java 3 Jul 2004 13:56:08 -0000 1.38 *************** *** 27,31 **** package org.htmlparser.lexer; - import java.io.ByteArrayInputStream; import java.io.InputStream; import java.io.IOException; --- 27,30 ---- *************** *** 43,47 **** import java.util.zip.InflaterInputStream; - import org.htmlparser.util.EncodingChangeException; import org.htmlparser.util.ParserException; --- 42,45 ---- *************** *** 57,61 **** /** * The default charset. ! * This should be <code>ISO-8859-1</code>, * see RFC 2616 (http://www.ietf.org/rfc/rfc2616.txt?number=2616) section 3.7.1 * Another alias is "8859_1". --- 55,59 ---- /** * The default charset. ! * This should be <code>{@value}</code>, * see RFC 2616 (http://www.ietf.org/rfc/rfc2616.txt?number=2616) section 3.7.1 * Another alias is "8859_1". *************** *** 65,69 **** /** * The default content type. ! * In the absence of alternate information, assume html content. */ public static final String DEFAULT_CONTENT_TYPE = "text/html"; --- 63,67 ---- /** * The default content type. ! * In the absence of alternate information, assume html content ({@value}). */ public static final String DEFAULT_CONTENT_TYPE = "text/html"; *************** *** 155,159 **** if (null == charset) charset = DEFAULT_CHARSET; ! mSource = new Source (stream, charset); mIndex = new PageIndex (this); mConnection = null; --- 153,157 ---- if (null == charset) charset = DEFAULT_CHARSET; ! mSource = new InputStreamSource (stream, charset); mIndex = new PageIndex (this); mConnection = null; *************** *** 162,166 **** } ! public Page (String text) { InputStream stream; --- 160,171 ---- } ! /** ! * Construct a page from the given string. ! * @param text The HTML text. ! * @param charset <em>Optional</em>. The character set encoding that will ! * be reported by {@link #getEncoding}. If charset is <code>null</code> ! * the default character set is used. ! */ ! public Page (String text, String charset) { InputStream stream; *************** *** 168,182 **** if (null == text) throw new IllegalArgumentException ("text cannot be null"); ! try ! { ! stream = new ByteArrayInputStream (text.getBytes (Page.DEFAULT_CHARSET)); ! mSource = new Source (stream, Page.DEFAULT_CHARSET, text.length () + 1); ! mIndex = new PageIndex (this); ! } ! catch (UnsupportedEncodingException uee) ! { ! // this is unlikely, so we cover it up with a runtime exception ! throw new IllegalStateException (uee.getMessage ()); ! } mConnection = null; mUrl = null; --- 173,180 ---- if (null == text) throw new IllegalArgumentException ("text cannot be null"); ! if (null == charset) ! charset = DEFAULT_CHARSET; ! mSource = new StringSource (text, charset); ! mIndex = new PageIndex (this); mConnection = null; mUrl = null; *************** *** 184,187 **** --- 182,196 ---- } + /** + * Construct a page from the given string. + * The page will report that it is using an encoding of + * {@link #DEFAULT_CHARSET}. + * @param text The HTML text. + */ + public Page (String text) + { + this (text, null); + } + // // Serialization support *************** *** 369,373 **** try { ! mSource = new Source (stream, charset); } catch (UnsupportedEncodingException uee) --- 378,382 ---- try { ! mSource = new InputStreamSource (stream, charset); } catch (UnsupportedEncodingException uee) *************** *** 383,387 **** ... [truncated message content] |
From: Derrick O. <der...@us...> - 2004-07-03 13:56:31
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14744/tests/lexerTests Modified Files: SourceTests.java Log Message: Further fix to bug #973137 Double-bytes characters are messed after parsing. Created a proper String based source with the encoding only optionally specified. A string is no longer converted to a byte array and then back to characters. Index: SourceTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/SourceTests.java,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** SourceTests.java 14 Jan 2004 02:53:47 -0000 1.16 --- SourceTests.java 3 Jul 2004 13:56:08 -0000 1.17 *************** *** 34,40 **** --- 34,42 ---- import java.net.URL; import java.net.URLConnection; + import org.htmlparser.lexer.InputStreamSource; import org.htmlparser.lexer.Stream; import org.htmlparser.lexer.Source; + import org.htmlparser.lexer.StringSource; import org.htmlparser.tests.ParserTestCase; *************** *** 65,95 **** * Test initialization with a null value. */ ! public void testNull () throws IOException { Source source; ! source = new Source (null); assertTrue ("erroneous character", -1 == source.read ()); } /** ! * Test initialization with a null charset name. */ ! public void testEmpty () throws IOException { Source source; ! source = new Source (new Stream (new ByteArrayInputStream (new byte[0])), null); assertTrue ("erroneous character", -1 == source.read ()); } /** ! * Test initialization with an input stream having only one byte. */ ! public void testOneByte () throws IOException { Source source; ! source = new Source (new Stream (new ByteArrayInputStream (new byte[] { (byte)0x42 })), null); assertTrue ("erroneous character", 'B' == source.read ()); assertTrue ("extra character", -1 == source.read ()); --- 67,97 ---- * Test initialization with a null value. */ ! public void testInputStreamSourceNull () throws IOException { Source source; ! source = new InputStreamSource (null); assertTrue ("erroneous character", -1 == source.read ()); } /** ! * Test initialization of a InputStreamSource with a zero length byte array. */ ! public void testInputStreamSourceEmpty () throws IOException { Source source; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (new byte[0])), null); assertTrue ("erroneous character", -1 == source.read ()); } /** ! * Test initialization of a InputStreamSource with an input stream having only one byte. */ ! public void testInputStreamSourceOneByte () throws IOException { Source source; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (new byte[] { (byte)0x42 })), null); assertTrue ("erroneous character", 'B' == source.read ()); assertTrue ("extra character", -1 == source.read ()); *************** *** 97,107 **** /** ! * Test close. */ ! public void testClose () throws IOException { Source source; ! source = new Source (new Stream (new ByteArrayInputStream ("hello word".getBytes ())), null); assertTrue ("no character", -1 != source.read ()); source.destroy (); --- 99,109 ---- /** ! * Test closing a InputStreamSource. */ ! public void testInputStreamSourceClose () throws IOException { Source source; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream ("hello word".getBytes ())), null); assertTrue ("no character", -1 != source.read ()); source.destroy (); *************** *** 118,124 **** /** ! * Test reset. */ ! public void testReset () throws IOException { String reference; --- 120,126 ---- /** ! * Test resetting a InputStreamSource. */ ! public void testInputStreamSourceReset () throws IOException { String reference; *************** *** 128,132 **** reference = "Now is the time for all good men to come to the aid of the party"; ! source = new Source (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new StringBuffer (reference.length ()); while (-1 != (c = source.read ())) --- 130,134 ---- reference = "Now is the time for all good men to come to the aid of the party"; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new StringBuffer (reference.length ()); while (-1 != (c = source.read ())) *************** *** 142,148 **** /** ! * Test reset in the middle of reading. */ ! public void testMidReset () throws IOException { String reference; --- 144,150 ---- /** ! * Test resetting a InputStreamSource in the middle of reading. */ ! public void testInputStreamSourceMidReset () throws IOException { String reference; *************** *** 152,156 **** reference = "Now is the time for all good men to come to the aid of the party"; ! source = new Source (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new StringBuffer (reference.length ()); for (int i = 0; i < 25; i++) --- 154,158 ---- reference = "Now is the time for all good men to come to the aid of the party"; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new StringBuffer (reference.length ()); for (int i = 0; i < 25; i++) *************** *** 166,172 **** /** ! * Test mark/reset in the middle of reading. */ ! public void testMarkReset () throws IOException { String reference; --- 168,174 ---- /** ! * Test mark/reset of a InputStreamSource in the middle of reading. */ ! public void testInputStreamSourceMarkReset () throws IOException { String reference; *************** *** 176,180 **** reference = "Now is the time for all good men to come to the aid of the party"; ! source = new Source (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); assertTrue ("not markable", source.markSupported ()); buffer = new StringBuffer (reference.length ()); --- 178,182 ---- reference = "Now is the time for all good men to come to the aid of the party"; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); assertTrue ("not markable", source.markSupported ()); buffer = new StringBuffer (reference.length ()); *************** *** 192,198 **** /** ! * Test skip. */ ! public void testSkip () throws IOException { String part1; --- 194,200 ---- /** ! * Test skipping a InputStreamSource. */ ! public void testInputStreamSourceSkip () throws IOException { String part1; *************** *** 208,212 **** part3 = "to come to the aid of the party"; reference = part1 + part2 + part3; ! source = new Source (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new StringBuffer (reference.length ()); for (int i = 0; i < part1.length (); i++) --- 210,214 ---- part3 = "to come to the aid of the party"; reference = part1 + part2 + part3; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new StringBuffer (reference.length ()); for (int i = 0; i < part1.length (); i++) *************** *** 220,226 **** /** ! * Test multi-byte read. */ ! public void testMultByte () throws IOException { String reference; --- 222,228 ---- /** ! * Test multi-byte read with a InputStreamSource. */ ! public void testInputStreamSourceMultByte () throws IOException { String reference; *************** *** 229,233 **** reference = "Now is the time for all good men to come to the aid of the party"; ! source = new Source (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new char[reference.length ()]; source.read (buffer, 0, buffer.length); --- 231,235 ---- reference = "Now is the time for all good men to come to the aid of the party"; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new char[reference.length ()]; source.read (buffer, 0, buffer.length); *************** *** 238,244 **** /** ! * Test positioned multi-byte read. */ ! public void testPositionedMultByte () throws IOException { String part1; --- 240,246 ---- /** ! * Test positioned multi-byte read with a InputStreamSource. */ ! public void testInputStreamSourcePositionedMultByte () throws IOException { String part1; *************** *** 255,259 **** part3 = "to come to the aid of the party"; reference = part1 + part2 + part3; ! source = new Source (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new char[reference.length ()]; for (int i = 0; i < part1.length (); i++) --- 257,261 ---- part3 = "to come to the aid of the party"; reference = part1 + part2 + part3; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (reference.getBytes (DEFAULT_CHARSET))), null); buffer = new char[reference.length ()]; for (int i = 0; i < part1.length (); i++) *************** *** 270,280 **** /** ! * Test ready. */ ! public void testReady () throws IOException { Source source; ! source = new Source (new Stream (new ByteArrayInputStream (new byte[] { (byte)0x42, (byte)0x62 })), null); assertTrue ("ready?", !source.ready ()); assertTrue ("erroneous character", 'B' == source.read ()); --- 272,282 ---- /** ! * Test ready of a InputStreamSource. */ ! public void testInputStreamSourceReady () throws IOException { Source source; ! source = new InputStreamSource (new Stream (new ByteArrayInputStream (new byte[] { (byte)0x42, (byte)0x62 })), null); assertTrue ("ready?", !source.ready ()); assertTrue ("erroneous character", 'B' == source.read ()); *************** *** 300,305 **** int index; ! // pick a big file ! link = "http://htmlparser.sourceforge.net/HTMLParser_Coverage.html"; try { --- 302,306 ---- int index; ! link = "http://htmlparser.sourceforge.net"; try { *************** *** 310,314 **** connection2 = url.openConnection (); connection2.connect (); ! source = new Source (new Stream (connection2.getInputStream ()), "UTF-8"); index = 0; while (-1 != (c1 = in.read ())) --- 311,315 ---- connection2 = url.openConnection (); connection2.connect (); ! source = new InputStreamSource (new Stream (connection2.getInputStream ()), "UTF-8"); index = 0; while (-1 != (c1 = in.read ())) *************** *** 329,331 **** --- 330,555 ---- } } + + /** + * Test initialization of a StringSource with a null value. + */ + public void testStringSourceNull () throws IOException + { + Source source; + + source = new StringSource (null); + assertTrue ("erroneous character", -1 == source.read ()); + } + + /** + * Test initialization of a StringSource with a zero length string. + */ + public void testStringSourceEmpty () throws IOException + { + Source source; + + source = new StringSource (""); + assertTrue ("erroneous character", -1 == source.read ()); + } + + /** + * Test initialization of a StringSource with a one character string. + */ + public void testStringSourceOneCharacter () throws IOException + { + Source source; + + source = new StringSource (new String ("B")); + assertTrue ("erroneous character", 'B' == source.read ()); + assertTrue ("extra character", -1 == source.read ()); + } + + /** + * Test closing a StringSource. + */ + public void testStringSourceClose () throws IOException + { + Source source; + + source = new StringSource ("hello word"); + assertTrue ("no character", -1 != source.read ()); + source.destroy (); + try + { + source.read (); + fail ("not closed"); + } + catch (IOException ioe) + { + // expected outcome + } + } + + /** + * Test resetting a StringSource. + */ + public void testStringSourceReset () throws IOException + { + String reference; + Source source; + StringBuffer buffer; + int c; + + reference = "Now is the time for all good men to come to the aid of the party"; + source = new StringSource (reference); + buffer = new StringBuffer (reference.length ()); + while (-1 != (c = source.read ())) + buffer.append ((char)c); + assertTrue ("string incorrect", reference.equals (buffer.toString ())); + source.reset (); + buffer.setLength (0); + while (-1 != (c = source.read ())) + buffer.append ((char)c); + assertTrue ("string incorrect", reference.equals (buffer.toString ())); + source.close (); + } + + /** + * Test resetting a StringSource in the middle of reading. + */ + public void testStringSourceMidReset () throws IOException + { + String reference; + Source source; + StringBuffer buffer; + int c; + + reference = "Now is the time for all good men to come to the aid of the party"; + source = new StringSource (reference); + buffer = new StringBuffer (reference.length ()); + for (int i = 0; i < 25; i++) + buffer.append ((char)source.read ()); + source.reset (); + for (int i = 0; i < 25; i++) + source.read (); + while (-1 != (c = source.read ())) + buffer.append ((char)c); + assertTrue ("string incorrect", reference.equals (buffer.toString ())); + source.close (); + } + + /** + * Test mark/reset of a StringSource in the middle of reading. + */ + public void testStringSourceMarkReset () throws IOException + { + String reference; + Source source; + StringBuffer buffer; + int c; + + reference = "Now is the time for all good men to come to the aid of the party"; + source = new StringSource (reference); + assertTrue ("not markable", source.markSupported ()); + buffer = new StringBuffer (reference.length ()); + for (int i = 0; i < 25; i++) + buffer.append ((char)source.read ()); + source.mark (88); + for (int i = 0; i < 25; i++) + source.read (); + source.reset (); + while (-1 != (c = source.read ())) + buffer.append ((char)c); + assertTrue ("string incorrect", reference.equals (buffer.toString ())); + source.close (); + } + + /** + * Test skipping a StringSource. + */ + public void testStringSourceSkip () throws IOException + { + String part1; + String part2; + String part3; + String reference; + Source source; + StringBuffer buffer; + int c; + + part1 = "Now is the time "; + part2 = "for all good men "; + part3 = "to come to the aid of the party"; + reference = part1 + part2 + part3; + source = new StringSource (reference); + buffer = new StringBuffer (reference.length ()); + for (int i = 0; i < part1.length (); i++) + buffer.append ((char)source.read ()); + source.skip (part2.length ()); + while (-1 != (c = source.read ())) + buffer.append ((char)c); + assertTrue ("string incorrect", (part1 + part3).equals (buffer.toString ())); + source.close (); + } + + /** + * Test multi-byte read with a StringSource. + */ + public void testStringSourceMultByte () throws IOException + { + String reference; + Source source; + char[] buffer; + + reference = "Now is the time for all good men to come to the aid of the party"; + source = new StringSource (reference); + buffer = new char[reference.length ()]; + source.read (buffer, 0, buffer.length); + assertTrue ("string incorrect", reference.equals (new String (buffer))); + assertTrue ("extra character", -1 == source.read ()); + source.close (); + } + + /** + * Test positioned multi-byte read with a StringSource. + */ + public void testStringSourcePositionedMultByte () throws IOException + { + String part1; + String part2; + String part3; + String reference; + Source source; + char[] buffer; + int c; + int length; + + part1 = "Now is the time "; + part2 = "for all good men "; + part3 = "to come to the aid of the party"; + reference = part1 + part2 + part3; + source = new StringSource (reference); + buffer = new char[reference.length ()]; + for (int i = 0; i < part1.length (); i++) + buffer[i] = (char)source.read (); + length = source.read (buffer, part1.length (), part2.length ()); + assertTrue ("incorrect length", part2.length () == length); + length += part1.length (); + for (int i = 0; i < part3.length (); i++) + buffer[i + length] = (char)source.read (); + assertTrue ("string incorrect", reference.equals (new String (buffer))); + assertTrue ("extra character", -1 == source.read ()); + source.close (); + } + + /** + * Test ready of a StringSource. + */ + public void testStringSourceReady () throws IOException + { + Source source; + + source = new StringSource ("Bb"); + assertTrue ("ready?", source.ready ()); + assertTrue ("erroneous character", 'B' == source.read ()); + assertTrue ("not ready", source.ready ()); + assertTrue ("erroneous character", 'b' == source.read ()); + assertTrue ("ready?", !source.ready ()); + assertTrue ("extra character", -1 == source.read ()); + } } |
From: Derrick O. <der...@us...> - 2004-07-03 13:56:31
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14744 Modified Files: Parser.java Log Message: Further fix to bug #973137 Double-bytes characters are messed after parsing. Created a proper String based source with the encoding only optionally specified. A string is no longer converted to a byte array and then back to characters. Index: Parser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v retrieving revision 1.94 retrieving revision 1.95 diff -C2 -d -r1.94 -r1.95 *** Parser.java 16 Jun 2004 02:17:25 -0000 1.94 --- Parser.java 3 Jul 2004 13:56:07 -0000 1.95 *************** *** 27,31 **** package org.htmlparser; - import java.io.ByteArrayInputStream; import java.io.File; import java.io.IOException; --- 27,30 ---- *************** *** 794,802 **** /** * Creates the parser on an input string. - * Uses the character set encoding to create a stream of bytes that is - * fed into the parser as if it had come off the wire. * @param html The string containing HTML. ! * @param charset Character set encoding to use when converting the ! * <code>html</code> to a stream of bytes. If charset is <code>null</code> * the default character set is used. * @return A parser with the <code>html</code> string as input. --- 793,799 ---- /** * Creates the parser on an input string. * @param html The string containing HTML. ! * @param charset <em>Optional</em>. The character set encoding that will ! * be reported by {@link #getEncoding}. If charset is <code>null</code> * the default character set is used. * @return A parser with the <code>html</code> string as input. *************** *** 804,828 **** public static Parser createParser (String html, String charset) { - ByteArrayInputStream stream; Parser ret; if (null == html) throw new IllegalArgumentException ("html cannot be null"); ! if (null == charset) ! charset = Page.DEFAULT_CHARSET; ! try ! { ! stream = new ByteArrayInputStream (html.getBytes (charset)); ! ret = new Parser (new Lexer (new Page (stream, charset))); ! } ! catch (UnsupportedEncodingException uee) ! { ! String msg; ! ! msg = uee.getMessage (); ! if (null == msg) ! msg = "unsupported encoding (" + charset + ") exception"; ! ret = new Parser (new Lexer (new Page (msg))); ! } return (ret); --- 801,809 ---- public static Parser createParser (String html, String charset) { Parser ret; if (null == html) throw new IllegalArgumentException ("html cannot be null"); ! ret = new Parser (new Lexer (new Page (html, charset))); return (ret); |
From: Derrick O. <der...@us...> - 2004-07-02 01:33:35
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7849 Modified Files: ParserTestCase.java Log Message: Fix broken test framework. Index: ParserTestCase.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v retrieving revision 1.49 retrieving revision 1.50 diff -C2 -d -r1.49 -r1.50 *** ParserTestCase.java 2 Jul 2004 00:49:29 -0000 1.49 --- ParserTestCase.java 2 Jul 2004 01:33:26 -0000 1.50 *************** *** 351,357 **** } ! private void assertActualTagHasNoExtraAttributes(String displayMessage, Tag expectedTag, Tag actualTag) { Vector v = actualTag.getAttributesEx (); ! for (int i = 0; i < v.size (); i++) { Attribute a = (Attribute)v.elementAt (i); --- 351,359 ---- } ! private void assertActualTagHasNoExtraAttributes(String displayMessage, Tag expectedTag, Tag actualTag) ! { ! assertStringEquals (displayMessage+"\ntag name", expectedTag.getTagName (), actualTag.getTagName ()); Vector v = actualTag.getAttributesEx (); ! for (int i = 1; i < v.size (); i++) { Attribute a = (Attribute)v.elementAt (i); *************** *** 370,375 **** Tag actualTag) { Vector v = actualTag.getAttributesEx (); ! for (int i = 0; i < v.size (); i++) { Attribute a = (Attribute)v.elementAt (i); --- 372,378 ---- Tag actualTag) { + assertStringEquals (displayMessage+"\ntag name", expectedTag.getTagName (), actualTag.getTagName ()); Vector v = actualTag.getAttributesEx (); ! for (int i = 1; i < v.size (); i++) { Attribute a = (Attribute)v.elementAt (i); |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:09
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/tags Modified Files: AppletTag.java BaseHrefTag.java CompositeTag.java DoctypeTag.java FrameTag.java ImageTag.java InputTag.java JspTag.java MetaTag.java ObjectTag.java Removed Files: Tag.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: ObjectTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/ObjectTag.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** ObjectTag.java 26 Jun 2004 11:25:01 -0000 1.1 --- ObjectTag.java 2 Jul 2004 00:49:29 -0000 1.2 *************** *** 31,37 **** import java.util.Vector; import org.htmlparser.Node; import org.htmlparser.nodes.TextNode; ! import org.htmlparser.Attribute; import org.htmlparser.util.NodeList; import org.htmlparser.util.SimpleNodeIterator; --- 31,39 ---- import java.util.Vector; + import org.htmlparser.Attribute; import org.htmlparser.Node; + import org.htmlparser.Tag; import org.htmlparser.nodes.TextNode; ! import org.htmlparser.nodes.TagNode; import org.htmlparser.util.NodeList; import org.htmlparser.util.SimpleNodeIterator; *************** *** 343,347 **** attributes.addElement (new Attribute (" ")); attributes.addElement (new Attribute ("NAME", paramName.toUpperCase (), '"')); ! tag = new Tag (null, 0, 0, attributes); kids.add (tag); } --- 345,349 ---- attributes.addElement (new Attribute (" ")); attributes.addElement (new Attribute ("NAME", paramName.toUpperCase (), '"')); ! tag = new TagNode (null, 0, 0, attributes); kids.add (tag); } Index: BaseHrefTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/BaseHrefTag.java,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** BaseHrefTag.java 18 Mar 2004 04:04:08 -0000 1.38 --- BaseHrefTag.java 2 Jul 2004 00:49:28 -0000 1.39 *************** *** 28,31 **** --- 28,32 ---- import org.htmlparser.lexer.Page; + import org.htmlparser.nodes.TagNode; import org.htmlparser.util.ParserException; *************** *** 34,38 **** * It extends a basic tag by providing an accessor to the HREF attribute. */ ! public class BaseHrefTag extends Tag { /** --- 35,41 ---- * It extends a basic tag by providing an accessor to the HREF attribute. */ ! public class BaseHrefTag ! extends ! TagNode { /** Index: InputTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/InputTag.java,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** InputTag.java 14 Jan 2004 02:58:06 -0000 1.35 --- InputTag.java 2 Jul 2004 00:49:29 -0000 1.36 *************** *** 27,34 **** package org.htmlparser.tags; /** * An input tag in a form. */ ! public class InputTag extends Tag { /** --- 27,38 ---- package org.htmlparser.tags; + import org.htmlparser.nodes.TagNode; + /** * An input tag in a form. */ ! public class InputTag ! extends ! TagNode { /** Index: AppletTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/AppletTag.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** AppletTag.java 24 May 2004 16:18:30 -0000 1.40 --- AppletTag.java 2 Jul 2004 00:49:28 -0000 1.41 *************** *** 33,37 **** --- 33,39 ---- import org.htmlparser.Attribute; import org.htmlparser.Node; + import org.htmlparser.Tag; import org.htmlparser.Text; + import org.htmlparser.nodes.TagNode; import org.htmlparser.util.NodeList; import org.htmlparser.util.SimpleNodeIterator; *************** *** 238,242 **** attributes.addElement (new Attribute (" ")); attributes.addElement (new Attribute ("NAME", paramName, '"')); ! tag = new Tag (null, 0, 0, attributes); kids.add (tag); } --- 240,244 ---- attributes.addElement (new Attribute (" ")); attributes.addElement (new Attribute ("NAME", paramName, '"')); ! tag = new TagNode (null, 0, 0, attributes); kids.add (tag); } Index: CompositeTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/CompositeTag.java,v retrieving revision 1.77 retrieving revision 1.78 diff -C2 -d -r1.77 -r1.78 *** CompositeTag.java 24 May 2004 16:18:30 -0000 1.77 --- CompositeTag.java 2 Jul 2004 00:49:28 -0000 1.78 *************** *** 34,37 **** --- 34,38 ---- import org.htmlparser.nodes.AbstractNode; import org.htmlparser.nodes.TagNode; + import org.htmlparser.Tag; import org.htmlparser.scanners.CompositeTagScanner; import org.htmlparser.util.NodeList; *************** *** 45,49 **** * the {@link #toHtml toHtml} method. */ ! public class CompositeTag extends Tag { /** --- 46,50 ---- * the {@link #toHtml toHtml} method. */ ! public class CompositeTag extends TagNode { /** *************** *** 51,55 **** * May be a virtual tag generated by the scanning logic. */ ! protected TagNode mEndTag; /** --- 52,56 ---- * May be a virtual tag generated by the scanning logic. */ ! protected Tag mEndTag; /** *************** *** 170,178 **** public Tag searchByName(String name) { Node node; ! Tag tag=null; boolean found = false; for (SimpleNodeIterator e = children();e.hasMoreNodes() && !found;) { node = (Node)e.nextNode(); ! if (node instanceof TagNode) { tag = (Tag)node; String nameAttribute = tag.getAttribute("NAME"); --- 171,180 ---- public Tag searchByName(String name) { Node node; ! Tag tag = null; boolean found = false; for (SimpleNodeIterator e = children();e.hasMoreNodes() && !found;) { node = (Node)e.nextNode(); ! if (node instanceof Tag) ! { tag = (Tag)node; String nameAttribute = tag.getAttribute("NAME"); *************** *** 448,474 **** } ! /** ! * @deprecated The tag *is* ths start tag. ! */ ! public TagNode getStartTag() ! { ! return (this); ! } ! ! /** ! * @deprecated The tag *is* ths start tag. ! */ ! public void setStartTag (TagNode start) ! { ! if (null != start) ! throw new IllegalStateException ("the tag *is* ths start tag"); ! } ! ! public TagNode getEndTag() { return (mEndTag); } ! public void setEndTag(TagNode end) { mEndTag = end; --- 450,459 ---- } ! public Tag getEndTag() { return (mEndTag); } ! public void setEndTag (Tag end) { mEndTag = end; Index: DoctypeTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/DoctypeTag.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** DoctypeTag.java 2 Jan 2004 16:24:54 -0000 1.37 --- DoctypeTag.java 2 Jul 2004 00:49:28 -0000 1.38 *************** *** 27,34 **** package org.htmlparser.tags; /** * The HTML Document Declaration Tag can identify <!DOCTYPE> tags. */ ! public class DoctypeTag extends Tag { /** --- 27,38 ---- package org.htmlparser.tags; + import org.htmlparser.nodes.TagNode; + /** * The HTML Document Declaration Tag can identify <!DOCTYPE> tags. */ ! public class DoctypeTag ! extends ! TagNode { /** Index: MetaTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/MetaTag.java,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -d -r1.36 -r1.37 *** MetaTag.java 24 May 2004 16:18:30 -0000 1.36 --- MetaTag.java 2 Jul 2004 00:49:29 -0000 1.37 *************** *** 28,31 **** --- 28,32 ---- import org.htmlparser.Attribute; + import org.htmlparser.nodes.TagNode; import org.htmlparser.util.ParserException; *************** *** 35,39 **** public class MetaTag extends ! Tag { /** --- 36,40 ---- public class MetaTag extends ! TagNode { /** Index: JspTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/JspTag.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** JspTag.java 14 Jan 2004 02:53:46 -0000 1.39 --- JspTag.java 2 Jul 2004 00:49:29 -0000 1.40 *************** *** 27,34 **** package org.htmlparser.tags; /** * The JSP/ASP tags like <%...%> can be identified by this class. */ ! public class JspTag extends Tag { /** --- 27,38 ---- package org.htmlparser.tags; + import org.htmlparser.nodes.TagNode; + /** * The JSP/ASP tags like <%...%> can be identified by this class. */ ! public class JspTag ! extends ! TagNode { /** Index: ImageTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/ImageTag.java,v retrieving revision 1.46 retrieving revision 1.47 diff -C2 -d -r1.46 -r1.47 *** ImageTag.java 24 May 2004 16:18:30 -0000 1.46 --- ImageTag.java 2 Jul 2004 00:49:28 -0000 1.47 *************** *** 31,34 **** --- 31,35 ---- import org.htmlparser.Attribute; + import org.htmlparser.nodes.TagNode; import org.htmlparser.util.ParserUtils; import org.htmlparser.visitors.NodeVisitor; *************** *** 39,43 **** public class ImageTag extends ! Tag { /** --- 40,44 ---- public class ImageTag extends ! TagNode { /** --- Tag.java DELETED --- Index: FrameTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/FrameTag.java,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -d -r1.36 -r1.37 *** FrameTag.java 18 Mar 2004 04:04:08 -0000 1.36 --- FrameTag.java 2 Jul 2004 00:49:28 -0000 1.37 *************** *** 27,34 **** package org.htmlparser.tags; /** * Identifies a frame tag */ ! public class FrameTag extends Tag { /** --- 27,38 ---- package org.htmlparser.tags; + import org.htmlparser.nodes.TagNode; + /** * Identifies a frame tag */ ! public class FrameTag ! extends ! TagNode { /** |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:08
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/tests Modified Files: ParserTest.java ParserTestCase.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: ParserTestCase.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v retrieving revision 1.48 retrieving revision 1.49 diff -C2 -d -r1.48 -r1.49 *** ParserTestCase.java 16 Jun 2004 02:17:26 -0000 1.48 --- ParserTestCase.java 2 Jul 2004 00:49:29 -0000 1.49 *************** *** 30,44 **** import java.util.Iterator; import java.util.Properties; import junit.framework.TestCase; import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.Text; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; import org.htmlparser.tags.FormTag; import org.htmlparser.tags.InputTag; - import org.htmlparser.tags.Tag; import org.htmlparser.util.DefaultParserFeedback; import org.htmlparser.util.NodeIterator; --- 30,47 ---- import java.util.Iterator; import java.util.Properties; + import java.util.Vector; import junit.framework.TestCase; + import org.htmlparser.Attribute; import org.htmlparser.Node; import org.htmlparser.Parser; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; + import org.htmlparser.nodes.TagNode; import org.htmlparser.tags.FormTag; import org.htmlparser.tags.InputTag; import org.htmlparser.util.DefaultParserFeedback; import org.htmlparser.util.NodeIterator; *************** *** 254,259 **** nextActualNode = getNextNodeUsing (actualIterator); assertNotNull (nextActualNode); ! tag1 = fixIfXmlEndTag (nextExpectedNode); ! tag2 = fixIfXmlEndTag (nextActualNode); assertStringValueMatches( displayMessage, --- 257,262 ---- nextActualNode = getNextNodeUsing (actualIterator); assertNotNull (nextActualNode); ! tag1 = fixIfXmlEndTag (expectedParser.getLexer ().getPage (), nextExpectedNode); ! tag2 = fixIfXmlEndTag (resultParser.getLexer ().getPage (), nextActualNode); assertStringValueMatches( displayMessage, *************** *** 320,324 **** * Return a following tag if node is an empty XML tag. */ ! private Tag fixIfXmlEndTag (Node node) { Tag ret; --- 323,327 ---- * Return a following tag if node is an empty XML tag. */ ! private Tag fixIfXmlEndTag (Page page, Node node) { Tag ret; *************** *** 331,335 **** { tag.setEmptyXmlTag (false); ! ret = new Tag (tag.getPage (), tag.getStartPosition (), tag.getEndPosition (), tag.getAttributesEx ()); } } --- 334,338 ---- { tag.setEmptyXmlTag (false); ! ret = new TagNode (page, tag.getStartPosition (), tag.getEndPosition (), tag.getAttributesEx ()); } } *************** *** 338,371 **** } ! private void assertAttributesMatch(String displayMessage, Tag expectedTag, Tag actualTag) { assertAllExpectedTagAttributesFoundInActualTag( displayMessage, expectedTag, actualTag); ! if (expectedTag.getAttributes().size()!=actualTag.getAttributes().size()) { assertActualTagHasNoExtraAttributes(displayMessage, expectedTag, actualTag); - } } private void assertActualTagHasNoExtraAttributes(String displayMessage, Tag expectedTag, Tag actualTag) { ! Iterator i = actualTag.getAttributes().keySet().iterator(); ! while (i.hasNext()) { ! String key = (String)i.next(); ! if (key=="/") continue; ! String expectedValue = ! expectedTag.getAttribute(key); ! String actualValue = ! actualTag.getAttribute(key); ! if (key==SpecialHashtable.TAGNAME) { ! expectedValue = ParserUtils.removeChars(expectedValue,'/'); ! actualValue = ParserUtils.removeChars(actualValue,'/'); ! assertStringEquals(displayMessage+"\ntag name",actualValue,expectedValue); continue; ! } ! ! if (expectedValue==null) ! fail( ! "\nActual tag had extra key: "+key+displayMessage ! ); } } --- 341,365 ---- } ! private void assertAttributesMatch(String displayMessage, Tag expectedTag, Tag actualTag) ! { assertAllExpectedTagAttributesFoundInActualTag( displayMessage, expectedTag, actualTag); ! if (expectedTag.getAttributesEx().size() != actualTag.getAttributesEx().size()) assertActualTagHasNoExtraAttributes(displayMessage, expectedTag, actualTag); } private void assertActualTagHasNoExtraAttributes(String displayMessage, Tag expectedTag, Tag actualTag) { ! Vector v = actualTag.getAttributesEx (); ! for (int i = 0; i < v.size (); i++) ! { ! Attribute a = (Attribute)v.elementAt (i); ! if (a.isWhitespace ()) continue; ! String actualValue = actualTag.getAttribute (a.getName ()); ! String expectedValue = expectedTag.getAttribute (a.getName ()); ! if (null == expectedValue) ! fail("\nActual tag had extra attribute: " + a.getName () + displayMessage); } } *************** *** 374,395 **** String displayMessage, Tag expectedTag, ! Tag actualTag) { ! Iterator i = expectedTag.getAttributes().keySet().iterator(); ! while (i.hasNext()) { ! String key = (String)i.next(); ! if (key.trim().equals ("/")) continue; ! String expectedValue = ! expectedTag.getAttribute(key); ! String actualValue = ! actualTag.getAttribute(key); ! if (key==SpecialHashtable.TAGNAME) { ! expectedValue = ParserUtils.removeChars(expectedValue,'/'); ! actualValue = ParserUtils.removeChars(actualValue,'/'); ! assertStringEquals(displayMessage+"\ntag name",expectedValue,actualValue); continue; ! } assertStringEquals( ! "\nvalue for key "+key+" in tag "+expectedTag.getTagName()+" expected="+expectedValue+" but was "+actualValue+ "\n\nComplete Tag expected:\n"+expectedTag.toHtml()+ "\n\nComplete Tag actual:\n"+actualTag.toHtml()+ --- 368,384 ---- String displayMessage, Tag expectedTag, ! Tag actualTag) ! { ! Vector v = actualTag.getAttributesEx (); ! for (int i = 0; i < v.size (); i++) ! { ! Attribute a = (Attribute)v.elementAt (i); ! if (a.isWhitespace ()) continue; ! String actualValue = actualTag.getAttribute (a.getName ()); ! String expectedValue = expectedTag.getAttribute (a.getName ()); assertStringEquals( ! "\nvalue for attribute " + a.getName () + " in tag "+expectedTag.getTagName()+" expected="+expectedValue+" but was "+actualValue+ "\n\nComplete Tag expected:\n"+expectedTag.toHtml()+ "\n\nComplete Tag actual:\n"+actualTag.toHtml()+ Index: ParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTest.java,v retrieving revision 1.59 retrieving revision 1.60 diff -C2 -d -r1.59 -r1.60 *** ParserTest.java 24 May 2004 16:18:30 -0000 1.59 --- ParserTest.java 2 Jul 2004 00:49:29 -0000 1.60 *************** *** 42,45 **** --- 42,46 ---- import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Remark; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.filters.NodeClassFilter; *************** *** 51,55 **** import org.htmlparser.tags.LinkTag; import org.htmlparser.tags.MetaTag; - import org.htmlparser.tags.Tag; import org.htmlparser.util.DefaultParserFeedback; import org.htmlparser.util.NodeIterator; --- 52,55 ---- |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:07
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/scanners Modified Files: CompositeTagScanner.java Scanner.java ScriptScanner.java StyleScanner.java TagScanner.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: CompositeTagScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/CompositeTagScanner.java,v retrieving revision 1.86 retrieving revision 1.87 diff -C2 -d -r1.86 -r1.87 *** CompositeTagScanner.java 24 May 2004 16:18:20 -0000 1.86 --- CompositeTagScanner.java 2 Jul 2004 00:49:28 -0000 1.87 *************** *** 31,39 **** import org.htmlparser.Attribute; import org.htmlparser.Node; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; import org.htmlparser.scanners.Scanner; - import org.htmlparser.tags.CompositeTag; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; --- 31,39 ---- import org.htmlparser.Attribute; import org.htmlparser.Node; + import org.htmlparser.Tag; import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; + import org.htmlparser.nodes.TagNode; import org.htmlparser.scanners.Scanner; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; *************** *** 101,107 **** String name; Scanner scanner; ! CompositeTag ret; ! ret = (CompositeTag)tag; if (ret.isEmptyXmlTag ()) --- 101,107 ---- String name; Scanner scanner; ! Tag ret; ! ret = tag; if (ret.isEmptyXmlTag ()) *************** *** 143,159 **** { // fake recursion: ! if ((scanner == this) && (next instanceof CompositeTag)) { ! CompositeTag ondeck = (CompositeTag)next; ! if (ondeck.isEmptyXmlTag ()) { ! ondeck.setEndTag (ondeck); ! finishTag (ondeck, lexer); ! addChild (ret, ondeck); } else { stack.add (ret); ! ret = ondeck; } } --- 143,158 ---- { // fake recursion: ! if (scanner == this) { ! if (next.isEmptyXmlTag ()) { ! next.setEndTag (next); ! finishTag (next, lexer); ! addChild (ret, next); } else { stack.add (ret); ! ret = next; } } *************** *** 192,196 **** attributes.addElement (new Attribute (name, null)); Tag opener = (Tag)lexer.getNodeFactory ().createTagNode ( ! next.getPage (), next.getStartPosition (), next.getEndPosition (), attributes); --- 191,195 ---- attributes.addElement (new Attribute (name, null)); Tag opener = (Tag)lexer.getNodeFactory ().createTagNode ( ! lexer.getPage (), next.getStartPosition (), next.getEndPosition (), attributes); *************** *** 202,208 **** for (int i = stack.size () - 1; (-1 == index) && (i >= 0); i--) { ! // short circuit here... assume everything on the stack is a CompositeTag and has this as it's scanner // we'll need to stop if either of those conditions isn't met ! CompositeTag boffo = (CompositeTag)stack.elementAt (i); if (name.equals (boffo.getTagName ())) index = i; --- 201,207 ---- for (int i = stack.size () - 1; (-1 == index) && (i >= 0); i--) { ! // short circuit here... assume everything on the stack has this as it's scanner // we'll need to stop if either of those conditions isn't met ! Tag boffo = (Tag)stack.elementAt (i); if (name.equals (boffo.getTagName ())) index = i; *************** *** 214,225 **** // finish off the current one first finishTag (ret, lexer); ! addChild ((CompositeTag)stack.elementAt (stack.size () - 1), ret); for (int i = stack.size () - 1; i > index; i--) { ! CompositeTag fred = (CompositeTag)stack.remove (i); finishTag (fred, lexer); ! addChild ((CompositeTag)stack.elementAt (i - 1), fred); } ! ret = (CompositeTag)stack.remove (index); node = null; } --- 213,224 ---- // finish off the current one first finishTag (ret, lexer); ! addChild ((Tag)stack.elementAt (stack.size () - 1), ret); for (int i = stack.size () - 1; i > index; i--) { ! Tag fred = (Tag)stack.remove (i); finishTag (fred, lexer); ! addChild ((Tag)stack.elementAt (i - 1), fred); } ! ret = (Tag)stack.remove (index); node = null; } *************** *** 247,253 **** { node = stack.elementAt (depth - 1); ! if (node instanceof CompositeTag) { ! CompositeTag precursor = (CompositeTag)node; scanner = precursor.getThisScanner (); if (scanner == this) --- 246,252 ---- { node = stack.elementAt (depth - 1); ! if (node instanceof Tag) { ! Tag precursor = (Tag)node; scanner = precursor.getThisScanner (); if (scanner == this) *************** *** 295,311 **** * @param lexer A lexer positioned at the end of the tag. */ ! protected void finishTag (CompositeTag tag, Lexer lexer) throws ParserException { if (null == tag.getEndTag ()) ! tag.setEndTag (createVirtualEndTag (tag, lexer.getPage (), lexer.getCursor ().getPosition ())); tag.getEndTag ().setParent (tag); tag.doSemanticAction (); } ! /** * Creates an end tag with the same name as the given tag. * @param tag The tag to end. * @param page The page the tag is on (virtually). * @param position The offset into the page at which the tag is to --- 294,311 ---- * @param lexer A lexer positioned at the end of the tag. */ ! protected void finishTag (Tag tag, Lexer lexer) throws ParserException { if (null == tag.getEndTag ()) ! tag.setEndTag (createVirtualEndTag (tag, lexer, lexer.getPage (), lexer.getCursor ().getPosition ())); tag.getEndTag ().setParent (tag); tag.doSemanticAction (); } ! /** * Creates an end tag with the same name as the given tag. * @param tag The tag to end. + * @param lexer The object containg the node factory. * @param page The page the tag is on (virtually). * @param position The offset into the page at which the tag is to *************** *** 315,319 **** * equal may be used to distinguish it as a virtual tag later on. */ ! protected Tag createVirtualEndTag (Tag tag, Page page, int position) { Tag ret; --- 315,321 ---- * equal may be used to distinguish it as a virtual tag later on. */ ! protected Tag createVirtualEndTag (Tag tag, Lexer lexer, Page page, int position) ! throws ! ParserException { Tag ret; *************** *** 321,328 **** Vector attributes; ! name = "/" + tag.getRawTagName(); attributes = new Vector (); attributes.addElement (new Attribute (name, (String)null)); ! ret = new Tag (page, position, position, attributes); return (ret); --- 323,331 ---- Vector attributes; ! name = "/" + tag.getRawTagName (); attributes = new Vector (); attributes.addElement (new Attribute (name, (String)null)); ! ret = (Tag)lexer.getNodeFactory ().createTagNode ( ! page, position, position, attributes); return (ret); Index: StyleScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/StyleScanner.java,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** StyleScanner.java 14 Jun 2004 00:06:52 -0000 1.35 --- StyleScanner.java 2 Jul 2004 00:49:28 -0000 1.36 *************** *** 36,41 **** import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; ! import org.htmlparser.tags.CompositeTag; ! import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; --- 36,40 ---- import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; ! import org.htmlparser.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; *************** *** 75,79 **** NodeFactory factory; Text content; ! CompositeTag ret; done = false; --- 74,78 ---- NodeFactory factory; Text content; ! Tag ret; done = false; *************** *** 118,123 **** // build new end tag if required if (null == end) ! end = new Tag (lexer.getPage (), endpos, endpos, new Vector ()); ! ret = (CompositeTag)tag; ret.setEndTag (end); ret.setChildren (new NodeList (content)); --- 117,123 ---- // build new end tag if required if (null == end) ! end = (Tag)lexer.getNodeFactory ().createTagNode ( ! lexer.getPage (), endpos, endpos, new Vector ()); ! ret = tag; ret.setEndTag (end); ret.setChildren (new NodeList (content)); Index: ScriptScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ScriptScanner.java,v retrieving revision 1.58 retrieving revision 1.59 diff -C2 -d -r1.58 -r1.59 *** ScriptScanner.java 14 Jun 2004 00:06:52 -0000 1.58 --- ScriptScanner.java 2 Jul 2004 00:49:28 -0000 1.59 *************** *** 33,43 **** import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Remark; import org.htmlparser.Text; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; import org.htmlparser.scanners.ScriptDecoder; - import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ScriptTag; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; --- 33,42 ---- import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Remark; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; import org.htmlparser.scanners.ScriptDecoder; import org.htmlparser.tags.ScriptTag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; *************** *** 80,84 **** NodeFactory factory; Text content; ! CompositeTag ret; done = false; --- 79,83 ---- NodeFactory factory; Text content; ! Tag ret; done = false; *************** *** 135,140 **** // build new end tag if required if (null == end) ! end = new Tag (lexer.getPage (), endpos, endpos, new Vector ()); ! ret = (CompositeTag)tag; ret.setEndTag (end); ret.setChildren (new NodeList (content)); --- 134,140 ---- // build new end tag if required if (null == end) ! end = (Tag)lexer.getNodeFactory ().createTagNode ( ! lexer.getPage (), endpos, endpos, new Vector ()); ! ret = tag; ret.setEndTag (end); ret.setChildren (new NodeList (content)); Index: Scanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/Scanner.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** Scanner.java 20 Dec 2003 23:47:55 -0000 1.1 --- Scanner.java 2 Jul 2004 00:49:28 -0000 1.2 *************** *** 27,32 **** package org.htmlparser.scanners; import org.htmlparser.lexer.Lexer; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; --- 27,32 ---- package org.htmlparser.scanners; + import org.htmlparser.Tag; import org.htmlparser.lexer.Lexer; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; Index: TagScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/TagScanner.java,v retrieving revision 1.53 retrieving revision 1.54 diff -C2 -d -r1.53 -r1.54 *** TagScanner.java 20 Dec 2003 23:47:55 -0000 1.53 --- TagScanner.java 2 Jul 2004 00:49:28 -0000 1.54 *************** *** 29,34 **** import java.io.Serializable; import org.htmlparser.lexer.Lexer; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; --- 29,34 ---- import java.io.Serializable; + import org.htmlparser.Tag; import org.htmlparser.lexer.Lexer; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:06
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/nodes Modified Files: TagNode.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: TagNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodes/TagNode.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** TagNode.java 24 May 2004 16:18:37 -0000 1.1 --- TagNode.java 2 Jul 2004 00:49:27 -0000 1.2 *************** *** 37,40 **** --- 37,42 ---- import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.Page; + import org.htmlparser.scanners.Scanner; + import org.htmlparser.scanners.TagScanner; import org.htmlparser.util.ParserException; import org.htmlparser.util.SpecialHashtable; *************** *** 44,48 **** /** * TagNode represents a generic tag. ! * */ public class TagNode --- 46,51 ---- /** * TagNode represents a generic tag. ! * If no scanner is registered for a given tag name, this is what you get. ! * This is also the base class for all tags created by the parser. */ public class TagNode *************** *** 53,56 **** --- 56,74 ---- { /** + * An empty set of tag names. + */ + private final static String[] NONE = new String[0]; + + /** + * The scanner for this tag. + */ + private Scanner mScanner; + + /** + * The default scanner for non-composite tags. + */ + protected final static Scanner mDefaultScanner = new TagScanner (); + + /** * The tag attributes. * Objects of type {@link Attribute}. *************** *** 118,122 **** --- 136,163 ---- { super (page, start, end); + + mScanner = mDefaultScanner; mAttributes = attributes; + if ((null == mAttributes) || (0 == mAttributes.size ())) + { + String[] names; + + names = getIds (); + if ((null != names) && (0 != names.length)) + setTagName (names[0]); + else + setTagName (""); // make sure it's not null + } + } + + /** + * Create a tag like the one provided. + * @param node The tag to emulate. + * @param scanner The scanner for this tag. + */ + public TagNode (TagNode tag, TagScanner scanner) + { + this (tag.getPage (), tag.getTagBegin (), tag.getTagEnd (), tag.getAttributesEx ()); + setThisScanner (scanner); } *************** *** 467,470 **** --- 508,516 ---- attribute = new Attribute (name, null, (char)0); attributes = getAttributesEx (); + if (null == attributes) + { + attributes = new Vector (); + setAttributesEx (attributes); + } if (0 == attributes.size ()) // nothing added yet *************** *** 473,477 **** { zeroth = (Attribute)attributes.elementAt (0); ! // check forn attribute that looks like a name if ((null == zeroth.getValue ()) && (0 == zeroth.getQuote ())) attributes.setElementAt (attribute, 0); --- 519,523 ---- { zeroth = (Attribute)attributes.elementAt (0); ! // check for attribute that looks like a name if ((null == zeroth.getValue ()) && (0 == zeroth.getQuote ())) attributes.setElementAt (attribute, 0); *************** *** 877,879 **** --- 923,998 ---- return (getPage ().row (getEndPosition ())); } + + /** + * Return the set of names handled by this tag. + * Since this a a generic tag, it has no ids. + * @return The names to be matched that create tags of this type. + */ + public String[] getIds () + { + return (NONE); + } + + /** + * Return the set of tag names that cause this tag to finish. + * These are the normal (non end tags) that if encountered while + * scanning (a composite tag) will cause the generation of a virtual + * tag. + * Since this a a non-composite tag, the default is no enders. + * @return The names of following tags that stop further scanning. + */ + public String[] getEnders () + { + return (NONE); + } + + /** + * Return the set of end tag names that cause this tag to finish. + * These are the end tags that if encountered while + * scanning (a composite tag) will cause the generation of a virtual + * tag. + * Since this a a non-composite tag, it has no end tag enders. + * @return The names of following end tags that stop further scanning. + */ + public String[] getEndTagEnders () + { + return (NONE); + } + + /** + * Return the scanner associated with this tag. + * @return The scanner associated with this tag. + */ + public Scanner getThisScanner () + { + return (mScanner); + } + + /** + * Set the scanner associated with this tag. + * @param scanner The scanner for this tag. + */ + public void setThisScanner (Scanner scanner) + { + mScanner = scanner; + } + + /** + * Get the end tag for this (composite) tag. + * For a non-composite tag this always returns <code>null</code>. + * @return The tag that terminates this composite tag, i.e. </HTML>. + */ + public Tag getEndTag () + { + return (null); + } + + /** + * Set the end tag for this (composite) tag. + * For a non-composite tag this is a no-op. + * @param end The tag that terminates this composite tag, i.e. </HTML>. + */ + public void setEndTag (Tag end) + { + } } |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:06
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodeDecorators In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/nodeDecorators Modified Files: AbstractNodeDecorator.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: AbstractNodeDecorator.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodeDecorators/AbstractNodeDecorator.java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** AbstractNodeDecorator.java 14 Jun 2004 00:06:51 -0000 1.21 --- AbstractNodeDecorator.java 2 Jul 2004 00:49:27 -0000 1.22 *************** *** 30,33 **** --- 30,34 ---- import org.htmlparser.Text; import org.htmlparser.NodeFilter; + import org.htmlparser.lexer.Page; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; *************** *** 108,112 **** } ! public boolean equals(Object arg0) { return delegate.equals(arg0); } --- 109,132 ---- } ! /** ! * Get the page this node came from. ! * @return The page that supplied this node. ! */ ! public Page getPage () ! { ! return (delegate.getPage ()); ! } ! ! /** ! * Set the page this node came from. ! * @param page The page that supplied this node. ! */ ! public void setPage (Page page) ! { ! delegate.setPage (page); ! } ! ! public boolean equals(Object arg0) ! { return delegate.equals(arg0); } |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:06
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser Modified Files: Node.java PrototypicalNodeFactory.java Tag.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: Node.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v retrieving revision 1.50 retrieving revision 1.51 diff -C2 -d -r1.50 -r1.51 *** Node.java 14 Jun 2004 00:06:51 -0000 1.50 --- Node.java 2 Jul 2004 00:49:26 -0000 1.51 *************** *** 27,30 **** --- 27,31 ---- package org.htmlparser; + import org.htmlparser.lexer.Page; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; *************** *** 171,174 **** --- 172,186 ---- /** + * Get the page this node came from. + * @return The page that supplied this node. + */ + public Page getPage (); + + /** + * Set the page this node came from. + * @param page The page that supplied this node. + */ + public void setPage (Page page); + /** * Apply the visitor to this node. * @param visitor The visitor to this node. Index: PrototypicalNodeFactory.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/PrototypicalNodeFactory.java,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** PrototypicalNodeFactory.java 26 Jun 2004 11:25:01 -0000 1.10 --- PrototypicalNodeFactory.java 2 Jul 2004 00:49:26 -0000 1.11 *************** *** 43,46 **** --- 43,47 ---- import org.htmlparser.nodes.TextNode; import org.htmlparser.nodes.RemarkNode; + import org.htmlparser.nodes.TagNode; import org.htmlparser.tags.AppletTag; import org.htmlparser.tags.BaseHrefTag; *************** *** 106,109 **** --- 107,115 ---- /** + * The prototypical tag node. + */ + protected Tag mTag; + + /** * The list of tags to return. * The list is keyed by tag name. *************** *** 129,132 **** --- 135,139 ---- mText = new TextNode (null, 0, 0); mRemark = new RemarkNode (null, 0, 0); + mTag = new TagNode (null, 0, 0, null); if (!empty) registerTags (); *************** *** 137,141 **** * @param tag The single tag to register in the otherwise empty factory. */ ! public PrototypicalNodeFactory (org.htmlparser.tags.Tag tag) { this (true); --- 144,148 ---- * @param tag The single tag to register in the otherwise empty factory. */ ! public PrototypicalNodeFactory (Tag tag) { this (true); *************** *** 147,151 **** * @param tags The tags to register in the otherwise empty factory. */ ! public PrototypicalNodeFactory (org.htmlparser.tags.Tag[] tags) { this (true); --- 154,158 ---- * @param tags The tags to register in the otherwise empty factory. */ ! public PrototypicalNodeFactory (Tag[] tags) { this (true); *************** *** 207,213 **** * Registers the given tag under every id the tag has. * @param tag The tag to register (subclass of ! * {@link org.htmlparser.tags.Tag}). */ ! public void registerTag (org.htmlparser.tags.Tag tag) { String ids[]; --- 214,220 ---- * Registers the given tag under every id the tag has. * @param tag The tag to register (subclass of ! * {@link Tag}). */ ! public void registerTag (Tag tag) { String ids[]; *************** *** 222,228 **** * Unregisters the given tag from every id the tag has. * @param tag The tag to unregister (subclass of ! * {@link org.htmlparser.tags.Tag}). */ ! public void unregisterTag (org.htmlparser.tags.Tag tag) { String ids[]; --- 229,235 ---- * Unregisters the given tag from every id the tag has. * @param tag The tag to unregister (subclass of ! * {@link Tag}). */ ! public void unregisterTag (Tag tag) { String ids[]; *************** *** 234,260 **** /** - * Register a tag. - * Registers the given tag under the tag {@link Tag#getTagName() name}. - * @param tag The tag to register (implements {@link org.htmlparser.Tag}). - */ - public void registerTag (Tag tag) - { - put (tag.getTagName (), tag); - } - - /** - * Unregister a tag. - * Unregisters the given tag from the tag {@link Tag#getTagName() name}. - * @param tag The tag to unregister (implements {@link org.htmlparser.Tag}). - */ - public void unregisterTag (Tag tag) - { - remove (tag.getTagName ()); - } - - /** * Register all known tags in the tag package. * Registers tags from the {@link org.htmlparser.tags tag package} by ! * calling {@link #registerTag(org.htmlparser.tags.Tag) registerTag()}. * @return 'this' nodefactory as a convenience. */ --- 241,247 ---- /** * Register all known tags in the tag package. * Registers tags from the {@link org.htmlparser.tags tag package} by ! * calling {@link #registerTag(Tag) registerTag()}. * @return 'this' nodefactory as a convenience. */ *************** *** 337,340 **** --- 324,352 ---- } + /** + * Get the object being used to generate generic tag nodes. + * These are returned from {@link createTagNode} when no specific tag + * is found in the registered tag list. + * @return The prototype for {@link Tag} nodes. + */ + public Tag getTagPrototype () + { + return (mTag); + } + + /** + * Set the object to be used to generate tag nodes. + * These are returned from {@link createTagNode} when no specific tag + * is found in the registered tag list. + * @param remark The prototype for {@link Tag} nodes. + */ + public void setTagPrototype (Tag tag) + { + if (null == tag) + throw new IllegalArgumentException ("tag prototype node cannot be null"); + else + mTag = tag; + } + // // NodeFactory interface *************** *** 354,361 **** { ret = (Text)(getTextPrototype ().clone ()); ! if (ret instanceof AbstractNode) ! ((AbstractNode)ret).setPage (page); ! else ! ret.setText (page.getText (start, end)); ret.setStartPosition (start); ret.setEndPosition (end); --- 366,370 ---- { ret = (Text)(getTextPrototype ().clone ()); ! ret.setPage (page); ret.setStartPosition (start); ret.setEndPosition (end); *************** *** 384,398 **** { ret = (Remark)(getRemarkPrototype ().clone ()); ! if (ret instanceof AbstractNode) ! ((AbstractNode)ret).setPage (page); ! else ! { ! first = start + 4; // <!-- ! last = end - 3; // --> ! if (first >= last) ! ret.setText (""); ! else ! ret.setText (page.getText (first, last)); ! } ret.setStartPosition (start); ret.setEndPosition (end); --- 393,397 ---- { ret = (Remark)(getRemarkPrototype ().clone ()); ! ret.setPage (page); ret.setStartPosition (start); ret.setEndPosition (end); *************** *** 445,450 **** { ret = (Tag)prototype.clone (); ! if (ret instanceof AbstractNode) ! ((AbstractNode)ret).setPage (page); ret.setStartPosition (start); ret.setEndPosition (end); --- 444,448 ---- { ret = (Tag)prototype.clone (); ! ret.setPage (page); ret.setStartPosition (start); ret.setEndPosition (end); *************** *** 460,465 **** } if (null == ret) ! // generate a generic node ! ret = new org.htmlparser.tags.Tag (page, start, end, attributes); return (ret); --- 458,475 ---- } if (null == ret) ! { // generate a generic node ! try ! { ! ret = (Tag)getTagPrototype ().clone (); ! ret.setPage (page); ! ret.setStartPosition (start); ! ret.setEndPosition (end); ! ret.setAttributesEx (attributes); ! } ! catch (CloneNotSupportedException cnse) ! { ! ret = new TagNode (page, start, end, attributes); ! } ! } return (ret); Index: Tag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Tag.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** Tag.java 24 May 2004 16:18:12 -0000 1.3 --- Tag.java 2 Jul 2004 00:49:26 -0000 1.4 *************** *** 27,32 **** --- 27,35 ---- package org.htmlparser; + import java.util.Hashtable; import java.util.Vector; + import org.htmlparser.scanners.Scanner; + /** * Identifies what a Tag such as <XXX xxx yyy="zzz"> can do. *************** *** 96,99 **** --- 99,128 ---- */ public void setAttributesEx (Vector attribs); + + /** + * Gets the attributes in the tag. + * This is not the preferred method to get attributes, see {@link + * #getAttributesEx getAttributesEx} which returns a list of {@link + * Attribute} objects, which offer more information than the simple + * <code>String</code> objects available from this <code>Hashtable</code>. + * @return Returns a list of name/value pairs representing the attributes. + * These are not in order, the keys (names) are converted to uppercase and the values + * are not quoted, even if they need to be. The table <em>will</em> return + * <code>null</code> if there was no value for an attribute (no equals + * sign or nothing to the right of the equals sign). A special entry with + * a key of SpecialHashtable.TAGNAME ("$<TAGNAME>$") holds the tag name. + * The conversion to uppercase is performed with an ENGLISH locale. + * @deprecated Use getAttributesEx() instead. + */ + public Hashtable getAttributes (); + + /** + * Sets the attributes. + * A special entry with a key of SpecialHashtable.TAGNAME ("$<TAGNAME>$") + * sets the tag name. + * @param attributes The attribute collection to set. + * @deprecated Use setAttributesEx() instead. + */ + public void setAttributes (Hashtable attributes); /** *************** *** 119,122 **** --- 148,158 ---- /** + * Return the name of this tag. + * @return The tag name or null if this tag contains nothing or only + * whitespace. + */ + public String getRawTagName (); + + /** * Determines if the given tag breaks the flow of text. * @return <code>true</code> if following text would start on a new line, *************** *** 152,154 **** --- 188,254 ---- */ public void setEmptyXmlTag (boolean emptyXmlTag); + + /** + * Return the set of names handled by this tag. + * Since this a a generic tag, it has no ids. + * @return The names to be matched that create tags of this type. + */ + public String[] getIds (); + + /** + * Return the set of tag names that cause this tag to finish. + * These are the normal (non end tags) that if encountered while + * scanning (a composite tag) will cause the generation of a virtual + * tag. + * Since this a a non-composite tag, the default is no enders. + * @return The names of following tags that stop further scanning. + */ + public String[] getEnders (); + + /** + * Return the set of end tag names that cause this tag to finish. + * These are the end tags that if encountered while + * scanning (a composite tag) will cause the generation of a virtual + * tag. + * Since this a a non-composite tag, it has no end tag enders. + * @return The names of following end tags that stop further scanning. + */ + public String[] getEndTagEnders (); + + /** + * Get the end tag for this (composite) tag. + * For a non-composite tag this always returns <code>null</code>. + * @return The tag that terminates this composite tag, i.e. </HTML>. + */ + public Tag getEndTag (); + + /** + * Set the end tag for this (composite) tag. + * For a non-composite tag this is a no-op. + * @param end The tag that terminates this composite tag, i.e. </HTML>. + */ + public void setEndTag (Tag end); + + /** + * Return the scanner associated with this tag. + * @return The scanner associated with this tag. + */ + public Scanner getThisScanner (); + + /** + * Set the scanner associated with this tag. + * @param scanner The scanner for this tag. + */ + public void setThisScanner (Scanner scanner); + + /** + * Get the line number where this tag starts. + * @return The (zero based) line number in the page where this tag starts. + */ + public int getStartingLineNumber (); + /** + * Get the line number where this tag ends. + * @return The (zero based) line number in the page where this tag ends. + */ + public int getEndingLineNumber (); } |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:06
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/filters Modified Files: CssSelectorNodeFilter.java HasParentFilter.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: HasParentFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/HasParentFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** HasParentFilter.java 8 Jun 2004 10:20:19 -0000 1.2 --- HasParentFilter.java 2 Jul 2004 00:49:27 -0000 1.3 *************** *** 29,33 **** import org.htmlparser.Node; import org.htmlparser.NodeFilter; ! import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; --- 29,33 ---- import org.htmlparser.Node; import org.htmlparser.NodeFilter; ! import org.htmlparser.Tag; import org.htmlparser.util.NodeList; Index: CssSelectorNodeFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/CssSelectorNodeFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** CssSelectorNodeFilter.java 22 May 2004 12:28:15 -0000 1.2 --- CssSelectorNodeFilter.java 2 Jul 2004 00:49:27 -0000 1.3 *************** *** 27,40 **** package org.htmlparser.filters; ! import org.htmlparser.*; import org.htmlparser.lexer.Lexer; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.NodeList; - import java.util.regex.Matcher; - import java.util.regex.Pattern; - import java.net.URLConnection; - /** * A NodeFilter that accepts nodes based on whether they match a CSS2 selector. --- 27,41 ---- package org.htmlparser.filters; ! import java.net.URLConnection; ! import java.util.regex.Matcher; ! import java.util.regex.Pattern; ! ! import org.htmlparser.Node; ! import org.htmlparser.NodeFilter; ! import org.htmlparser.Tag; import org.htmlparser.lexer.Lexer; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.NodeList; /** * A NodeFilter that accepts nodes based on whether they match a CSS2 selector. |
From: Derrick O. <der...@us...> - 2004-07-02 00:50:05
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670 Modified Files: build.xml Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.68 retrieving revision 1.69 diff -C2 -d -r1.68 -r1.69 *** build.xml 30 May 2004 01:43:54 -0000 1.68 --- build.xml 2 Jul 2004 00:49:26 -0000 1.69 *************** *** 270,273 **** --- 270,275 ---- <include name="org/htmlparser/Tag.class"/> <include name="org/htmlparser/Text.class"/> + <include name="org/htmlparser/scanners/Scanner.class"/> + <include name="org/htmlparser/scanners/TagScanner.class"/> <include name="org/htmlparser/util/ParserException.class"/> <include name="org/htmlparser/util/ChainedException.class"/> |
From: Derrick O. <der...@us...> - 2004-07-02 00:49:41
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/tests/tagTests Modified Files: BaseHrefTagTest.java BodyTagTest.java DivTagTest.java EndTagTest.java FormTagTest.java FrameSetTagTest.java HtmlTagTest.java JspTagTest.java LinkTagTest.java MetaTagTest.java ObjectCollectionTest.java SpanTagTest.java TagTest.java TitleTagTest.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: FormTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/FormTagTest.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** FormTagTest.java 24 May 2004 16:18:33 -0000 1.44 --- FormTagTest.java 2 Jul 2004 00:49:31 -0000 1.45 *************** *** 31,34 **** --- 31,35 ---- import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Remark; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.filters.NodeClassFilter; *************** *** 42,46 **** import org.htmlparser.tags.SelectTag; import org.htmlparser.tags.TableTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tags.TextareaTag; import org.htmlparser.tests.ParserTestCase; --- 43,46 ---- Index: TitleTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TitleTagTest.java,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** TitleTagTest.java 2 Jan 2004 16:24:57 -0000 1.35 --- TitleTagTest.java 2 Jul 2004 00:49:31 -0000 1.36 *************** *** 28,31 **** --- 28,32 ---- import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.BaseHrefTag; import org.htmlparser.tags.HeadTag; *************** *** 33,37 **** import org.htmlparser.tags.MetaTag; import org.htmlparser.tags.StyleTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; --- 34,37 ---- Index: LinkTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/LinkTagTest.java,v retrieving revision 1.47 retrieving revision 1.48 diff -C2 -d -r1.47 -r1.48 *** LinkTagTest.java 24 May 2004 16:18:33 -0000 1.47 --- LinkTagTest.java 2 Jul 2004 00:49:31 -0000 1.48 *************** *** 30,33 **** --- 30,34 ---- import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.tags.HeadTag; *************** *** 35,39 **** import org.htmlparser.tags.ImageTag; import org.htmlparser.tags.LinkTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; --- 36,39 ---- Index: ObjectCollectionTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ObjectCollectionTest.java,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** ObjectCollectionTest.java 2 Jan 2004 16:24:57 -0000 1.20 --- ObjectCollectionTest.java 2 Jul 2004 00:49:31 -0000 1.21 *************** *** 29,36 **** import org.htmlparser.Node; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.Div; import org.htmlparser.tags.Span; import org.htmlparser.tags.TableTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.NodeList; --- 29,36 ---- import org.htmlparser.Node; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.Div; import org.htmlparser.tags.Span; import org.htmlparser.tags.TableTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.NodeList; Index: BodyTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/BodyTagTest.java,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** BodyTagTest.java 2 Jan 2004 16:24:57 -0000 1.20 --- BodyTagTest.java 2 Jul 2004 00:49:31 -0000 1.21 *************** *** 32,38 **** import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.BodyTag; import org.htmlparser.tags.Html; - import org.htmlparser.tags.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; --- 32,38 ---- import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.BodyTag; import org.htmlparser.tags.Html; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; Index: MetaTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/MetaTagTest.java,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** MetaTagTest.java 2 Jan 2004 16:24:57 -0000 1.38 --- MetaTagTest.java 2 Jul 2004 00:49:31 -0000 1.39 *************** *** 28,36 **** import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.HeadTag; import org.htmlparser.tags.Html; import org.htmlparser.tags.LinkTag; import org.htmlparser.tags.MetaTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; --- 28,36 ---- import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.HeadTag; import org.htmlparser.tags.Html; import org.htmlparser.tags.LinkTag; import org.htmlparser.tags.MetaTag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; Index: BaseHrefTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/BaseHrefTagTest.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** BaseHrefTagTest.java 18 Mar 2004 04:04:08 -0000 1.40 --- BaseHrefTagTest.java 2 Jul 2004 00:49:31 -0000 1.41 *************** *** 28,34 **** import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.BaseHrefTag; import org.htmlparser.tags.LinkTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; --- 28,34 ---- import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.BaseHrefTag; import org.htmlparser.tags.LinkTag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; Index: HtmlTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/HtmlTagTest.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** HtmlTagTest.java 7 Dec 2003 23:41:43 -0000 1.1 --- HtmlTagTest.java 2 Jul 2004 00:49:31 -0000 1.2 *************** *** 29,35 **** import org.htmlparser.Node; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.filters.NodeClassFilter; import org.htmlparser.tags.Html; - import org.htmlparser.tags.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; --- 29,35 ---- import org.htmlparser.Node; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.filters.NodeClassFilter; import org.htmlparser.tags.Html; import org.htmlparser.tags.TitleTag; import org.htmlparser.tests.ParserTestCase; Index: EndTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/EndTagTest.java,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** EndTagTest.java 2 Jan 2004 16:24:57 -0000 1.38 --- EndTagTest.java 2 Jul 2004 00:49:31 -0000 1.39 *************** *** 28,32 **** import org.htmlparser.PrototypicalNodeFactory; ! import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; --- 28,32 ---- import org.htmlparser.PrototypicalNodeFactory; ! import org.htmlparser.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; Index: DivTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/DivTagTest.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** DivTagTest.java 24 Jan 2004 23:58:07 -0000 1.2 --- DivTagTest.java 2 Jul 2004 00:49:31 -0000 1.3 *************** *** 28,35 **** import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.Div; import org.htmlparser.tags.InputTag; import org.htmlparser.tags.TableTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; --- 28,35 ---- import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.Div; import org.htmlparser.tags.InputTag; import org.htmlparser.tags.TableTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; Index: SpanTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/SpanTagTest.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** SpanTagTest.java 24 Jan 2004 23:58:07 -0000 1.2 --- SpanTagTest.java 2 Jul 2004 00:49:31 -0000 1.3 *************** *** 29,35 **** import org.htmlparser.Node; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.Span; import org.htmlparser.tags.TableColumn; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; --- 29,35 ---- import org.htmlparser.Node; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.Span; import org.htmlparser.tags.TableColumn; import org.htmlparser.tests.ParserTestCase; Index: TagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v retrieving revision 1.58 retrieving revision 1.59 diff -C2 -d -r1.58 -r1.59 *** TagTest.java 24 May 2004 16:18:33 -0000 1.58 --- TagTest.java 2 Jul 2004 00:49:31 -0000 1.59 *************** *** 28,35 **** --- 28,37 ---- import java.util.Hashtable; + import org.htmlparser.Attribute; import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.tags.BodyTag; *************** *** 37,41 **** import org.htmlparser.tags.Html; import org.htmlparser.tags.LinkTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.NodeIterator; --- 39,42 ---- *************** *** 127,131 **** createParser(lin1); NodeIterator en = parser.elements(); - Hashtable h; try { --- 128,131 ---- *************** *** 135,140 **** tag = (Tag)node; ! h = tag.getAttributes(); ! String classValue= (String)h.get("CLASS"); assertEquals ("The class value should be ","userData",classValue); } --- 135,139 ---- tag = (Tag)node; ! String classValue= tag.getAttribute ("CLASS"); assertEquals ("The class value should be ","userData",classValue); } *************** *** 159,163 **** createParser(lin1); NodeIterator en = parser.elements(); - Hashtable h; String a,href,myValue,nice; --- 158,161 ---- *************** *** 168,176 **** tag = (Tag)node; ! h = tag.getAttributes(); ! a = (String)h.get(SpecialHashtable.TAGNAME); ! href = (String)h.get("HREF"); ! myValue = (String)h.get("MYPARAMETER"); ! nice = (String)h.get("YOURPARAMETER"); assertEquals ("Link tag (A)","A",a); assertEquals ("href value","http://www.iki.fi/kaila",href); --- 166,173 ---- tag = (Tag)node; ! a = ((Attribute)(tag.getAttributesEx ().elementAt (0))).getName (); ! href = tag.getAttribute ("HREF"); ! myValue = tag.getAttribute ("MYPARAMETER"); ! nice = tag.getAttribute ("YOURPARAMETER"); assertEquals ("Link tag (A)","A",a); assertEquals ("href value","http://www.iki.fi/kaila",href); *************** *** 229,233 **** createParser(lin1); NodeIterator en = parser.elements(); - Hashtable h; String a,href,myValue,nice; --- 226,229 ---- *************** *** 238,246 **** tag = (Tag)node; ! h = tag.getAttributes(); ! a = (String)h.get(SpecialHashtable.TAGNAME); ! href = (String)h.get("HREF"); ! myValue = (String)h.get("MYPARAMETER"); ! nice = (String)h.get("YOURPARAMETER"); assertEquals ("The tagname should be G",a,"G"); assertEquals ("Check the http address",href,"http://www.iki.fi/kaila"); --- 234,241 ---- tag = (Tag)node; ! a = ((Attribute)(tag.getAttributesEx ().elementAt (0))).getName (); ! href = tag.getAttribute ("HREF"); ! myValue = tag.getAttribute ("MYPARAMETER"); ! nice = tag.getAttribute ("YOURPARAMETER"); assertEquals ("The tagname should be G",a,"G"); assertEquals ("Check the http address",href,"http://www.iki.fi/kaila"); *************** *** 306,312 **** tag = (Tag)node; ! h = tag.getAttributes(); ! a = (String)h.get(SpecialHashtable.TAGNAME); ! nice = (String)h.get("YOURPARAMETER"); assertEquals ("Link tag (A)",a,"A"); assertEquals ("yourParameter value","Kaarle",nice); --- 301,306 ---- tag = (Tag)node; ! a = ((Attribute)(tag.getAttributesEx ().elementAt (0))).getName (); ! nice = tag.getAttribute ("YOURPARAMETER"); assertEquals ("Link tag (A)",a,"A"); assertEquals ("yourParameter value","Kaarle",nice); *************** *** 363,373 **** // an alternate interpretation: assertEquals("Second tag should be corrected","font face=\"Arial,helvetica,\" sans-serif=\"sans-serif\" size=\"2\" color=\"#FFFFFF\"",fontTag.getText()); assertEquals("Second tag should be corrected","font face=\"Arial,\"helvetica,\" sans-serif=\"sans-serif\" size=\"2\" color=\"#FFFFFF\"",fontTag.getText()); ! // Try to parse the parameters from this tag. ! Hashtable table = fontTag.getAttributes(); ! assertNotNull("Parameters table",table); ! assertEquals("font sans-serif parameter","sans-serif",table.get("SANS-SERIF")); // an alternate interpretation: assertEquals("font face parameter","Arial,helvetica,",table.get("FACE")); // another: assertEquals("font face parameter","Arial,\"helvetica,",table.get("FACE")); ! assertEquals("font face parameter","Arial,",table.get("FACE")); } --- 357,364 ---- // an alternate interpretation: assertEquals("Second tag should be corrected","font face=\"Arial,helvetica,\" sans-serif=\"sans-serif\" size=\"2\" color=\"#FFFFFF\"",fontTag.getText()); assertEquals("Second tag should be corrected","font face=\"Arial,\"helvetica,\" sans-serif=\"sans-serif\" size=\"2\" color=\"#FFFFFF\"",fontTag.getText()); ! assertEquals("font sans-serif parameter","sans-serif",fontTag.getAttribute("SANS-SERIF")); // an alternate interpretation: assertEquals("font face parameter","Arial,helvetica,",table.get("FACE")); // another: assertEquals("font face parameter","Arial,\"helvetica,",table.get("FACE")); ! assertEquals("font face parameter","Arial,",fontTag.getAttribute("FACE")); } *************** *** 694,708 **** parser.setNodeFactory (new PrototypicalNodeFactory (true)); parseAndAssertNodeCount(1); ! // the node should be an HTMLTag ! assertTrue("Node should be a HTMLTag",node[0] instanceof Tag); Tag tag = (Tag)node[0]; assertEquals("Initial text should be","TABLE BORDER=0",tag.getText ()); ! ! Hashtable tempHash = tag.getAttributes (); ! tempHash.put ("BORDER","\"1\""); ! tag.setAttributes (tempHash); ! ! String s = tag.toHtml (); ! assertEquals("HTML should be","<TABLE BORDER=\"1\">", s); } } --- 685,694 ---- parser.setNodeFactory (new PrototypicalNodeFactory (true)); parseAndAssertNodeCount(1); ! // the node should be a Tag ! assertTrue("Node should be a Tag",node[0] instanceof Tag); Tag tag = (Tag)node[0]; assertEquals("Initial text should be","TABLE BORDER=0",tag.getText ()); ! tag.setAttribute ("BORDER","\"1\""); ! assertEquals("HTML should be","<TABLE BORDER=\"1\">", tag.toHtml ()); } } Index: FrameSetTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/FrameSetTagTest.java,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -d -r1.36 -r1.37 *** FrameSetTagTest.java 2 Jan 2004 16:24:57 -0000 1.36 --- FrameSetTagTest.java 2 Jul 2004 00:49:31 -0000 1.37 *************** *** 28,34 **** import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.FrameSetTag; import org.htmlparser.tags.FrameTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; --- 28,34 ---- import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.FrameSetTag; import org.htmlparser.tags.FrameTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; Index: JspTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/JspTagTest.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** JspTagTest.java 24 Jan 2004 17:14:47 -0000 1.44 --- JspTagTest.java 2 Jul 2004 00:49:31 -0000 1.45 *************** *** 29,34 **** import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.JspTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; --- 29,34 ---- import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.JspTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; |
From: Derrick O. <der...@us...> - 2004-07-02 00:49:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/util Modified Files: IteratorImpl.java ParserUtils.java Removed Files: PeekingIterator.java PeekingIteratorImpl.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. --- PeekingIterator.java DELETED --- --- PeekingIteratorImpl.java DELETED --- Index: ParserUtils.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** ParserUtils.java 24 May 2004 16:18:35 -0000 1.41 --- ParserUtils.java 2 Jul 2004 00:49:32 -0000 1.42 *************** *** 34,37 **** --- 34,38 ---- import org.htmlparser.NodeFilter; import org.htmlparser.Parser; + import org.htmlparser.Tag; import org.htmlparser.filters.NodeClassFilter; import org.htmlparser.filters.TagNameFilter; *************** *** 40,44 **** import org.htmlparser.lexer.Source; import org.htmlparser.tags.CompositeTag; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; --- 41,44 ---- *************** *** 722,729 **** // positions of begin and end tags ! int beginTagBegin = beginTag.getTagBegin(); ! int endTagBegin = beginTag.getTagEnd(); ! int beginTagEnd = endTag.getTagBegin(); ! int endTagEnd = endTag.getTagEnd(); if (insideTag) --- 722,729 ---- // positions of begin and end tags ! int beginTagBegin = beginTag.getStartPosition (); ! int endTagBegin = beginTag.getEndPosition (); ! int beginTagEnd = endTag.getStartPosition (); ! int endTagEnd = endTag.getEndPosition (); if (insideTag) *************** *** 847,854 **** // positions of begin and end tags ! int beginTagBegin = beginTag.getTagBegin(); ! int endTagBegin = beginTag.getTagEnd(); ! int beginTagEnd = endTag.getTagBegin(); ! int endTagEnd = endTag.getTagEnd(); if (insideTag) --- 847,854 ---- // positions of begin and end tags ! int beginTagBegin = beginTag.getStartPosition (); ! int endTagBegin = beginTag.getEndPosition (); ! int beginTagEnd = endTag.getStartPosition (); ! int endTagEnd = endTag.getEndPosition (); if (insideTag) *************** *** 951,958 **** // positions of begin and end tags ! int beginTagBegin = beginTag.getTagBegin(); ! int endTagBegin = beginTag.getTagEnd(); ! int beginTagEnd = endTag.getTagBegin(); ! int endTagEnd = endTag.getTagEnd(); if (insideTag) --- 951,959 ---- // positions of begin and end tags ! int beginTagBegin = beginTag.getStartPosition (); ! int endTagBegin = beginTag.getEndPosition (); ! int beginTagEnd = endTag.getStartPosition (); ! int endTagEnd = endTag.getEndPosition (); ! if (insideTag) *************** *** 1050,1057 **** // positions of begin and end tags ! int beginTagBegin = beginTag.getTagBegin(); ! int endTagBegin = beginTag.getTagEnd(); ! int beginTagEnd = endTag.getTagBegin(); ! int endTagEnd = endTag.getTagEnd(); if (insideTag) --- 1051,1058 ---- // positions of begin and end tags ! int beginTagBegin = beginTag.getStartPosition (); ! int endTagBegin = beginTag.getEndPosition (); ! int beginTagEnd = endTag.getStartPosition (); ! int endTagEnd = endTag.getEndPosition (); if (insideTag) *************** *** 1125,1136 **** CompositeTag jStartTag = (CompositeTag)links.elementAt(j); Tag jEndTag = (Tag)jStartTag.getEndTag(); ! int jStartTagBegin = jStartTag.getTagBegin(); ! int jEndTagEnd = jEndTag.getTagEnd(); for (int k=0; k<links.size(); k++) { CompositeTag kStartTag = (CompositeTag)links.elementAt(k); Tag kEndTag = (Tag)kStartTag.getEndTag(); ! int kStartTagBegin = kStartTag.getTagBegin(); ! int kEndTagEnd = kEndTag.getTagEnd(); if ((k!=j) && (kStartTagBegin>jStartTagBegin) && (kEndTagEnd<jEndTagEnd)) { --- 1126,1137 ---- CompositeTag jStartTag = (CompositeTag)links.elementAt(j); Tag jEndTag = (Tag)jStartTag.getEndTag(); ! int jStartTagBegin = jStartTag.getStartPosition (); ! int jEndTagEnd = jEndTag.getEndPosition (); for (int k=0; k<links.size(); k++) { CompositeTag kStartTag = (CompositeTag)links.elementAt(k); Tag kEndTag = (Tag)kStartTag.getEndTag(); ! int kStartTagBegin = kStartTag.getStartPosition (); ! int kEndTagEnd = kEndTag.getEndPosition (); if ((k!=j) && (kStartTagBegin>jStartTagBegin) && (kEndTagEnd<jEndTagEnd)) { *************** *** 1165,1176 **** CompositeTag jStartTag = (CompositeTag)links.elementAt(j); Tag jEndTag = (Tag)jStartTag.getEndTag(); ! int jStartTagBegin = jStartTag.getTagBegin(); ! int jEndTagEnd = jEndTag.getTagEnd(); for (int k=0; k<links.size(); k++) { CompositeTag kStartTag = (CompositeTag)links.elementAt(k); Tag kEndTag = (Tag)kStartTag.getEndTag(); ! int kStartTagBegin = kStartTag.getTagBegin(); ! int kEndTagEnd = kEndTag.getTagEnd(); if ((k!=j) && (kStartTagBegin>jStartTagBegin) && (kEndTagEnd<jEndTagEnd)) { --- 1166,1177 ---- CompositeTag jStartTag = (CompositeTag)links.elementAt(j); Tag jEndTag = (Tag)jStartTag.getEndTag(); ! int jStartTagBegin = jStartTag.getStartPosition (); ! int jEndTagEnd = jEndTag.getEndPosition (); for (int k=0; k<links.size(); k++) { CompositeTag kStartTag = (CompositeTag)links.elementAt(k); Tag kEndTag = (Tag)kStartTag.getEndTag(); ! int kStartTagBegin = kStartTag.getStartPosition (); ! int kEndTagEnd = kEndTag.getEndPosition (); if ((k!=j) && (kStartTagBegin>jStartTagBegin) && (kEndTagEnd<jEndTagEnd)) { Index: IteratorImpl.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/IteratorImpl.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** IteratorImpl.java 10 Jan 2004 15:23:33 -0000 1.40 --- IteratorImpl.java 2 Jul 2004 00:49:32 -0000 1.41 *************** *** 28,35 **** import org.htmlparser.Node; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; import org.htmlparser.scanners.Scanner; - import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeIterator; --- 28,35 ---- import org.htmlparser.Node; + import org.htmlparser.Tag; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; import org.htmlparser.scanners.Scanner; import org.htmlparser.util.NodeIterator; |
From: Derrick O. <der...@us...> - 2004-07-02 00:49:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/tests/visitorsTests Modified Files: TagFindingVisitorTest.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: TagFindingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/TagFindingVisitorTest.java,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** TagFindingVisitorTest.java 2 Jan 2004 16:24:57 -0000 1.17 --- TagFindingVisitorTest.java 2 Jul 2004 00:49:31 -0000 1.18 *************** *** 28,32 **** import org.htmlparser.Node; ! import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.TagFindingVisitor; --- 28,32 ---- import org.htmlparser.Node; ! import org.htmlparser.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.TagFindingVisitor; |
From: Derrick O. <der...@us...> - 2004-07-02 00:49:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/tests/lexerTests Modified Files: AttributeTests.java TagTests.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: AttributeTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/AttributeTests.java,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** AttributeTests.java 26 Jun 2004 11:56:08 -0000 1.16 --- AttributeTests.java 2 Jul 2004 00:49:30 -0000 1.17 *************** *** 33,40 **** import org.htmlparser.Attribute; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.lexer.PageAttribute; import org.htmlparser.tags.ImageTag; import org.htmlparser.tags.LinkTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.NodeIterator; --- 33,41 ---- import org.htmlparser.Attribute; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.lexer.PageAttribute; + import org.htmlparser.nodes.TagNode; import org.htmlparser.tags.ImageTag; import org.htmlparser.tags.LinkTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.NodeIterator; *************** *** 51,55 **** private static final boolean JSP_TESTS_ENABLED = false; private Tag tag; ! private Hashtable table; public AttributeTests (String name) { --- 52,56 ---- private static final boolean JSP_TESTS_ENABLED = false; private Tag tag; ! private Vector attributes; public AttributeTests (String name) { *************** *** 67,71 **** NodeIterator iterator; Node node; - Vector attributes; html = "<" + tagContents + ">"; --- 68,71 ---- *************** *** 98,105 **** System.out.println (); } - table = tag.getAttributes (); } else ! table = null; String string = node.toHtml (); assertEquals ("toHtml differs", html, string); --- 98,104 ---- System.out.println (); } } else ! attributes = null; String string = node.toHtml (); assertEquals ("toHtml differs", html, string); *************** *** 134,138 **** // String String, String, char attributes.add (new Attribute ("name", "=", "topFrame", '"')); ! tag = new Tag (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); --- 133,137 ---- // String String, String, char attributes.add (new Attribute ("name", "=", "topFrame", '"')); ! tag = new TagNode (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); *************** *** 191,195 **** assertTrue ("should not be empty", !attribute.isEmpty ()); attributes.add (attribute); ! tag = new Tag (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); --- 190,194 ---- assertTrue ("should not be empty", !attribute.isEmpty ()); attributes.add (attribute); ! tag = new TagNode (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); *************** *** 218,222 **** // String String, String, char attributes.add (new PageAttribute ("name", "=", "topFrame", '"')); ! tag = new Tag (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); --- 217,221 ---- // String String, String, char attributes.add (new PageAttribute ("name", "=", "topFrame", '"')); ! tag = new TagNode (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); *************** *** 279,283 **** assertTrue ("should not be empty", !attribute.isEmpty ()); attributes.add (attribute); ! tag = new Tag (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); --- 278,282 ---- assertTrue ("should not be empty", !attribute.isEmpty ()); attributes.add (attribute); ! tag = new TagNode (null, 0, 0, attributes); html = "<wombat label=\"The civil war.\" frameborder= no name=\"topFrame\">"; assertStringEquals ("tag contents", html, tag.toHtml ()); *************** *** 289,293 **** public void testParseParameters() { getParameterTableFor("a b = \"c\""); ! assertEquals("Value","c",table.get("B")); } --- 288,292 ---- public void testParseParameters() { getParameterTableFor("a b = \"c\""); ! assertEquals("Value","c",((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 297,301 **** public void testParseTokenValues() { getParameterTableFor("a b = \"'\""); ! assertEquals("Value","'",table.get("B")); } --- 296,300 ---- public void testParseTokenValues() { getParameterTableFor("a b = \"'\""); ! assertEquals("Value","'",((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 305,309 **** public void testParseEmptyValues() { getParameterTableFor("a b = \"\""); ! assertEquals("Value","",table.get("B")); } --- 304,308 ---- public void testParseEmptyValues() { getParameterTableFor("a b = \"\""); ! assertEquals("Value","",((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 315,320 **** public void testParseMissingEqual() { getParameterTableFor("a b\"c\""); ! assertEquals("ValueB",null,table.get("B")); ! assertTrue("NameC",table.containsKey("B\"C\"")); } --- 314,318 ---- public void testParseMissingEqual() { getParameterTableFor("a b\"c\""); ! assertEquals("NameC", "b\"c\"", ((Attribute)(attributes.elementAt (2))).getName ()); } *************** *** 324,329 **** public void testTwoParams(){ getParameterTableFor("PARAM NAME=\"Param1\" VALUE=\"Somik\""); ! assertEquals("Param1","Param1",table.get("NAME")); ! assertEquals("Somik","Somik",table.get("VALUE")); } --- 322,327 ---- public void testTwoParams(){ getParameterTableFor("PARAM NAME=\"Param1\" VALUE=\"Somik\""); ! assertEquals("Param1","Param1",((Attribute)(attributes.elementAt (2))).getValue ()); ! assertEquals("Somik","Somik",((Attribute)(attributes.elementAt (4))).getValue ()); } *************** *** 333,338 **** public void testPlainParams(){ getParameterTableFor("PARAM NAME=Param1 VALUE=Somik"); ! assertEquals("Param1","Param1",table.get("NAME")); ! assertEquals("Somik","Somik",table.get("VALUE")); } --- 331,336 ---- public void testPlainParams(){ getParameterTableFor("PARAM NAME=Param1 VALUE=Somik"); ! assertEquals("Param1","Param1",((Attribute)(attributes.elementAt (2))).getValue ()); ! assertEquals("Somik","Somik",((Attribute)(attributes.elementAt (4))).getValue ()); } *************** *** 342,350 **** public void testValueMissing() { getParameterTableFor("INPUT type=\"checkbox\" name=\"Authorize\" value=\"Y\" checked"); ! assertEquals("Name of Tag","INPUT",table.get(SpecialHashtable.TAGNAME)); ! assertEquals("Type","checkbox",table.get("TYPE")); ! assertEquals("Name","Authorize",table.get("NAME")); ! assertEquals("Value","Y",table.get("VALUE")); ! assertEquals("Checked",null,table.get("CHECKED")); } --- 340,348 ---- public void testValueMissing() { getParameterTableFor("INPUT type=\"checkbox\" name=\"Authorize\" value=\"Y\" checked"); ! assertEquals("Name of Tag","INPUT",((Attribute)(attributes.elementAt (0))).getName ()); ! assertEquals("Type","checkbox",((Attribute)(attributes.elementAt (2))).getValue ()); ! assertEquals("Name","Authorize",((Attribute)(attributes.elementAt (4))).getValue ()); ! assertEquals("Value","Y",((Attribute)(attributes.elementAt (6))).getValue ()); ! assertEquals("Checked",null,((Attribute)(attributes.elementAt (8))).getValue ()); } *************** *** 357,367 **** getParameterTableFor("TEXTAREA name=\"Remarks\" "); // There should only be two keys.. ! assertEquals("There should only be two keys",2,table.size()); // The first key is name ! String key1 = "NAME"; ! String value1 = (String)table.get(key1); ! assertEquals("Expected value 1", "Remarks",value1); ! String key2 = SpecialHashtable.TAGNAME; ! assertEquals("Expected Value 2","TEXTAREA",table.get(key2)); } --- 355,362 ---- getParameterTableFor("TEXTAREA name=\"Remarks\" "); // There should only be two keys.. ! assertEquals("There should only be two attributes",4,attributes.size()); // The first key is name ! assertEquals("Expected name","TEXTAREA",((Attribute)(attributes.elementAt (0))).getName ()); ! assertEquals("Expected value 1", "Remarks",((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 371,376 **** public void testNullTag(){ getParameterTableFor("INPUT type="); ! assertEquals("Name of Tag","INPUT",table.get(SpecialHashtable.TAGNAME)); ! assertEquals("Type","",table.get("TYPE")); } --- 366,371 ---- public void testNullTag(){ getParameterTableFor("INPUT type="); ! assertEquals("Name of Tag","INPUT",((Attribute)(attributes.elementAt (0))).getName ()); ! assertNull("Type",((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 385,389 **** "href", "/news/866201.asp?0sl=-32", ! (String)table.get("HREF") ); } --- 380,384 ---- "href", "/news/866201.asp?0sl=-32", ! ((Attribute)(attributes.elementAt (4))).getValue () ); } *************** *** 399,408 **** "href", "mailto:sa...@ne...?subject=Site Comments", ! (String)table.get("HREF") ); assertStringEquals( "tag name", ! "A", ! (String)table.get(SpecialHashtable.TAGNAME) ); } --- 394,403 ---- "href", "mailto:sa...@ne...?subject=Site Comments", ! ((Attribute)(attributes.elementAt (2))).getValue () ); assertStringEquals( "tag name", ! "a", ! ((Attribute)(attributes.elementAt (0))).getName () ); } *************** *** 420,424 **** public void testEmptyTag () { getParameterTableFor(""); ! assertNull ("<> is not a tag",table); } --- 415,419 ---- public void testEmptyTag () { getParameterTableFor(""); ! assertNull ("<> is not a tag",attributes); } *************** *** 437,441 **** "href", "<%=Application(\"sURL\")%>/literature/index.htm", ! (String)table.get("HREF") ); } --- 432,436 ---- "href", "<%=Application(\"sURL\")%>/literature/index.htm", ! ((Attribute)(attributes.elementAt (2))).getValue () ); } *************** *** 448,455 **** public void testScriptedTag () { getParameterTableFor("body onLoad=defaultStatus=''"); ! String name = (String)table.get(SpecialHashtable.TAGNAME); assertNotNull ("No Tag.TAGNAME", name); ! assertStringEquals("tag name parsed incorrectly", "BODY", name); ! String value = (String)table.get ("ONLOAD"); assertStringEquals ("parameter parsed incorrectly", "defaultStatus=''", value); } --- 443,450 ---- public void testScriptedTag () { getParameterTableFor("body onLoad=defaultStatus=''"); ! String name = ((Attribute)(attributes.elementAt (0))).getName (); assertNotNull ("No Tag.TAGNAME", name); ! assertStringEquals("tag name parsed incorrectly", "body", name); ! String value = ((Attribute)(attributes.elementAt (2))).getValue (); assertStringEquals ("parameter parsed incorrectly", "defaultStatus=''", value); } *************** *** 463,468 **** { getParameterTableFor ("INPUT DISABLED"); ! assertTrue ("Standalone attribute has no entry in table keyset",table.containsKey("DISABLED")); ! assertNull ("Standalone attribute has non-null value",(String)table.get("DISABLED")); } --- 458,463 ---- { getParameterTableFor ("INPUT DISABLED"); ! assertStringEquals("Standalone attribute not parsed","DISABLED",((Attribute)(attributes.elementAt (2))).getName ()); ! assertNull ("Standalone attribute has non-null value",((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 473,478 **** { getParameterTableFor ("INPUT DISABLED="); ! assertTrue ("Attribute has no entry in table keyset",table.containsKey("DISABLED")); ! assertEquals ("Attribute has non-blank value","",(String)table.get("DISABLED")); } --- 468,473 ---- { getParameterTableFor ("INPUT DISABLED="); ! assertStringEquals("Empty attribute has no attribute","DISABLED",((Attribute)(attributes.elementAt (2))).getName ()); ! assertEquals ("Attribute has non-blank value",null,((Attribute)(attributes.elementAt (2))).getValue ()); } *************** *** 484,490 **** { getParameterTableFor ("tag att = other=fred"); ! assertTrue ("Attribute missing", table.containsKey ("ATT")); ! assertEquals ("Attribute has wrong value", "other=fred", (String)table.get ("ATT")); ! assertTrue ("No attribute should be called equal sign", !table.containsKey ("=")); } --- 479,486 ---- { getParameterTableFor ("tag att = other=fred"); ! assertStringEquals("Attribute not parsed","att",((Attribute)(attributes.elementAt (2))).getName ()); ! assertEquals ("Attribute has wrong value", "other=fred", ((Attribute)(attributes.elementAt (2))).getValue ()); ! for (int i = 0; i < attributes.size (); i++) ! assertTrue ("No attribute should be called =", !((Attribute)(attributes.elementAt (2))).getName ().equals ("=")); } *************** *** 495,503 **** { getParameterTableFor ("tag att =value other=fred"); ! assertTrue ("Attribute missing", table.containsKey ("ATT")); ! assertEquals ("Attribute has wrong value", "value", (String)table.get ("ATT")); ! assertTrue ("No attribute should be called =value", !table.containsKey ("=VALUE")); ! assertTrue ("Attribute missing", table.containsKey ("OTHER")); ! assertEquals ("Attribute has wrong value", "fred", (String)table.get ("OTHER")); } --- 491,500 ---- { getParameterTableFor ("tag att =value other=fred"); ! assertStringEquals("Attribute not parsed","att",((Attribute)(attributes.elementAt (2))).getName ()); ! assertEquals ("Attribute has wrong value", "value", ((Attribute)(attributes.elementAt (2))).getValue ()); ! for (int i = 0; i < attributes.size (); i++) ! assertTrue ("No attribute should be called =value", !((Attribute)(attributes.elementAt (2))).getName ().equals ("=value")); ! assertStringEquals("Empty attribute not parsed","other",((Attribute)(attributes.elementAt (4))).getName ()); ! assertEquals ("Attribute has wrong value", "fred", ((Attribute)(attributes.elementAt (4))).getValue ()); } *************** *** 508,516 **** { getParameterTableFor ("tag att= \"value\" other=fred"); ! assertTrue ("Attribute missing", table.containsKey ("ATT")); ! assertEquals ("Attribute has wrong value", "value", (String)table.get ("ATT")); ! assertTrue ("No attribute should be called \"value\"", !table.containsKey ("\"VALUE\"")); ! assertTrue ("Attribute missing", table.containsKey ("OTHER")); ! assertEquals ("Attribute has wrong value", "fred", (String)table.get ("OTHER")); } --- 505,514 ---- { getParameterTableFor ("tag att= \"value\" other=fred"); ! assertStringEquals("Attribute not parsed","att",((Attribute)(attributes.elementAt (2))).getName ()); ! assertEquals ("Attribute has wrong value", "value", ((Attribute)(attributes.elementAt (2))).getValue ()); ! for (int i = 0; i < attributes.size (); i++) ! assertTrue ("No attribute should be called \"value\"", !((Attribute)(attributes.elementAt (2))).getName ().equals ("\"value\"")); ! assertStringEquals("Empty attribute not parsed","other",((Attribute)(attributes.elementAt (4))).getName ()); ! assertEquals ("Attribute has wrong value", "fred", ((Attribute)(attributes.elementAt (4))).getValue ()); } *************** *** 522,532 **** { getParameterTableFor ("tag att=\"va\"lue\" other=fred"); ! assertTrue ("Attribute missing", table.containsKey ("ATT")); ! assertEquals ("Attribute has wrong value", "va", (String)table.get ("ATT")); ! assertTrue ("No attribute should be called va\"lue", !table.containsKey ("VA\"LUE")); ! assertTrue ("Attribute missing", table.containsKey ("LUE\"")); ! assertNull ("Attribute has wrong value", table.get ("LUE\"")); ! assertTrue ("Attribute missing", table.containsKey ("OTHER")); ! assertEquals ("Attribute has wrong value", "fred", (String)table.get ("OTHER")); } --- 520,532 ---- { getParameterTableFor ("tag att=\"va\"lue\" other=fred"); ! assertStringEquals("Attribute not parsed","att",((Attribute)(attributes.elementAt (2))).getName ()); ! assertEquals ("Attribute has wrong value", "va", ((Attribute)(attributes.elementAt (2))).getValue ()); ! for (int i = 0; i < attributes.size (); i++) ! assertTrue ("No attribute should be called va\"lue", !((Attribute)(attributes.elementAt (2))).getName ().equals ("va\"lue")); ! assertStringEquals("Attribute missing","att",((Attribute)(attributes.elementAt (2))).getName ()); ! assertStringEquals("Attribute not parsed","lue\"",((Attribute)(attributes.elementAt (3))).getName ()); ! assertNull ("Attribute has wrong value", ((Attribute)(attributes.elementAt (3))).getValue ()); ! assertStringEquals("Empty attribute not parsed","other",((Attribute)(attributes.elementAt (5))).getName ()); ! assertEquals ("Attribute has wrong value", "fred", ((Attribute)(attributes.elementAt (5))).getValue ()); } *************** *** 538,548 **** { getParameterTableFor ("tag att='va'lue' other=fred"); ! assertTrue ("Attribute missing", table.containsKey ("ATT")); ! assertEquals ("Attribute has wrong value", "va", (String)table.get ("ATT")); ! assertTrue ("No attribute should be called va'lue", !table.containsKey ("VA'LUE")); ! assertTrue ("Attribute missing", table.containsKey ("LUE'")); ! assertNull ("Attribute has wrong value", table.get ("LUE'")); ! assertTrue ("Attribute missing", table.containsKey ("OTHER")); ! assertEquals ("Attribute has wrong value", "fred", (String)table.get ("OTHER")); } --- 538,549 ---- { getParameterTableFor ("tag att='va'lue' other=fred"); ! assertStringEquals("Attribute not parsed","att",((Attribute)(attributes.elementAt (2))).getName ()); ! assertEquals ("Attribute has wrong value", "va", ((Attribute)(attributes.elementAt (2))).getValue ()); ! for (int i = 0; i < attributes.size (); i++) ! assertTrue ("No attribute should be called va'lue", !((Attribute)(attributes.elementAt (2))).getName ().equals ("va'lue")); ! assertStringEquals("Attribute not parsed","lue'",((Attribute)(attributes.elementAt (3))).getName ()); ! assertNull ("Attribute has wrong value", ((Attribute)(attributes.elementAt (3))).getValue ()); ! assertStringEquals("Empty attribute not parsed","other",((Attribute)(attributes.elementAt (5))).getName ()); ! assertEquals ("Attribute has wrong value", "fred", ((Attribute)(attributes.elementAt (5))).getValue ()); } Index: TagTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/TagTests.java,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** TagTests.java 16 Jun 2004 02:17:26 -0000 1.10 --- TagTests.java 2 Jul 2004 00:49:30 -0000 1.11 *************** *** 32,38 **** import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.tags.LinkTag; import org.htmlparser.tags.MetaTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; --- 32,38 ---- import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; + import org.htmlparser.Tag; import org.htmlparser.tags.LinkTag; import org.htmlparser.tags.MetaTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; |
From: Derrick O. <der...@us...> - 2004-07-02 00:49:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32670/src/org/htmlparser/tests/utilTests Modified Files: CharacterTranslationTest.java Log Message: Part four of a multiphase refactoring. Most internals now use the Tag interface. This interface has been broadened to add set/get scanner and set/get endtag. Removed the org.htmlparser.tags.Tag class and moved the remaining (minor) functionality to the TagNode class. So now tags inherit directly from TagNode or CompositeTag. ** NOTE: If you have subclassed org.htmlparser.tags.Tag, use org.htmlparser.nodes.TagNode now.** Removed deprecated methods getTagBegin/getTagEnd and deleted unused classes: PeekingIterator and it's Implementation. Index: CharacterTranslationTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/CharacterTranslationTest.java,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** CharacterTranslationTest.java 24 May 2004 16:18:34 -0000 1.43 --- CharacterTranslationTest.java 2 Jul 2004 00:49:31 -0000 1.44 *************** *** 46,52 **** import org.htmlparser.Parser; import org.htmlparser.Remark; import org.htmlparser.Text; import org.htmlparser.tags.LinkTag; - import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.CharacterReference; --- 46,52 ---- import org.htmlparser.Parser; import org.htmlparser.Remark; + import org.htmlparser.Tag; import org.htmlparser.Text; import org.htmlparser.tags.LinkTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.CharacterReference; |