htmlparser-cvs Mailing List for HTML Parser (Page 47)
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
| 2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
| 2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: <der...@us...> - 2003-08-23 17:24:47
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1:/tmp/cvs-serv20167 Modified Files: AbstractNode.java Node.java RemarkNode.java StringNode.java Log Message: Sixth drop for new i/o subsystem. Isolated htmllexer.jar file and made it compileable and runnable on JDK 1.1 systems. The build.xml file now has four new targets for separate compiling and jaring of the lexer and parser. Significantly refactored the existing Node interface and AbstractNode class to achieve isolation. They now support get/setChildren(), rather than CompositeTag. Various scanners that were directly accessing the childTags node list were affected. The get/setParent is now a generic Node rather than a CompositeTag. The visitor accept() signature was changed to Object to avoid dragging in visitors code. This was *not* changed on classes derived from Tag, although it could be. ChainedException now uses/returns a Vector. Removed the cruft from lexer nodes where possible. Index: AbstractNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/AbstractNode.java,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** AbstractNode.java 11 Aug 2003 00:18:28 -0000 1.7 --- AbstractNode.java 23 Aug 2003 17:14:44 -0000 1.8 *************** *** 29,37 **** package org.htmlparser; ! import java.io.*; ! import org.htmlparser.tags.*; ! import org.htmlparser.util.*; ! import org.htmlparser.visitors.*; /** --- 29,35 ---- package org.htmlparser; ! import java.io.Serializable; ! import org.htmlparser.util.NodeList; /** *************** *** 50,60 **** /** ! * If parent of this tag */ ! protected CompositeTag parent = null; ! ! public AbstractNode(int nodeBegin, int nodeEnd) { ! this.nodeBegin = nodeBegin; ! this.nodeEnd = nodeEnd; } --- 48,70 ---- /** ! * The parent of this node. */ ! protected Node parent; ! ! /** ! * The children of this node. ! */ ! protected NodeList children; ! ! /** ! * Create an abstract node with the page positions given. ! * @param begin The starting position of the node. ! * @param end The ending position of the node. ! */ ! public AbstractNode (int begin, int end) ! { ! nodeBegin = begin; ! nodeEnd = end; ! parent = null; } *************** *** 166,170 **** } ! public abstract void accept(NodeVisitor visitor); /** --- 176,180 ---- } ! public abstract void accept(Object visitor); /** *************** *** 176,195 **** /** ! * Get the parent of this tag * @return The parent of this node, if it's been set, <code>null</code> otherwise. */ ! public CompositeTag getParent() { ! return parent; } ! /** ! * Sets the parent of this tag ! * @param tag */ ! public void setParent(CompositeTag tag) { ! parent = tag; } ! /** * Returns the text of the string line */ --- 186,228 ---- /** ! * Get the parent of this node. ! * This will always return null when parsing without scanners, ! * i.e. if semantic parsing was not performed. ! * The object returned from this method can be safely cast to a <code>CompositeTag</code>. * @return The parent of this node, if it's been set, <code>null</code> otherwise. */ ! public Node getParent () ! { ! return (parent); } ! /** ! * Sets the parent of this node. ! * @param node The node that contains this node. Must be a <code>CompositeTag</code>. */ ! public void setParent (Node node) ! { ! parent = node; } ! /** ! * Get the children of this node. ! * @return The list of children contained by this node, if it's been set, <code>null</code> otherwise. ! */ ! public NodeList getChildren () ! { ! return (children); ! } ! ! /** ! * Set the children of this node. ! * @param children The new list of children this node contains. ! */ ! public void setChildren (NodeList children) ! { ! this.children = children; ! } ! ! /** * Returns the text of the string line */ Index: Node.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v retrieving revision 1.33 retrieving revision 1.34 diff -C2 -d -r1.33 -r1.34 *** Node.java 11 Aug 2003 00:18:28 -0000 1.33 --- Node.java 23 Aug 2003 17:14:44 -0000 1.34 *************** *** 31,37 **** package org.htmlparser; - import org.htmlparser.tags.CompositeTag; import org.htmlparser.util.NodeList; - import org.htmlparser.visitors.NodeVisitor; public interface Node { --- 31,35 ---- *************** *** 128,145 **** */ public abstract int elementEnd(); ! public abstract void accept(NodeVisitor visitor); ! /** ! * Get the parent of this tag * @return The parent of this node, if it's been set, <code>null</code> otherwise. */ ! public abstract CompositeTag getParent(); ! /** ! * Sets the parent of this tag ! * @param tag */ ! public abstract void setParent(CompositeTag tag); ! /** * Returns the text of the string line --- 126,159 ---- */ public abstract int elementEnd(); ! ! public abstract void accept(Object visitor); /** ! * Get the parent of this node. ! * This will always return null when parsing without scanners, ! * i.e. if semantic parsing was not performed. ! * The object returned from this method can be safely cast to a <code>CompositeTag</code>. * @return The parent of this node, if it's been set, <code>null</code> otherwise. */ ! public abstract Node getParent (); ! ! /** ! * Sets the parent of this node. ! * @param node The node that contains this node. Must be a <code>CompositeTag</code>. */ ! public abstract void setParent (Node node); ! ! /** ! * Get the children of this node. ! * @return The list of children contained by this node, if it's been set, <code>null</code> otherwise. ! */ ! public abstract NodeList getChildren (); ! ! /** ! * Set the children of this node. ! * @param children The new list of children this node contains. ! */ ! public abstract void setChildren (NodeList children); ! /** * Returns the text of the string line Index: RemarkNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/RemarkNode.java,v retrieving revision 1.24 retrieving revision 1.25 diff -C2 -d -r1.24 -r1.25 *** RemarkNode.java 11 Aug 2003 00:18:28 -0000 1.24 --- RemarkNode.java 23 Aug 2003 17:14:44 -0000 1.25 *************** *** 83,88 **** } ! public void accept(NodeVisitor visitor) { ! visitor.visitRemarkNode(this); } --- 83,88 ---- } ! public void accept(Object visitor) { ! ((NodeVisitor)visitor).visitRemarkNode(this); } Index: StringNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/StringNode.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** StringNode.java 11 Aug 2003 00:18:28 -0000 1.32 --- StringNode.java 23 Aug 2003 17:14:44 -0000 1.33 *************** *** 88,93 **** } ! public void accept(NodeVisitor visitor) { ! visitor.visitStringNode(this); } } --- 88,93 ---- } ! public void accept(Object visitor) { ! ((NodeVisitor)visitor).visitStringNode(this); } } |
|
From: <der...@us...> - 2003-08-23 01:33:10
|
Update of /cvsroot/htmlparser/htmlparser/src
In directory sc8-pr-cvs1:/tmp/cvs-serv23027/src
Removed Files:
ExceptionMessages_en_US.properties
ExceptionMessages_ja_JP.properties Manifest.mf
Log Message:
Fifth drop for new i/o subsystem.
There is now a mainline for the lexer.
Try:
java -jar lexer.jar http://whatever
or the integration build has a new lexer execution script:
bin/lexer http://whatever
--- ExceptionMessages_en_US.properties DELETED ---
--- ExceptionMessages_ja_JP.properties DELETED ---
--- Manifest.mf DELETED ---
|
|
From: <der...@us...> - 2003-08-23 01:33:09
|
Update of /cvsroot/htmlparser/htmlparser/resources
In directory sc8-pr-cvs1:/tmp/cvs-serv23027/resources
Added Files:
lexer runLexer.bat
Removed Files:
Manifest.mf
Log Message:
Fifth drop for new i/o subsystem.
There is now a mainline for the lexer.
Try:
java -jar lexer.jar http://whatever
or the integration build has a new lexer execution script:
bin/lexer http://whatever
--- NEW FILE: lexer ---
#! /bin/sh
if [ -z "$HTMLPARSER_HOME" ] ; then
## resolve links - $0 may be a link to the home
PRG="$0"
progname=`basename "$0"`
saveddir=`pwd`
# need this for relative symlinks
dirname_prg=`dirname "$PRG"`
cd "$dirname_prg"
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`"/$link"
fi
done
HTMLPARSER_HOME=`dirname "$PRG"`/..
cd "$saveddir"
# make it fully qualified
HTMLPARSER_HOME=`cd "$HTMLPARSER_HOME" && pwd`
fi
if [ -z "$JAVACMD" ] ; then
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
else
JAVACMD="$JAVA_HOME/bin/java"
fi
else
JAVACMD=`which java 2> /dev/null `
if [ -z "$JAVACMD" ] ; then
JAVACMD=java
fi
fi
fi
if [ ! -x "$JAVACMD" ] ; then
echo "Error: JAVA_HOME is not defined correctly."
echo " We cannot execute $JAVACMD"
exit 1
fi
if [ -n "$CLASSPATH" ] ; then
LOCALCLASSPATH="$CLASSPATH"
fi
HTMLPARSER_LIB="${HTMLPARSER_HOME}/lib"
# add in the lexer .jar file
if [ -z "$LOCALCLASSPATH" ] ; then
LOCALCLASSPATH="${HTMLPARSER_LIB}/htmllexer.jar"
else
LOCALCLASSPATH="${HTMLPARSER_LIB}/htmllexer.jar":"$LOCALCLASSPATH"
fi
# handle 1.1x JDKs
if [ -n "$JAVA_HOME" ] ; then
if [ -f "$JAVA_HOME/lib/classes.zip" ] ; then
LOCALCLASSPATH="$LOCALCLASSPATH:$JAVA_HOME/lib/classes.zip"
fi
fi
"$JAVACMD" -classpath "$LOCALCLASSPATH" org.htmlparser.lexer.Lexer "$@"
--- NEW FILE: runLexer.bat ---
java -jar ..\lib\htmlparser.jar org.htmlparser.lexer.Lexer %1 %2
--- Manifest.mf DELETED ---
|
|
From: <der...@us...> - 2003-08-23 01:33:09
|
Update of /cvsroot/htmlparser/htmlparser
In directory sc8-pr-cvs1:/tmp/cvs-serv23027
Modified Files:
build.xml
Log Message:
Fifth drop for new i/o subsystem.
There is now a mainline for the lexer.
Try:
java -jar lexer.jar http://whatever
or the integration build has a new lexer execution script:
bin/lexer http://whatever
Index: build.xml
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** build.xml 11 Aug 2003 03:53:31 -0000 1.41
--- build.xml 23 Aug 2003 01:33:06 -0000 1.42
***************
*** 2,6 ****
Build Procedure
- cd htmlparser
! - 'ant jar' generates new htmlparser.jar in htmlparser/release/htmlparser1_4/lib
Release Procedure
--- 2,6 ----
Build Procedure
- cd htmlparser
! - 'ant jars' generates new htmlparser.jar and htmllexer.jar in htmlparser/release/htmlparser1_4/lib
Release Procedure
***************
*** 197,219 ****
<!-- Compile the java code in ${src} -->
! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**,org/htmlparser/util/Generate.java,org/htmlparser/lexer/**" debug="on" classpath="src:${commons-logging.jar}" />
</target>
! <!-- Create the distribution of htmlparser.jar -->
! <target name="jar" depends="compile" description="create htmlparser.jar">
<echo message="**********************************"/>
<echo message="* Creating htmlparser.jar.... *"/>
<echo message="**********************************"/>
- <!-- Create the distribution directory -->
- <mkdir dir="${dist}/lib"/>
-
<!-- Put classes and images into the htmlparser.jar file -->
<jar jarfile="${dist}/lib/htmlparser.jar"
basedir="${src}"
includes="**/*.class **/*.gif"
! excludes="org/htmlparser/tests/**/*.class,org/htmlparser/util/Generate.class"
! manifest="${resources}/Manifest.mf">
<manifest>
<section name="org/htmlparser/Parser.class">
<attribute name="Java-Bean" value="True"/>
--- 197,219 ----
<!-- Compile the java code in ${src} -->
! <javac srcdir="${src}" includes="org/htmlparser/**" excludes="org/htmlparser/tests/**,org/htmlparser/util/Generate.java" debug="on" classpath="src:${commons-logging.jar}" />
</target>
! <!-- Create the distribution of htmlparser.jar and htmllexer.jar -->
! <target name="jars" depends="compile" description="create htmlparser.jar and htmllexer.jar">
! <!-- Create the distribution directory -->
! <mkdir dir="${dist}/lib"/>
!
<echo message="**********************************"/>
<echo message="* Creating htmlparser.jar.... *"/>
<echo message="**********************************"/>
<!-- Put classes and images into the htmlparser.jar file -->
<jar jarfile="${dist}/lib/htmlparser.jar"
basedir="${src}"
includes="**/*.class **/*.gif"
! excludes="org/htmlparser/tests/**/*.class,org/htmlparser/util/Generate.class">
<manifest>
+ <attribute name="Main-Class" value="org.htmlparser.Parser"/>
<section name="org/htmlparser/Parser.class">
<attribute name="Java-Bean" value="True"/>
***************
*** 233,240 ****
</manifest>
</jar>
</target>
<!-- Run the unit tests -->
! <target name="test" depends="jar" description="run the JUnit tests">
<echo message="**********************************"/>
<echo message="* Running unit tests.... *"/>
--- 233,261 ----
</manifest>
</jar>
+
+ <echo message="**********************************"/>
+ <echo message="* Creating htmllexer.jar.... *"/>
+ <echo message="**********************************"/>
+
+ <!-- Put classes and images into the htmllexer.jar file -->
+ <jar jarfile="${dist}/lib/htmllexer.jar"
+ basedir="${src}">
+ <include name="org/htmlparser/lexer/**/*.class"/>
+ <include name="org/htmlparser/AbstractNode.class"/>
+ <include name="org/htmlparser/Node.class"/>
+ <include name="org/htmlparser/util/ParserException.class"/>
+ <include name="org/htmlparser/util/ChainedException.class"/>
+ <include name="org/htmlparser/util/sort/**/*.class"/>
+ <!-- to be removed -->
+ <include name="org/htmlparser/parserHelper/SpecialHashtable.class"/>
+ <manifest>
+ <attribute name="Main-Class" value="org.htmlparser.lexer.Lexer"/>
+ </manifest>
+ </jar>
+
</target>
<!-- Run the unit tests -->
! <target name="test" depends="jars" description="run the JUnit tests">
<echo message="**********************************"/>
<echo message="* Running unit tests.... *"/>
***************
*** 331,335 ****
<!-- The release directory structuring finishes here -->
! <target name="Release" depends="versionSource,jar,javadoc,CopyBatch" description="prepare the release files">
</target>
--- 352,356 ----
<!-- The release directory structuring finishes here -->
! <target name="Release" depends="versionSource,jars,javadoc,CopyBatch" description="prepare the release files">
</target>
|
|
From: <der...@us...> - 2003-08-23 01:33:09
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1:/tmp/cvs-serv23027/src/org/htmlparser/lexer
Modified Files:
Lexer.java Page.java
Log Message:
Fifth drop for new i/o subsystem.
There is now a mainline for the lexer.
Try:
java -jar lexer.jar http://whatever
or the integration build has a new lexer execution script:
bin/lexer http://whatever
Index: Lexer.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** Lexer.java 21 Aug 2003 01:52:23 -0000 1.2
--- Lexer.java 23 Aug 2003 01:33:06 -0000 1.3
***************
*** 26,33 ****
--- 26,39 ----
// CA 94708, USA
// Website : http://www.industriallogic.com
+ //
+ // This class was contributed by
+ // Derrick Oswald
+ //
package org.htmlparser.lexer;
+ import java.io.IOException;
import java.io.UnsupportedEncodingException;
+ import java.net.URL;
import java.net.URLConnection;
import java.util.Vector;
***************
*** 546,548 ****
--- 552,580 ----
}
+ /**
+ * Mainline for command line operation
+ */
+ public static void main (String[] args) throws IOException, ParserException
+ {
+ URL url;
+ Lexer lexer;
+ Node node;
+
+ if (0 >= args.length)
+ System.out.println ("usage: java -jar htmllexer.jar <url>");
+ else
+ {
+ url = new URL (args[0]);
+ try
+ {
+ lexer = new Lexer (url.openConnection ());
+ while (null != (node = lexer.nextNode ()))
+ System.out.println (node.toString ());
+ }
+ catch (ParserException pe)
+ {
+ System.out.println (pe.getMessage ());
+ }
+ }
+ }
}
Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** Page.java 21 Aug 2003 01:52:23 -0000 1.6
--- Page.java 23 Aug 2003 01:33:06 -0000 1.7
***************
*** 26,29 ****
--- 26,33 ----
// CA 94708, USA
// Website : http://www.industriallogic.com
+ //
+ // This class was contributed by
+ // Derrick Oswald
+ //
package org.htmlparser.lexer;
***************
*** 39,45 ****
import java.net.URLConnection;
import java.net.UnknownHostException;
!
! import org.apache.commons.logging.Log;
! import org.apache.commons.logging.LogFactory;
import org.htmlparser.util.ParserException;
--- 43,47 ----
import java.net.URLConnection;
import java.net.UnknownHostException;
! import java.util.Random;
import org.htmlparser.util.ParserException;
***************
*** 61,69 ****
/**
- * The logging object.
- */
- protected static Log mLog = null;
-
- /**
* The source of characters.
*/
--- 63,66 ----
***************
*** 113,117 ****
catch (UnknownHostException uhe)
{
! throw new ParserException ("the host (" + connection.getURL ().getHost () + ") was not found", uhe);
}
catch (IOException ioe)
--- 110,116 ----
catch (UnknownHostException uhe)
{
! Random number = new Random ();
! int message = number.nextInt (mFourOhFour.length);
! throw new ParserException (mFourOhFour[message], uhe);
}
catch (IOException ioe)
***************
*** 348,352 ****
if (!ret.equalsIgnoreCase (content))
{
! getLog ().info (
"detected charset \""
+ content
--- 347,351 ----
if (!ret.equalsIgnoreCase (content))
{
! System.out.println (
"detected charset \""
+ content
***************
*** 408,417 ****
// return the default
ret = _default;
! getLog ().debug (
"unable to determine cannonical charset name for "
+ name
+ " - using "
! + _default,
! ita);
}
--- 407,415 ----
// return the default
ret = _default;
! System.out.println (
"unable to determine cannonical charset name for "
+ name
+ " - using "
! + _default);
}
***************
*** 506,523 ****
getText (buffer, 0, mSource.mOffset);
}
-
- //
- // Bean patterns
- //
-
- public Log getLog ()
- {
- if (null == mLog)
- mLog = LogFactory.getLog (this.getClass ());
- // String name = this.getClass ().getName ();
- // java.util.logging.Logger logger = java.util.logging.Logger.getLogger (name);
- // logger.setLevel (java.util.logging.Level.FINEST);
- return (mLog);
- }
-
}
--- 504,506 ----
|
|
From: <der...@us...> - 2003-08-23 01:33:09
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1:/tmp/cvs-serv23027/src/org/htmlparser/lexer/nodes
Modified Files:
Attribute.java
Log Message:
Fifth drop for new i/o subsystem.
There is now a mainline for the lexer.
Try:
java -jar lexer.jar http://whatever
or the integration build has a new lexer execution script:
bin/lexer http://whatever
Index: Attribute.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/Attribute.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** Attribute.java 21 Aug 2003 01:52:23 -0000 1.2
--- Attribute.java 23 Aug 2003 01:33:06 -0000 1.3
***************
*** 45,77 ****
* <p>If Name is not null, and Value is something, and Quote is ' it's a single quoted attribute.
* <p>If Name is not null, and Value is something, and Quote is " it's a double quoted attribute.
*/
public class Attribute
{
! Page mPage;
! int mNameStart;
! int mNameEnd;
! int mValueStart;
! int mValueEnd;
/**
* The name of this attribute.
* The part before the equals sign, or the stand-alone attribute.
*/
! String mName;
/**
* The value of the attribute.
* The part after the equals sign.
*/
! String mValue;
/**
* The quote, if any, surrounding the value of the attribute, if any.
*/
! char mQuote;
/**
* Create an attribute.
! * todo
* @param quote The quote, if any, surrounding the value of the attribute,
* (i.e. ' or "), or zero if none.
--- 45,114 ----
* <p>If Name is not null, and Value is something, and Quote is ' it's a single quoted attribute.
* <p>If Name is not null, and Value is something, and Quote is " it's a double quoted attribute.
+ * <p>All other states are illegal.
+ * <p>
+ * The attribute can be 'lazy loaded' by providing the page and cursor offsets
+ * into the page for the name and value. In this case if the starting offset is
+ * less than zero, the element is null. This is done for speed, since if the name
+ * and value are not been needed we can avoid the cost of creating the strings.
*/
public class Attribute
{
! /**
! * The page this attribute is extracted from.
! */
! protected Page mPage;
!
! /**
! * The starting offset of the name within the page.
! * If negative, the name is considered <code>null</code>.
! */
! protected int mNameStart;
!
! /**
! * The ending offset of the name within the page.
! */
! protected int mNameEnd;
!
! /**
! * The starting offset of the value within the page.
! * If negative, the value is considered <code>null</code>.
! */
! protected int mValueStart;
!
! /**
! * The ending offset of the name within the page.
! */
! protected int mValueEnd;
/**
* The name of this attribute.
* The part before the equals sign, or the stand-alone attribute.
+ * This will be <code>null</code> if the name has not been extracted from
+ * the page, or the name starting offset is negative.
*/
! protected String mName;
/**
* The value of the attribute.
* The part after the equals sign.
+ * This will be <code>null</code> if the value has not been extracted from
+ * the page, or the value starting offset is negative.
*/
! protected String mValue;
/**
* The quote, if any, surrounding the value of the attribute, if any.
*/
! protected char mQuote;
/**
* Create an attribute.
! * @param page The page containing the attribute.
! * @param name_start The starting offset of the name within the page.
! * If this is negative, the name is considered null.
! * @param name_end The ending offset of the name within the page.
! * @param value_start he starting offset of the value within the page.
! * If this is negative, the value is considered null.
! * @param value_end The ending offset of the value within the page.
* @param quote The quote, if any, surrounding the value of the attribute,
* (i.e. ' or "), or zero if none.
***************
*** 111,115 ****
{
if (null == mName)
! if (-1 != mNameStart)
mName = mPage.getText (mNameStart, mNameEnd);
return (mName);
--- 148,152 ----
{
if (null == mName)
! if (0 <= mNameStart)
mName = mPage.getText (mNameStart, mNameEnd);
return (mName);
***************
*** 125,129 ****
{
if (null == mValue)
! if (-1 != mValueStart)
mValue = mPage.getText (mValueStart, mValueEnd);
return (mValue);
--- 162,166 ----
{
if (null == mValue)
! if (0 <= mValueStart)
mValue = mPage.getText (mValueStart, mValueEnd);
return (mValue);
|
|
From: <der...@us...> - 2003-08-22 03:37:46
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1:/tmp/cvs-serv6515/lexer
Modified Files:
Cursor.java Lexer.java Page.java PageIndex.java Source.java
Log Message:
Fourth drop for new i/o subsystem.
Index: Cursor.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Cursor.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** Cursor.java 17 Aug 2003 16:09:27 -0000 1.3
--- Cursor.java 21 Aug 2003 01:52:23 -0000 1.4
***************
*** 39,43 ****
* This class remembers the page it came from and its position within the page.
*/
! public class Cursor implements Ordered
{
/**
--- 39,43 ----
* This class remembers the page it came from and its position within the page.
*/
! public class Cursor implements Ordered, Cloneable
{
/**
***************
*** 105,109 ****
public Cursor dup ()
{
! return (new Cursor (getPage (), getPosition ()));
}
--- 105,116 ----
public Cursor dup ()
{
! try
! {
! return ((Cursor)clone ());
! }
! catch (CloneNotSupportedException cnse)
! {
! return (new Cursor (getPage (), getPosition ()));
! }
}
Index: Lexer.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** Lexer.java 17 Aug 2003 16:09:27 -0000 1.1
--- Lexer.java 21 Aug 2003 01:52:23 -0000 1.2
***************
*** 159,162 ****
--- 159,164 ----
char ch;
int length;
+ int begin;
+ int end;
StringNode ret;
***************
*** 174,178 ****
done = true;
// the order of these tests might be optimized for speed:
! else if ('/' == ch || '%' == ch || Character.isLetter (ch) || '!' == ch)
{
done = true;
--- 176,180 ----
done = true;
// the order of these tests might be optimized for speed:
! else if ('/' == ch || Character.isLetter (ch) || '!' == ch || '%' == ch)
{
done = true;
***************
*** 187,194 ****
}
}
! length = cursor.getPosition () - mCursor.getPosition ();
if (0 != length)
{ // got some characters
! ret = new StringNode (mPage, mCursor.getPosition (), cursor.getPosition ());
mCursor = cursor;
}
--- 189,198 ----
}
}
! begin = mCursor.getPosition ();
! end = cursor.getPosition ();
! length = end - begin;
if (0 != length)
{ // got some characters
! ret = new StringNode (mPage, begin, end);
mCursor = cursor;
}
***************
*** 202,231 ****
{
if (bookmarks[1] > bookmarks[0])
! attributes.addElement (new Attribute (null, mPage.getText (bookmarks[0], bookmarks[1]), (char)0));
}
private void standalone (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), null, (char)0));
}
private void empty (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), "", (char)0));
}
private void naked (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[3], bookmarks[4]), (char)0));
}
private void single_quote (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[4] + 1, bookmarks[5]), '\''));
}
private void double_quote (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[5] + 1, bookmarks[6]), '"'));
}
--- 206,241 ----
{
if (bookmarks[1] > bookmarks[0])
! attributes.addElement (new Attribute (mPage, -1, -1, bookmarks[0], bookmarks[1], (char)0));
! //attributes.addElement (new Attribute (null, mPage.getText (bookmarks[0], bookmarks[1]), (char)0));
}
private void standalone (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage, bookmarks[1], bookmarks[2], -1, -1, (char)0));
! //attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), null, (char)0));
}
private void empty (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage, bookmarks[1], bookmarks[2], bookmarks[2] + 1, bookmarks[2] + 1, (char)0));
! //attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), "", (char)0));
}
private void naked (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage, bookmarks[1], bookmarks[2], bookmarks[3], bookmarks[4], (char)0));
! //attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[3], bookmarks[4]), (char)0));
}
private void single_quote (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage, bookmarks[1], bookmarks[2], bookmarks[4] + 1, bookmarks[5], '\''));
! //attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[4] + 1, bookmarks[5]), '\''));
}
private void double_quote (Vector attributes, int[] bookmarks)
{
! attributes.addElement (new Attribute (mPage, bookmarks[1], bookmarks[2], bookmarks[5] + 1, bookmarks[6], '"'));
! //attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[5] + 1, bookmarks[6]), '"'));
}
***************
*** 510,514 ****
if ('>' == ch)
done = true;
! else if (!Character.isWhitespace (ch) || ('!' == ch))
state = 2;
break;
--- 520,528 ----
if ('>' == ch)
done = true;
! else if (('!' == ch) || ('-' == ch) || Character.isWhitespace (ch))
! {
! // stay in state 4
! }
! else
state = 2;
break;
Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** Page.java 17 Aug 2003 16:09:27 -0000 1.5
--- Page.java 21 Aug 2003 01:52:23 -0000 1.6
***************
*** 31,34 ****
--- 31,36 ----
import java.io.ByteArrayInputStream;
import java.io.IOException;
+ import java.io.InputStream;
+ import java.io.InputStreamReader;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
***************
*** 76,80 ****
* Messages for page not there (404).
*/
! private String[] mFourOhFour =
{
"The web site you seek cannot be located, but countless more exist",
--- 78,82 ----
* Messages for page not there (404).
*/
! static private String[] mFourOhFour =
{
"The web site you seek cannot be located, but countless more exist",
***************
*** 135,139 ****
* @exception UnsupportedEncodingException If the given charset is not supported.
*/
! public Page (Stream stream, String charset)
throws
UnsupportedEncodingException
--- 137,141 ----
* @exception UnsupportedEncodingException If the given charset is not supported.
*/
! public Page (InputStream stream, String charset)
throws
UnsupportedEncodingException
***************
*** 149,153 ****
public Page (String text) throws ParserException
{
! Stream stream;
Page ret;
--- 151,155 ----
public Page (String text) throws ParserException
{
! InputStream stream;
Page ret;
***************
*** 156,161 ****
try
{
! stream = new Stream (new ByteArrayInputStream (text.getBytes (Page.DEFAULT_CHARSET)));
! mSource = new Source (stream, Page.DEFAULT_CHARSET);
mIndex = new PageIndex (this);
}
--- 158,163 ----
try
{
! stream = new ByteArrayInputStream (text.getBytes (Page.DEFAULT_CHARSET));
! mSource = new Source (stream, Page.DEFAULT_CHARSET, text.length () + 1);
mIndex = new PageIndex (this);
}
***************
*** 193,205 ****
int i;
char ret;
!
! if (mSource.mOffset < cursor.getPosition ())
// hmmm, we could skip ahead, but then what about the EOL index
throw new ParserException ("attempt to read future characters from source");
! else if (mSource.mOffset == cursor.getPosition ())
try
{
i = mSource.read ();
! if (-1 == i)
ret = 0;
else
--- 195,208 ----
int i;
char ret;
!
! i = cursor.getPosition ();
! if (mSource.mOffset < i)
// hmmm, we could skip ahead, but then what about the EOL index
throw new ParserException ("attempt to read future characters from source");
! else if (mSource.mOffset == i)
try
{
i = mSource.read ();
! if (0 > i)
ret = 0;
else
***************
*** 218,222 ****
{
// historic read
! ret = mSource.mBuffer[cursor.getPosition ()];
cursor.advance ();
}
--- 221,225 ----
{
// historic read
! ret = mSource.mBuffer[i];
cursor.advance ();
}
***************
*** 466,470 ****
{
int length;
- StringBuffer ret;
if ((mSource.mOffset < start) || (mSource.mOffset < end))
--- 469,472 ----
***************
*** 478,481 ****
--- 480,508 ----
length = end - start;
buffer.append (mSource.mBuffer, start, length);
+ }
+
+ /**
+ * Get all text read so far from the source.
+ * @return The text from the source.
+ * @see #getText(StringBuffer)
+ */
+ public String getText ()
+ {
+ StringBuffer ret;
+
+ ret = new StringBuffer (mSource.mOffset);
+ getText (ret);
+
+ return (ret.toString ());
+ }
+
+ /**
+ * Put all text read so far from the source into the given buffer.
+ * @param buffer The accumulator for the characters.
+ * @see #getText(StringBuffer,int,int)
+ */
+ public void getText (StringBuffer buffer)
+ {
+ getText (buffer, 0, mSource.mOffset);
}
Index: PageIndex.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/PageIndex.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** PageIndex.java 17 Aug 2003 16:09:27 -0000 1.3
--- PageIndex.java 21 Aug 2003 01:52:23 -0000 1.4
***************
*** 51,55 ****
* Increment for allocations.
*/
! protected static final int mIncrement = 10;
/**
--- 51,55 ----
* Increment for allocations.
*/
! protected static final int mIncrement = 100;
/**
Index: Source.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Source.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** Source.java 17 Aug 2003 16:09:27 -0000 1.4
--- Source.java 21 Aug 2003 01:52:23 -0000 1.5
***************
*** 30,33 ****
--- 30,34 ----
import java.io.IOException;
+ import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
***************
*** 50,55 ****
* An initial buffer size.
*/
! protected static final int BUFFER_SIZE = 4096;
!
/**
* Return value when no more characters are left.
--- 51,56 ----
* An initial buffer size.
*/
! public static int BUFFER_SIZE = 16384;
!
/**
* Return value when no more characters are left.
***************
*** 60,64 ****
* The stream of bytes.
*/
! protected Stream mStream;
/**
--- 61,65 ----
* The stream of bytes.
*/
! protected InputStream mStream;
/**
***************
*** 70,84 ****
* The characters read so far.
*/
! public volatile char[] mBuffer;
/**
* The number of valid bytes in the buffer.
*/
! public volatile int mLevel;
/**
* The offset of the next byte returned by read().
*/
! public volatile int mOffset;
/**
--- 71,85 ----
* The characters read so far.
*/
! public /*volatile*/ char[] mBuffer;
/**
* The number of valid bytes in the buffer.
*/
! public /*volatile*/ int mLevel;
/**
* The offset of the next byte returned by read().
*/
! public /*volatile*/ int mOffset;
/**
***************
*** 91,99 ****
* @param stream The stream of bytes to use.
*/
! public Source (Stream stream)
throws
UnsupportedEncodingException
{
! this (stream, null);
}
--- 92,100 ----
* @param stream The stream of bytes to use.
*/
! public Source (InputStream stream)
throws
UnsupportedEncodingException
{
! this (stream, null, BUFFER_SIZE);
}
***************
*** 103,107 ****
* @param charset The character set used in encoding the stream.
*/
! public Source (Stream stream, String charset)
throws
UnsupportedEncodingException
--- 104,119 ----
* @param charset The character set used in encoding the stream.
*/
! public Source (InputStream stream, String charset)
! throws
! UnsupportedEncodingException
! {
! this (stream, charset, BUFFER_SIZE);
! }
! /**
! * Create a source of characters.
! * @param stream The stream of bytes to use.
! * @param charset The character set used in encoding the stream.
! */
! public Source (InputStream stream, String charset, int buffer_size)
throws
UnsupportedEncodingException
***************
*** 114,118 ****
else
mReader = new InputStreamReader (stream, charset);
! mBuffer = null;
mLevel = 0;
mOffset = 0;
--- 126,130 ----
else
mReader = new InputStreamReader (stream, charset);
! mBuffer = new char[buffer_size];
mLevel = 0;
mOffset = 0;
***************
*** 131,156 ****
{
char[] buffer;
int read;
if (null != mReader) // mReader goes null when it's been sucked dry
{
! // get some buffer space
! // unknown length... keep doubling
! if (null == mBuffer)
{
! mBuffer = new char[Math.max (BUFFER_SIZE, min)];
! buffer = mBuffer;
}
else
{
! read = Math.max (BUFFER_SIZE / 2, min);
! if (mBuffer.length - mLevel < read)
! buffer = new char[Math.max (mBuffer.length * 2, mBuffer.length + min)];
! else
! buffer = mBuffer;
}
// read into the end of the 'new' buffer
! read = mReader.read (buffer, mLevel, buffer.length - mLevel);
if (-1 == read)
{
--- 143,171 ----
{
char[] buffer;
+ int size;
int read;
if (null != mReader) // mReader goes null when it's been sucked dry
{
! size = mBuffer.length - mLevel; // available space
! if (size < min) // oops, better get some buffer space
{
! // unknown length... keep doubling
! size = mBuffer.length * 2;
! read = mLevel + min;
! if (size < read) // or satisfy min, whichever is greater
! size = read;
! else
! min = size - mLevel; // read the max
! buffer = new char[size];
}
else
{
! buffer = mBuffer;
! min = size;
}
// read into the end of the 'new' buffer
! read = mReader.read (buffer, mLevel, min);
if (-1 == read)
{
***************
*** 167,170 ****
--- 182,186 ----
mLevel += read;
}
+ // todo, should repeat on read shorter than original min
}
}
***************
*** 196,211 ****
int ret;
- if (null == mStream) // mStream goes null on close()
- throw new IOException ("reader is closed");
if (mLevel - mOffset < 1)
- fill (1);
- if (mOffset >= mLevel)
- ret = EOF;
- else
{
! ret = mBuffer[mOffset];
! mOffset++;
}
!
return (ret);
}
--- 212,228 ----
int ret;
if (mLevel - mOffset < 1)
{
! if (null == mStream) // mStream goes null on close()
! throw new IOException ("reader is closed");
! fill (1);
! if (mOffset >= mLevel)
! ret = EOF;
! else
! ret = mBuffer[mOffset++];
}
! else
! ret = mBuffer[mOffset++];
!
return (ret);
}
***************
*** 245,249 ****
--- 262,281 ----
return (ret);
}
+
+ /**
+ * Read characters into an array.
+ * This method will block until some input is available, an I/O error occurs,
+ * or the end of the stream is reached.
+ * @param cbuf Destination buffer.
+ * @return The number of characters read, or -1 if the end of the stream has
+ * been reached.
+ * @exception IOException If an I/O error occurs.
+ */
+ public int read (char[] cbuf) throws IOException
+ {
+ return (read (cbuf, 0, cbuf.length));
+ }
+
/**
* Reset the stream. If the stream has been marked, then attempt to
***************
*** 367,370 ****
--- 399,411 ----
mOffset = 0;
mMark = -1;
+ }
+
+ /**
+ * Get the number of available characters.
+ * @return The number of characters that can be read without blocking.
+ */
+ public int available ()
+ {
+ return (mLevel - mOffset);
}
}
|
|
From: <der...@us...> - 2003-08-22 03:35:31
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests
In directory sc8-pr-cvs1:/tmp/cvs-serv6515/tests/lexerTests
Modified Files:
AllTests.java KitTest.java LexerTests.java
Log Message:
Fourth drop for new i/o subsystem.
Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/AllTests.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** AllTests.java 17 Aug 2003 16:09:27 -0000 1.5
--- AllTests.java 21 Aug 2003 01:52:23 -0000 1.6
***************
*** 42,48 ****
{
TestSuite suite = new TestSuite ("Lexer Tests");
! suite.addTestSuite (StreamTests.class);
! suite.addTestSuite (SourceTests.class);
! suite.addTestSuite (PageTests.class);
suite.addTestSuite (PageIndexTests.class);
suite.addTestSuite (LexerTests.class);
--- 42,48 ----
{
TestSuite suite = new TestSuite ("Lexer Tests");
! suite.addTestSuite (StreamTests.class);
! suite.addTestSuite (SourceTests.class);
! suite.addTestSuite (PageTests.class);
suite.addTestSuite (PageIndexTests.class);
suite.addTestSuite (LexerTests.class);
Index: KitTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/KitTest.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** KitTest.java 17 Aug 2003 16:09:27 -0000 1.1
--- KitTest.java 21 Aug 2003 01:52:23 -0000 1.2
***************
*** 22,30 ****
--- 22,33 ----
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import org.htmlparser.Node;
+ import org.htmlparser.lexer.Cursor;
import org.htmlparser.lexer.Lexer;
import org.htmlparser.lexer.Page;
+ import org.htmlparser.lexer.nodes.AbstractNode;
import org.htmlparser.lexer.nodes.Attribute;
import org.htmlparser.lexer.nodes.TagNode;
import org.htmlparser.util.ParserException;
+ import org.htmlparser.util.Translate;
/**
***************
*** 44,47 ****
--- 47,75 ----
}
+ String snowhite (String s)
+ {
+ int length;
+ char ch;
+ StringBuffer ret;
+
+ length = s.length ();
+ ret = new StringBuffer (length);
+ for (int i = 0; i < length; i++)
+ {
+ ch = s.charAt (i);
+ if (!Character.isWhitespace (ch) && !(160 == (int)ch))
+ ret.append (ch);
+ }
+
+ return (ret.toString ());
+ }
+
+ boolean match (String s1, String s2)
+ {
+ s1 = snowhite (Translate.decode (s1));
+ s2 = snowhite (Translate.decode (s2));
+ return (s1.equalsIgnoreCase (s2));
+ }
+
public void handleText (char[] data, int pos)
{
***************
*** 66,70 ****
node = (Node)mNodes.elementAt (i);
ours = node.getText ();
! if (theirs.equalsIgnoreCase (ours))
{
match = i;
--- 94,98 ----
node = (Node)mNodes.elementAt (i);
ours = node.getText ();
! if (match (theirs, ours))
{
match = i;
***************
*** 77,85 ****
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! System.out.println (" ours: " + ours);
! mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
--- 105,132 ----
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
! System.out.println ("ours " + cursor + ": " + ours);
}
else
{
+ boolean skipped = false;
+ for (int i = mIndex; i < match; i++)
+ {
+ ours = ((Node)mNodes.elementAt (i)).toHtml ();
+ if (0 != ours.trim ().length ())
+ {
+ if (!skipped)
+ System.out.println ("skipping:");
+ System.out.println (ours);
+ skipped = true;
+ }
+ }
+ if (skipped)
+ {
+ System.out.println ("to match:");
+ node = (Node)mNodes.elementAt (match);
+ Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
+ System.out.println ("@" + cursor + ": " + node.toHtml ());
+ }
// System.out.println (" match: " + theirs);
mIndex = match + 1;
***************
*** 103,107 ****
node = (Node)mNodes.elementAt (i);
ours = node.getText ();
! if (theirs.equalsIgnoreCase (ours))
{
match = i;
--- 150,154 ----
node = (Node)mNodes.elementAt (i);
ours = node.getText ();
! if (match (theirs, ours))
{
match = i;
***************
*** 114,122 ****
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! System.out.println (" ours: " + ours);
! mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
--- 161,188 ----
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
! System.out.println ("ours " + cursor + ": " + ours);
}
else
{
+ boolean skipped = false;
+ for (int i = mIndex; i < match; i++)
+ {
+ ours = ((Node)mNodes.elementAt (i)).toHtml ();
+ if (0 != ours.trim ().length ())
+ {
+ if (!skipped)
+ System.out.println ("skipping:");
+ System.out.println (ours);
+ skipped = true;
+ }
+ }
+ if (skipped)
+ {
+ System.out.println ("to match:");
+ node = (Node)mNodes.elementAt (match);
+ Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
+ System.out.println ("@" + cursor + ": " + node.toHtml ());
+ }
// System.out.println (" match: " + theirs);
mIndex = match + 1;
***************
*** 140,144 ****
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ();
! if (theirs.equalsIgnoreCase (ours))
{
match = i;
--- 206,210 ----
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ();
! if (match (theirs, ours))
{
match = i;
***************
*** 152,160 ****
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! System.out.println (" ours: " + ours);
! mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
--- 218,245 ----
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
! System.out.println ("ours " + cursor + ": " + ours);
}
else
{
+ boolean skipped = false;
+ for (int i = mIndex; i < match; i++)
+ {
+ ours = ((Node)mNodes.elementAt (i)).toHtml ();
+ if (0 != ours.trim ().length ())
+ {
+ if (!skipped)
+ System.out.println ("skipping:");
+ System.out.println (ours);
+ skipped = true;
+ }
+ }
+ if (skipped)
+ {
+ System.out.println ("to match:");
+ node = (Node)mNodes.elementAt (match);
+ Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
+ System.out.println ("@" + cursor + ": " + node.toHtml ());
+ }
// System.out.println (" match: " + theirs);
mIndex = match + 1;
***************
*** 178,182 ****
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ().substring (1);
! if (theirs.equalsIgnoreCase (ours))
{
match = i;
--- 263,267 ----
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ().substring (1);
! if (match (theirs, ours))
{
match = i;
***************
*** 190,198 ****
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! System.out.println (" ours: " + ours);
! mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
--- 275,302 ----
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
! System.out.println ("ours " + cursor + ": " + ours);
}
else
{
+ boolean skipped = false;
+ for (int i = mIndex; i < match; i++)
+ {
+ ours = ((Node)mNodes.elementAt (i)).toHtml ();
+ if (0 != ours.trim ().length ())
+ {
+ if (!skipped)
+ System.out.println ("skipping:");
+ System.out.println (ours);
+ skipped = true;
+ }
+ }
+ if (skipped)
+ {
+ System.out.println ("to match:");
+ node = (Node)mNodes.elementAt (match);
+ Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
+ System.out.println ("@" + cursor + ": " + node.toHtml ());
+ }
// System.out.println (" match: " + theirs);
mIndex = match + 1;
***************
*** 216,225 ****
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ();
! if (theirs.equalsIgnoreCase (ours))
{
match = i;
break;
}
! else if (theirs.equalsIgnoreCase (ours.substring (1)))
{
match = i;
--- 320,329 ----
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ();
! if (match (theirs, ours))
{
match = i;
break;
}
! if (match (theirs, ours))
{
match = i;
***************
*** 233,241 ****
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! System.out.println (" ours: " + ours);
! mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
--- 337,364 ----
ours = node.getText ();
System.out.println ("theirs: " + theirs);
! Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
! System.out.println ("ours " + cursor + ": " + ours);
}
else
{
+ boolean skipped = false;
+ for (int i = mIndex; i < match; i++)
+ {
+ ours = ((Node)mNodes.elementAt (i)).toHtml ();
+ if (0 != ours.trim ().length ())
+ {
+ if (!skipped)
+ System.out.println ("skipping:");
+ System.out.println (ours);
+ skipped = true;
+ }
+ }
+ if (skipped)
+ {
+ System.out.println ("to match:");
+ node = (Node)mNodes.elementAt (match);
+ Cursor cursor = new Cursor (((AbstractNode)node).getPage (), node.elementBegin ());
+ System.out.println ("@" + cursor + ": " + node.toHtml ());
+ }
// System.out.println (" match: " + theirs);
mIndex = match + 1;
***************
*** 246,250 ****
public void handleError (String errorMsg, int pos)
{
! // System.out.println ("******* error @" + pos + " ******** " + errorMsg);
}
--- 369,373 ----
public void handleError (String errorMsg, int pos)
{
! System.out.println ("******* error @" + pos + " ******** " + errorMsg);
}
***************
*** 348,351 ****
--- 471,475 ----
public static void main (String[] args) throws ParserException, IOException
{
+ String link;
Lexer lexer;
Node node;
***************
*** 357,362 ****
Element[] elements;
// pass through it once to read the entire page
! URL url = new URL ("http://sourceforge.net/projects/htmlparser");
lexer = new Lexer (url.openConnection ());
nodes = new Vector ();
--- 481,490 ----
Element[] elements;
+ if (0 == args.length)
+ link = "http://sourceforge.net/projects/htmlparser";
+ else
+ link = args[0];
// pass through it once to read the entire page
! URL url = new URL (link);
lexer = new Lexer (url.openConnection ());
nodes = new Vector ();
Index: LexerTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/LexerTests.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** LexerTests.java 17 Aug 2003 16:09:27 -0000 1.1
--- LexerTests.java 21 Aug 2003 01:52:23 -0000 1.2
***************
*** 29,47 ****
--- 29,56 ----
package org.htmlparser.tests.lexerTests;
+ import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
+ import java.io.InputStream;
+ import java.io.InputStreamReader;
+ import java.io.StringReader;
import java.io.UnsupportedEncodingException;
import java.net.URL;
+ import java.net.URLConnection;
import junit.framework.TestCase;
import org.htmlparser.Node;
+ import org.htmlparser.NodeReader;
+ import org.htmlparser.Parser;
import org.htmlparser.lexer.Lexer;
import org.htmlparser.lexer.Page;
import org.htmlparser.lexer.PageIndex;
+ import org.htmlparser.lexer.Source;
import org.htmlparser.lexer.Stream;
import org.htmlparser.lexer.nodes.RemarkNode;
import org.htmlparser.lexer.nodes.StringNode;
import org.htmlparser.lexer.nodes.TagNode;
+ import org.htmlparser.tags.Tag;
import org.htmlparser.util.ParserException;
***************
*** 244,261 ****
}
/**
! * Try a real page.
*/
! public void testReal () throws ParserException, IOException
{
Lexer lexer;
Node node;
URL url = new URL ("http://sourceforge.net/projects/htmlparser");
lexer = new Lexer (url.openConnection ());
while (null != (node = lexer.nextNode ()))
! System.out.println (node.toString ());
}
}
--- 253,593 ----
}
+ // /**
+ // * Try a real page.
+ // */
+ // public void testReal () throws ParserException, IOException
+ // {
+ // Lexer lexer;
+ // Node node;
+ //
+ // URL url = new URL ("http://sourceforge.net/projects/htmlparser");
+ // lexer = new Lexer (url.openConnection ());
+ // while (null != (node = lexer.nextNode ()))
+ // System.out.println (node.toString ());
+ // }
+
/**
! * Test the fidelity of the toHtml() method.
*/
! public void testFidelity () throws ParserException, IOException
{
Lexer lexer;
Node node;
+ int position;
+ StringBuffer buffer;
+ String string;
+ char[] ref;
+ char[] test;
URL url = new URL ("http://sourceforge.net/projects/htmlparser");
lexer = new Lexer (url.openConnection ());
+ position = 0;
+ buffer = new StringBuffer (80000);
while (null != (node = lexer.nextNode ()))
! {
! string = node.toHtml ();
! if (position != node.elementBegin ())
! fail ("non-contiguous" + string);
! buffer.append (string);
! position = node.elementEnd ();
! if (buffer.length () != position)
! fail ("text length differed after encountering node " + string);
! }
! ref = lexer.getPage ().getText ().toCharArray ();
! test = new char[buffer.length ()];
! buffer.getChars (0, buffer.length (), test, 0);
! assertEquals ("different amounts of text", ref.length, test.length);
! for (int i = 0; i < ref.length; i++)
! if (ref[i] != test[i])
! fail ("character differs at position " + i + ", expected <" + ref[i] + "> but was <" + test[i] + ">");
}
+ /**
+ * Test the relative speed reading from a string parsing tags too.
+ */
+ public void testSpeedStringWithoutTags () throws ParserException, IOException
+ {
+ final String link = "http://htmlparser.sourceforge.net/javadoc_1_3/index-all.html";
+ URL url;
+ URLConnection connection;
+ Source source;
+ StringBuffer buffer;
+ int i;
+ String html;
+
+ long old_total;
+ long new_total;
+ long begin;
+ long end;
+ StringReader reader;
+ NodeReader nodes;
+ Parser parser;
+ int nodecount;
+ Node node;
+ int charcount;
+
+ url = new URL (link);
+ connection = url.openConnection ();
+ connection.connect ();
+ source = new Source (new Stream (connection.getInputStream ()));
+ buffer = new StringBuffer (350000);
+ while (-1 != (i = source.read ()))
+ buffer.append ((char)i);
+ source.close ();
+ html = buffer.toString ();
+ old_total = 0;
+ new_total = 0;
+ for (i = 0; i < 5; i++)
+ {
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ Lexer lexer = new Lexer (html);
+ nodecount = 0;
+ while (null != (node = lexer.nextNode ()))
+ nodecount++;
+ end = System.currentTimeMillis ();
+ System.out.println (" lexer: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ new_total += (end - begin);
+
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ reader = new StringReader (html);
+ nodes = new NodeReader (new BufferedReader (reader), 350000);
+ parser = new Parser (nodes, null);
+ nodecount = 0;
+ while (null != (node = nodes.readElement ()))
+ nodecount++;
+ end = System.currentTimeMillis ();
+ System.out.println ("old reader: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ old_total += (end - begin);
+ }
+ assertTrue ("old parser is" + ((double)(new_total - old_total)/(double)old_total*100.0) + "% faster", new_total < old_total);
+ System.out.println ("lexer is " + ((double)(old_total - new_total)/(double)old_total*100.0) + "% faster");
+ }
+ /**
+ * Test the relative speed reading from a string parsing tags too.
+ */
+ public void testSpeedStringWithTags () throws ParserException, IOException
+ {
+ final String link = "http://htmlparser.sourceforge.net/javadoc_1_3/index-all.html";
+ URL url;
+ URLConnection connection;
+ Source source;
+ StringBuffer buffer;
+ int i;
+ String html;
+
+ long old_total;
+ long new_total;
+ long begin;
+ long end;
+ StringReader reader;
+ NodeReader nodes;
+ Parser parser;
+ int nodecount;
+ Node node;
+ int charcount;
+
+ url = new URL (link);
+ connection = url.openConnection ();
+ connection.connect ();
+ source = new Source (new Stream (connection.getInputStream ()));
+ buffer = new StringBuffer (350000);
+ while (-1 != (i = source.read ()))
+ buffer.append ((char)i);
+ source.close ();
+ html = buffer.toString ();
+ old_total = 0;
+ new_total = 0;
+ for (i = 0; i < 5; i++)
+ {
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ Lexer lexer = new Lexer (html);
+ nodecount = 0;
+ while (null != (node = lexer.nextNode ()))
+ {
+ nodecount++;
+ if (node instanceof TagNode)
+ ((TagNode)node).getAttributes ();
+ }
+ end = System.currentTimeMillis ();
+ System.out.println (" lexer: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ new_total += (end - begin);
+
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ reader = new StringReader (html);
+ nodes = new NodeReader (new BufferedReader (reader), 350000);
+ parser = new Parser (nodes, null);
+ nodecount = 0;
+ while (null != (node = nodes.readElement ()))
+ {
+ nodecount++;
+ if (node instanceof Tag)
+ ((Tag)node).getAttributes ();
+ }
+ end = System.currentTimeMillis ();
+ System.out.println ("old reader: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ old_total += (end - begin);
+ }
+ assertTrue ("old parser is" + ((double)(new_total - old_total)/(double)old_total*100.0) + "% faster", new_total < old_total);
+ System.out.println ("lexer is " + ((double)(old_total - new_total)/(double)old_total*100.0) + "% faster");
+ }
+
+ public void testSpeedStreamWithoutTags () throws ParserException, IOException
+ {
+ final String link = "http://htmlparser.sourceforge.net/javadoc_1_3/index-all.html";
+ URL url;
+ URLConnection connection;
+ Source source;
+ StringBuffer buffer;
+ int i;
+ String html;
+ InputStream stream;
+
+ long old_total;
+ long new_total;
+ long begin;
+ long end;
+ InputStreamReader reader;
+ NodeReader nodes;
+ Parser parser;
+ int nodecount;
+ Node node;
+ int charcount;
+
+ url = new URL (link);
+ connection = url.openConnection ();
+ connection.connect ();
+ source = new Source (new Stream (connection.getInputStream ()));
+ buffer = new StringBuffer (350000);
+ while (-1 != (i = source.read ()))
+ buffer.append ((char)i);
+ source.close ();
+ html = buffer.toString ();
+ old_total = 0;
+ new_total = 0;
+
+ for (i = 0; i < 5; i++)
+ {
+
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ stream = new ByteArrayInputStream (html.getBytes (Page.DEFAULT_CHARSET));
+ Lexer lexer = new Lexer (new Page (stream, Page.DEFAULT_CHARSET));
+ nodecount = 0;
+ while (null != (node = lexer.nextNode ()))
+ nodecount++;
+ end = System.currentTimeMillis ();
+ System.out.println (" lexer: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ new_total += (end - begin);
+
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ stream = new ByteArrayInputStream (html.getBytes (Page.DEFAULT_CHARSET));
+ reader = new InputStreamReader (stream);
+ nodes = new NodeReader (reader, 350000);
+ parser = new Parser (nodes, null);
+ nodecount = 0;
+ while (null != (node = nodes.readElement ()))
+ nodecount++;
+ end = System.currentTimeMillis ();
+ System.out.println ("old reader: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ old_total += (end - begin);
+
+ }
+ assertTrue ("old parser is" + ((double)(new_total - old_total)/(double)old_total*100.0) + "% faster", new_total < old_total);
+ System.out.println ("lexer is " + ((double)(old_total - new_total)/(double)old_total*100.0) + "% faster");
+ }
+
+ public void testSpeedStreamWithTags () throws ParserException, IOException
+ {
+ final String link = "http://htmlparser.sourceforge.net/javadoc_1_3/index-all.html";
+ URL url;
+ URLConnection connection;
+ Source source;
+ StringBuffer buffer;
+ int i;
+ String html;
+ InputStream stream;
+
+ long old_total;
+ long new_total;
+ long begin;
+ long end;
+ InputStreamReader reader;
+ NodeReader nodes;
+ Parser parser;
+ int nodecount;
+ Node node;
+ int charcount;
+
+ url = new URL (link);
+ connection = url.openConnection ();
+ connection.connect ();
+ source = new Source (new Stream (connection.getInputStream ()));
+ buffer = new StringBuffer (350000);
+ while (-1 != (i = source.read ()))
+ buffer.append ((char)i);
+ source.close ();
+ html = buffer.toString ();
+ old_total = 0;
+ new_total = 0;
+
+ for (i = 0; i < 5; i++)
+ {
+
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ stream = new ByteArrayInputStream (html.getBytes (Page.DEFAULT_CHARSET));
+ Lexer lexer = new Lexer (new Page (stream, Page.DEFAULT_CHARSET));
+ nodecount = 0;
+ while (null != (node = lexer.nextNode ()))
+ {
+ nodecount++;
+ if (node instanceof TagNode)
+ ((TagNode)node).getAttributes ();
+ }
+ end = System.currentTimeMillis ();
+ System.out.println (" lexer: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ new_total += (end - begin);
+
+ System.gc ();
+ begin = System.currentTimeMillis ();
+ stream = new ByteArrayInputStream (html.getBytes (Page.DEFAULT_CHARSET));
+ reader = new InputStreamReader (stream);
+ nodes = new NodeReader (reader, 350000);
+ parser = new Parser (nodes, null);
+ nodecount = 0;
+ while (null != (node = nodes.readElement ()))
+ {
+ nodecount++;
+ if (node instanceof Tag)
+ ((Tag)node).getAttributes ();
+ }
+ end = System.currentTimeMillis ();
+ System.out.println ("old reader: " + (end - begin) + " msec, " + nodecount + " nodes");
+ if (0 != i) // the first timing is way different
+ old_total += (end - begin);
+ }
+ assertTrue ("old parser is" + ((double)(new_total - old_total)/(double)old_total*100.0) + "% faster", new_total < old_total);
+ System.out.println ("lexer is " + ((double)(old_total - new_total)/(double)old_total*100.0) + "% faster");
+ }
+
+ // public static void main (String[] args) throws ParserException, IOException
+ // {
+ // LexerTests tests = new LexerTests ("hallow");
+ // tests.testSpeedStreamWithTags ();
+ // }
+
}
+
|
|
From: <der...@us...> - 2003-08-22 02:40:29
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1:/tmp/cvs-serv6515/lexer/nodes
Modified Files:
Attribute.java TagNode.java
Log Message:
Fourth drop for new i/o subsystem.
Index: Attribute.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/Attribute.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** Attribute.java 17 Aug 2003 16:09:28 -0000 1.1
--- Attribute.java 21 Aug 2003 01:52:23 -0000 1.2
***************
*** 33,36 ****
--- 33,38 ----
package org.htmlparser.lexer.nodes;
+ import org.htmlparser.lexer.Page;
+
/**
* An attribute within a tag.
***************
*** 46,49 ****
--- 48,57 ----
public class Attribute
{
+ Page mPage;
+ int mNameStart;
+ int mNameEnd;
+ int mValueStart;
+ int mValueEnd;
+
/**
* The name of this attribute.
***************
*** 64,67 ****
--- 72,93 ----
/**
+ * Create an attribute.
+ * todo
+ * @param quote The quote, if any, surrounding the value of the attribute,
+ * (i.e. ' or "), or zero if none.
+ */
+ public Attribute (Page page, int name_start, int name_end, int value_start, int value_end, char quote)
+ {
+ mPage = page;
+ mNameStart = name_start;
+ mNameEnd = name_end;
+ mValueStart = value_start;
+ mValueEnd = value_end;
+ mName = null;
+ mValue = null;
+ mQuote = quote;
+ }
+
+ /**
* Create an attribute with the name, value and quote character given.
* @param name The name of this attribute, or null if it's just whitespace.
***************
*** 84,87 ****
--- 110,116 ----
public String getName ()
{
+ if (null == mName)
+ if (-1 != mNameStart)
+ mName = mPage.getText (mNameStart, mNameEnd);
return (mName);
}
***************
*** 95,98 ****
--- 124,130 ----
public String getValue ()
{
+ if (null == mValue)
+ if (-1 != mValueStart)
+ mValue = mPage.getText (mValueStart, mValueEnd);
return (mValue);
}
Index: TagNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** TagNode.java 17 Aug 2003 16:09:28 -0000 1.1
--- TagNode.java 21 Aug 2003 01:52:23 -0000 1.2
***************
*** 193,196 ****
--- 193,197 ----
Attribute attribute;
String value;
+ StringBuffer _value;
Hashtable ret;
***************
*** 210,216 ****
value = attribute.getValue ();
if ('\'' == attribute.getQuote ())
! value = "'" + value + "'";
else if ('"' == attribute.getQuote ())
! value = "\"" + value + "\"";
else if ((null != value) && value.equals (""))
value = NOTHING;
--- 211,229 ----
value = attribute.getValue ();
if ('\'' == attribute.getQuote ())
! {
! _value = new StringBuffer (value.length () + 2);
! _value.append ("'");
! _value.append (value);
! _value.append ("'");
! value = _value.toString ();
! }
else if ('"' == attribute.getQuote ())
! {
! _value = new StringBuffer (value.length () + 2);
! _value.append ("\"");
! _value.append (value);
! _value.append ("\"");
! value = _value.toString ();
! }
else if ((null != value) && value.equals (""))
value = NOTHING;
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes In directory sc8-pr-cvs1:/tmp/cvs-serv9123/lexer/nodes Added Files: AbstractNode.java Attribute.java RemarkNode.java StringNode.java TagNode.java package.html Log Message: Third drop for new i/o subsystem. --- NEW FILE: AbstractNode.java --- // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // // For any questions or suggestions, you can write to me at : // Email :so...@in... // // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com // // This class was contributed by // Derrick Oswald // package org.htmlparser.lexer.nodes; import org.htmlparser.lexer.Page; /** * Extend org.htmlparser.AbstractNode temporarily to add the Page. * <em>This will be folded into org.htmlparser.AbstractNode eventually.</em> */ public abstract class AbstractNode extends org.htmlparser.AbstractNode { /** * The page this node came from. */ protected Page mPage; /** * Create a lexeme. * Remember the page and start & end cursor positions. * @param page The page this tag was read from. * @param start The starting offset of this node within the page. * @param end The ending offset of this node within the page. */ public AbstractNode (Page page, int start, int end) { super (start, end); mPage = page; } /** * Get the page this node came from. * @return The page that supplied this node. */ public Page getPage () { return (mPage); } } --- NEW FILE: Attribute.java --- // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // // For any questions or suggestions, you can write to me at : // Email :so...@in... // // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com // // This class was contributed by // Derrick Oswald // package org.htmlparser.lexer.nodes; /** * An attribute within a tag. * <p>If Name is null, it's whitepace and Value has the text. * <p>If Name is not null, and Value is null it's a standalone attribute. * <p>If Name is not null, and Value is "", and Quote is zero it's an empty attribute. * <p>If Name is not null, and Value is "", and Quote is ' it's an empty single quoted attribute. * <p>If Name is not null, and Value is "", and Quote is " it's an empty double quoted attribute. * <p>If Name is not null, and Value is something, and Quote is zero it's a naked attribute. * <p>If Name is not null, and Value is something, and Quote is ' it's a single quoted attribute. * <p>If Name is not null, and Value is something, and Quote is " it's a double quoted attribute. */ public class Attribute { /** * The name of this attribute. * The part before the equals sign, or the stand-alone attribute. */ String mName; /** * The value of the attribute. * The part after the equals sign. */ String mValue; /** * The quote, if any, surrounding the value of the attribute, if any. */ char mQuote; /** * Create an attribute with the name, value and quote character given. * @param name The name of this attribute, or null if it's just whitespace. * @param value The value of the attribute or null if it's a stand-alone. * @param quote The quote, if any, surrounding the value of the attribute, * (i.e. ' or "), or zero if none. */ public Attribute (String name, String value, char quote) { mName = name; mValue = value; mQuote = quote; } /** * Get the name of this attribute. * The part before the equals sign, or the stand-alone attribute. * @return The name, or <code>null</code> if it's just a whitepace 'attribute'. */ public String getName () { return (mName); } /** * Get the value of the attribute. * The part after the equals sign, or the text if it's just a whitepace 'attribute'. * @return The value, or <code>null</code> if it's a stand-alone attribute, * or the text if it's just a whitepace 'attribute'. */ public String getValue () { return (mValue); } /** * Get the quote, if any, surrounding the value of the attribute, if any. * @return Either ' or " if the attribute value was quoted, or zero * if there are no quotes around it. */ public char getQuote () { return (mQuote); } /** * Get a text representation of this attribute. * Suitable for insertion into a start tag, the output is one of * the forms: * <code> * <pre> * value * name * name= value * name= 'value' * name= "value" * </pre> * </code> * @param buffer The accumulator for placing the text into. */ public void toString (StringBuffer buffer) { String value; String name; value = getValue (); name = getName (); if (null == name) { if (value != null) buffer.append (value); } else { buffer.append (name); if (null != value) { buffer.append ("="); if (0 != getQuote ()) buffer.append (getQuote ()); buffer.append (value); if (0 != getQuote ()) buffer.append (getQuote ()); } } } /** * Get a text representation of this attribute. * @return A string that can be used within a start tag. * @see #toString(StringBuffer) */ public String toString () { String value; String name; int length; StringBuffer ret; // calculate the size we'll need to avoid extra StringBuffer allocations length = 0; value = getValue (); name = getName (); if (null == getName ()) { if (value != null) length += value.length (); } else { length += name.length (); if (null != value) { length += 1; length += value.length (); if (0 != getQuote ()) length += 2; } } ret = new StringBuffer (length); toString (ret); return (ret.toString ()); } } --- NEW FILE: RemarkNode.java --- // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // // For any questions or suggestions, you can write to me at : // Email :so...@in... // // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com package org.htmlparser.lexer.nodes; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Page; import org.htmlparser.util.NodeList; import org.htmlparser.visitors.NodeVisitor; /** * The remark tag is identified and represented by this class. */ public class RemarkNode extends AbstractNode { public final static String REMARK_NODE_FILTER="-r"; /** * Constructor takes in the text string, beginning and ending posns. * @param page The page this string is on. * @param start The beginning position of the string. * @param end The ending positiong of the string. */ public RemarkNode (Page page, int start, int end) { super (page, start, end); } /** * Returns the text contents of the comment tag. * todo: this only works for the usual case. */ public String getText() { return (mPage.getText (elementBegin () + 4, elementEnd () - 3)); } public String toPlainTextString() { return (getText()); } public String toHtml() { return (mPage.getText (elementBegin (), elementEnd ())); } /** * Print the contents of the remark tag. */ public String toString() { Cursor start; Cursor end; start = new Cursor (getPage (), elementBegin ()); end = new Cursor (getPage (), elementEnd ()); return ("Rem (" + start.toString () + "," + end.toString () + "): " + getText ()); } public void collectInto(NodeList collectionList, String filter) { if (filter==REMARK_NODE_FILTER) collectionList.add(this); } public void accept(NodeVisitor visitor) { // todo: fix this // visitor.visitRemarkNode(this); } } --- NEW FILE: StringNode.java --- // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // // For any questions or suggestions, you can write to me at : // Email :so...@in... // // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com package org.htmlparser.lexer.nodes; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Page; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.visitors.NodeVisitor; /** * Normal text in the HTML document is represented by this class. */ public class StringNode extends AbstractNode { public static final String STRING_FILTER = "-string"; /** * Constructor takes in the text string, beginning and ending posns. * @param page The page this string is on. * @param start The beginning position of the string. * @param end The ending positiong of the string. */ public StringNode (Page page, int start, int end) { super (page, start, end); } /** * Returns the text of the string line */ public String getText () { return (toHtml ()); } /** * Sets the string contents of the node. * @param text The new text for the node. */ public void setText (String text) { try { mPage = new Page (text); nodeBegin = 0; nodeEnd = text.length (); } catch (ParserException pe) { } } public String toPlainTextString () { return (toHtml ()); } public String toHtml () { return (mPage.getText (elementBegin (), elementEnd ())); } public String toString () { Cursor start; Cursor end; start = new Cursor (getPage (), elementBegin ()); end = new Cursor (getPage (), elementEnd ()); return ("Txt (" + start.toString () + "," + end.toString () + "): " + getText ()); } public void collectInto (NodeList collectionList, String filter) { if (STRING_FILTER == filter) collectionList.add (this); } public void accept (NodeVisitor visitor) { // todo: fix this // visitor.visitStringNode (this); } } --- NEW FILE: TagNode.java --- // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // // For any questions or suggestions, you can write to me at : // Email :so...@in... // // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com package org.htmlparser.lexer.nodes; import java.util.Enumeration; import java.util.HashSet; import java.util.Hashtable; import java.util.Map; import java.util.Vector; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Page; import org.htmlparser.parserHelper.SpecialHashtable; import org.htmlparser.parserHelper.TagParser; import org.htmlparser.scanners.TagScanner; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; import org.htmlparser.visitors.NodeVisitor; /** * Tag represents a generic tag. This class allows users to register specific * tag scanners, which can identify links, or image references. This tag asks the * scanners to run over the text, and identify. It can be used to dynamically * configure a parser. * @author Kaarle Kaila 23.10.2001 */ public class TagNode extends AbstractNode { public static final String TYPE = "TAG"; /** * Constant used as value for the value of the tag name * in parseParameters (Kaarle Kaila 3.8.2001) */ public final static String TAGNAME = "$<TAGNAME>$"; public final static String EMPTYTAG = "$<EMPTYTAG>$"; public final static String NULLVALUE = "$<NULL>$"; public final static String NOTHING = "$<NOTHING>$"; private final static String EMPTY_STRING=""; private static TagParser tagParser; private boolean emptyXmlTag = false; /** * The tag attributes. * Objects of type Attribute. */ protected Vector mAttributes; /** * Scanner associated with this tag (useful for extraction of filtering data from a * HTML node) */ protected TagScanner thisScanner = null; /** * Set of tags that breaks the flow. */ protected static HashSet breakTags; static { breakTags = new HashSet (30); breakTags.add ("BLOCKQUOTE"); breakTags.add ("BODY"); breakTags.add ("BR"); breakTags.add ("CENTER"); breakTags.add ("DD"); breakTags.add ("DIR"); breakTags.add ("DIV"); breakTags.add ("DL"); breakTags.add ("DT"); breakTags.add ("FORM"); breakTags.add ("H1"); breakTags.add ("H2"); breakTags.add ("H3"); breakTags.add ("H4"); breakTags.add ("H5"); breakTags.add ("H6"); breakTags.add ("HEAD"); breakTags.add ("HR"); breakTags.add ("HTML"); breakTags.add ("ISINDEX"); breakTags.add ("LI"); breakTags.add ("MENU"); breakTags.add ("NOFRAMES"); breakTags.add ("OL"); breakTags.add ("P"); breakTags.add ("PRE"); breakTags.add ("TD"); breakTags.add ("TH"); breakTags.add ("TITLE"); breakTags.add ("UL"); } /** * Create a tag with the location and attributes provided * @param page The page this tag was read from. * @param start The starting offset of this node within the page. * @param end The ending offset of this node within the page. * @param attributes The list of attributes that were parsed in this tag. * @see Attribute */ public TagNode (Page page, int start, int end, Vector attributes) { super (page, start, end); mAttributes = attributes; } /** * Locate the tag withing the input string, by parsing from the given position * @param reader HTML reader to be provided so as to allow reading of next line * @param input Input String * @param position Position to start parsing from */ // public static Tag find(NodeReader reader,String input,int position) { // return tagParser.find(reader,input,position); // } /** * In case the tag is parsed at the scan method this will return value of a * parameter not implemented yet * @param name of parameter */ public String getAttribute (String name) { return ((String)getAttributes().get(name.toUpperCase())); } /** * Set attribute with given key, value pair. * @param key * @param value */ public void setAttribute(String key, String value) { getAttributes ().put(key,value); } /** * In case the tag is parsed at the scan method this will return value of a * parameter not implemented yet * @param name of parameter * @deprecated use getAttribute instead */ public String getParameter(String name) { return (String)getAttributes().get (name.toUpperCase()); } /** * Gets the attributes in the tag. * NOTE: Values of the extended hashtable are two element arrays of String, * with the first element being the original name (not uppercased), * and the second element being the value. * @return Returns a special hashtable of attributes in two element String arrays. */ public Vector getAttributesEx() { return mAttributes; } /** * Gets the attributes in the tag. * @return Returns a Hashtable of attributes */ public Hashtable getAttributes() { Vector attributes; Attribute attribute; String value; Hashtable ret; ret = new SpecialHashtable (); attributes = getAttributesEx (); if (0 < attributes.size ()) { // special handling for the node name attribute = (Attribute)attributes.elementAt (0); ret.put (org.htmlparser.tags.Tag.TAGNAME, attribute.getName ().toUpperCase ()); // the rest for (int i = 1; i < attributes.size (); i++) { attribute = (Attribute)attributes.elementAt (i); if (null != attribute.getName ()) { value = attribute.getValue (); if ('\'' == attribute.getQuote ()) value = "'" + value + "'"; else if ('"' == attribute.getQuote ()) value = "\"" + value + "\""; else if ((null != value) && value.equals ("")) value = NOTHING; if (null == value) value = NULLVALUE; ret.put (attribute.getName (), value); } } } else ret.put (org.htmlparser.tags.Tag.TAGNAME, ""); return (ret); } public String getTagName(){ return getParameter(TAGNAME); } /** * Return the text contained in this tag */ public String getText() { return (mPage.getText (elementBegin () + 1, elementEnd () - 1)); } /** * Return the scanner associated with this tag. */ public TagScanner getThisScanner() { return thisScanner; } /** * Extract the first word from the given string. * Words are delimited by whitespace or equals signs. * @param s The string to get the word from. * @return The first word. */ // public static String extractWord (String s) // { // int length; // boolean parse; // char ch; // StringBuffer ret; // // length = s.length (); // ret = new StringBuffer (length); // parse = true; // for (int i = 0; i < length && parse; i++) // { // ch = s.charAt (i); // if (Character.isWhitespace (ch) || ch == '=') // parse = false; // else // ret.append (Character.toUpperCase (ch)); // } // // return (ret.toString ()); // } /** * Scan the tag to see using the registered scanners, and attempt identification. * @param url URL at which HTML page is located * @param reader The NodeReader that is to be used for reading the url */ // public AbstractNode scan(Map scanners,String url,NodeReader reader) throws ParserException // { // if (tagContents.length()==0) return this; // try { // boolean found=false; // AbstractNode retVal=null; // // Find the first word in the scanners // String firstWord = extractWord(tagContents.toString()); // // Now, get the scanner associated with this. // TagScanner scanner = (TagScanner)scanners.get(firstWord); // // // Now do a deep check // if (scanner != null && // scanner.evaluate( // tagContents.toString(), // reader.getPreviousOpenScanner() // ) // ) // { // found=true; // TagScanner save; // save = reader.getPreviousOpenScanner (); // reader.setPreviousOpenScanner(scanner); // retVal=scanner.createScannedNode(this,url,reader,tagLine); // reader.setPreviousOpenScanner(save); // } // // if (!found) return this; // else { // return retVal; // } // } // catch (Exception e) { // String errorMsg; // if (tagContents!=null) errorMsg = tagContents.toString(); else errorMsg="null"; // throw new ParserException("Tag.scan() : Error while scanning tag, tag contents = "+errorMsg+", tagLine = "+tagLine,e); // } // } /** * Sets the attributes. * @param attributes The attribute collection to set. */ public void setAttributes (Hashtable attributes) { Vector att; String key; String value; char quote; Attribute attribute; att = new Vector (); for (Enumeration e = attributes.keys (); e.hasMoreElements (); ) { key = (String)e.nextElement (); value = (String)attributes.get (key); if (value.startsWith ("'") && value.endsWith ("'") && (2 <= value.length ())) { quote = '\''; value = value.substring (1, value.length () - 1); } else if (value.startsWith ("\"") && value.endsWith ("\"") && (2 <= value.length ())) { quote = '"'; value = value.substring (1, value.length () - 1); } else quote = (char)0; attribute = new Attribute (key, value, quote); att.addElement (attribute); } this.mAttributes = att; } /** * Sets the attributes. * NOTE: Values of the extended hashtable are two element arrays of String, * with the first element being the original name (not uppercased), * and the second element being the value. * @param attribs The attribute collection to set. */ public void setAttributesEx (Vector attribs) { mAttributes = attribs; } /** * Sets the nodeBegin. * @param tagBegin The nodeBegin to set */ public void setTagBegin(int tagBegin) { this.nodeBegin = tagBegin; } /** * Gets the nodeBegin. * @return The nodeBegin value. */ public int getTagBegin() { return (nodeBegin); } /** * Sets the nodeEnd. * @param tagEnd The nodeEnd to set */ public void setTagEnd(int tagEnd) { this.nodeEnd = tagEnd; } /** * Gets the nodeEnd. * @return The nodeEnd value. */ public int getTagEnd() { return (nodeEnd); } public void setText (String text) { try { mPage = new Page (text); nodeBegin = 0; nodeEnd = text.length (); } catch (ParserException pe) { } } public void setThisScanner(TagScanner scanner) { thisScanner = scanner; } public String toPlainTextString() { return EMPTY_STRING; } /** * A call to a tag's toHTML() method will render it in HTML * Most tags that do not have children and inherit from Tag, * do not need to override toHTML(). * @see org.htmlparser.Node#toHtml() */ public String toHtml() { StringBuffer ret; Vector attributes; Attribute attribute; String value; ret = new StringBuffer (); attributes = getAttributesEx (); ret.append ("<"); if (0 < attributes.size ()) { // special handling for the node name attribute = (Attribute)attributes.elementAt (0); ret.append (attribute.getName ()); // the rest for (int i = 1; i < attributes.size (); i++) { attribute = (Attribute)attributes.elementAt (i); attribute.toString (ret); } } if (isEmptyXmlTag ()) ret.append ("/"); ret.append (">"); return (ret.toString ()); } /** * Print the contents of the tag */ public String toString() { String tag; Cursor start; Cursor end; tag = getTagName (); if (tag.startsWith ("/")) tag = "End"; else tag = "Tag"; start = new Cursor (getPage (), elementBegin ()); end = new Cursor (getPage (), elementEnd ()); return (tag + " (" + start.toString () + "," + end.toString () + "): " + getText ()); } /** * Sets the tagParser. * @param tagParser The tagParser to set */ public static void setTagParser(TagParser tagParser) { //todo: fix this Tag.tagParser = tagParser; } /** * Determines if the given tag breaks the flow of text. * @return <code>true</code> if following text would start on a new line, * <code>false</code> otherwise. */ public boolean breaksFlow () { return (breakTags.contains (getText ().toUpperCase ())); } /** * This method verifies that the current tag matches the provided * filter. The match is based on the string object and not its contents, * so ensure that you are using static final filter strings provided * in the tag classes. * @see org.htmlparser.Node#collectInto(NodeList, String) */ public void collectInto(NodeList collectionList, String filter) { if (thisScanner!=null && thisScanner.getFilter()==filter) collectionList.add(this); } /** * Returns table of attributes in the tag * @return Hashtable * @deprecated This method is deprecated. Use getAttributes() instead. */ public Hashtable getParsed() { return getAttributes (); } /** * Sometimes, a scanner may need to request a re-evaluation of the * attributes in a tag. This may happen when there is some correction * activity. An example of its usage can be found in ImageTag. * <br> * <B>Note:<B> This is an intensive task, hence call only when * really necessary * @return Hashtable */ public Hashtable redoParseAttributes() { mAttributes = null; getAttributesEx (); return (getAttributes ()); } public void accept(NodeVisitor visitor) { // todo: fix this visitor.visitTag(this); } public String getType() { return TYPE; } /** * Is this an empty xml tag of the form<br> * <tag/> * @return boolean */ public boolean isEmptyXmlTag() { return emptyXmlTag; } public void setEmptyXmlTag(boolean emptyXmlTag) { this.emptyXmlTag = emptyXmlTag; } } --- NEW FILE: package.html --- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <!-- @(#)package.html 1.60 98/01/27 HTMLParser Library v1_4_20030810 - A java-based parser for HTML Copyright (C) Dec 31, 2000 Somik Raha This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA For any questions or suggestions, you can write to me at : Email :so...@in... Postal Address : Somik Raha Extreme Programmer & Coach Industrial Logic Corporation 2583 Cedar Street, Berkeley, CA 94708, USA Website : http://www.industriallogic.com --> <TITLE>Nodes Package</TITLE> </HEAD> <BODY> The nodes package will eventually be the lexemes returned by the base level I/O subsystem. <EM>It is currently under development.</EM> There are three types of lexems so far, <code>RemarkNode</code>, <code>StringNode</code> and <code>TagNode</code>. Within the <code>TagNode</code> objects is a list of <code>Attribute</code> objects.<p> The <code>Lexer</code> parses the HTML stream into a contiguous stream of these tokens. They all implement the <code>Node</code> interface and are derived from the <code>AbstractNode</code> class. </BODY> </HTML> |
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests
In directory sc8-pr-cvs1:/tmp/cvs-serv9123/tests/lexerTests
Modified Files:
AllTests.java PageIndexTests.java PageTests.java
SourceTests.java
Added Files:
KitTest.java LexerTests.java
Log Message:
Third drop for new i/o subsystem.
--- NEW FILE: KitTest.java ---
/*
* KitTest.java
*
* Created on August 16, 2003, 2:16 PM
*/
package org.htmlparser.tests.lexerTests;
import java.io.IOException;
import java.io.Reader;
import java.net.URL;
import java.util.Vector;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
import javax.swing.text.EditorKit;
import javax.swing.text.Element;
import javax.swing.text.ElementIterator;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.HTMLEditorKit.Parser;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import org.htmlparser.Node;
import org.htmlparser.lexer.Lexer;
import org.htmlparser.lexer.Page;
import org.htmlparser.lexer.nodes.Attribute;
import org.htmlparser.lexer.nodes.TagNode;
import org.htmlparser.util.ParserException;
/**
*
* @author derrick
*/
public class KitTest extends ParserCallback
{
Vector mNodes;
int mIndex;
/** Creates a new instance of KitTest */
public KitTest (Vector nodes)
{
mNodes = nodes;
mIndex = 0;
}
public void handleText (char[] data, int pos)
{
StringBuffer sb;
String theirs;
Node node;
int match;
String ours;
sb = new StringBuffer (data.length);
for (int i = 0; i < data.length; i++)
{
if (160 == (int)data[i])
sb.append (" ");
else
sb.append (data[i]);
}
theirs = sb.toString ();
match = -1;
for (int i = mIndex; i < Math.min (mIndex + 25, mNodes.size ()); i++)
{
node = (Node)mNodes.elementAt (i);
ours = node.getText ();
if (theirs.equalsIgnoreCase (ours))
{
match = i;
break;
}
}
if (-1 == match)
{
node = (Node)mNodes.elementAt (mIndex);
ours = node.getText ();
System.out.println ("theirs: " + theirs);
System.out.println (" ours: " + ours);
mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
}
}
public void handleComment (char[] data, int pos)
{
StringBuffer sb;
String theirs;
Node node;
int match;
String ours;
sb = new StringBuffer (data.length);
sb.append (data);
theirs = sb.toString ();
match = -1;
for (int i = mIndex; i < Math.min (mIndex + 25, mNodes.size ()); i++)
{
node = (Node)mNodes.elementAt (i);
ours = node.getText ();
if (theirs.equalsIgnoreCase (ours))
{
match = i;
break;
}
}
if (-1 == match)
{
node = (Node)mNodes.elementAt (mIndex);
ours = node.getText ();
System.out.println ("theirs: " + theirs);
System.out.println (" ours: " + ours);
mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
}
}
public void handleStartTag (HTML.Tag t, MutableAttributeSet a, int pos)
{
StringBuffer sb;
String theirs;
Node node;
int match;
String ours;
theirs = t.toString ();
match = -1;
for (int i = mIndex; i < Math.min (mIndex + 25, mNodes.size ()); i++)
{
node = (Node)mNodes.elementAt (i);
if (node instanceof TagNode)
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ();
if (theirs.equalsIgnoreCase (ours))
{
match = i;
break;
}
}
}
if (-1 == match)
{
node = (Node)mNodes.elementAt (mIndex);
ours = node.getText ();
System.out.println ("theirs: " + theirs);
System.out.println (" ours: " + ours);
mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
}
}
public void handleEndTag (HTML.Tag t, int pos)
{
StringBuffer sb;
String theirs;
Node node;
int match;
String ours;
theirs = t.toString ();
match = -1;
for (int i = mIndex; i < Math.min (mIndex + 25, mNodes.size ()); i++)
{
node = (Node)mNodes.elementAt (i);
if (node instanceof TagNode)
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ().substring (1);
if (theirs.equalsIgnoreCase (ours))
{
match = i;
break;
}
}
}
if (-1 == match)
{
node = (Node)mNodes.elementAt (mIndex);
ours = node.getText ();
System.out.println ("theirs: " + theirs);
System.out.println (" ours: " + ours);
mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
}
}
public void handleSimpleTag (HTML.Tag t, MutableAttributeSet a, int pos)
{
StringBuffer sb;
String theirs;
Node node;
int match;
String ours;
theirs = t.toString ();
match = -1;
for (int i = mIndex; i < Math.min (mIndex + 25, mNodes.size ()); i++)
{
node = (Node)mNodes.elementAt (i);
if (node instanceof TagNode)
{
ours = ((Attribute)(((TagNode)node).getAttributesEx ().elementAt (0))).getName ();
if (theirs.equalsIgnoreCase (ours))
{
match = i;
break;
}
else if (theirs.equalsIgnoreCase (ours.substring (1)))
{
match = i;
break;
}
}
}
if (-1 == match)
{
node = (Node)mNodes.elementAt (mIndex);
ours = node.getText ();
System.out.println ("theirs: " + theirs);
System.out.println (" ours: " + ours);
mIndex++;
}
else
{
// System.out.println (" match: " + theirs);
mIndex = match + 1;
}
}
public void handleError (String errorMsg, int pos)
{
// System.out.println ("******* error @" + pos + " ******** " + errorMsg);
}
public void flush () throws BadLocationException
{
}
/**
* This is invoked after the stream has been parsed, but before
* <code>flush</code>. <code>eol</code> will be one of \n, \r
* or \r\n, which ever is encountered the most in parsing the
* stream.
*
* @since 1.3
*/
public void handleEndOfLineString (String eol)
{
}
// /**
// * Get the document data from the URL.
// * @param rd The reader to read bytes from.
// * @return The parsed HTML document.
// */
// protected static Element[] getData (Reader rd) throws IOException
// {
// EditorKit kit;
// Document doc;
// Element[] ret;
//
// ret = null;
//
// // need this because HTMLEditorKit is not thread safe apparently
// synchronized (Boolean.TRUE)
// {
// kit = new HTMLEditorKit ();
// doc = kit.createDefaultDocument ();
// // the Document class does not yet handle charset's properly
// doc.putProperty ("IgnoreCharsetDirective", Boolean.TRUE);
//
// try
// {
// // parse the HTML
// kit.read (rd, doc, 0);
// }
// catch (BadLocationException ble)
// {
// throw new IOException ("parse error " + ble.getMessage ());
// }
//
// ret = doc.getRootElements ();
// }
//
// return (ret);
// }
// public static void scanElements (Element element) throws BadLocationException
// {
// int start;
// int end;
// String string;
// ElementIterator it;
// Element child;
//
// if (element.isLeaf ())
// {
// start = element.getStartOffset ();
// end = element.getEndOffset ();
// string = element.getDocument ().getText (start, end - start);
// System.out.println (string);
// }
// else
// // iterate through the elements of the element
// for (int i = 0; i < element.getElementCount (); i++)
// {
// child = element.getElement (i);
// scanElements (child);
// }
// }
class MyKit extends HTMLEditorKit
{
public MyKit ()
{
}
public HTMLEditorKit.Parser getParser ()
{
return (super.getParser ());
}
}
public MyKit getKit ()
{
return (new MyKit ());
}
/**
* @param args the command line arguments
*/
public static void main (String[] args) throws ParserException, IOException
{
Lexer lexer;
Node node;
Vector nodes;
KitTest test;
MyKit kit;
Parser parser;
Element[] elements;
// pass through it once to read the entire page
URL url = new URL ("http://sourceforge.net/projects/htmlparser");
lexer = new Lexer (url.openConnection ());
nodes = new Vector ();
while (null != (node = lexer.nextNode ()))
nodes.addElement (node);
// reset the reader
lexer.getPage ().getSource ().reset ();
test = new KitTest (nodes);
kit = test.getKit ();
parser = kit.getParser ();
parser.parse ((Reader)lexer.getPage ().getSource (), (ParserCallback)test, true);
}
}
--- NEW FILE: LexerTests.java ---
// HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//
// For any questions or suggestions, you can write to me at :
// Email :so...@in...
//
// Postal Address :
// Somik Raha
// Extreme Programmer & Coach
// Industrial Logic Corporation
// 2583 Cedar Street, Berkeley,
// CA 94708, USA
// Website : http://www.industriallogic.com
package org.htmlparser.tests.lexerTests;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.URL;
import junit.framework.TestCase;
import org.htmlparser.Node;
import org.htmlparser.lexer.Lexer;
import org.htmlparser.lexer.Page;
import org.htmlparser.lexer.PageIndex;
import org.htmlparser.lexer.Stream;
import org.htmlparser.lexer.nodes.RemarkNode;
import org.htmlparser.lexer.nodes.StringNode;
import org.htmlparser.lexer.nodes.TagNode;
import org.htmlparser.util.ParserException;
public class LexerTests extends TestCase
{
/**
* Test the Lexer class.
*/
public LexerTests (String name)
{
super (name);
}
/**
* Test operation without tags.
*/
public void testPureText () throws ParserException
{
String reference;
Lexer lexer;
StringNode node;
reference = "Hello world";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
}
/**
* Test operation with Unix line endings.
*/
public void testUnixEOL () throws ParserException
{
String reference;
Lexer lexer;
StringNode node;
reference = "Hello\nworld";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
}
/**
* Test operation with Dos line endings.
*/
public void testDosEOL () throws ParserException
{
String reference;
Lexer lexer;
StringNode node;
reference = "Hello\r\nworld";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
reference = "Hello\rworld";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
}
/**
* Test operation with line endings near the end of input.
*/
public void testEOF_EOL () throws ParserException
{
String reference;
Lexer lexer;
StringNode node;
reference = "Hello world\n";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
reference = "Hello world\r";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
reference = "Hello world\r\n";
lexer = new Lexer (reference);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", reference, node.getText ());
}
/**
* Test that tags stop string nodes.
*/
public void testTagStops () throws ParserException
{
String[] references =
{
"Hello world",
"Hello world\n",
"Hello world\r\n",
"Hello world\r",
};
String[] suffixes =
{
"<head>",
"</head>",
"<%=head%>",
"<!--head-->",
};
Lexer lexer;
StringNode node;
for (int i = 0; i < references.length; i++)
{
for (int j = 0; j < suffixes.length; j++)
{
lexer = new Lexer (references[i] + suffixes[j]);
node = (StringNode)lexer.nextNode ();
assertEquals ("StringNode contents wrong", references[i], node.getText ());
}
}
}
/**
* Test operation with only tags.
*/
public void testPureTag () throws ParserException
{
String reference;
String suffix;
Lexer lexer;
TagNode node;
reference = "<head>";
lexer = new Lexer (reference);
node = (TagNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
reference = "<head>";
suffix = "<body>";
lexer = new Lexer (reference + suffix);
node = (TagNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
node = (TagNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", suffix, node.toHtml ());
}
/**
* Test operation with attributed tags.
*/
public void testAttributedTag () throws ParserException
{
String reference;
Lexer lexer;
TagNode node;
reference = "<head lang='en_US' dir=ltr\nprofile=\"http://htmlparser.sourceforge.org/dictionary.html\">";
lexer = new Lexer (reference);
node = (TagNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
}
/**
* Test operation with comments.
*/
public void testRemarkNode () throws ParserException
{
String reference;
Lexer lexer;
RemarkNode node;
String suffix;
reference = "<!-- This is a comment -->";
lexer = new Lexer (reference);
node = (RemarkNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
reference = "<!-- This is a comment -- >";
lexer = new Lexer (reference);
node = (RemarkNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
reference = "<!-- This is a\nmultiline comment -->";
lexer = new Lexer (reference);
node = (RemarkNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
suffix = "<head>";
reference = "<!-- This is a comment -->";
lexer = new Lexer (reference + suffix);
node = (RemarkNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
reference = "<!-- This is a comment -- >";
lexer = new Lexer (reference + suffix);
node = (RemarkNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
reference = "<!-- This is a\nmultiline comment -->";
lexer = new Lexer (reference + suffix);
node = (RemarkNode)lexer.nextNode ();
assertEquals ("Tag contents wrong", reference, node.toHtml ());
}
/**
* Try a real page.
*/
public void testReal () throws ParserException, IOException
{
Lexer lexer;
Node node;
URL url = new URL ("http://sourceforge.net/projects/htmlparser");
lexer = new Lexer (url.openConnection ());
while (null != (node = lexer.nextNode ()))
System.out.println (node.toString ());
}
}
Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/AllTests.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** AllTests.java 11 Aug 2003 00:18:31 -0000 1.4
--- AllTests.java 17 Aug 2003 16:09:27 -0000 1.5
***************
*** 46,49 ****
--- 46,50 ----
suite.addTestSuite (PageTests.class);
suite.addTestSuite (PageIndexTests.class);
+ suite.addTestSuite (LexerTests.class);
return suite;
}
Index: PageIndexTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/PageIndexTests.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** PageIndexTests.java 11 Aug 2003 00:18:31 -0000 1.2
--- PageIndexTests.java 17 Aug 2003 16:09:27 -0000 1.3
***************
*** 123,131 ****
// test for correct position
if (0 <= pos - 1)
! assertTrue ("search error less " + pos + " " + index.elementAt (pos - 1) + " " + n, index.elementAt (pos - 1) < n);
if (pos + 1 < index.size ())
assertTrue ("search error greater " + pos + " " + index.elementAt (pos + 1) + " " + n, index.elementAt (pos + 1) > n);
-
- assertTrue ("wrong position", pos == index.add (n));
}
--- 123,129 ----
// test for correct position
if (0 <= pos - 1)
! assertTrue ("search error less " + pos + " " + index.elementAt (pos - 1) + " " + n, index.elementAt (pos - 1) <= n);
if (pos + 1 < index.size ())
assertTrue ("search error greater " + pos + " " + index.elementAt (pos + 1) + " " + n, index.elementAt (pos + 1) > n);
}
Index: PageTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/PageTests.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** PageTests.java 11 Aug 2003 00:18:31 -0000 1.4
--- PageTests.java 17 Aug 2003 16:09:27 -0000 1.5
***************
*** 66,70 ****
try
{
! page = new Page (null);
assertTrue ("null value in constructor", false);
}
--- 66,80 ----
try
{
! page = new Page ((URLConnection)null);
! assertTrue ("null value in constructor", false);
! }
! catch (IllegalArgumentException iae)
! {
! // expected outcome
! }
!
! try
! {
! page = new Page ((String)null);
assertTrue ("null value in constructor", false);
}
Index: SourceTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/SourceTests.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** SourceTests.java 11 Aug 2003 00:18:31 -0000 1.3
--- SourceTests.java 17 Aug 2003 16:09:27 -0000 1.4
***************
*** 106,110 ****
source = new Source (new Stream (new ByteArrayInputStream ("hello word".getBytes ())), null);
assertTrue ("no character", -1 != source.read ());
! source.close ();
try
{
--- 106,110 ----
source = new Source (new Stream (new ByteArrayInputStream ("hello word".getBytes ())), null);
assertTrue ("no character", -1 != source.read ());
! source.destroy ();
try
{
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1:/tmp/cvs-serv9123/lexer Modified Files: Cursor.java Page.java PageIndex.java Source.java package.html Added Files: Lexer.java Log Message: Third drop for new i/o subsystem. --- NEW FILE: Lexer.java --- // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // // For any questions or suggestions, you can write to me at : // Email :so...@in... // // Postal Address : // Somik Raha // Extreme Programmer & Coach // Industrial Logic Corporation // 2583 Cedar Street, Berkeley, // CA 94708, USA // Website : http://www.industriallogic.com package org.htmlparser.lexer; import java.io.UnsupportedEncodingException; import java.net.URLConnection; import java.util.Vector; import org.htmlparser.Node; import org.htmlparser.lexer.Stream; import org.htmlparser.lexer.nodes.Attribute; import org.htmlparser.lexer.nodes.RemarkNode; import org.htmlparser.lexer.nodes.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.util.ParserException; /** * This class parses the HTML stream into nodes. * There are three major types of nodes (lexemes): * <li>RemarkNode</li> * <li>StringNode</li> * <li>TagNode</li> * Each time <code>nextNode()</code> is called, another node is returned until * the stream is exhausted, and <code>null</code> is returned. */ public class Lexer { /** * The page lexemes are retrieved from. */ protected Page mPage; /** * The current position on the page. */ protected Cursor mCursor; /** * Creates a new instance of a Lexer. * @param page The page with HTML text. */ public Lexer (Page page) { mPage = page; mCursor = new Cursor (page, 0); } /** * Creates a new instance of a Lexer. * @param text The text to parse. */ public Lexer (String text) throws ParserException { this (new Page (text)); } /** * Creates a new instance of a Lexer. * @param connection The url to parse. */ public Lexer (URLConnection connection) throws ParserException { this (new Page (connection)); } /** * Get the page this lexer is working on. * @return The page that nodes are being read from. */ public Page getPage () { return (mPage); } /** * Get the next node from the source. * @return A RemarkNode, StringNode or Tag, or <code>null</code> if no * more lexemes are present. * @exception ParserException If there is a problem with the underlying page. */ public Node nextNode () throws ParserException { Cursor probe; char ch; Node ret; probe = mCursor.dup (); ch = mPage.getCharacter (probe); switch (ch) { case 0: // end of input ret = null; break; case '<': ch = mPage.getCharacter (probe); if (0 == ch) ret = parseString (); else if ('/' == ch || '%' == ch || Character.isLetter (ch)) ret = parseTag (); else if ('!' == ch) { ch = mPage.getCharacter (probe); if ('-' == ch) ret = parseRemark (); else ret = parseTag (); } else ret = parseString (); break; default: ret = parseString (); break; } return (ret); } /** * Parse a string node. * Scan characters until "</", "<%", "<!" or < followed by a * letter is encountered, or the input stream is exhausted, in which * case <code>null</code> is returned. */ protected Node parseString () throws ParserException { Cursor cursor; boolean done; char ch; int length; StringNode ret; cursor = mCursor.dup (); done = false; while (!done) { ch = mPage.getCharacter (cursor); if (0 == ch) done = true; else if ('<' == ch) { ch = mPage.getCharacter (cursor); if (0 == ch) done = true; // the order of these tests might be optimized for speed: else if ('/' == ch || '%' == ch || Character.isLetter (ch) || '!' == ch) { done = true; cursor.retreat (); cursor.retreat (); } else { // it's not a tag, so keep going, // the extra characters consumed are in this string } } } length = cursor.getPosition () - mCursor.getPosition (); if (0 != length) { // got some characters ret = new StringNode (mPage, mCursor.getPosition (), cursor.getPosition ()); mCursor = cursor; } else ret = null; return (ret); } private void whitespace (Vector attributes, int[] bookmarks) { if (bookmarks[1] > bookmarks[0]) attributes.addElement (new Attribute (null, mPage.getText (bookmarks[0], bookmarks[1]), (char)0)); } private void standalone (Vector attributes, int[] bookmarks) { attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), null, (char)0)); } private void empty (Vector attributes, int[] bookmarks) { attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), "", (char)0)); } private void naked (Vector attributes, int[] bookmarks) { attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[3], bookmarks[4]), (char)0)); } private void single_quote (Vector attributes, int[] bookmarks) { attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[4] + 1, bookmarks[5]), '\'')); } private void double_quote (Vector attributes, int[] bookmarks) { attributes.addElement (new Attribute (mPage.getText (bookmarks[1], bookmarks[2]), mPage.getText (bookmarks[5] + 1, bookmarks[6]), '"')); } /** * Parse a tag. * Parse the name and attributes from a start tag.<p> * From the <a href="http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2"> * HTML 4.01 Specification, W3C Recommendation 24 December 1999</a> * http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2<p> * <cite> * 3.2.2 Attributes<p> * Elements may have associated properties, called attributes, which may * have values (by default, or set by authors or scripts). Attribute/value * pairs appear before the final ">" of an element's start tag. Any number * of (legal) attribute value pairs, separated by spaces, may appear in an * element's start tag. They may appear in any order.<p> * In this example, the id attribute is set for an H1 element: * <code> * <H1 id="section1"> * </code> * This is an identified heading thanks to the id attribute * <code> * </H1> * </code> * By default, SGML requires that all attribute values be delimited using * either double quotation marks (ASCII decimal 34) or single quotation * marks (ASCII decimal 39). Single quote marks can be included within the * attribute value when the value is delimited by double quote marks, and * vice versa. Authors may also use numeric character references to * represent double quotes (&#34;) and single quotes (&#39;). * For doublequotes authors can also use the character entity reference &quot;.<p> * In certain cases, authors may specify the value of an attribute without * any quotation marks. The attribute value may only contain letters * (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), * periods (ASCII decimal 46), underscores (ASCII decimal 95), * and colons (ASCII decimal 58). We recommend using quotation marks even * when it is possible to eliminate them.<p> * Attribute names are always case-insensitive.<p> * Attribute values are generally case-insensitive. The definition of each * attribute in the reference manual indicates whether its value is case-insensitive.<p> * All the attributes defined by this specification are listed in the attribute index.<p> * </cite> * <p> * This method uses a state machine with the following states: * <ol> * <li>state 0 - outside of any attribute</li> * <li>state 1 - within attributre name</li> * <li>state 2 - equals hit</li> * <li>state 3 - within naked attribute value.</li> * <li>state 4 - within single quoted attribute value</li> * <li>state 5 - within double quoted attribute value</li> * </ol> * <p> * The starting point for the various components is stored in an array * of integers that match the initiation point for the states one-for-one, * i.e. bookmarks[0] is where state 0 began, bookmarks[1] is where state 1 * began, etc. * Attributes are stored in a <code>Vector</code> having * one slot for each whitespace or attribute/value pair. * The first slot is for attribute name (kind of like a standalone attribute). */ protected Node parseTag () throws ParserException { Cursor cursor; boolean done; char ch; int state; int[] bookmarks; Vector attributes; int length; TagNode ret; cursor = mCursor.dup (); // sanity check ch = mPage.getCharacter (cursor); if ('<' != ch) return (parseString ()); done = false; attributes = new Vector (); state = 0; bookmarks = new int[7]; bookmarks[0] = cursor.getPosition (); while (!done) { bookmarks[state + 1] = cursor.getPosition (); ch = mPage.getCharacter (cursor); switch (state) { case 0: // outside of any attribute if ((0 == ch) || ('>' == ch)) { whitespace (attributes, bookmarks); done = true; } else if (!Character.isWhitespace (ch)) { whitespace (attributes, bookmarks); state = 1; } break; case 1: // within attributre name if ((0 == ch) || ('>' == ch)) { standalone (attributes, bookmarks); done = true; } else if (Character.isWhitespace (ch)) { standalone (attributes, bookmarks); bookmarks[0] = bookmarks[2]; state = 0; } else if ('=' == ch) state = 2; break; case 2: // equals hit if ((0 == ch) || ('>' == ch)) { empty (attributes, bookmarks); done = true; } else if ('\'' == ch) { state = 4; bookmarks[4] = bookmarks[3]; } else if ('"' == ch) { state = 5; bookmarks[5] = bookmarks[3]; } else state = 3; break; case 3: // within naked attribute value if ('>' == ch) { naked (attributes, bookmarks); done = true; } else if (Character.isWhitespace (ch)) { naked (attributes, bookmarks); bookmarks[0] = bookmarks[4]; state = 0; } break; case 4: // within single quoted attribute value if (0 == ch) { single_quote (attributes, bookmarks); done = true; // complain? } else if ('\'' == ch) { single_quote (attributes, bookmarks); bookmarks[0] = bookmarks[5] + 1; state = 0; } break; case 5: // within double quoted attribute value if (0 == ch) { double_quote (attributes, bookmarks); done = true; // complain? } else if ('"' == ch) { double_quote (attributes, bookmarks); bookmarks[0] = bookmarks[6] + 1; state = 0; } break; default: throw new IllegalStateException ("how the fuck did we get in state " + state); } } length = cursor.getPosition () - mCursor.getPosition (); if (0 != length) { // return tag based on second character, '/', '%', Letter (ch), '!' if (2 > length) // this is an error return (parseString ()); ret = new TagNode (mPage, mCursor.getPosition (), cursor.getPosition (), attributes); mCursor = cursor; } else ret = null; return (ret); } /** * Parse a comment. * Parse a remark markup.<p> * From the <a href="http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4"> * HTML 4.01 Specification, W3C Recommendation 24 December 1999</a> * http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4<p> * <cite> * 3.2.4 Comments<p> * HTML comments have the following syntax:<p> * <code> * <!-- this is a comment --><p> * <!-- and so is this one,<p> * which occupies more than one line --><p> * </code> * White space is not permitted between the markup declaration * open delimiter("<!") and the comment open delimiter ("--"), * but is permitted between the comment close delimiter ("--") and * the markup declaration close delimiter (">"). * A common error is to include a string of hyphens ("---") within a comment. * Authors should avoid putting two or more adjacent hyphens inside comments. * Information that appears between comments has no special meaning * (e.g., character references are not interpreted as such). * Note that comments are markup.<p> * </cite> * <p> * This method uses a state machine with the following states: * <ol> * <li>state 0 - prior to the first open delimiter</li> * <li>state 1 - prior to the second open delimiter</li> * <li>state 2 - prior to the first closing delimiter</li> * <li>state 3 - prior to the second closing delimiter</li> * <li>state 4 - prior to the terminating ></li> * </ol> * <p> * All comment text (everything excluding the < and >), is included * in the remark text. * We allow terminators like --!> even though this isn't part of the spec. */ protected Node parseRemark () throws ParserException { Cursor cursor; boolean done; char ch; int state; int length; RemarkNode ret; cursor = mCursor.dup (); // sanity check ch = mPage.getCharacter (cursor); if ('<' != ch) return (parseString ()); ch = mPage.getCharacter (cursor); if ('!' != ch) return (parseString ()); done = false; state = 0; while (!done) { ch = mPage.getCharacter (cursor); switch (state) { case 0: // prior to the first open delimiter if ('-' == ch) state = 1; else return (parseString ()); break; case 1: // prior to the second open delimiter if ('-' == ch) state = 2; else return (parseString ()); break; case 2: // prior to the first closing delimiter if ('-' == ch) state = 3; break; case 3: // prior to the second closing delimiter if ('-' == ch) state = 4; else state = 2; break; case 4: // prior to the terminating > if ('>' == ch) done = true; else if (!Character.isWhitespace (ch) || ('!' == ch)) state = 2; break; default: throw new IllegalStateException ("how the fuck did we get in state " + state); } } length = cursor.getPosition () - mCursor.getPosition (); if (0 != length) { // return tag based on second character, '/', '%', Letter (ch), '!' if (2 > length) // this is an error return (parseString ()); ret = new RemarkNode (mPage, mCursor.getPosition (), cursor.getPosition ()); mCursor = cursor; } else ret = null; return (ret); } } Index: Cursor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Cursor.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** Cursor.java 11 Aug 2003 00:18:28 -0000 1.2 --- Cursor.java 17 Aug 2003 16:09:27 -0000 1.3 *************** *** 79,83 **** --- 79,130 ---- return (mPosition); } + + /** + * Move the cursor position ahead one character. + */ + public void advance () + { + mPosition++; + } + + /** + * Move the cursor position back one character. + */ + public void retreat () + { + mPosition--; + if (0 > mPosition) + mPosition = 0; + } + + /** + * Make a new cursor just like this one. + * @return The new cursor positioned where <code>this</code> one is, + * and referring to the same page. + */ + public Cursor dup () + { + return (new Cursor (getPage (), getPosition ())); + } + + public String toString () + { + int row; + int column; + StringBuffer ret; + ret = new StringBuffer (9 * 3 + 3); // three ints and delimiters + ret.append (getPosition ()); + row = mPage.row (this); + column = mPage.column (this); + ret.append ("["); + ret.append (row); + ret.append (","); + ret.append (column); + ret.append ("]"); + + return (ret.toString ()); + } + // // Ordered interface Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** Page.java 11 Aug 2003 00:18:28 -0000 1.4 --- Page.java 17 Aug 2003 16:09:27 -0000 1.5 *************** *** 29,32 **** --- 29,33 ---- package org.htmlparser.lexer; + import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.Reader; *************** *** 44,51 **** /** * Represents the contents of an HTML page. ! * Contains a character array of the page downloaded so far, ! * a String with those characters in it, ! * and an index of positions of line separators (actually the first ! * character position on the next line). */ public class Page --- 45,50 ---- /** * Represents the contents of an HTML page. ! * Contains the source of characters and an index of positions of line ! * separators (actually the first character position on the next line). */ public class Page *************** *** 70,83 **** /** - * The characters read so far from the source. - */ - protected char[] mCharacters; - - /** - * The string representation of the source. - */ - protected String mString; - - /** * Character positions of the first character in each line. */ --- 69,72 ---- *************** *** 102,106 **** /** ! * Construct a page reading from a URL. * @param connection A fully conditioned connection. The connect() * method will be called so it need not be connected yet. --- 91,95 ---- /** ! * Construct a page reading from a URL connection. * @param connection A fully conditioned connection. The connect() * method will be called so it need not be connected yet. *************** *** 113,119 **** */ public Page (URLConnection connection) throws ParserException - // throws - // IOException, - // UnsupportedEncodingException { if (null == connection) --- 102,105 ---- *************** *** 139,148 **** throw new ParserException ("oops2", ioe); } - mCharacters = null; - mString = null; mIndex = new PageIndex (this); } /** * Try and extract the character set from the HTTP header. * @param connection The connection with the charset info. --- 125,270 ---- throw new ParserException ("oops2", ioe); } mIndex = new PageIndex (this); } /** + * Construct a page from a stream encoded with the given charset. + * @param stream The source of bytes. + * @param charset The encoding used. + * If null, defaults to the <code>DEFAULT_CHARSET</code>. + * @exception UnsupportedEncodingException If the given charset is not supported. + */ + public Page (Stream stream, String charset) + throws + UnsupportedEncodingException + { + if (null == stream) + throw new IllegalArgumentException ("stream cannot be null"); + if (null == charset) + charset = DEFAULT_CHARSET; + mSource = new Source (stream, charset); + mIndex = new PageIndex (this); + } + + public Page (String text) throws ParserException + { + Stream stream; + Page ret; + + if (null == text) + throw new IllegalArgumentException ("text cannot be null"); + try + { + stream = new Stream (new ByteArrayInputStream (text.getBytes (Page.DEFAULT_CHARSET))); + mSource = new Source (stream, Page.DEFAULT_CHARSET); + mIndex = new PageIndex (this); + } + catch (UnsupportedEncodingException uee) + { + throw new ParserException ("problem making a page", uee); + } + } + + /** + * Get the source this page is reading from. + */ + public Source getSource () + { + return (mSource); + } + + /** + * Read the character at the cursor position. + * The cursor position can be behind or equal to the current source position. + * Returns end of lines (EOL) as \n, by converting \r and \r\n to \n, + * and updates the end-of-line index accordingly + * Advances the cursor position by one (or two in the \r\n case). + * @param cursor The position to read at. + * @return The character at that position, and modifies the cursor to + * prepare for the next read. If the source is exhausted a zero is returned. + * @exception ParserException If an IOException on the underlying source + * occurs, or an attemp is made to read characters in the future (the + * cursor position is ahead of the underlying stream) + */ + public char getCharacter (Cursor cursor) + throws + ParserException + { + int i; + char ret; + + if (mSource.mOffset < cursor.getPosition ()) + // hmmm, we could skip ahead, but then what about the EOL index + throw new ParserException ("attempt to read future characters from source"); + else if (mSource.mOffset == cursor.getPosition ()) + try + { + i = mSource.read (); + if (-1 == i) + ret = 0; + else + { + ret = (char)i; + cursor.advance (); + } + } + catch (IOException ioe) + { + throw new ParserException ( + "problem reading a character at position " + + cursor.getPosition (), ioe); + } + else + { + // historic read + ret = mSource.mBuffer[cursor.getPosition ()]; + cursor.advance (); + } + + // handle \r + if ('\r' == ret) + { // switch to single character EOL + ret = '\n'; + + // check for a \n in the next position + if (mSource.mOffset == cursor.getPosition ()) + try + { + i = mSource.read (); + if (-1 == i) + { + // do nothing + } + else if ('\n' == (char)i) + cursor.advance (); + else + try + { + mSource.unread (); + } + catch (IOException ioe) + { + throw new ParserException ( + "can't unread a character at position " + + cursor.getPosition (), ioe); + } + } + catch (IOException ioe) + { + throw new ParserException ( + "problem reading a character at position " + + cursor.getPosition (), ioe); + } + else if ('\n' == mSource.mBuffer[cursor.getPosition ()]) + cursor.advance (); + } + if ('\n' == ret) + // update the EOL index in any case + mIndex.add (cursor); + + return (ret); + } + + /** * Try and extract the character set from the HTTP header. * @param connection The connection with the charset info. *************** *** 294,297 **** --- 416,483 ---- } + /** + * Get the line number for a cursor. + * @param cursor The character offset into the page. + * @return The line number the character is in. + */ + public int row (Cursor cursor) + { + return (mIndex.row (cursor)); + } + + /** + * Get the column number for a cursor. + * @param cursor The character offset into the page. + * @return The character offset into the line this cursor is on. + */ + public int column (Cursor cursor) + { + return (mIndex.column (cursor)); + } + + /** + * Get the text identified by the given limits. + * @param start The starting position, zero based. + * @param end The ending position + * (exclusive, i.e. the character at the ending position is not included), + * zero based. + * @return The text from <code>start</code> to <code>end</code>. + * @see #getText(StringBuffer, int, int) + */ + public String getText (int start, int end) + { + StringBuffer ret; + + ret = new StringBuffer (Math.abs (end - start)); + getText (ret, start, end); + + return (ret.toString ()); + } + + /** + * Put the text identified by the given limits into the given buffer. + * @param buffer The accumulator for the characters. + * @param start The starting position, zero based. + * @param end The ending position + * (exclusive, i.e. the character at the ending position is not included), + * zero based. + */ + public void getText (StringBuffer buffer, int start, int end) + { + int length; + StringBuffer ret; + + if ((mSource.mOffset < start) || (mSource.mOffset < end)) + throw new IllegalArgumentException ("attempt to extract future characters from source"); + if (end < start) + { + length = end; + end = start; + start = length; + } + length = end - start; + buffer.append (mSource.mBuffer, start, length); + } + // // Bean patterns *************** *** 307,309 **** --- 493,496 ---- return (mLog); } + } Index: PageIndex.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/PageIndex.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** PageIndex.java 11 Aug 2003 00:18:28 -0000 1.2 --- PageIndex.java 17 Aug 2003 16:09:27 -0000 1.3 *************** *** 206,210 **** public int row (Cursor cursor) { ! return (Sort.bsearch (this, cursor)); } --- 206,220 ---- public int row (Cursor cursor) { ! int ret; ! ! ret = Sort.bsearch (this, cursor); ! // handle line transition, the search returns the index if it matches ! // exactly one of the line end positions, so we advance one line if ! // it's equal to the offset at the row index, since that position is ! // actually the beginning of the next line ! if ((ret < mCount) && (cursor.getPosition () == mIndices[ret])) ! ret++; ! ! return (ret); } *************** *** 229,238 **** int previous; ! row = Sort.bsearch (this, cursor); ! // note, this shouldn't be zero if the first element of each index is offset zero if (0 != row) previous = this.elementAt (row - 1); else ! previous = this.elementAt (0); return (cursor.getPosition () - previous); --- 239,247 ---- int previous; ! row = row (cursor); if (0 != row) previous = this.elementAt (row - 1); else ! previous = 0; return (cursor.getPosition () - previous); Index: Source.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Source.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** Source.java 11 Aug 2003 00:18:28 -0000 1.3 --- Source.java 17 Aug 2003 16:09:27 -0000 1.4 *************** *** 80,84 **** * The offset of the next byte returned by read(). */ ! protected int mOffset; /** --- 80,84 ---- * The offset of the next byte returned by read(). */ ! public volatile int mOffset; /** *************** *** 175,195 **** /** ! * Close the stream. Once a stream has been closed, further read(), ! * ready(), mark(), or reset() invocations will throw an IOException. ! * Closing a previously-closed stream, however, has no effect. ! * @exception IOException If an I/O error occurs */ public void close () throws IOException { - mStream = null; - if (null != mReader) - mReader.close (); - mReader = null; - mBuffer = null; - mLevel = 0; - mOffset = 0; - mMark = -1; } ! /** * Read a single character. --- 175,186 ---- /** ! * Does nothing. ! * It's supposed to close the stream, but use destroy() instead. ! * @see #destroy */ public void close () throws IOException { } ! /** * Read a single character. *************** *** 342,345 **** --- 333,370 ---- return (ret); + } + + // + // Methods not in your Daddy's Reader + // + + /** + * Undo the read of a single character. + * @exception IOException If no characters have been read. + */ + public void unread () throws IOException + { + if (0 < mOffset) + mOffset--; + else + throw new IOException ("can't unread no characters"); + } + + /** + * Close the stream. Once a stream has been closed, further read(), + * ready(), mark(), or reset() invocations will throw an IOException. + * Closing a previously-closed stream, however, has no effect. + * @exception IOException If an I/O error occurs + */ + public void destroy () throws IOException + { + mStream = null; + if (null != mReader) + mReader.close (); + mReader = null; + mBuffer = null; + mLevel = 0; + mOffset = 0; + mMark = -1; } } Index: package.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/package.html,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** package.html 11 Aug 2003 00:18:28 -0000 1.2 --- package.html 17 Aug 2003 16:09:27 -0000 1.3 *************** *** 75,78 **** --- 75,94 ---- <LI>line 11, offset 7, to line 11, offset 8, string node "\n"</LI> </OL> + <p>Stream, Source, Page and Lexer + <p>The package is arranged in four levels, <CODE>Stream</CODE>, + <CODE>Source</CODE> <CODE>Page</CODE> and <CODE>Lexer</CODE> in the order of lowest to + highest. + A <CODE>Stream</CODE> is raw bytes from the URLConnection or file. It has no + intelligence. A <CODE>Source</CODE> is raw characters, hence it knows about the + encoding scheme used and can be reset if a different encoding is detected after + partially reading in the text. A <CODE>Page</CODE> provides characters from the + source while maintaining the index of line numbers, and hence can be thought of + as an array of strings corresponding to source file lines, but it doesn't + actually store any text, relying on the buffering within the + <CODE>Source</CODE> instead. The <CODE>Lexer</CODE> contains the actual lexeme parsing + code. It reads characters from the page, keeping track of where it is with a + <CODE>Cursor</CODE> and creates the array of nodes using various state + machines. + <p> The following are some design goals and 'invariants' within the package, if you are attempting to understand or modify it. Things that differ substantially from *************** *** 88,94 **** <DD>Besides complete coverage, the <B>nodes do not contain copies of the text</B>, but instead simply contain offsets into a single large buffer that contains the ! text read from the HTML source. Thus there is no lost whitespace or text ! formatting elements either outside or within tags. Upper and lower case text is ! preserved. <DT>Line Endings <DD><B>End of line characters are just whitespace.</B> There is no distinction --- 104,110 ---- <DD>Besides complete coverage, the <B>nodes do not contain copies of the text</B>, but instead simply contain offsets into a single large buffer that contains the ! text read from the HTML source. Even within tags, the attributes list can ! contain whitespace, thus there is no lost whitespace or text formatting ! either outside or within tags. Upper and lower case text is preserved. <DT>Line Endings <DD><B>End of line characters are just whitespace.</B> There is no distinction *************** *** 97,121 **** multiple lines with no special processing. Line endings are not transformed between platforms, i.e. Unix line endings are not converted to Windows line ! endings by this level. Each node will have a starting and ending ! <CODE>Cursor</CODE>, from which you can get the line number and offset within ! the HTML source, for error messages for example, but in general ignore line breaks in the source if at all possible. - <DT>Stream, Source and Page - <DD>The package is arranged in three levels, <CODE>Stream</CODE>, - <CODE>Source</CODE> and <CODE>Page</CODE> in the order of lowest to highest. - A <CODE>Stream</CODE> is raw bytes from the URLConnection or file. It has no - intelligence. A <CODE>Source</CODE> is raw characters, hence it knows about the - encoding scheme used and can be reset if a different encoding is detected after - partially reading in the text. A <CODE>Page</CODE> is the highest level and - contains the actual lexeme parsing code. It reads from the source and creates - the array of nodes (<CODE>NodeList</CODE>) using a state machine. <DT>One Parser, One Scan ! <DD>The major lexeme state machine has the following minor state machines corresponding (roughly) to the <B>four parsers it replaces</B> (StringParser, RemarkNodeParser, ! AttributeParser. TagParser): ! <LI>in text</LI> ! <LI>in comment</LI> ! <LI>in quote</LI> ! <LI>in tag</LI> By integrating the four state machines into one, a single pass over the text is all that's needed for a low level parse of the HTML source. In previous --- 113,127 ---- multiple lines with no special processing. Line endings are not transformed between platforms, i.e. Unix line endings are not converted to Windows line ! endings by this level. Each node will has a starting and ending location, which ! the page can use to extract the text. To facilitate formatting error and log messages ! the page can turn these offsets into row and column numbers. In general ignore line breaks in the source if at all possible. <DT>One Parser, One Scan ! <DD>The Lexer has the following state machines corresponding (roughly) to the <B>four parsers it replaces</B> (StringParser, RemarkNodeParser, ! TagParser & AttributeParser): ! <LI>in text - parseString()</LI> ! <LI>in comment - parseRemark()</LI> ! <LI>in tag - parseTag()</LI> By integrating the four state machines into one, a single pass over the text is all that's needed for a low level parse of the HTML source. In previous |
|
From: <der...@us...> - 2003-08-17 16:03:26
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes In directory sc8-pr-cvs1:/tmp/cvs-serv8569/nodes Log Message: Directory /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes added to the repository |
|
From: <der...@us...> - 2003-08-15 21:03:18
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags
In directory sc8-pr-cvs1:/tmp/cvs-serv3543/tags
Modified Files:
FormTag.java ImageTag.java LinkTag.java Tag.java
Log Message:
Case maintaining toHtml() output for tag attributes.
With these changes, the output of toHtml() now reflects the upper/lower case values
of the input for the contents of tags, i.e. attribute names maintain their original case.
They're still out of order from how they are parsed, but this is a first step.
Rather than adjust all the test cases right now, the ParserTestCase assertSameString()
method now checks a global flag to see if case matters when comparing strings.
As of this drop it ignores case when comparing HTML output. This will soon change.
Index: FormTag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/FormTag.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** FormTag.java 11 Aug 2003 00:18:30 -0000 1.28
--- FormTag.java 15 Aug 2003 20:51:48 -0000 1.29
***************
*** 142,146 ****
*/
public void setFormLocation(String formURL) {
! attributes.put("ACTION",formURL);
this.formURL = formURL;
}
--- 142,146 ----
*/
public void setFormLocation(String formURL) {
! setAttribute ("ACTION", formURL);
this.formURL = formURL;
}
Index: ImageTag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/ImageTag.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** ImageTag.java 11 Aug 2003 00:18:30 -0000 1.21
--- ImageTag.java 15 Aug 2003 20:51:48 -0000 1.22
***************
*** 68,72 ****
public void setImageURL(String imageURL) {
this.imageURL = imageURL;
! attributes.put("SRC",imageURL);
}
--- 68,72 ----
public void setImageURL(String imageURL) {
this.imageURL = imageURL;
! setAttribute ("SRC", imageURL);
}
Index: LinkTag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/LinkTag.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** LinkTag.java 11 Aug 2003 00:18:30 -0000 1.28
--- LinkTag.java 15 Aug 2003 20:51:48 -0000 1.29
***************
*** 231,235 ****
public void setLink(String link) {
this.link = link;
! attributes.put("HREF",link);
}
--- 231,235 ----
public void setLink(String link) {
this.link = link;
! setAttribute ("HREF", link);
}
Index: Tag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/Tag.java,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** Tag.java 11 Aug 2003 00:18:30 -0000 1.39
--- Tag.java 15 Aug 2003 20:51:48 -0000 1.40
***************
*** 64,68 ****
private final static String EMPTY_STRING="";
- private AttributeParser attributeParser;
private static TagParser tagParser;
/**
--- 64,67 ----
***************
*** 76,80 ****
* added by Kaarle Kaila 23.10.2001
*/
! protected Hashtable attributes=null;
/**
--- 75,79 ----
* added by Kaarle Kaila 23.10.2001
*/
! protected SpecialHashtable _attributes=null;
/**
***************
*** 177,183 ****
* @return Hashtable
*/
! private Hashtable parseAttributes(){
! attributeParser = new AttributeParser();
! return attributeParser.parseAttributes(this);
}
--- 176,181 ----
* @return Hashtable
*/
! private SpecialHashtable parseAttributes(){
! return (SpecialHashtable)(new AttributeParser()).parseAttributes(getText ());
}
***************
*** 187,192 ****
* @param name of parameter
*/
! public String getAttribute(String name){
! return (String)getAttributes().get(name.toUpperCase());
}
--- 185,205 ----
* @param name of parameter
*/
! public String getAttribute(String name)
! {
! SpecialHashtable ht;
! Object ret;
!
! ht = getAttributesEx();
! ret = ht.getRaw(name.toUpperCase());
! if (null != ret)
! {
! ret = ((String[])ret)[1];
! if (Tag.NULLVALUE == ret)
! ret = null;
! else if (Tag.NOTHING == ret)
! ret = "";
! }
!
! return ((String)ret);
}
***************
*** 197,201 ****
*/
public void setAttribute(String key, String value) {
! attributes.put(key,value);
}
--- 210,214 ----
*/
public void setAttribute(String key, String value) {
! _attributes.put(key.toUpperCase (), new String[] {key, value});
}
***************
*** 207,226 ****
*/
public String getParameter(String name){
! return (String)getAttributes().get(name.toUpperCase());
}
/**
* Gets the attributes in the tag.
* @return Returns a Hashtable of attributes
*/
! public Hashtable getAttributes() {
! if (attributes == null) {
! attributes = parseAttributes();
! }
! return attributes;
}
! public String getTagName(){
! return (String)getAttributes().get(TAGNAME);
}
--- 220,259 ----
*/
public String getParameter(String name){
! return ((String[])getAttributesEx().get(name.toUpperCase()))[1];
}
/**
* Gets the attributes in the tag.
+ * NOTE: Values of the extended hashtable are two element arrays of String,
+ * with the first element being the original name (not uppercased),
+ * and the second element being the value.
+ * @return Returns a special hashtable of attributes in two element String arrays.
+ */
+ public SpecialHashtable getAttributesEx() {
+ if (_attributes == null)
+ _attributes = parseAttributes();
+ return _attributes;
+ }
+
+ /**
+ * Gets the attributes in the tag.
* @return Returns a Hashtable of attributes
*/
! public Hashtable getAttributes()
! {
! Hashtable ret;
!
! ret = new SpecialHashtable ();
! for (Enumeration e = getAttributesEx ().keys(); e.hasMoreElements(); )
! {
! String key = (String)e.nextElement ();
! ret.put (key, ((String[])getAttributesEx().getRaw(key))[1]);
! }
!
! return (ret);
}
! public String getTagName(){
! return getParameter(TAGNAME);
}
***************
*** 329,340 ****
/**
! * Sets the parsed.
! * @param parsed The parsed to set
*/
! public void setAttributes(Hashtable attributes) {
! this.attributes = attributes;
}
/**
* Sets the nodeBegin.
* @param nodeBegin The nodeBegin to set
--- 362,392 ----
/**
! * Sets the attributes.
! * @param attributes The attribute collection to set.
*/
! public void setAttributes(Hashtable attributes)
! {
! SpecialHashtable att = new SpecialHashtable ();
! for (Enumeration e = attributes.keys (); e.hasMoreElements (); )
! {
! String key = (String)e.nextElement ();
! att.put (key, new String[] { key, (String)attributes.get (key)});
! }
! this._attributes = att;
}
/**
+ * Sets the attributes.
+ * NOTE: Values of the extended hashtable are two element arrays of String,
+ * with the first element being the original name (not uppercased),
+ * and the second element being the value.
+ * @param attributes The attribute collection to set.
+ */
+ public void setAttributesEx (SpecialHashtable attributes)
+ {
+ _attributes = attributes;
+ }
+
+ /**
* Sets the nodeBegin.
* @param nodeBegin The nodeBegin to set
***************
*** 420,431 ****
StringBuffer ret;
String key;
! String value;
String empty;
ret = new StringBuffer ();
ret.append ("<");
! ret.append (getTagName ());
empty = null;
! for (Enumeration e = attributes.keys(); e.hasMoreElements(); )
{
key = (String)e.nextElement ();
--- 472,484 ----
StringBuffer ret;
String key;
! String value[];
String empty;
ret = new StringBuffer ();
+ value = (String[])(getAttributesEx().getRaw (TAGNAME));
ret.append ("<");
! ret.append (value[1]);
empty = null;
! for (Enumeration e = getAttributesEx ().keys(); e.hasMoreElements(); )
{
key = (String)e.nextElement ();
***************
*** 437,449 ****
{
ret.append (" ");
! ret.append (key);
! value = (String)(((SpecialHashtable)getAttributes()).getRaw (key.toUpperCase ()));
! if (Tag.NULLVALUE != value)
{
ret.append ("=");
! if (!(Tag.NOTHING == value))
{
ret.append ("\"");
! ret.append (value);
ret.append ("\"");
}
--- 490,502 ----
{
ret.append (" ");
! value = (String[])(getAttributesEx().getRaw (key.toUpperCase ()));
! ret.append (value[0]);
! if (Tag.NULLVALUE != value[1])
{
ret.append ("=");
! if (!(Tag.NOTHING == value[1]))
{
ret.append ("\"");
! ret.append (value[1]);
ret.append ("\"");
}
***************
*** 507,511 ****
*/
public Hashtable getParsed() {
! return attributes;
}
--- 560,564 ----
*/
public Hashtable getParsed() {
! return getAttributes ();
}
***************
*** 519,524 ****
* @return Hashtable
*/
! public Hashtable redoParseAttributes() {
! return parseAttributes();
}
--- 572,580 ----
* @return Hashtable
*/
! public Hashtable redoParseAttributes()
! {
! _attributes = null;
! getAttributesEx ();
! return (getAttributes ());
}
|
|
From: <der...@us...> - 2003-08-15 21:01:47
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/parserHelper
In directory sc8-pr-cvs1:/tmp/cvs-serv3543/parserHelper
Modified Files:
AttributeParser.java
Log Message:
Case maintaining toHtml() output for tag attributes.
With these changes, the output of toHtml() now reflects the upper/lower case values
of the input for the contents of tags, i.e. attribute names maintain their original case.
They're still out of order from how they are parsed, but this is a first step.
Rather than adjust all the test cases right now, the ParserTestCase assertSameString()
method now checks a global flag to see if case matters when comparing strings.
As of this drop it ignores case when comparing HTML output. This will soon change.
Index: AttributeParser.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/parserHelper/AttributeParser.java,v
retrieving revision 1.33
retrieving revision 1.34
diff -C2 -d -r1.33 -r1.34
*** AttributeParser.java 11 Aug 2003 00:18:29 -0000 1.33
--- AttributeParser.java 15 Aug 2003 20:51:47 -0000 1.34
***************
*** 104,108 ****
*
*/
! public Hashtable parseAttributes(Tag tag) {
attributeTable = new SpecialHashtable();
part = null;
--- 104,108 ----
*
*/
! public Hashtable parseAttributes (String text) {
attributeTable = new SpecialHashtable();
part = null;
***************
*** 113,117 ****
equal = false;
delim=DELIMETERS;
! tokenizer = new StringTokenizer(tag.getText(),delim,true);
while (true) {
part=getNextPartUsing(delim);
--- 113,117 ----
equal = false;
delim=DELIMETERS;
! tokenizer = new StringTokenizer(text,delim,true);
while (true) {
part=getNextPartUsing(delim);
***************
*** 132,136 ****
}
}
! if (null == element) // handle no tag contents
putDataIntoTable(attributeTable,"",null,true);
return attributeTable;
--- 132,136 ----
}
}
! if (null == element) // handle no contents
putDataIntoTable(attributeTable,"",null,true);
return attributeTable;
***************
*** 259,267 ****
if (isName) {
// store tagname as tag.TAGNAME,tag
! h.put(value,name.toUpperCase());
}
else {
// store tag parameters as NAME, value
! h.put(name.toUpperCase(),value);
}
}
--- 259,267 ----
if (isName) {
// store tagname as tag.TAGNAME,tag
! h.put(value,new String[] {value, name.toUpperCase()});
}
else {
// store tag parameters as NAME, value
! h.put(name.toUpperCase(),new String[] {name, value });
}
}
|
|
From: <der...@us...> - 2003-08-15 21:01:42
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners In directory sc8-pr-cvs1:/tmp/cvs-serv3543/scanners Modified Files: TagScanner.java Log Message: Case maintaining toHtml() output for tag attributes. With these changes, the output of toHtml() now reflects the upper/lower case values of the input for the contents of tags, i.e. attribute names maintain their original case. They're still out of order from how they are parsed, but this is a first step. Rather than adjust all the test cases right now, the ParserTestCase assertSameString() method now checks a global flag to see if case matters when comparing strings. As of this drop it ignores case when comparing HTML output. This will soon change. Index: TagScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/TagScanner.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** TagScanner.java 11 Aug 2003 00:18:30 -0000 1.32 --- TagScanner.java 15 Aug 2003 20:51:48 -0000 1.33 *************** *** 202,206 **** Tag thisTag = scan(tag,url,reader,currLine); thisTag.setThisScanner(this); ! thisTag.setAttributes(tag.getAttributes()); return thisTag; } --- 202,206 ---- Tag thisTag = scan(tag,url,reader,currLine); thisTag.setThisScanner(this); ! thisTag.setAttributesEx(tag.getAttributesEx()); return thisTag; } |
|
From: <der...@us...> - 2003-08-15 20:56:09
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1:/tmp/cvs-serv3543/tests
Modified Files:
ParserTestCase.java
Log Message:
Case maintaining toHtml() output for tag attributes.
With these changes, the output of toHtml() now reflects the upper/lower case values
of the input for the contents of tags, i.e. attribute names maintain their original case.
They're still out of order from how they are parsed, but this is a first step.
Rather than adjust all the test cases right now, the ParserTestCase assertSameString()
method now checks a global flag to see if case matters when comparing strings.
As of this drop it ignores case when comparing HTML output. This will soon change.
Index: ParserTestCase.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** ParserTestCase.java 11 Aug 2003 00:18:31 -0000 1.21
--- ParserTestCase.java 15 Aug 2003 20:51:48 -0000 1.22
***************
*** 49,52 ****
--- 49,54 ----
public class ParserTestCase extends TestCase {
+
+ static boolean mCaseInsensitiveComparisons = true;
protected Parser parser;
protected Node node [];
***************
*** 121,126 ****
i >= (actual.length()-1 )
)
! ) ||
! (actual.charAt(i) != expected.charAt(i))
) {
StringBuffer errorMsg = new StringBuffer();
--- 123,129 ----
i >= (actual.length()-1 )
)
! ) ||
! (mCaseInsensitiveComparisons && Character.toUpperCase (actual.charAt(i)) != Character.toUpperCase (expected.charAt(i))) ||
! (!mCaseInsensitiveComparisons && (actual.charAt(i) != expected.charAt(i)))
) {
StringBuffer errorMsg = new StringBuffer();
|
|
From: <der...@us...> - 2003-08-15 20:55:23
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests
In directory sc8-pr-cvs1:/tmp/cvs-serv3543/tests/tagTests
Modified Files:
InputTagTest.java TagTest.java
Log Message:
Case maintaining toHtml() output for tag attributes.
With these changes, the output of toHtml() now reflects the upper/lower case values
of the input for the contents of tags, i.e. attribute names maintain their original case.
They're still out of order from how they are parsed, but this is a first step.
Rather than adjust all the test cases right now, the ParserTestCase assertSameString()
method now checks a global flag to see if case matters when comparing strings.
As of this drop it ignores case when comparing HTML output. This will soon change.
Index: InputTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/InputTagTest.java,v
retrieving revision 1.25
retrieving revision 1.26
diff -C2 -d -r1.25 -r1.26
*** InputTagTest.java 11 Aug 2003 00:18:33 -0000 1.25
--- InputTagTest.java 15 Aug 2003 20:51:48 -0000 1.26
***************
*** 58,62 ****
InputTag InputTag;
InputTag = (InputTag) node[0];
! assertEquals("HTML String","<INPUT NAME=\"Google\" TYPE=\"text\">",InputTag.toHtml());
}
--- 58,62 ----
InputTag InputTag;
InputTag = (InputTag) node[0];
! assertStringEquals ("HTML String","<INPUT NAME=\"Google\" TYPE=\"text\">",InputTag.toHtml());
}
Index: TagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v
retrieving revision 1.36
retrieving revision 1.37
diff -C2 -d -r1.36 -r1.37
*** TagTest.java 11 Aug 2003 00:18:33 -0000 1.36
--- TagTest.java 15 Aug 2003 20:51:48 -0000 1.37
***************
*** 673,677 ****
}
assertNotNull ("No nodes", temp);
! assertEquals ("Incorrect HTML output: ",
"<A HREF=\"http://www.google.com/webhp?hl=en\"></A>",
temp);
--- 673,677 ----
}
assertNotNull ("No nodes", temp);
! assertStringEquals ("Incorrect HTML output: ",
"<A HREF=\"http://www.google.com/webhp?hl=en\"></A>",
temp);
|
|
From: <der...@us...> - 2003-08-15 20:55:17
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests
In directory sc8-pr-cvs1:/tmp/cvs-serv3543/tests/scannersTests
Modified Files:
ScriptScannerTest.java
Log Message:
Case maintaining toHtml() output for tag attributes.
With these changes, the output of toHtml() now reflects the upper/lower case values
of the input for the contents of tags, i.e. attribute names maintain their original case.
They're still out of order from how they are parsed, but this is a first step.
Rather than adjust all the test cases right now, the ParserTestCase assertSameString()
method now checks a global flag to see if case matters when comparing strings.
As of this drop it ignores case when comparing HTML output. This will soon change.
Index: ScriptScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v
retrieving revision 1.33
retrieving revision 1.34
diff -C2 -d -r1.33 -r1.34
*** ScriptScannerTest.java 11 Aug 2003 00:18:33 -0000 1.33
--- ScriptScannerTest.java 15 Aug 2003 20:51:48 -0000 1.34
***************
*** 555,559 ****
parseAndAssertNodeCount(1);
String s = node[0].toHtml ();
! assertEquals ("Parse error","<SCRIPT LANGUAGE=\"JavaScript\">document.write('</SCRIPT>');</SCRIPT>",s);
}
--- 555,559 ----
parseAndAssertNodeCount(1);
String s = node[0].toHtml ();
! assertStringEquals ("Parse error","<SCRIPT LANGUAGE=\"JavaScript\">document.write('</SCRIPT>');</SCRIPT>",s);
}
|
|
From: <der...@us...> - 2003-08-15 20:55:17
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures
In directory sc8-pr-cvs1:/tmp/cvs-serv3543/tests/temporaryFailures
Modified Files:
AttributeParserTest.java
Log Message:
Case maintaining toHtml() output for tag attributes.
With these changes, the output of toHtml() now reflects the upper/lower case values
of the input for the contents of tags, i.e. attribute names maintain their original case.
They're still out of order from how they are parsed, but this is a first step.
Rather than adjust all the test cases right now, the ParserTestCase assertSameString()
method now checks a global flag to see if case matters when comparing strings.
As of this drop it ignores case when comparing HTML output. This will soon change.
Index: AttributeParserTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures/AttributeParserTest.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** AttributeParserTest.java 11 Aug 2003 00:18:33 -0000 1.7
--- AttributeParserTest.java 15 Aug 2003 20:51:48 -0000 1.8
***************
*** 58,62 ****
public void getParameterTableFor(String tagContents) {
tag = new Tag(new TagData(0,0,tagContents,""));
! table = parser.parseAttributes(tag);
}
--- 58,62 ----
public void getParameterTableFor(String tagContents) {
tag = new Tag(new TagData(0,0,tagContents,""));
! table = parser.parseAttributes(tag.getText ());
}
|
|
From: <der...@us...> - 2003-08-11 03:53:34
|
Update of /cvsroot/htmlparser/htmlparser
In directory sc8-pr-cvs1:/tmp/cvs-serv5185
Modified Files:
build.xml
Log Message:
Move libs to correct level in distribution zip.
Index: build.xml
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v
retrieving revision 1.40
retrieving revision 1.41
diff -C2 -d -r1.40 -r1.41
*** build.xml 29 Jul 2003 23:24:54 -0000 1.40
--- build.xml 11 Aug 2003 03:53:31 -0000 1.41
***************
*** 340,347 ****
<echo message="**********************************"/>
<mkdir dir="${finalLoc}"/>
- <mkdir dir="${releaseDir}/lib"/>
- <copy file="lib/commons-logging.jar" todir="${releaseDir}/lib"/>
- <copy file="lib/junit.jar" todir="${releaseDir}/lib"/>
<zip zipfile="${finalLoc}/htmlparser${versionTag}.zip"
basedir="${releaseDir}"/>
--- 340,346 ----
<echo message="**********************************"/>
+ <copy file="lib/commons-logging.jar" todir="${dist}/lib"/>
+ <copy file="lib/junit.jar" todir="${dist}/lib"/>
<mkdir dir="${finalLoc}"/>
<zip zipfile="${finalLoc}/htmlparser${versionTag}.zip"
basedir="${releaseDir}"/>
|
|
From: <der...@us...> - 2003-08-11 00:38:03
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests In directory sc8-pr-cvs1:/tmp/cvs-serv7155/src/org/htmlparser/tests/visitorsTests Modified Files: AllTests.java CompositeTagFindingVisitorTest.java HtmlPageTest.java LinkFindingVisitorTest.java NodeVisitorTest.java StringFindingVisitorTest.java TagFindingVisitorTest.java TextExtractingVisitorTest.java UrlModifyingVisitorTest.java Log Message: Update version headers to 1.4-20030810 and update changelog. Index: AllTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/AllTests.java,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** AllTests.java 27 Jul 2003 19:19:24 -0000 1.29 --- AllTests.java 11 Aug 2003 00:18:34 -0000 1.30 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: CompositeTagFindingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/CompositeTagFindingVisitorTest.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** CompositeTagFindingVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.3 --- CompositeTagFindingVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.4 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: HtmlPageTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/HtmlPageTest.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** HtmlPageTest.java 27 Jul 2003 19:19:24 -0000 1.6 --- HtmlPageTest.java 11 Aug 2003 00:18:35 -0000 1.7 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: LinkFindingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/LinkFindingVisitorTest.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** LinkFindingVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.3 --- LinkFindingVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.4 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: NodeVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/NodeVisitorTest.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** NodeVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.3 --- NodeVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.4 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: StringFindingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/StringFindingVisitorTest.java,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** StringFindingVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.5 --- StringFindingVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.6 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: TagFindingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/TagFindingVisitorTest.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** TagFindingVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.6 --- TagFindingVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.7 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: TextExtractingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/TextExtractingVisitorTest.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** TextExtractingVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.3 --- TextExtractingVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.4 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: UrlModifyingVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/UrlModifyingVisitorTest.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** UrlModifyingVisitorTest.java 27 Jul 2003 19:19:24 -0000 1.3 --- UrlModifyingVisitorTest.java 11 Aug 2003 00:18:35 -0000 1.4 *************** *** 1,3 **** ! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // |
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests
In directory sc8-pr-cvs1:/tmp/cvs-serv7155/src/org/htmlparser/tests/scannersTests
Modified Files:
AllTests.java AppletScannerTest.java BaseHREFScannerTest.java
BodyScannerTest.java BulletListScannerTest.java
BulletScannerTest.java CompositeTagScannerTest.java
DivScannerTest.java FormScannerTest.java FrameScannerTest.java
FrameSetScannerTest.java HeadScannerTest.java HtmlTest.java
ImageScannerTest.java InputTagScannerTest.java
JspScannerTest.java LabelScannerTest.java LinkScannerTest.java
MetaTagScannerTest.java OptionTagScannerTest.java
ScriptScannerTest.java SelectTagScannerTest.java
SpanScannerTest.java StyleScannerTest.java
TableScannerTest.java TagScannerTest.java
TextareaTagScannerTest.java TitleScannerTest.java
XmlEndTagScanningTest.java package.html
Log Message:
Update version headers to 1.4-20030810 and update changelog.
Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/AllTests.java,v
retrieving revision 1.43
retrieving revision 1.44
diff -C2 -d -r1.43 -r1.44
*** AllTests.java 27 Jul 2003 19:19:20 -0000 1.43
--- AllTests.java 11 Aug 2003 00:18:32 -0000 1.44
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
***************
*** 19,23 ****
// Email :so...@ki...
//
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 19,23 ----
// Email :so...@ki...
//
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: AppletScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/AppletScannerTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** AppletScannerTest.java 27 Jul 2003 19:19:20 -0000 1.21
--- AppletScannerTest.java 11 Aug 2003 00:18:32 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: BaseHREFScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/BaseHREFScannerTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** BaseHREFScannerTest.java 27 Jul 2003 19:19:20 -0000 1.21
--- BaseHREFScannerTest.java 11 Aug 2003 00:18:32 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: BodyScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/BodyScannerTest.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** BodyScannerTest.java 27 Jul 2003 19:19:21 -0000 1.7
--- BodyScannerTest.java 11 Aug 2003 00:18:32 -0000 1.8
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: BulletListScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/BulletListScannerTest.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** BulletListScannerTest.java 27 Jul 2003 19:19:21 -0000 1.3
--- BulletListScannerTest.java 11 Aug 2003 00:18:32 -0000 1.4
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: BulletScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/BulletScannerTest.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** BulletScannerTest.java 27 Jul 2003 19:19:21 -0000 1.4
--- BulletScannerTest.java 11 Aug 2003 00:18:32 -0000 1.5
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: CompositeTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/CompositeTagScannerTest.java,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** CompositeTagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.31
--- CompositeTagScannerTest.java 11 Aug 2003 00:18:32 -0000 1.32
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: DivScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/DivScannerTest.java,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** DivScannerTest.java 27 Jul 2003 19:19:21 -0000 1.27
--- DivScannerTest.java 11 Aug 2003 00:18:32 -0000 1.28
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: FormScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/FormScannerTest.java,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** FormScannerTest.java 27 Jul 2003 19:19:21 -0000 1.27
--- FormScannerTest.java 11 Aug 2003 00:18:32 -0000 1.28
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: FrameScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/FrameScannerTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** FrameScannerTest.java 27 Jul 2003 19:19:21 -0000 1.21
--- FrameScannerTest.java 11 Aug 2003 00:18:32 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: FrameSetScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/FrameSetScannerTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** FrameSetScannerTest.java 27 Jul 2003 19:19:21 -0000 1.21
--- FrameSetScannerTest.java 11 Aug 2003 00:18:32 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: HeadScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/HeadScannerTest.java,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** HeadScannerTest.java 27 Jul 2003 19:19:21 -0000 1.10
--- HeadScannerTest.java 11 Aug 2003 00:18:32 -0000 1.11
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: HtmlTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/HtmlTest.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** HtmlTest.java 27 Jul 2003 19:19:21 -0000 1.5
--- HtmlTest.java 11 Aug 2003 00:18:32 -0000 1.6
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: ImageScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ImageScannerTest.java,v
retrieving revision 1.24
retrieving revision 1.25
diff -C2 -d -r1.24 -r1.25
*** ImageScannerTest.java 27 Jul 2003 19:19:21 -0000 1.24
--- ImageScannerTest.java 11 Aug 2003 00:18:32 -0000 1.25
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: InputTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/InputTagScannerTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** InputTagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.21
--- InputTagScannerTest.java 11 Aug 2003 00:18:32 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: JspScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/JspScannerTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** JspScannerTest.java 27 Jul 2003 19:19:21 -0000 1.22
--- JspScannerTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: LabelScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/LabelScannerTest.java,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** LabelScannerTest.java 2 Aug 2003 16:22:57 -0000 1.30
--- LabelScannerTest.java 11 Aug 2003 00:18:33 -0000 1.31
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: LinkScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/LinkScannerTest.java,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** LinkScannerTest.java 27 Jul 2003 19:19:21 -0000 1.31
--- LinkScannerTest.java 11 Aug 2003 00:18:33 -0000 1.32
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: MetaTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/MetaTagScannerTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** MetaTagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.22
--- MetaTagScannerTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: OptionTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/OptionTagScannerTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** OptionTagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.23
--- OptionTagScannerTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: ScriptScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v
retrieving revision 1.32
retrieving revision 1.33
diff -C2 -d -r1.32 -r1.33
*** ScriptScannerTest.java 27 Jul 2003 19:19:21 -0000 1.32
--- ScriptScannerTest.java 11 Aug 2003 00:18:33 -0000 1.33
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: SelectTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/SelectTagScannerTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** SelectTagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.23
--- SelectTagScannerTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: SpanScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/SpanScannerTest.java,v
retrieving revision 1.24
retrieving revision 1.25
diff -C2 -d -r1.24 -r1.25
*** SpanScannerTest.java 27 Jul 2003 19:19:21 -0000 1.24
--- SpanScannerTest.java 11 Aug 2003 00:18:33 -0000 1.25
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: StyleScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/StyleScannerTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** StyleScannerTest.java 27 Jul 2003 19:19:21 -0000 1.22
--- StyleScannerTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TableScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TableScannerTest.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** TableScannerTest.java 27 Jul 2003 19:19:21 -0000 1.28
--- TableScannerTest.java 11 Aug 2003 00:18:33 -0000 1.29
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TagScannerTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** TagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.22
--- TagScannerTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TextareaTagScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TextareaTagScannerTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** TextareaTagScannerTest.java 27 Jul 2003 19:19:21 -0000 1.21
--- TextareaTagScannerTest.java 11 Aug 2003 00:18:33 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TitleScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/TitleScannerTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** TitleScannerTest.java 27 Jul 2003 19:19:21 -0000 1.22
--- TitleScannerTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: XmlEndTagScanningTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/XmlEndTagScanningTest.java,v
retrieving revision 1.24
retrieving revision 1.25
diff -C2 -d -r1.24 -r1.25
*** XmlEndTagScanningTest.java 27 Jul 2003 19:19:21 -0000 1.24
--- XmlEndTagScanningTest.java 11 Aug 2003 00:18:33 -0000 1.25
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: package.html
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/package.html,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** package.html 27 Jul 2003 19:19:21 -0000 1.10
--- package.html 11 Aug 2003 00:18:33 -0000 1.11
***************
*** 6,10 ****
@(#)package.html 1.60 98/01/27
! HTMLParser Library v1_4_20030727 - A java-based parser for HTML
Copyright (C) Dec 31, 2000 Somik Raha
--- 6,10 ----
@(#)package.html 1.60 98/01/27
! HTMLParser Library v1_4_20030810 - A java-based parser for HTML
Copyright (C) Dec 31, 2000 Somik Raha
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests
In directory sc8-pr-cvs1:/tmp/cvs-serv7155/src/org/htmlparser/tests/tagTests
Modified Files:
AllTests.java AppletTagTest.java BaseHrefTagTest.java
BodyTagTest.java CompositeTagTest.java DoctypeTagTest.java
EndTagTest.java FormTagTest.java FrameSetTagTest.java
FrameTagTest.java ImageTagTest.java InputTagTest.java
JspTagTest.java LinkTagTest.java MetaTagTest.java
ObjectCollectionTest.java OptionTagTest.java
ScriptTagTest.java SelectTagTest.java StyleTagTest.java
TagTest.java TextareaTagTest.java TitleTagTest.java
package.html
Log Message:
Update version headers to 1.4-20030810 and update changelog.
Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/AllTests.java,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** AllTests.java 27 Jul 2003 19:19:21 -0000 1.38
--- AllTests.java 11 Aug 2003 00:18:33 -0000 1.39
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: AppletTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/AppletTagTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** AppletTagTest.java 27 Jul 2003 19:19:22 -0000 1.22
--- AppletTagTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: BaseHrefTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/BaseHrefTagTest.java,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** BaseHrefTagTest.java 27 Jul 2003 19:19:23 -0000 1.21
--- BaseHrefTagTest.java 11 Aug 2003 00:18:33 -0000 1.22
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: BodyTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/BodyTagTest.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** BodyTagTest.java 27 Jul 2003 19:19:23 -0000 1.7
--- BodyTagTest.java 11 Aug 2003 00:18:33 -0000 1.8
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: CompositeTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/CompositeTagTest.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** CompositeTagTest.java 27 Jul 2003 19:19:23 -0000 1.2
--- CompositeTagTest.java 11 Aug 2003 00:18:33 -0000 1.3
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: DoctypeTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/DoctypeTagTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** DoctypeTagTest.java 27 Jul 2003 19:19:23 -0000 1.22
--- DoctypeTagTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: EndTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/EndTagTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** EndTagTest.java 27 Jul 2003 19:19:23 -0000 1.23
--- EndTagTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: FormTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/FormTagTest.java,v
retrieving revision 1.26
retrieving revision 1.27
diff -C2 -d -r1.26 -r1.27
*** FormTagTest.java 27 Jul 2003 19:19:23 -0000 1.26
--- FormTagTest.java 11 Aug 2003 00:18:33 -0000 1.27
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: FrameSetTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/FrameSetTagTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** FrameSetTagTest.java 2 Aug 2003 16:22:58 -0000 1.23
--- FrameSetTagTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: FrameTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/FrameTagTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** FrameTagTest.java 2 Aug 2003 16:22:58 -0000 1.23
--- FrameTagTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: ImageTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ImageTagTest.java,v
retrieving revision 1.25
retrieving revision 1.26
diff -C2 -d -r1.25 -r1.26
*** ImageTagTest.java 27 Jul 2003 19:19:23 -0000 1.25
--- ImageTagTest.java 11 Aug 2003 00:18:33 -0000 1.26
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: InputTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/InputTagTest.java,v
retrieving revision 1.24
retrieving revision 1.25
diff -C2 -d -r1.24 -r1.25
*** InputTagTest.java 2 Aug 2003 16:22:58 -0000 1.24
--- InputTagTest.java 11 Aug 2003 00:18:33 -0000 1.25
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: JspTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/JspTagTest.java,v
retrieving revision 1.25
retrieving revision 1.26
diff -C2 -d -r1.25 -r1.26
*** JspTagTest.java 27 Jul 2003 19:19:23 -0000 1.25
--- JspTagTest.java 11 Aug 2003 00:18:33 -0000 1.26
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: LinkTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/LinkTagTest.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** LinkTagTest.java 27 Jul 2003 19:19:23 -0000 1.28
--- LinkTagTest.java 11 Aug 2003 00:18:33 -0000 1.29
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: MetaTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/MetaTagTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** MetaTagTest.java 27 Jul 2003 19:19:23 -0000 1.23
--- MetaTagTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: ObjectCollectionTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ObjectCollectionTest.java,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** ObjectCollectionTest.java 27 Jul 2003 19:19:23 -0000 1.7
--- ObjectCollectionTest.java 11 Aug 2003 00:18:33 -0000 1.8
***************
*** 1,3 ****
! // HTMLParser Library v1_3_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: OptionTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/OptionTagTest.java,v
retrieving revision 1.24
retrieving revision 1.25
diff -C2 -d -r1.24 -r1.25
*** OptionTagTest.java 27 Jul 2003 19:19:23 -0000 1.24
--- OptionTagTest.java 11 Aug 2003 00:18:33 -0000 1.25
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: ScriptTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/ScriptTagTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** ScriptTagTest.java 27 Jul 2003 19:19:23 -0000 1.23
--- ScriptTagTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: SelectTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/SelectTagTest.java,v
retrieving revision 1.25
retrieving revision 1.26
diff -C2 -d -r1.25 -r1.26
*** SelectTagTest.java 27 Jul 2003 19:19:23 -0000 1.25
--- SelectTagTest.java 11 Aug 2003 00:18:33 -0000 1.26
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: StyleTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/StyleTagTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** StyleTagTest.java 27 Jul 2003 19:19:23 -0000 1.22
--- StyleTagTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TagTest.java,v
retrieving revision 1.35
retrieving revision 1.36
diff -C2 -d -r1.35 -r1.36
*** TagTest.java 2 Aug 2003 16:22:58 -0000 1.35
--- TagTest.java 11 Aug 2003 00:18:33 -0000 1.36
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TextareaTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TextareaTagTest.java,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** TextareaTagTest.java 27 Jul 2003 19:19:23 -0000 1.23
--- TextareaTagTest.java 11 Aug 2003 00:18:33 -0000 1.24
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: TitleTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TitleTagTest.java,v
retrieving revision 1.22
retrieving revision 1.23
diff -C2 -d -r1.22 -r1.23
*** TitleTagTest.java 27 Jul 2003 19:19:23 -0000 1.22
--- TitleTagTest.java 11 Aug 2003 00:18:33 -0000 1.23
***************
*** 1,3 ****
! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
--- 1,3 ----
! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML
// Copyright (C) Dec 31, 2000 Somik Raha
//
Index: package.html
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/package.html,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** package.html 27 Jul 2003 19:19:23 -0000 1.10
--- package.html 11 Aug 2003 00:18:33 -0000 1.11
***************
*** 6,10 ****
@(#)package.html 1.60 98/01/27
! HTMLParser Library v1_4_20030727 - A java-based parser for HTML
Copyright (C) Dec 31, 2000 Somik Raha
--- 6,10 ----
@(#)package.html 1.60 98/01/27
! HTMLParser Library v1_4_20030810 - A java-based parser for HTML
Copyright (C) Dec 31, 2000 Somik Raha
|
|
From: <der...@us...> - 2003-08-11 00:38:02
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures In directory sc8-pr-cvs1:/tmp/cvs-serv7155/src/org/htmlparser/tests/temporaryFailures Modified Files: AttributeParserTest.java TagParserTest.java Log Message: Update version headers to 1.4-20030810 and update changelog. Index: AttributeParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures/AttributeParserTest.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** AttributeParserTest.java 2 Aug 2003 16:22:58 -0000 1.6 --- AttributeParserTest.java 11 Aug 2003 00:18:33 -0000 1.7 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // Index: TagParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/temporaryFailures/TagParserTest.java,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** TagParserTest.java 2 Aug 2003 16:22:58 -0000 1.6 --- TagParserTest.java 11 Aug 2003 00:18:34 -0000 1.7 *************** *** 1,3 **** ! // HTMLParser Library v1_4_20030727 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // --- 1,3 ---- ! // HTMLParser Library v1_4_20030810 - A java-based parser for HTML // Copyright (C) Dec 31, 2000 Somik Raha // |