htmlparser-cvs Mailing List for HTML Parser (Page 18)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Derrick O. <der...@us...> - 2004-05-24 16:18:54
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19028 Modified Files: build.xml Log Message: Part three of a multiphase refactoring. The three node types are now fronted by interfaces (program to the interface paradigm) with concrete implementations in the new htmlparser.nodes package. Classes from the lexer.nodes package are moved to this package, and obvious references to the concrete classes that got broken by this have been changed to use the interfaces where possible. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.64 retrieving revision 1.65 diff -C2 -d -r1.64 -r1.65 *** build.xml 22 May 2004 11:35:50 -0000 1.64 --- build.xml 24 May 2004 16:18:10 -0000 1.65 *************** *** 198,204 **** <target name="compilelexer" description="compile lexer java files"> <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}" target="1.1"> ! <include name="org/htmlparser/lexer/**/*.java"/> ! <include name="org/htmlparser/AbstractNode.java"/> <include name="org/htmlparser/Node.java"/> <include name="org/htmlparser/util/ParserException.java"/> <include name="org/htmlparser/util/ChainedException.java"/> --- 198,209 ---- <target name="compilelexer" description="compile lexer java files"> <javac srcdir="${src}" debug="on" classpath="src:${commons-logging.jar}" target="1.1"> ! <include name="org/htmlparser/lexer/*.java"/> ! <include name="org/htmlparser/nodes/*.java"/> ! <include name="org/htmlparser/Attribute.java"/> <include name="org/htmlparser/Node.java"/> + <include name="org/htmlparser/NodeFactory.java"/> + <include name="org/htmlparser/Remark.java"/> + <include name="org/htmlparser/Tag.java"/> + <include name="org/htmlparser/Text.java"/> <include name="org/htmlparser/util/ParserException.java"/> <include name="org/htmlparser/util/ChainedException.java"/> *************** *** 228,236 **** <jar jarfile="${lib}/htmllexer.jar" basedir="${src}"> ! <include name="org/htmlparser/lexer/**/*.class"/> ! <include name="org/htmlparser/AbstractNode.class"/> <include name="org/htmlparser/Node.class"/> ! <include name="org/htmlparser/NodeFilter.class"/> <include name="org/htmlparser/Tag.class"/> <include name="org/htmlparser/util/ParserException.class"/> <include name="org/htmlparser/util/ChainedException.class"/> --- 233,244 ---- <jar jarfile="${lib}/htmllexer.jar" basedir="${src}"> ! <include name="org/htmlparser/lexer/*.class"/> ! <include name="org/htmlparser/nodes/*.class"/> ! <include name="org/htmlparser/Attribute.class"/> <include name="org/htmlparser/Node.class"/> ! <include name="org/htmlparser/NodeFactory.class"/> ! <include name="org/htmlparser/Remark.class"/> <include name="org/htmlparser/Tag.class"/> + <include name="org/htmlparser/Text.class"/> <include name="org/htmlparser/util/ParserException.class"/> <include name="org/htmlparser/util/ChainedException.class"/> *************** *** 349,353 **** <group title="Example Applications" packages="org.htmlparser.parserapplications,org.htmlparser.lexerapplications.tabby,org.htmlparser.lexerapplications.thumbelina"/> <group title="Tags" packages="org.htmlparser.tags,org.htmlparser.tags.data"/> ! <group title="Lexer" packages="org.htmlparser.lexer,org.htmlparser.lexer.nodes"/> <group title="Scanners" packages="org.htmlparser.scanners"/> <group title="Beans" packages="org.htmlparser.beans"/> --- 357,361 ---- <group title="Example Applications" packages="org.htmlparser.parserapplications,org.htmlparser.lexerapplications.tabby,org.htmlparser.lexerapplications.thumbelina"/> <group title="Tags" packages="org.htmlparser.tags,org.htmlparser.tags.data"/> ! <group title="Lexer" packages="org.htmlparser.lexer"/> <group title="Scanners" packages="org.htmlparser.scanners"/> <group title="Beans" packages="org.htmlparser.beans"/> |
From: Derrick O. <der...@us...> - 2004-05-24 16:18:53
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19028/src/org/htmlparser/beans Modified Files: StringBean.java Log Message: Part three of a multiphase refactoring. The three node types are now fronted by interfaces (program to the interface paradigm) with concrete implementations in the new htmlparser.nodes package. Classes from the lexer.nodes package are moved to this package, and obvious references to the concrete classes that got broken by this have been changed to use the interfaces where possible. Index: StringBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** StringBean.java 24 May 2004 00:38:16 -0000 1.40 --- StringBean.java 24 May 2004 16:18:12 -0000 1.41 *************** *** 33,37 **** import org.htmlparser.Parser; ! import org.htmlparser.StringNode; import org.htmlparser.tags.LinkTag; import org.htmlparser.Tag; --- 33,37 ---- import org.htmlparser.Parser; ! import org.htmlparser.Text; import org.htmlparser.tags.LinkTag; import org.htmlparser.Tag; *************** *** 607,611 **** * @param string The text node. */ ! public void visitStringNode (StringNode string) { if (!mIsScript && !mIsStyle) --- 607,611 ---- * @param string The text node. */ ! public void visitStringNode (Text string) { if (!mIsScript && !mIsStyle) |
From: Derrick O. <der...@us...> - 2004-05-24 16:18:50
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19028/src/org/htmlparser/tests/visitorsTests Modified Files: HtmlPageTest.java NodeVisitorTest.java Log Message: Part three of a multiphase refactoring. The three node types are now fronted by interfaces (program to the interface paradigm) with concrete implementations in the new htmlparser.nodes package. Classes from the lexer.nodes package are moved to this package, and obvious references to the concrete classes that got broken by this have been changed to use the interfaces where possible. Index: HtmlPageTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/HtmlPageTest.java,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** HtmlPageTest.java 2 Jan 2004 16:24:57 -0000 1.18 --- HtmlPageTest.java 24 May 2004 16:18:34 -0000 1.19 *************** *** 28,32 **** import org.htmlparser.Node; ! import org.htmlparser.StringNode; import org.htmlparser.tags.TableColumn; import org.htmlparser.tags.TableRow; --- 28,32 ---- import org.htmlparser.Node; ! import org.htmlparser.Text; import org.htmlparser.tags.TableColumn; import org.htmlparser.tags.TableRow; *************** *** 91,95 **** Node node = bodyNodes.elementAt(0); assertTrue("expected stringNode but was "+node.getClass().getName(), ! node instanceof StringNode ); assertStringEquals( --- 91,95 ---- Node node = bodyNodes.elementAt(0); assertTrue("expected stringNode but was "+node.getClass().getName(), ! node instanceof Text ); assertStringEquals( Index: NodeVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/NodeVisitorTest.java,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** NodeVisitorTest.java 24 May 2004 00:38:19 -0000 1.15 --- NodeVisitorTest.java 24 May 2004 16:18:34 -0000 1.16 *************** *** 30,35 **** import java.util.Map; - import org.htmlparser.StringNode; import org.htmlparser.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; --- 30,35 ---- import java.util.Map; import org.htmlparser.Tag; + import org.htmlparser.Text; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; *************** *** 67,71 **** } ! public void visitStringNode(StringNode stringNode) { paramsMap.put(lastKeyVisited,stringNode.getText()); } --- 67,71 ---- } ! public void visitStringNode(Text stringNode) { paramsMap.put(lastKeyVisited,stringNode.getText()); } |
From: Derrick O. <der...@us...> - 2004-05-24 16:18:50
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19028/src/org/htmlparser/visitors Modified Files: NodeVisitor.java StringFindingVisitor.java TextExtractingVisitor.java UrlModifyingVisitor.java Log Message: Part three of a multiphase refactoring. The three node types are now fronted by interfaces (program to the interface paradigm) with concrete implementations in the new htmlparser.nodes package. Classes from the lexer.nodes package are moved to this package, and obvious references to the concrete classes that got broken by this have been changed to use the interfaces where possible. Index: NodeVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/NodeVisitor.java,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** NodeVisitor.java 24 May 2004 00:38:19 -0000 1.37 --- NodeVisitor.java 24 May 2004 16:18:36 -0000 1.38 *************** *** 27,32 **** package org.htmlparser.visitors; ! import org.htmlparser.RemarkNode; ! import org.htmlparser.StringNode; import org.htmlparser.Tag; --- 27,32 ---- package org.htmlparser.visitors; ! import org.htmlparser.Remark; ! import org.htmlparser.Text; import org.htmlparser.Tag; *************** *** 133,137 **** * @param string The string node being visited. */ ! public void visitStringNode (StringNode string) { } --- 133,137 ---- * @param string The string node being visited. */ ! public void visitStringNode (Text string) { } *************** *** 141,145 **** * @param remark The remark node being visited. */ ! public void visitRemarkNode (RemarkNode remark) { } --- 141,145 ---- * @param remark The remark node being visited. */ ! public void visitRemarkNode (Remark remark) { } Index: TextExtractingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/TextExtractingVisitor.java,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** TextExtractingVisitor.java 24 May 2004 00:38:19 -0000 1.41 --- TextExtractingVisitor.java 24 May 2004 16:18:36 -0000 1.42 *************** *** 27,31 **** package org.htmlparser.visitors; ! import org.htmlparser.StringNode; import org.htmlparser.Tag; import org.htmlparser.util.Translate; --- 27,31 ---- package org.htmlparser.visitors; ! import org.htmlparser.Text; import org.htmlparser.Tag; import org.htmlparser.util.Translate; *************** *** 55,59 **** } ! public void visitStringNode(StringNode stringNode) { String text = stringNode.getText(); if (!preTagBeingProcessed) { --- 55,59 ---- } ! public void visitStringNode(Text stringNode) { String text = stringNode.getText(); if (!preTagBeingProcessed) { Index: UrlModifyingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/UrlModifyingVisitor.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** UrlModifyingVisitor.java 24 May 2004 00:38:19 -0000 1.44 --- UrlModifyingVisitor.java 24 May 2004 16:18:36 -0000 1.45 *************** *** 29,34 **** import org.htmlparser.Node; import org.htmlparser.Parser; ! import org.htmlparser.RemarkNode; ! import org.htmlparser.StringNode; import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ImageTag; --- 29,34 ---- import org.htmlparser.Node; import org.htmlparser.Parser; ! import org.htmlparser.Remark; ! import org.htmlparser.Text; import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ImageTag; *************** *** 48,57 **** } ! public void visitRemarkNode (RemarkNode remarkNode) { modifiedResult.append (remarkNode.toHtml()); } ! public void visitStringNode(StringNode stringNode) { modifiedResult.append (stringNode.toHtml()); --- 48,57 ---- } ! public void visitRemarkNode (Remark remarkNode) { modifiedResult.append (remarkNode.toHtml()); } ! public void visitStringNode(Text stringNode) { modifiedResult.append (stringNode.toHtml()); Index: StringFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/StringFindingVisitor.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** StringFindingVisitor.java 25 Jan 2004 21:33:14 -0000 1.39 --- StringFindingVisitor.java 24 May 2004 16:18:36 -0000 1.40 *************** *** 29,33 **** import java.util.Locale; ! import org.htmlparser.StringNode; public class StringFindingVisitor extends NodeVisitor --- 29,33 ---- import java.util.Locale; ! import org.htmlparser.Text; public class StringFindingVisitor extends NodeVisitor *************** *** 56,60 **** } ! public void visitStringNode(StringNode stringNode) { String stringToBeSearched = stringNode.getText().toUpperCase(locale); --- 56,60 ---- } ! public void visitStringNode(Text stringNode) { String stringToBeSearched = stringNode.getText().toUpperCase(locale); |
From: Derrick O. <der...@us...> - 2004-05-24 16:18:50
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19028/src/org/htmlparser/lexer/nodes Removed Files: Attribute.java NodeFactory.java PageAttribute.java TagNode.java package.html Log Message: Part three of a multiphase refactoring. The three node types are now fronted by interfaces (program to the interface paradigm) with concrete implementations in the new htmlparser.nodes package. Classes from the lexer.nodes package are moved to this package, and obvious references to the concrete classes that got broken by this have been changed to use the interfaces where possible. --- Attribute.java DELETED --- --- package.html DELETED --- --- NodeFactory.java DELETED --- --- PageAttribute.java DELETED --- --- TagNode.java DELETED --- |
From: Derrick O. <der...@us...> - 2004-05-24 11:08:12
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18078/nodes Log Message: Directory /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodes added to the repository |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:58
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/beans Modified Files: StringBean.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: StringBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** StringBean.java 16 May 2004 17:59:57 -0000 1.39 --- StringBean.java 24 May 2004 00:38:16 -0000 1.40 *************** *** 35,39 **** import org.htmlparser.StringNode; import org.htmlparser.tags.LinkTag; ! import org.htmlparser.tags.Tag; import org.htmlparser.util.ParserException; import org.htmlparser.util.EncodingChangeException; --- 35,39 ---- import org.htmlparser.StringNode; import org.htmlparser.tags.LinkTag; ! import org.htmlparser.Tag; import org.htmlparser.util.ParserException; import org.htmlparser.util.EncodingChangeException; *************** *** 604,621 **** /** - * Appends the link as text between angle brackets to the output. - * @param link The link to process. - */ - public void visitLinkTag (LinkTag link) - { - if (getLinks ()) - { - mBuffer.append ("<"); - mBuffer.append (link.getLink ()); - mBuffer.append (">"); - } - } - - /** * Appends the text to the output. * @param string The text node. --- 604,607 ---- *************** *** 649,652 **** --- 635,645 ---- String name; + if (tag instanceof LinkTag) + if (getLinks ()) + { // appends the link as text between angle brackets to the output. + mBuffer.append ("<"); + mBuffer.append (((LinkTag)tag).getLink ()); + mBuffer.append (">"); + } name = tag.getTagName (); if (name.equalsIgnoreCase ("PRE")) |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:58
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/lexer Modified Files: Lexer.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: Lexer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** Lexer.java 22 May 2004 03:57:29 -0000 1.28 --- Lexer.java 24 May 2004 00:38:16 -0000 1.29 *************** *** 38,43 **** import org.htmlparser.lexer.nodes.PageAttribute; import org.htmlparser.lexer.nodes.NodeFactory; ! import org.htmlparser.lexer.nodes.RemarkNode; ! import org.htmlparser.lexer.nodes.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.util.ParserException; --- 38,43 ---- import org.htmlparser.lexer.nodes.PageAttribute; import org.htmlparser.lexer.nodes.NodeFactory; ! import org.htmlparser.RemarkNode; ! import org.htmlparser.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.util.ParserException; |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:58
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/lexer/nodes Modified Files: TagNode.java Removed Files: RemarkNode.java StringNode.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. --- StringNode.java DELETED --- --- RemarkNode.java DELETED --- Index: TagNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** TagNode.java 6 Apr 2004 10:51:57 -0000 1.34 --- TagNode.java 24 May 2004 00:38:16 -0000 1.35 *************** *** 40,43 **** --- 40,44 ---- import org.htmlparser.util.SpecialHashtable; import org.htmlparser.util.Translate; + import org.htmlparser.visitors.NodeVisitor; /** *************** *** 717,722 **** } ! public void accept (Object visitor) { } --- 718,732 ---- } ! /** ! * Default tag visiting code. ! * Based on <code>isEndTag()</code>, calls either <code>visitTag()</code> or ! * <code>visitEndTag()</code>. ! */ ! public void accept (NodeVisitor visitor) { + if (isEndTag ()) + visitor.visitEndTag (this); + else + visitor.visitTag (this); } |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:58
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodeDecorators In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/nodeDecorators Modified Files: AbstractNodeDecorator.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: AbstractNodeDecorator.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/nodeDecorators/AbstractNodeDecorator.java,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** AbstractNodeDecorator.java 2 Jan 2004 16:24:54 -0000 1.18 --- AbstractNodeDecorator.java 24 May 2004 00:38:17 -0000 1.19 *************** *** 31,34 **** --- 31,35 ---- import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; + import org.htmlparser.visitors.NodeVisitor; public abstract class AbstractNodeDecorator implements Node { *************** *** 39,44 **** } ! public void accept(Object visitor) { ! delegate.accept(visitor); } --- 40,46 ---- } ! public void accept (NodeVisitor visitor) ! { ! delegate.accept (visitor); } |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:57
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/filters Modified Files: StringFilter.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: StringFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/StringFilter.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** StringFilter.java 25 Jan 2004 21:32:58 -0000 1.2 --- StringFilter.java 24 May 2004 00:38:16 -0000 1.3 *************** *** 31,35 **** import org.htmlparser.Node; import org.htmlparser.NodeFilter; ! import org.htmlparser.lexer.nodes.StringNode; /** --- 31,35 ---- import org.htmlparser.Node; import org.htmlparser.NodeFilter; ! import org.htmlparser.StringNode; /** |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:57
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556 Modified Files: AbstractNode.java Node.java RemarkNode.java StringNode.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: StringNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/StringNode.java,v retrieving revision 1.49 retrieving revision 1.50 diff -C2 -d -r1.49 -r1.50 *** StringNode.java 14 Jan 2004 02:53:46 -0000 1.49 --- StringNode.java 24 May 2004 00:38:15 -0000 1.50 *************** *** 1,5 **** // HTMLParser Library $Name$ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser ! // Copyright (C) 2004 Somik Raha // // Revision Control Information --- 1,5 ---- // HTMLParser Library $Name$ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser ! // Copyright (C) 2004 Derrick Oswald // // Revision Control Information *************** *** 27,42 **** package org.htmlparser; import org.htmlparser.lexer.Page; import org.htmlparser.visitors.NodeVisitor; /** ! * Normal text in the html document is identified and represented by this class. */ ! public class StringNode ! extends ! org.htmlparser.lexer.nodes.StringNode { /** ! * Constructor takes in the text string, beginning and ending posns. * @param page The page this string is on. * @param start The beginning position of the string. --- 27,60 ---- package org.htmlparser; + import org.htmlparser.AbstractNode; + import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Page; + import org.htmlparser.util.NodeList; + import org.htmlparser.util.ParserException; import org.htmlparser.visitors.NodeVisitor; /** ! * Normal text in the HTML document is represented by this class. */ ! public class StringNode extends AbstractNode { /** ! * The contents of the string node, or override text. ! */ ! protected String mText; ! ! /** ! * Constructor takes in the text string. ! * @param text The string node text. For correct generation of HTML, this ! * should not contain representations of tags (unless they are balanced). ! */ ! public StringNode (String text) ! { ! super (null, 0, 0); ! setText (text); ! } ! ! /** ! * Constructor takes in the page and beginning and ending posns. * @param page The page this string is on. * @param start The beginning position of the string. *************** *** 46,49 **** --- 64,198 ---- { super (page, start, end); + mText = null; + } + + /** + * Returns the text of the string line. + */ + public String getText () + { + return (toHtml ()); + } + + /** + * Sets the string contents of the node. + * @param text The new text for the node. + */ + public void setText (String text) + { + mText = text; + nodeBegin = 0; + nodeEnd = mText.length (); + } + + public String toPlainTextString () + { + return (toHtml ()); + } + + public String toHtml () + { + String ret; + + ret = mText; + if (null == ret) + ret = mPage.getText (getStartPosition (), getEndPosition ()); + + return (ret); + } + + /** + * Express this string node as a printable string + * This is suitable for display in a debugger or output to a printout. + * Control characters are replaced by their equivalent escape + * sequence and contents is truncated to 80 characters. + * @return A string representation of the string node. + */ + public String toString () + { + int startpos; + int endpos; + Cursor start; + Cursor end; + char c; + StringBuffer ret; + + startpos = getStartPosition (); + endpos = getEndPosition (); + ret = new StringBuffer (endpos - startpos + 20); + if (null == mText) + { + start = new Cursor (getPage (), startpos); + end = new Cursor (getPage (), endpos); + ret.append ("Txt ("); + ret.append (start); + ret.append (","); + ret.append (end); + ret.append ("): "); + while (start.getPosition () < endpos) + { + try + { + c = mPage.getCharacter (start); + switch (c) + { + case '\t': + ret.append ("\\t"); + break; + case '\n': + ret.append ("\\n"); + break; + case '\r': + ret.append ("\\r"); + break; + default: + ret.append (c); + } + } + catch (ParserException pe) + { + // not really expected, but we're only doing toString, so ignore + } + if (77 <= ret.length ()) + { + ret.append ("..."); + break; + } + } + } + else + { + ret.append ("Txt ("); + ret.append (startpos); + ret.append (","); + ret.append (endpos); + ret.append ("): "); + while (startpos < endpos) + { + c = mText.charAt (startpos); + switch (c) + { + case '\t': + ret.append ("\\t"); + break; + case '\n': + ret.append ("\\n"); + break; + case '\r': + ret.append ("\\r"); + break; + default: + ret.append (c); + } + if (77 <= ret.length ()) + { + ret.append ("..."); + break; + } + startpos++; + } + } + + return (ret.toString ()); } *************** *** 53,59 **** * <code>visitStringNode()</code> on. */ ! public void accept (Object visitor) { ! ((NodeVisitor)visitor).visitStringNode (this); } } --- 202,208 ---- * <code>visitStringNode()</code> on. */ ! public void accept (NodeVisitor visitor) { ! visitor.visitStringNode (this); } } Index: RemarkNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/RemarkNode.java,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** RemarkNode.java 14 Jan 2004 02:53:46 -0000 1.41 --- RemarkNode.java 24 May 2004 00:38:15 -0000 1.42 *************** *** 1,5 **** // HTMLParser Library $Name$ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser ! // Copyright (C) 2004 Somik Raha // // Revision Control Information --- 1,5 ---- // HTMLParser Library $Name$ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser ! // Copyright (C) 2004 Derrick Oswald // // Revision Control Information *************** *** 27,31 **** --- 27,35 ---- package org.htmlparser; + import org.htmlparser.AbstractNode; + import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Page; + import org.htmlparser.util.NodeList; + import org.htmlparser.util.ParserException; import org.htmlparser.visitors.NodeVisitor; *************** *** 33,49 **** * The remark tag is identified and represented by this class. */ ! public class RemarkNode ! extends ! org.htmlparser.lexer.nodes.RemarkNode { /** ! * Constructor takes in the text string, beginning and ending posns. ! * @param page The page this string is on. ! * @param start The beginning position of the string. ! * @param end The ending positiong of the string. */ public RemarkNode (Page page, int start, int end) { super (page, start, end); } --- 37,228 ---- * The remark tag is identified and represented by this class. */ ! public class RemarkNode extends AbstractNode { /** ! * The contents of the remark node, or override text. ! */ ! protected String mText; ! ! /** ! * Constructor takes in the text string. ! * @param text The string node text. For correct generation of HTML, this ! * should not contain representations of tags (unless they are balanced). ! */ ! public RemarkNode (String text) ! { ! super (null, 0, 0); ! setText (text); ! } ! ! /** ! * Constructor takes in the page and beginning and ending posns. ! * @param page The page this remark is on. ! * @param start The beginning position of the remark. ! * @param end The ending positiong of the remark. */ public RemarkNode (Page page, int start, int end) { super (page, start, end); + mText = null; + } + + /** + * Returns the text contents of the comment tag. + * @return The contents of the text inside the comment delimiters. + */ + public String getText() + { + int start; + int end; + String ret; + + if (null == mText) + { + start = getStartPosition () + 4; // <!-- + end = getEndPosition () - 3; // --> + if (start >= end) + ret = ""; + else + ret = mPage.getText (start, end); + } + else + ret = mText; + + return (ret); + } + + /** + * Sets the string contents of the node. + * If the text has the remark delimiters (<!-- -->), these are stripped off. + * @param text The new text for the node. + */ + public void setText (String text) + { + mText = text; + if (text.startsWith ("<!--") && text.endsWith ("-->")) + mText = text.substring (4, text.length () - 3); + nodeBegin = 0; + nodeEnd = mText.length (); + } + + public String toPlainTextString() + { + return (getText()); + } + + public String toHtml() + { + StringBuffer buffer; + String ret; + + if (null == mText) + ret = mPage.getText (getStartPosition (), getEndPosition ()); + else + { + buffer = new StringBuffer (mText.length () + 7); + buffer.append ("<!--"); + buffer.append (mText); + buffer.append ("-->"); + ret = buffer.toString (); + } + + return (ret); + } + + /** + * Print the contents of the remark tag. + * This is suitable for display in a debugger or output to a printout. + * Control characters are replaced by their equivalent escape + * sequence and contents is truncated to 80 characters. + * @return A string representation of the remark node. + */ + public String toString() + { + int startpos; + int endpos; + Cursor start; + Cursor end; + char c; + StringBuffer ret; + + startpos = getStartPosition (); + endpos = getEndPosition (); + ret = new StringBuffer (endpos - startpos + 20); + if (null == mText) + { + start = new Cursor (getPage (), startpos); + end = new Cursor (getPage (), endpos); + ret.append ("Rem ("); + ret.append (start); + ret.append (","); + ret.append (end); + ret.append ("): "); + start.setPosition (startpos + 4); // <!-- + endpos -= 3; // --> + while (start.getPosition () < endpos) + { + try + { + c = mPage.getCharacter (start); + switch (c) + { + case '\t': + ret.append ("\\t"); + break; + case '\n': + ret.append ("\\n"); + break; + case '\r': + ret.append ("\\r"); + break; + default: + ret.append (c); + } + } + catch (ParserException pe) + { + // not really expected, but we're only doing toString, so ignore + } + if (77 <= ret.length ()) + { + ret.append ("..."); + break; + } + } + } + else + { + ret.append ("Rem ("); + ret.append (startpos); + ret.append (","); + ret.append (endpos); + ret.append ("): "); + while (startpos < endpos) + { + c = mText.charAt (startpos); + switch (c) + { + case '\t': + ret.append ("\\t"); + break; + case '\n': + ret.append ("\\n"); + break; + case '\r': + ret.append ("\\r"); + break; + default: + ret.append (c); + } + if (77 <= ret.length ()) + { + ret.append ("..."); + break; + } + startpos++; + } + } + + return (ret.toString ()); } *************** *** 53,59 **** * <code>visitRemarkNode()</code> on. */ ! public void accept (Object visitor) { ! ((NodeVisitor)visitor).visitRemarkNode (this); } } --- 232,238 ---- * <code>visitRemarkNode()</code> on. */ ! public void accept (NodeVisitor visitor) { ! visitor.visitRemarkNode (this); } } Index: Node.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Node.java,v retrieving revision 1.48 retrieving revision 1.49 diff -C2 -d -r1.48 -r1.49 *** Node.java 2 Jan 2004 16:24:52 -0000 1.48 --- Node.java 24 May 2004 00:38:15 -0000 1.49 *************** *** 29,32 **** --- 29,33 ---- import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; + import org.htmlparser.visitors.NodeVisitor; public interface Node *************** *** 135,139 **** * Apply the visitor object (of type NodeVisitor) to this node. */ ! public abstract void accept(Object visitor); /** --- 136,140 ---- * Apply the visitor object (of type NodeVisitor) to this node. */ ! public abstract void accept (NodeVisitor visitor); /** Index: AbstractNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/AbstractNode.java,v retrieving revision 1.24 retrieving revision 1.25 diff -C2 -d -r1.24 -r1.25 *** AbstractNode.java 2 Jan 2004 16:24:52 -0000 1.24 --- AbstractNode.java 24 May 2004 00:38:15 -0000 1.25 *************** *** 32,35 **** --- 32,36 ---- import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; + import org.htmlparser.visitors.NodeVisitor; /** *************** *** 219,223 **** } ! public abstract void accept(Object visitor); /** --- 220,224 ---- } ! public abstract void accept (NodeVisitor visitor); /** |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:34
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/visitors Modified Files: HtmlPage.java LinkFindingVisitor.java NodeVisitor.java ObjectFindingVisitor.java TagFindingVisitor.java TextExtractingVisitor.java UrlModifyingVisitor.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: LinkFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/LinkFindingVisitor.java,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** LinkFindingVisitor.java 25 Jan 2004 21:33:14 -0000 1.35 --- LinkFindingVisitor.java 24 May 2004 00:38:19 -0000 1.36 *************** *** 28,34 **** import java.util.Locale; - import org.htmlparser.tags.LinkTag; public class LinkFindingVisitor extends NodeVisitor { --- 28,35 ---- import java.util.Locale; import org.htmlparser.tags.LinkTag; + import org.htmlparser.Tag; + public class LinkFindingVisitor extends NodeVisitor { *************** *** 49,56 **** } ! public void visitLinkTag(LinkTag linkTag) { ! if (-1 != linkTag.getLinkText ().toUpperCase (locale).indexOf (linkTextToFind)) ! count++; } --- 50,58 ---- } ! public void visitTag(Tag tag) { ! if (tag instanceof LinkTag) ! if (-1 != ((LinkTag)tag).getLinkText ().toUpperCase (locale).indexOf (linkTextToFind)) ! count++; } Index: TagFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/TagFindingVisitor.java,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** TagFindingVisitor.java 2 Jan 2004 16:24:58 -0000 1.41 --- TagFindingVisitor.java 24 May 2004 00:38:19 -0000 1.42 *************** *** 28,32 **** import org.htmlparser.Node; ! import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; --- 28,32 ---- import org.htmlparser.Node; ! import org.htmlparser.Tag; import org.htmlparser.util.NodeList; Index: HtmlPage.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/HtmlPage.java,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** HtmlPage.java 14 Jan 2004 02:53:47 -0000 1.42 --- HtmlPage.java 24 May 2004 00:38:19 -0000 1.43 *************** *** 30,34 **** import org.htmlparser.tags.BodyTag; import org.htmlparser.tags.TableTag; ! import org.htmlparser.tags.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.util.NodeList; --- 30,34 ---- import org.htmlparser.tags.BodyTag; import org.htmlparser.tags.TableTag; ! import org.htmlparser.Tag; import org.htmlparser.tags.TitleTag; import org.htmlparser.util.NodeList; *************** *** 60,63 **** --- 60,65 ---- else if (isBodyTag(tag)) nodesInBody = tag.getChildren (); + else if (isTitleTag(tag)) + title = ((TitleTag)tag).getTitle(); } *************** *** 72,75 **** --- 74,82 ---- } + private boolean isTitleTag(Tag tag) + { + return (tag instanceof TitleTag); + } + public NodeList getBody() { return nodesInBody; *************** *** 82,89 **** return tableArr; } - - public void visitTitleTag(TitleTag titleTag) - { - title = titleTag.getTitle(); - } } --- 89,91 ---- Index: ObjectFindingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/ObjectFindingVisitor.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** ObjectFindingVisitor.java 2 Jan 2004 16:24:58 -0000 1.39 --- ObjectFindingVisitor.java 24 May 2004 00:38:19 -0000 1.40 *************** *** 28,32 **** import org.htmlparser.Node; ! import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeList; --- 28,32 ---- import org.htmlparser.Node; ! import org.htmlparser.Tag; import org.htmlparser.util.NodeList; Index: NodeVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/NodeVisitor.java,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -d -r1.36 -r1.37 *** NodeVisitor.java 2 Jan 2004 16:24:58 -0000 1.36 --- NodeVisitor.java 24 May 2004 00:38:19 -0000 1.37 *************** *** 29,36 **** import org.htmlparser.RemarkNode; import org.htmlparser.StringNode; ! import org.htmlparser.tags.Tag; ! import org.htmlparser.tags.ImageTag; ! import org.htmlparser.tags.LinkTag; ! import org.htmlparser.tags.TitleTag; /** --- 29,33 ---- import org.htmlparser.RemarkNode; import org.htmlparser.StringNode; ! import org.htmlparser.Tag; /** *************** *** 43,49 **** * types of nodes encountered in depth-first order and finally * <code>finishedParsing()</code>.<p> - * There are currently three specialized <code>visitXXX()</code> calls for - * titles, images and links. Thes call their specialized visit, and then - * perform the generic processing. * Typical code to print all the link tags: * <pre> --- 40,43 ---- *************** *** 58,64 **** * { * } ! * public void visitLinkTag (LinkTag linkTag) * { ! * System.out.println (linkTag); * } * public static void main (String[] args) throws ParserException --- 52,59 ---- * { * } ! * public void visitTag (Tag tag) * { ! * if (tag instanceof LinkTag) ! * System.out.println (tag); * } * public static void main (String[] args) throws ParserException *************** *** 75,79 **** private boolean mRecurseChildren; private boolean mRecurseSelf; ! public NodeVisitor () { --- 70,77 ---- private boolean mRecurseChildren; private boolean mRecurseSelf; ! ! /** ! * Creates a node visitor that recurses itself and it's children. ! */ public NodeVisitor () { *************** *** 81,84 **** --- 79,88 ---- } + /** + * Creates a node visitor that recurses itself and it's children + * only if <code>recurseChildren</code> is <code>true</code>. + * @param recurseChildren If <code>true</code>, the visitor will + * visit children, otherwise only the top level nodes are recursed. + */ public NodeVisitor (boolean recurseChildren) { *************** *** 86,89 **** --- 90,102 ---- } + /** + * Creates a node visitor that recurses itself only if + * <code>recurseSelf</code> is <code>true</code> and it's children + * only if <code>recurseChildren</code> is <code>true</code>. + * @param recurseChildren If <code>true</code>, the visitor will + * visit children, otherwise only the top level nodes are recursed. + * @param recurseSelf If <code>true</code>, the visitor will + * visit the top level node. + */ public NodeVisitor (boolean recurseChildren, boolean recurseSelf) { *************** *** 100,122 **** } public void visitTag (Tag tag) { - } public void visitEndTag (Tag tag) { - } ! public void visitStringNode (StringNode stringNode) { } ! public void visitRemarkNode (RemarkNode remarkNode) { - } ! /** * Override this method if you wish to do special --- 113,148 ---- } + /** + * Called for each <code>Tag</code> visited. + * @param tag The tag being visited. + */ public void visitTag (Tag tag) { } + /** + * Called for each <code>Tag</code> visited that is an end tag. + * @param tag The end tag being visited. + */ public void visitEndTag (Tag tag) { } ! /** ! * Called for each <code>StringNode</code> visited. ! * @param string The string node being visited. ! */ ! public void visitStringNode (StringNode string) { } ! /** ! * Called for each <code>RemarkNode</code> visited. ! * @param remark The remark node being visited. ! */ ! public void visitRemarkNode (RemarkNode remark) { } ! /** * Override this method if you wish to do special *************** *** 127,143 **** } ! public void visitLinkTag (LinkTag linkTag) ! { ! } ! ! public void visitImageTag (ImageTag imageTag) ! { ! } ! ! public void visitTitleTag (TitleTag titleTag) ! { ! ! } ! public boolean shouldRecurseChildren () { --- 153,160 ---- } ! /** ! * Depth traversal predicate. ! * @return <code>true</code> if children are to be visited. ! */ public boolean shouldRecurseChildren () { *************** *** 145,148 **** --- 162,169 ---- } + /** + * Self traversal predicate. + * @return <code>true</code> if a node itself is to be visited. + */ public boolean shouldRecurseSelf () { Index: TextExtractingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/TextExtractingVisitor.java,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** TextExtractingVisitor.java 14 Jan 2004 02:53:47 -0000 1.40 --- TextExtractingVisitor.java 24 May 2004 00:38:19 -0000 1.41 *************** *** 28,32 **** import org.htmlparser.StringNode; ! import org.htmlparser.tags.Tag; import org.htmlparser.util.Translate; --- 28,32 ---- import org.htmlparser.StringNode; ! import org.htmlparser.Tag; import org.htmlparser.util.Translate; Index: UrlModifyingVisitor.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/visitors/UrlModifyingVisitor.java,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** UrlModifyingVisitor.java 2 Jan 2004 16:24:58 -0000 1.43 --- UrlModifyingVisitor.java 24 May 2004 00:38:19 -0000 1.44 *************** *** 34,38 **** import org.htmlparser.tags.ImageTag; import org.htmlparser.tags.LinkTag; ! import org.htmlparser.tags.Tag; public class UrlModifyingVisitor extends NodeVisitor { --- 34,38 ---- import org.htmlparser.tags.ImageTag; import org.htmlparser.tags.LinkTag; ! import org.htmlparser.Tag; public class UrlModifyingVisitor extends NodeVisitor { *************** *** 48,59 **** } - public void visitLinkTag(LinkTag linkTag) { - linkTag.setLink(linkPrefix + linkTag.getLink()); - } - - public void visitImageTag(ImageTag imageTag) { - imageTag.setImageURL(linkPrefix + imageTag.getImageURL()); - } - public void visitRemarkNode (RemarkNode remarkNode) { --- 48,51 ---- *************** *** 67,71 **** public void visitTag(Tag tag) ! { // process only those nodes that won't be processed by an end tag, // nodes without parents or parents without an end tag, since // the complete processing of all children should happen before --- 59,68 ---- public void visitTag(Tag tag) ! { ! if (tag instanceof LinkTag) ! ((LinkTag)tag).setLink(linkPrefix + ((LinkTag)tag).getLink()); ! else if (tag instanceof ImageTag) ! ((ImageTag)tag).setImageURL(linkPrefix + ((ImageTag)tag).getImageURL()); ! // process only those nodes that won't be processed by an end tag, // nodes without parents or parents without an end tag, since // the complete processing of all children should happen before |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:30
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/tests/lexerTests Modified Files: LexerTests.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: LexerTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/LexerTests.java,v retrieving revision 1.19 retrieving revision 1.20 diff -C2 -d -r1.19 -r1.20 *** LexerTests.java 18 Feb 2004 12:34:04 -0000 1.19 --- LexerTests.java 24 May 2004 00:38:18 -0000 1.20 *************** *** 34,39 **** import org.htmlparser.Parser; import org.htmlparser.lexer.Lexer; ! import org.htmlparser.lexer.nodes.RemarkNode; ! import org.htmlparser.lexer.nodes.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.tags.Tag; --- 34,39 ---- import org.htmlparser.Parser; import org.htmlparser.lexer.Lexer; ! import org.htmlparser.RemarkNode; ! import org.htmlparser.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.tags.Tag; |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:30
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/tests/visitorsTests Modified Files: NodeVisitorTest.java ScriptCommentTest.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: ScriptCommentTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/ScriptCommentTest.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** ScriptCommentTest.java 22 May 2004 03:57:31 -0000 1.1 --- ScriptCommentTest.java 24 May 2004 00:38:19 -0000 1.2 *************** *** 29,33 **** import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ScriptTag; ! import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; --- 29,33 ---- import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ScriptTag; ! import org.htmlparser.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; Index: NodeVisitorTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/NodeVisitorTest.java,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** NodeVisitorTest.java 2 Jan 2004 16:24:57 -0000 1.14 --- NodeVisitorTest.java 24 May 2004 00:38:19 -0000 1.15 *************** *** 31,35 **** import org.htmlparser.StringNode; ! import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; --- 31,35 ---- import org.htmlparser.StringNode; ! import org.htmlparser.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:29
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/tests/utilTests Modified Files: NodeListTest.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: NodeListTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/NodeListTest.java,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** NodeListTest.java 2 Jan 2004 16:24:57 -0000 1.25 --- NodeListTest.java 24 May 2004 00:38:19 -0000 1.26 *************** *** 32,35 **** --- 32,36 ---- import org.htmlparser.util.NodeList; import org.htmlparser.util.SimpleNodeIterator; + import org.htmlparser.visitors.NodeVisitor; public class NodeListTest extends ParserTestCase { *************** *** 121,125 **** private Node createHTMLNodeObject() { Node node = new AbstractNode(null,10,20) { ! public void accept(Object visitor) { } --- 122,126 ---- private Node createHTMLNodeObject() { Node node = new AbstractNode(null,10,20) { ! public void accept(NodeVisitor visitor) { } |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:28
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/filterTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/tests/filterTests Modified Files: FilterTest.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: FilterTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/filterTests/FilterTest.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** FilterTest.java 10 May 2004 22:31:46 -0000 1.3 --- FilterTest.java 24 May 2004 00:38:18 -0000 1.4 *************** *** 38,42 **** import org.htmlparser.filters.TagNameFilter; import org.htmlparser.lexer.Lexer; ! import org.htmlparser.lexer.nodes.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.tags.BodyTag; --- 38,42 ---- import org.htmlparser.filters.TagNameFilter; import org.htmlparser.lexer.Lexer; ! import org.htmlparser.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.tags.BodyTag; |
From: Derrick O. <der...@us...> - 2004-05-24 00:38:28
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31556/tags Modified Files: ImageTag.java LinkTag.java Tag.java TitleTag.java Log Message: Part two of a multiphase refactoring. Part one added the Tag interface. This submission eliminates some of the duplication between the lexer.nodes package and the htmlparser package by removing the tag specific signatures, visitTitleTag, visitLinkTag and visitImageTag, from the NodeVisitor class. This allows the lexer to return htmlparser level classes for StringNode and RemarkNode. The TagNode is still present in the lexer.nodes package, but will move next. This means that classes derived from NodeVisitor *will not* work using the above signatures; instead a check for tag class (or name) should be performed in visitTag. A document will be added to the visitors package with comprehensive porting instructions. Index: ImageTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/ImageTag.java,v retrieving revision 1.44 retrieving revision 1.45 diff -C2 -d -r1.44 -r1.45 *** ImageTag.java 18 Mar 2004 04:04:08 -0000 1.44 --- ImageTag.java 24 May 2004 00:38:17 -0000 1.45 *************** *** 195,210 **** setAttribute ("SRC", imageURL); } - - /** - * Image visiting code. - * Invokes <code>visitImageTag()</code> on the visitor and then - * invokes the normal tag processing. - * @param visitor The <code>NodeVisitor</code> object to invoke - * <code>visitImageTag()</code> on. - */ - public void accept (NodeVisitor visitor) - { - visitor.visitImageTag (this); - super.accept (visitor); - } } --- 195,197 ---- Index: LinkTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/LinkTag.java,v retrieving revision 1.49 retrieving revision 1.50 diff -C2 -d -r1.49 -r1.50 *** LinkTag.java 18 Mar 2004 04:04:08 -0000 1.49 --- LinkTag.java 24 May 2004 00:38:17 -0000 1.50 *************** *** 304,320 **** /** - * Link visiting code. - * Invokes <code>visitLinkTag()</code> on the visitor and then - * invokes the normal tag processing. - * @param visitor The <code>NodeVisitor</code> object to invoke - * <code>visitLinkTag()</code> on. - */ - public void accept (NodeVisitor visitor) - { - visitor.visitLinkTag (this); - super.accept (visitor); - } - - /** * Extract the link from the HREF attribute. * @return The URL from the HREF attibute. This is absolute if the tag has --- 304,307 ---- Index: TitleTag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/TitleTag.java,v retrieving revision 1.33 retrieving revision 1.34 diff -C2 -d -r1.33 -r1.34 *** TitleTag.java 2 Jan 2004 16:24:55 -0000 1.33 --- TitleTag.java 24 May 2004 00:38:18 -0000 1.34 *************** *** 95,110 **** return "TITLE: " + getTitle(); } - - /** - * Title visiting code. - * Invokes <code>visitTitleTag()</code> on the visitor and then - * invokes the normal tag processing. - * @param visitor The <code>NodeVisitor</code> object to invoke - * <code>visitTitleTag()</code> on. - */ - public void accept (NodeVisitor visitor) - { - visitor.visitTitleTag (this); - super.accept (visitor); - } } --- 95,97 ---- Index: Tag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/Tag.java,v retrieving revision 1.62 retrieving revision 1.63 diff -C2 -d -r1.62 -r1.63 *** Tag.java 28 Feb 2004 15:52:43 -0000 1.62 --- Tag.java 24 May 2004 00:38:18 -0000 1.63 *************** *** 138,167 **** mScanner = scanner; } - - /** - * Handle a visitor. - * <em>NOTE: This currently defers to accept(NodeVisitor). If - * subclasses of Node override accept(Object) directly, they must - * handle the delegation to <code>visitTag()</code> and - * <code>visitEndTag()</code>.</em> - * @param visitor The <code>NodeVisitor</code> object - * (a cast is performed without checking). - */ - public void accept (Object visitor) - { - accept ((NodeVisitor)visitor); - } - - /** - * Default tag visiting code. - * Based on <code>isEndTag()</code>, calls either <code>visitTag()</code> or - * <code>visitEndTag()</code>. - */ - public void accept (NodeVisitor visitor) - { - if (isEndTag ()) - ((NodeVisitor)visitor).visitEndTag (this); - else - ((NodeVisitor)visitor).visitTag (this); - } } --- 138,140 ---- |
From: Derrick O. <der...@us...> - 2004-05-23 19:42:24
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32030 Modified Files: Page.java Log Message: Incorporate feature request submitted by Bradford A. Folkens #943197 Accept gzip / deflate content encodings by setting request property "Accept-Encoding" to "gzip, deflate" in Page.setConnection(), if possible, and handling those encodings. No test case added because it needs a specially configured HTTP server. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** Page.java 7 May 2004 23:30:37 -0000 1.35 --- Page.java 23 May 2004 19:42:14 -0000 1.36 *************** *** 40,43 **** --- 40,45 ---- import java.net.URLConnection; import java.net.UnknownHostException; + import java.util.zip.GZIPInputStream; + import java.util.zip.InflaterInputStream; import org.htmlparser.util.EncodingChangeException; *************** *** 319,326 **** --- 321,337 ---- String type; String charset; + String contentEncoding; mConnection = connection; try { + try + { + getConnection ().setRequestProperty ("Accept-Encoding", "gzip, deflate"); + } + catch (IllegalStateException ise) // already connected + { + // assume all request properties have already been set + } getConnection ().connect (); } *************** *** 343,347 **** try { ! stream = new Stream (getConnection ().getInputStream ()); try { --- 354,370 ---- try { ! contentEncoding = connection.getContentEncoding(); ! if ((null != contentEncoding) && (-1 != contentEncoding.indexOf ("gzip"))) ! { ! stream = new Stream (new GZIPInputStream (getConnection ().getInputStream ())); ! } ! else if ((null != contentEncoding) && (-1 != contentEncoding.indexOf ("deflate"))) ! { ! stream = new Stream (new InflaterInputStream (getConnection ().getInputStream ())); ! } ! else ! { ! stream = new Stream (getConnection ().getInputStream ()); ! } try { |
From: Derrick O. <der...@us...> - 2004-05-22 20:10:42
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29429/src/org/htmlparser/tests/visitorsTests Modified Files: Tag: v1_41 AllTests.java Added Files: Tag: v1_41 ScriptCommentTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. --- NEW FILE: ScriptCommentTest.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Jim Arnell // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/ScriptCommentTest.java,v $ // $Author: derrickoswald $ // $Date: 2004/05/22 20:10:33 $ // $Revision: 1.1.2.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.tests.visitorsTests; import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ScriptTag; import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; public class ScriptCommentTest extends ParserTestCase { static { System.setProperty ("org.htmlparser.tests.visitorsTests.ScriptCommentTest", "ScriptCommentTest"); } private String workingScriptTag = "<script language='javascript'>" + "// I cant handle single quotations\n" + "</script>"; private String workingHtml = this.workingScriptTag + "<HTML>" + "</HTML>"; private String failingScriptTag = "<script language='javascript'>" + "// I can't handle single quotations.\n" + "</script>"; private String failingHtml = this.failingScriptTag + "<HTML>" + "</HTML>"; private String failingHtml2 = "<HTML>" + this.failingScriptTag + "</HTML>"; private String anotherFailingScriptTag = "<script language='javascript'>" + "/* I can't handle single quotations. */" + "</script>"; private String failingHtml3 = this.anotherFailingScriptTag + "<HTML>" + "</HTML>"; public ScriptCommentTest(String name) { super(name); } public void testTagWorking() throws Exception { createParser(this.workingHtml); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing worked", this.workingScriptTag, scriptNodeHtml); } public void testScriptTagNotWorkingOuter() throws Exception { createParser(this.failingHtml); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing not working", this.failingScriptTag, scriptNodeHtml); } public void testScriptTagNotWorkingInner() throws Exception { createParser(this.failingHtml2); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing not working", this.failingScriptTag, scriptNodeHtml); } public void testScriptTagNotWorkingMultiLine() throws Exception { createParser(this.anotherFailingScriptTag); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing not working", this.anotherFailingScriptTag, scriptNodeHtml); } /** * Implement test case NodeVisitor. */ public final class ScriptVisitor extends NodeVisitor { /** Keps the only script tag. */ public ScriptTag scriptTag; /** * Creates a new ScriptVisitor object. * * @param hat param. * @param hostString param. * @param direction param. */ public ScriptVisitor() { super(true, true); } /** * @see org.htmlparser.visitors.NodeVisitor */ public void visitTag(final Tag n) { if ((null != n.getParent()) || ((n instanceof CompositeTag) && (null == ((CompositeTag) n).getEndTag()))) { if (n instanceof ScriptTag) { this.scriptTag = (ScriptTag) n; } } else { if (n instanceof ScriptTag) { this.scriptTag = (ScriptTag) n; } } } } } |
From: Derrick O. <der...@us...> - 2004-05-22 20:10:42
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29429/src/org/htmlparser/tests/scannersTests Modified Files: Tag: v1_41 ScriptScannerTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Index: ScriptScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v retrieving revision 1.53 retrieving revision 1.53.2.1 diff -C2 -d -r1.53 -r1.53.2.1 *** ScriptScannerTest.java 28 Feb 2004 15:52:44 -0000 1.53 --- ScriptScannerTest.java 22 May 2004 20:10:32 -0000 1.53.2.1 *************** *** 203,207 **** "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag. " + "document.write(\"</script>\");" + "</script>" + --- 203,207 ---- "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag.\n" + "document.write(\"</script>\");" + "</script>" + *************** *** 226,230 **** "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag. " + "document.write(\"</script>\");", scriptTag.getScriptCode() --- 226,230 ---- "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag.\n" + "document.write(\"</script>\");", scriptTag.getScriptCode() |
From: Derrick O. <der...@us...> - 2004-05-22 20:10:41
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29429/src/org/htmlparser/lexer Modified Files: Tag: v1_41 Lexer.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Index: Lexer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v retrieving revision 1.27 retrieving revision 1.27.2.1 diff -C2 -d -r1.27 -r1.27.2.1 *** Lexer.java 18 Feb 2004 12:34:04 -0000 1.27 --- Lexer.java 22 May 2004 20:10:31 -0000 1.27.2.1 *************** *** 303,306 **** --- 303,307 ---- break; default: + probe.retreat (); // string needs to see leading foreslash ret = parseString (probe, quotesmart); break; *************** *** 412,415 **** --- 413,445 ---- else if (quotesmart && (ch == quote)) quote = 0; // exit quoted state + else if (quotesmart && (0 == quote) && (ch == '/')) + { + // handle multiline and double slash comments (with a quote) in script like: + // I can't handle single quotations. + ch = mPage.getCharacter (cursor); + if (0 == ch) + done = true; + else if ('/' == ch) + { + do + ch = mPage.getCharacter (cursor); + while ((ch != 0) && (ch != '\n')); + } + else if ('*' == ch) + { + do + { + do + ch = mPage.getCharacter (cursor); + while ((ch != 0) && (ch != '*')); + ch = mPage.getCharacter (cursor); + if (ch == '*') + cursor.retreat (); + } + while ((ch != 0) && (ch != '/')); + } + else + cursor.retreat (); + } else if ((0 == quote) && ('<' == ch)) { |
From: Derrick O. <der...@us...> - 2004-05-22 20:10:41
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29429/src/org/htmlparser/tests/parserHelperTests Modified Files: Tag: v1_41 StringParserTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Index: StringParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests/StringParserTest.java,v retrieving revision 1.46 retrieving revision 1.46.2.1 diff -C2 -d -r1.46 -r1.46.2.1 *** StringParserTest.java 2 Jan 2004 16:24:56 -0000 1.46 --- StringParserTest.java 22 May 2004 20:10:32 -0000 1.46.2.1 *************** *** 206,213 **** "</head>" + "<script language=\"JavaScript\" type=\"text/JavaScript\">" + ! "// if this fails, output a 'hello' " + "if (true) " + "{ " + ! "//something good... " + "} " + "</script>" + --- 206,213 ---- "</head>" + "<script language=\"JavaScript\" type=\"text/JavaScript\">" + ! "// if this fails, output a 'hello' \n" + "if (true) " + "{ " + ! "//something good...\n" + "} " + "</script>" + |
From: Derrick O. <der...@us...> - 2004-05-22 20:10:40
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29429/docs Modified Files: Tag: v1_41 changes.txt release.txt Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Index: release.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/release.txt,v retrieving revision 1.58 retrieving revision 1.58.2.1 diff -C2 -d -r1.58 -r1.58.2.1 *** release.txt 14 Mar 2004 16:31:40 -0000 1.58 --- release.txt 22 May 2004 20:10:31 -0000 1.58.2.1 *************** *** 1,3 **** ! HTMLParser Version 1.4 (Release Build Mar 14, 2004) ********************************************* --- 1,3 ---- ! HTMLParser Version 1.41 (Release Build May 22, 2004) ********************************************* *************** *** 19,22 **** --- 19,30 ---- (v) this file + Changes since Version 1.4 + ------------------------- + + Bug Fixes + --------- + 919738 Text has not been extracted correctly using StringBean + 936392 ScriptTag visitor fails for comments with ' (duplicate of above) + Changes since Version 1.3 ------------------------- Index: changes.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/changes.txt,v retrieving revision 1.199 retrieving revision 1.199.2.1 diff -C2 -d -r1.199 -r1.199.2.1 *** changes.txt 14 Mar 2004 16:31:39 -0000 1.199 --- changes.txt 22 May 2004 20:10:30 -0000 1.199.2.1 *************** *** 13,16 **** --- 13,33 ---- ******************************************************************************* + Release Build 1.41 - 20040522 + -------------------------------- + + 2004-05-22 16:10 derrickoswald + + * src/org/htmlparser/: lexer/Lexer.java, + docs/changes.txt, docs/release.txt + tests/parserHelperTests/StringParserTest.java, + tests/scannersTests/ScriptScannerTest.java, + tests/visitorsTests/AllTests.java, + tests/visitorsTests/ScriptCommentTest.java: + + Fix bug# 919738 Text has not been extracted correctly using StringBean + and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' + by handling single and multiline ecmascript comments in the Lexer class + when called with quotesmart true. + Release Build 1.4 - 20040314 -------------------------------- |
From: Derrick O. <der...@us...> - 2004-05-22 20:10:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29429/src/org/htmlparser Modified Files: Tag: v1_41 Parser.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Index: Parser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v retrieving revision 1.89 retrieving revision 1.89.2.1 diff -C2 -d -r1.89 -r1.89.2.1 *** Parser.java 14 Mar 2004 16:31:40 -0000 1.89 --- Parser.java 22 May 2004 20:10:31 -0000 1.89.2.1 *************** *** 74,78 **** */ public final static double ! VERSION_NUMBER = 1.4 ; --- 74,78 ---- */ public final static double ! VERSION_NUMBER = 1.41 ; *************** *** 88,92 **** */ public final static String ! VERSION_DATE = "Mar 14, 2004" ; --- 88,92 ---- */ public final static String ! VERSION_DATE = "May 22, 2004" ; |