htmlparser-cvs Mailing List for HTML Parser (Page 19)
Brought to you by:
derrickoswald
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(141) |
Jun
(108) |
Jul
(66) |
Aug
(127) |
Sep
(155) |
Oct
(149) |
Nov
(72) |
Dec
(72) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(100) |
Feb
(36) |
Mar
(21) |
Apr
(3) |
May
(87) |
Jun
(28) |
Jul
(84) |
Aug
(5) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2005 |
Jan
(1) |
Feb
(39) |
Mar
(26) |
Apr
(38) |
May
(14) |
Jun
(10) |
Jul
|
Aug
|
Sep
(13) |
Oct
(8) |
Nov
(10) |
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(24) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Derrick O. <der...@us...> - 2004-05-22 12:28:25
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10696 Modified Files: CssSelectorNodeFilter.java Log Message: Remove junit import. Index: CssSelectorNodeFilter.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/CssSelectorNodeFilter.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** CssSelectorNodeFilter.java 10 May 2004 22:31:57 -0000 1.1 --- CssSelectorNodeFilter.java 22 May 2004 12:28:15 -0000 1.2 *************** *** 27,31 **** package org.htmlparser.filters; - import junit.framework.TestCase; import org.htmlparser.*; import org.htmlparser.lexer.Lexer; --- 27,30 ---- |
From: Derrick O. <der...@us...> - 2004-05-22 12:09:10
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7046/docs Modified Files: changes.txt release.txt Log Message: Update version to 1.5-20040522 Index: release.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/release.txt,v retrieving revision 1.58 retrieving revision 1.59 diff -C2 -d -r1.58 -r1.59 *** release.txt 14 Mar 2004 16:31:40 -0000 1.58 --- release.txt 22 May 2004 12:08:59 -0000 1.59 *************** *** 1,3 **** ! HTMLParser Version 1.4 (Release Build Mar 14, 2004) ********************************************* --- 1,3 ---- ! HTMLParser Version 1.5 (Integration Build May 22, 2004) ********************************************* *************** *** 19,108 **** (v) this file ! Changes since Version 1.3 ------------------------- ! Translation ! Character entity encoding and decoding has been revamped, leading to ! higher throughput and less memory churn. ! Beans ! The StringBean can now be used as a visitor for parsers external to the bean. ! Decorators ! The node decorator package has been added to provide support for the ! delegate model. ! Lexer ! A new lexer i/o subsystem has been added. This provides accurate line number ! and character position data, tag and attribute names maintain their original ! case, and attributes maintain their original order. Line numbers reported by ! tags are now zero based, not one based. The node count for parsing goes up ! in most cases because whitespace is strictly maintained, i.e. every ! whitespace (i.e. newline) now counts as a StringNode too. Storage of ! attributes is now in a Vector which means the element 0 Attribute is ! actually the name of the tag, rather than having the $TAGNAME entry in a ! HashTable. The htmllexer.jar is this new i/o subsystem broken out and made ! JDK 1.1 compliant, the htmlparser.jar, which includes everything in ! htmllexer.jar, is not necessarily intended to be used in JDK 1.1 ! environments. Some support for JIS escape sequences has been added. ! Tags ! Zero arg tag constructors have been added. Attribute maintenance ! (add/remove/edit) improved. There is no EndTag class any more. Just a ! generic tag that responds true to isEndTag(). Improvements to form tag ! handling, getting <input> and <textarea> tags nested within other tags. ! Improvements to applet tag handling regarding parameters and codebases. ! Scanners ! The concept of scanners has been completely reworked. Applications register ! tags not scanners to express interest in parsing only some tags. The default ! is now to parse all tags, which is equivalent to the old registerDOMTags(), ! so some extra nesting of tags will need to be handled. CompositeTagScanner ! logic has been improved to try and match unclosed open tags when an ! unexpected end tag is encountered. This change also moved recursion off the ! JDK stack, eliminating most StackOverflow exceptions. Also, a CompositeTag's ! "startTag()" is "this", and the CompositeTagScanner just adds children. ! The ScriptScanner will now decrypt Microsoft Script Encoder encrypted script ! tags. The plaintext is available via ScriptTag.getScriptCode(). Filters ! A new powerful filtering capability has been added, which makes extracting ! specific tags very easy. ! Applications ! New example applications Thumbelina and SiteCapturer. ! A mainline has been added to the Translate class to encode/decode stdin to ! stdout. Bug Fixes --------- ! 911565 isValued() and isEmpty() don't work ! 902121 StringBean throws NullPointerException. ! 900128 RemarkNode.setText() does not set Text ! 900125 Style Tag Children not grouped ! 899413 bug in javascript end detection. ! 891058 Bug in lexer ! 865279 Documentation ! 851882 zero length alt tag causes bug in ImageScanner ! 839264 toHtml() parse error in Javascripts with "form" keyword ! 833592 DOCTYPE element is not parsed correctly ! 832530 empty attribute causes parser to fail ! 826764 ParserException occurs only when using setInputHTML() instea ! 825820 Words conjoined ! 825645 <input> not getting parsed inside table ! 813838 links not parsed correctly ! 805598 attribute src in tag img sometimes not correctly parsed ! 801118 two " characters at the end of an attribute value problem ! 798554 Applet Tag does not update codebase data ! 798553 setInputHtml does not set text ! 798552 Sample for node iterator incorrect ! 789439 Japanese page causes OutOfMemory Exception ! 788746 parser crashes on comments like <!-- foobar --!> ! 786869 LinkExtractor Sample not working ! 784767 irc://server/channel urls are HTTPLike? ! 778781 SRC-attribute suppression in IMG-tags ! 772700 Jsp Tags are not parsed correctly when in quoted attributes ! 765413 typo ! 761798 Error reading next element. ! 757337 Standalone attributes should remain standalone ! 755929 Empty string attr. value causes attr parsing to be stopped ! 753012 IMG SRC not parsed v1.3 & v1.4 ! 753003 <IMG> within <A> missed when followed by <MAP> ! 750117 StackOverFlow while Node-Iteration ! 749295 Problem Parsing Table ! 745566 StackOverflowError on select with too many unclosed options ! 744610 getLink() Erroneous for Relative Links from Files on Windows Acknowledgements --- 19,41 ---- (v) this file ! Changes since Version 1.4 ------------------------- ! Configuration Management ! Removed the need for the Translate class to be packaged with htmllexer.jar. ! This results in a lighter weight component. ! Refactoring ! Added Tag interface. Obviated LinkProcessor and moved it's functionality to ! the Page class. Filters ! Added CssSelectorNodeFilter. ! ! Enhancement Requests ! -------------------- ! 943593 LinkProcessor.extract(link,base) weird behaviour? Bug Fixes --------- ! 919738 Text has not been extracted correctly using StringBean ! 936392 ScriptTag visitor fails for comments with ' Acknowledgements *************** *** 140,143 **** --- 73,78 ---- [30] Gernot Fricke [31] Anthony Labarre + [32] Alberto Nacher + [33] Rogers George If you find any bugs, please go to Index: changes.txt =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/changes.txt,v retrieving revision 1.199 retrieving revision 1.200 diff -C2 -d -r1.199 -r1.200 *** changes.txt 14 Mar 2004 16:31:39 -0000 1.199 --- changes.txt 22 May 2004 12:08:57 -0000 1.200 *************** *** 11,3582 **** * http://www.red-bean.com/cvs2cl/changelogs.html * * * ******************************************************************************* ! Release Build 1.4 - 20040314 ! -------------------------------- ! ! 2004-03-14 10:53 derrickoswald ! ! * src/org/htmlparser/beans/LinkBean.java: [...3685 lines suppressed...] src/org/htmlparser/tests/tagTests/BaseHrefTagTest.java, src/org/htmlparser/tests/utilTests/AllTests.java, src/org/htmlparser/tests/utilTests/HTMLLinkProcessorTest.java, ! src/org/htmlparser/util/LinkProcessor.java: ! Deprecate LinkProcessor. ! Functionality moved to Page. ! 2004-03-15 17:50 derrickoswald ! * src/doc-files/building.html: ! Update build instruction problem identified by sarsie. ! 2004-03-14 15:31 derrickoswald ! * build.xml, src/org/htmlparser/lexer/nodes/Attribute.java, ! src/org/htmlparser/lexer/nodes/TagNode.java: ! Remove requirement for Translate.class to be in htmllexer.jar. |
From: Derrick O. <der...@us...> - 2004-05-22 12:09:10
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7046/src/org/htmlparser Modified Files: Parser.java Log Message: Update version to 1.5-20040522 Index: Parser.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v retrieving revision 1.90 retrieving revision 1.91 diff -C2 -d -r1.90 -r1.91 *** Parser.java 18 Mar 2004 04:04:07 -0000 1.90 --- Parser.java 22 May 2004 12:09:00 -0000 1.91 *************** *** 73,77 **** */ public final static double ! VERSION_NUMBER = 1.4 ; --- 73,77 ---- */ public final static double ! VERSION_NUMBER = 1.5 ; *************** *** 80,84 **** */ public final static String ! VERSION_TYPE = "Release Build" ; --- 80,84 ---- */ public final static String ! VERSION_TYPE = "Integration Build" ; *************** *** 87,91 **** */ public final static String ! VERSION_DATE = "Mar 14, 2004" ; --- 87,91 ---- */ public final static String ! VERSION_DATE = "May 22, 2004" ; |
From: Derrick O. <der...@us...> - 2004-05-22 11:36:01
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv895 Modified Files: build.xml Log Message: Change minor version to 5. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.63 retrieving revision 1.64 diff -C2 -d -r1.63 -r1.64 *** build.xml 20 Mar 2004 20:01:02 -0000 1.63 --- build.xml 22 May 2004 11:35:50 -0000 1.64 *************** *** 20,34 **** that's why this step can't be automated - incorporate changes from ChangeLog into htmlparser/docs/changes under ! a heading like "Integration Build 1.4 - 20040104" - 'ant versionSource' updates the version in Parser.java and release.txt - perform a CVS update on htmlparser to identify new and changed files - commit changed files (i.e. Parser.java, release.txt, docs/changes, docs/wiki and docs/wiki/images) to the head revision using a reason of the form: ! Update version to 1.4-20040104. ! - use CVS to tag the current head revisions with a name like v1_4_20040104 - use CVS to checkout everything with the tag used above - 'ant test' compiles and runs the unit tests - 'ant clean htmlparser' updates the version headers, creates the jar file and doc files and zips ! everything into a file htmlparser/distribution/htmlparser1_4_20040104.zip - use CVS to checkout everything against the head revision to reset your workspace --- 20,34 ---- that's why this step can't be automated - incorporate changes from ChangeLog into htmlparser/docs/changes under ! a heading like "Integration Build 1.5 - 20040522" - 'ant versionSource' updates the version in Parser.java and release.txt - perform a CVS update on htmlparser to identify new and changed files - commit changed files (i.e. Parser.java, release.txt, docs/changes, docs/wiki and docs/wiki/images) to the head revision using a reason of the form: ! Update version to 1.5-20040522. ! - use CVS to tag the current head revisions with a name like v1_5_20040522. - use CVS to checkout everything with the tag used above - 'ant test' compiles and runs the unit tests - 'ant clean htmlparser' updates the version headers, creates the jar file and doc files and zips ! everything into a file htmlparser/distribution/htmlparser1_5_20040522.zip - use CVS to checkout everything against the head revision to reset your workspace *************** *** 40,47 **** ftp> cd incoming ftp> bin ! ftp> put htmlparser1_4_20040104.zip ftp> bye - add a release to the 'Integation Builds' package ! Admin-File Releases-Add Release, use a name of the form '1_4_20040104' - Step 1, 'Paste The Notes' (using numeric character references and character entity references because this is displayed as HTML) with a --- 40,47 ---- ftp> cd incoming ftp> bin ! ftp> put htmlparser1_5_20040522.zip ftp> bye - add a release to the 'Integation Builds' package ! Admin-File Releases-Add Release, use a name of the form '1_5_20040522' - Step 1, 'Paste The Notes' (using numeric character references and character entity references because this is displayed as HTML) with a *************** *** 52,56 **** Pending Bugs: - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited ! - Step 2, check the checkbox of the htmlparser1_4_20040104.zip file from the list of files in the uploads section - Submit/Refresh --- 52,56 ---- Pending Bugs: - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited ! - Step 2, check the checkbox of the htmlparser1_5_20040522.zip file from the list of files in the uploads section - Submit/Refresh *************** *** 65,69 **** Submit News - from the project summary screen, select 'Submit News' and title it like: ! HTML Parser Integration Release 1.4-20040104 - type in a summary of the changes made - SUBMIT --- 65,69 ---- Submit News - from the project summary screen, select 'Submit News' and title it like: ! HTML Parser Integration Release 1.5-20040522 - type in a summary of the changes made - SUBMIT *************** *** 77,84 **** <!-- Note: These can be overridden on the command line, as in: ! ant -DversionMinor=4 -DversionType=Release\ Build versionSource --> <property name="versionMajor" value="1"/> ! <property name="versionMinor" value="4"/> <property name="versionType" value="Integration Build"/> <property name="versionNumber" value="${versionMajor}.${versionMinor}"/> --- 77,84 ---- <!-- Note: These can be overridden on the command line, as in: ! ant -DversionMinor=5 -DversionType=Release\ Build versionSource --> <property name="versionMajor" value="1"/> ! <property name="versionMinor" value="5"/> <property name="versionType" value="Integration Build"/> <property name="versionNumber" value="${versionMajor}.${versionMinor}"/> *************** *** 104,108 **** <target name="JDK1.4"> <condition property="JDK1.4"> ! <equals arg1="1.4" arg2="${ant.java.version}"/> </condition> </target> --- 104,111 ---- <target name="JDK1.4"> <condition property="JDK1.4"> ! <or> ! <equals arg1="1.4" arg2="${ant.java.version}"/> ! <equals arg1="1.4" arg2="${ant.java.version}"/> ! </or> </condition> </target> *************** *** 321,325 **** <property name="javadoc.doctitle" value="HTML Parser ${versionNumber}"/> <property name="javadoc.header" value="<A HREF="http://htmlparser.sourceforge.net" target="_top">HTML Parser Home Page</A>"/> ! <property name="javadoc.footer" value="&copy; 2004 Somik Raha<div align="right">${TODAY_STRING}</div>"/> <property name="javadoc.bottom" value="HTML Parser is an open source library released under <A HREF="http://www.opensource.org/licenses/lgpl-license.html" target="_top">LGPL</A>.<BR> --- 324,328 ---- <property name="javadoc.doctitle" value="HTML Parser ${versionNumber}"/> <property name="javadoc.header" value="<A HREF="http://htmlparser.sourceforge.net" target="_top">HTML Parser Home Page</A>"/> ! <property name="javadoc.footer" value="&copy; 2004 Derrick Oswald<div align="right">${TODAY_STRING}</div>"/> <property name="javadoc.bottom" value="HTML Parser is an open source library released under <A HREF="http://www.opensource.org/licenses/lgpl-license.html" target="_top">LGPL</A>.<BR> |
From: Derrick O. <der...@us...> - 2004-05-22 11:33:30
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv435/src/org/htmlparser Modified Files: Tag.java Log Message: Change minor version to 5. Fix doc comment warning. Index: Tag.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Tag.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** Tag.java 20 Mar 2004 17:03:53 -0000 1.1 --- Tag.java 22 May 2004 11:33:20 -0000 1.2 *************** *** 105,110 **** * begin with "/" if it is an end tag. Nor does it end with * a slash in the case of an XML type tag. - * To get at the original text of the tag name use - * {@link #getRawTagName getRawTagName()}. * The conversion to uppercase is performed with an ENGLISH locale. * </em> --- 105,108 ---- |
From: Derrick O. <der...@us...> - 2004-05-22 03:57:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/scannersTests Modified Files: ScriptScannerTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Also added test cases for, but didn't fix bug #923146 tag nesting rule too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable) and bug #922439 OutOfMemory on huge HTML files (4,7MB) (org.htmlparser.tests.MemoryTest) which are thus currently failing. Index: ScriptScannerTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v retrieving revision 1.53 retrieving revision 1.54 diff -C2 -d -r1.53 -r1.54 *** ScriptScannerTest.java 28 Feb 2004 15:52:44 -0000 1.53 --- ScriptScannerTest.java 22 May 2004 03:57:30 -0000 1.54 *************** *** 203,207 **** "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag. " + "document.write(\"</script>\");" + "</script>" + --- 203,207 ---- "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag.\n" + "document.write(\"</script>\");" + "</script>" + *************** *** 226,230 **** "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag. " + "document.write(\"</script>\");", scriptTag.getScriptCode() --- 226,230 ---- "document.write(\"{ // do something\"); " + "document.write(\"}\"); " + ! "// parser thinks this is the end tag.\n" + "document.write(\"</script>\");", scriptTag.getScriptCode() |
From: Derrick O. <der...@us...> - 2004-05-22 03:57:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/visitorsTests Modified Files: AllTests.java Added Files: ScriptCommentTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Also added test cases for, but didn't fix bug #923146 tag nesting rule too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable) and bug #922439 OutOfMemory on huge HTML files (4,7MB) (org.htmlparser.tests.MemoryTest) which are thus currently failing. --- NEW FILE: ScriptCommentTest.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Jim Arnell // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/ScriptCommentTest.java,v $ // $Author: derrickoswald $ // $Date: 2004/05/22 03:57:31 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.tests.visitorsTests; import org.htmlparser.tags.CompositeTag; import org.htmlparser.tags.ScriptTag; import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.visitors.NodeVisitor; public class ScriptCommentTest extends ParserTestCase { static { System.setProperty ("org.htmlparser.tests.visitorsTests.ScriptCommentTest", "ScriptCommentTest"); } private String workingScriptTag = "<script language='javascript'>" + "// I cant handle single quotations\n" + "</script>"; private String workingHtml = this.workingScriptTag + "<HTML>" + "</HTML>"; private String failingScriptTag = "<script language='javascript'>" + "// I can't handle single quotations.\n" + "</script>"; private String failingHtml = this.failingScriptTag + "<HTML>" + "</HTML>"; private String failingHtml2 = "<HTML>" + this.failingScriptTag + "</HTML>"; private String anotherFailingScriptTag = "<script language='javascript'>" + "/* I can't handle single quotations. */" + "</script>"; private String failingHtml3 = this.anotherFailingScriptTag + "<HTML>" + "</HTML>"; public ScriptCommentTest(String name) { super(name); } public void testTagWorking() throws Exception { createParser(this.workingHtml); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing worked", this.workingScriptTag, scriptNodeHtml); } public void testScriptTagNotWorkingOuter() throws Exception { createParser(this.failingHtml); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing not working", this.failingScriptTag, scriptNodeHtml); } public void testScriptTagNotWorkingInner() throws Exception { createParser(this.failingHtml2); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing not working", this.failingScriptTag, scriptNodeHtml); } public void testScriptTagNotWorkingMultiLine() throws Exception { createParser(this.anotherFailingScriptTag); ScriptVisitor visitor = new ScriptVisitor(); this.parser.visitAllNodesWith(visitor); String scriptNodeHtml = visitor.scriptTag.toHtml(); assertEquals("Script parsing not working", this.anotherFailingScriptTag, scriptNodeHtml); } /** * Implement test case NodeVisitor. */ public final class ScriptVisitor extends NodeVisitor { /** Keps the only script tag. */ public ScriptTag scriptTag; /** * Creates a new ScriptVisitor object. * * @param hat param. * @param hostString param. * @param direction param. */ public ScriptVisitor() { super(true, true); } /** * @see org.htmlparser.visitors.NodeVisitor */ public void visitTag(final Tag n) { if ((null != n.getParent()) || ((n instanceof CompositeTag) && (null == ((CompositeTag) n).getEndTag()))) { if (n instanceof ScriptTag) { this.scriptTag = (ScriptTag) n; } } else { if (n instanceof ScriptTag) { this.scriptTag = (ScriptTag) n; } } } } } Index: AllTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/AllTests.java,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** AllTests.java 2 Jan 2004 16:24:57 -0000 1.41 --- AllTests.java 22 May 2004 03:57:31 -0000 1.42 *************** *** 52,55 **** --- 52,56 ---- suite.addTestSuite(TextExtractingVisitorTest.class); suite.addTestSuite(UrlModifyingVisitorTest.class); + suite.addTestSuite(ScriptCommentTest.class); return suite; |
From: Derrick O. <der...@us...> - 2004-05-22 03:57:40
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/tagTests Modified Files: InputTagTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Also added test cases for, but didn't fix bug #923146 tag nesting rule too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable) and bug #922439 OutOfMemory on huge HTML files (4,7MB) (org.htmlparser.tests.MemoryTest) which are thus currently failing. Index: InputTagTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/InputTagTest.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** InputTagTest.java 2 Jan 2004 16:24:57 -0000 1.39 --- InputTagTest.java 22 May 2004 03:57:31 -0000 1.40 *************** *** 27,31 **** --- 27,35 ---- package org.htmlparser.tests.tagTests; + import org.htmlparser.tags.FormTag; import org.htmlparser.tags.InputTag; + import org.htmlparser.tags.TableColumn; + import org.htmlparser.tags.TableRow; + import org.htmlparser.tags.TableTag; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserException; *************** *** 82,84 **** --- 86,124 ---- assertEquals("Name","Google",inputTag.getAttribute("NAME")); } + + /** + * Bug #923146 tag nesting rule too strict for forms + */ + public void testTable () throws ParserException + { + String html = + "<table>" + + "<tr>" + + "<td>" + + "<form>" + + "<input name=input1>" + + "</td>" + + // <tr> missing + "<tr>" + + "<td>" + + "<input name=input2>" + + "</td>" + + "</tr>" + + "</form>" + + "</table>"; + createParser (html); + parseAndAssertNodeCount (1); + assertTrue ("not a table", node[0] instanceof TableTag); + TableTag table = (TableTag)node[0]; + assertTrue ("not two rows", 2 == table.getRowCount ()); + // assertTrue ("not one row", 1 == table.getRowCount ()); + TableRow row = table.getRow (0); + assertTrue ("not one column", 1 == row.getColumnCount ()); + TableColumn column = row.getColumns ()[0]; + assertTrue ("not one child", 1 == column.getChildCount ()); + assertTrue ("column doesn't have a form", column.getChild (0) instanceof FormTag); + FormTag form = (FormTag)column.getChild (0); + assertTrue ("form only has one input field", 2 == form.getFormInputs ().size ()); + } + } \ No newline at end of file |
From: Derrick O. <der...@us...> - 2004-05-22 03:57:39
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests Modified Files: AllTests.java Added Files: MemoryTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Also added test cases for, but didn't fix bug #923146 tag nesting rule too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable) and bug #922439 OutOfMemory on huge HTML files (4,7MB) (org.htmlparser.tests.MemoryTest) which are thus currently failing. --- NEW FILE: MemoryTest.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/MemoryTest.java,v $ // $Author: derrickoswald $ // $Date: 2004/05/22 03:57:30 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.tests; import org.htmlparser.Node; import org.htmlparser.Parser; import org.htmlparser.util.NodeIterator; /** * Test big memory requirements. */ public class MemoryTest extends ParserTestCase { static { System.setProperty ("org.htmlparser.tests.MemoryTest", "MemoryTest"); } public MemoryTest (String name) { super (name); } /** * Test for bug #922439 OutOfMemory on huge HTML files (4,7MB) */ public void testBigFile () throws Exception { Parser parser; NodeIterator iterator; Node node; int size; parser = new Parser ("http://htmlparser.sourceforge.net/test/A002.html"); size = 0; try { iterator = parser.elements (); while (iterator.hasMoreNodes ()) { node = iterator.nextNode (); size += node.toHtml ().length (); } } catch (OutOfMemoryError ome) { fail ("out of memory"); } assertEquals ("wrong size fetched", size, 4697411); } } Index: AllTests.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/AllTests.java,v retrieving revision 1.59 retrieving revision 1.60 diff -C2 -d -r1.59 -r1.60 *** AllTests.java 2 Jan 2004 16:24:55 -0000 1.59 --- AllTests.java 22 May 2004 03:57:30 -0000 1.60 *************** *** 53,56 **** --- 53,57 ---- sub.addTestSuite (FunctionalTests.class); sub.addTestSuite (LineNumberAssignedByNodeReaderTest.class); + sub.addTestSuite (MemoryTest.class); suite.addTest (sub); suite.addTest (org.htmlparser.tests.lexerTests.AllTests.suite ()); |
From: Derrick O. <der...@us...> - 2004-05-22 03:57:39
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/parserHelperTests Modified Files: StringParserTest.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Also added test cases for, but didn't fix bug #923146 tag nesting rule too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable) and bug #922439 OutOfMemory on huge HTML files (4,7MB) (org.htmlparser.tests.MemoryTest) which are thus currently failing. Index: StringParserTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests/StringParserTest.java,v retrieving revision 1.46 retrieving revision 1.47 diff -C2 -d -r1.46 -r1.47 *** StringParserTest.java 2 Jan 2004 16:24:56 -0000 1.46 --- StringParserTest.java 22 May 2004 03:57:30 -0000 1.47 *************** *** 206,213 **** "</head>" + "<script language=\"JavaScript\" type=\"text/JavaScript\">" + ! "// if this fails, output a 'hello' " + "if (true) " + "{ " + ! "//something good... " + "} " + "</script>" + --- 206,213 ---- "</head>" + "<script language=\"JavaScript\" type=\"text/JavaScript\">" + ! "// if this fails, output a 'hello' \n" + "if (true) " + "{ " + ! "//something good...\n" + "} " + "</script>" + |
From: Derrick O. <der...@us...> - 2004-05-22 03:57:38
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/lexer Modified Files: Lexer.java Log Message: Fix bug# 919738 Text has not been extracted correctly using StringBean and (duplicate) bug #936392 ScriptTag visitor fails for comments with ' by handling single and multiline ecmascript comments in the Lexer class when called with quotesmart true. Also added test cases for, but didn't fix bug #923146 tag nesting rule too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable) and bug #922439 OutOfMemory on huge HTML files (4,7MB) (org.htmlparser.tests.MemoryTest) which are thus currently failing. Index: Lexer.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v retrieving revision 1.27 retrieving revision 1.28 diff -C2 -d -r1.27 -r1.28 *** Lexer.java 18 Feb 2004 12:34:04 -0000 1.27 --- Lexer.java 22 May 2004 03:57:29 -0000 1.28 *************** *** 303,306 **** --- 303,307 ---- break; default: + probe.retreat (); // string needs to see leading foreslash ret = parseString (probe, quotesmart); break; *************** *** 412,415 **** --- 413,445 ---- else if (quotesmart && (ch == quote)) quote = 0; // exit quoted state + else if (quotesmart && (0 == quote) && (ch == '/')) + { + // handle multiline and double slash comments (with a quote) in script like: + // I can't handle single quotations. + ch = mPage.getCharacter (cursor); + if (0 == ch) + done = true; + else if ('/' == ch) + { + do + ch = mPage.getCharacter (cursor); + while ((ch != 0) && (ch != '\n')); + } + else if ('*' == ch) + { + do + { + do + ch = mPage.getCharacter (cursor); + while ((ch != 0) && (ch != '*')); + ch = mPage.getCharacter (cursor); + if (ch == '*') + cursor.retreat (); + } + while ((ch != 0) && (ch != '/')); + } + else + cursor.retreat (); + } else if ((0 == quote) && ('<' == ch)) { |
From: Derrick O. <der...@us...> - 2004-05-16 18:00:05
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20559/beans Modified Files: StringBean.java LinkBean.java Log Message: Alter bound property name constants to agree with section 8.8 Capitalization of inferred names. in the JavaBeans API specification. Index: LinkBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/LinkBean.java,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** LinkBean.java 14 Mar 2004 15:53:06 -0000 1.28 --- LinkBean.java 16 May 2004 17:59:57 -0000 1.29 *************** *** 50,54 **** * Property name in event where the URL contents changes. */ ! public static final String PROP_LINKS_PROPERTY = "Links"; /** --- 50,54 ---- * Property name in event where the URL contents changes. */ ! public static final String PROP_LINKS_PROPERTY = "links"; /** Index: StringBean.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** StringBean.java 28 Feb 2004 15:52:42 -0000 1.38 --- StringBean.java 16 May 2004 17:59:57 -0000 1.39 *************** *** 77,86 **** * Property name in event where the URL contents changes. */ ! public static final String PROP_STRINGS_PROPERTY = "Strings"; /** * Property name in event where the 'embed links' state changes. */ ! public static final String PROP_LINKS_PROPERTY = "Links"; /** --- 77,86 ---- * Property name in event where the URL contents changes. */ ! public static final String PROP_STRINGS_PROPERTY = "strings"; /** * Property name in event where the 'embed links' state changes. */ ! public static final String PROP_LINKS_PROPERTY = "links"; /** *************** *** 92,106 **** * Property name in event where the 'replace non-breaking spaces' state changes. */ ! public static final String PROP_REPLACE_SPACE_PROPERTY = "ReplaceSpace"; /** * Property name in event where the 'collapse whitespace' state changes. */ ! public static final String PROP_COLLAPSE_PROPERTY = "Collapse"; /** * Property name in event where the connection changes. */ ! public static final String PROP_CONNECTION_PROPERTY = "Connection"; /** --- 92,106 ---- * Property name in event where the 'replace non-breaking spaces' state changes. */ ! public static final String PROP_REPLACE_SPACE_PROPERTY = "replaceNonBreakingSpaces"; /** * Property name in event where the 'collapse whitespace' state changes. */ ! public static final String PROP_COLLAPSE_PROPERTY = "collapse"; /** * Property name in event where the connection changes. */ ! public static final String PROP_CONNECTION_PROPERTY = "connection"; /** |
From: Derrick O. <der...@us...> - 2004-05-16 18:00:05
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexerapplications/thumbelina In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20559/lexerapplications/thumbelina Modified Files: Thumbelina.java Log Message: Alter bound property name constants to agree with section 8.8 Capitalization of inferred names. in the JavaBeans API specification. Index: Thumbelina.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexerapplications/thumbelina/Thumbelina.java,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** Thumbelina.java 4 Nov 2003 01:25:02 -0000 1.3 --- Thumbelina.java 16 May 2004 17:59:56 -0000 1.4 *************** *** 83,95 **** * Property name for current URL binding. */ ! public static final String PROP_CURRENT_URL_PROPERTY = "CurrentURL"; /** * Property name for queue size binding. */ ! public static final String PROP_URL_QUEUE_PROPERTY = "URLQueue"; /** * Property name for visited URL size binding. */ ! public static final String PROP_URL_VISITED_PROPERTY = "URLVisited"; /** --- 83,95 ---- * Property name for current URL binding. */ ! public static final String PROP_CURRENT_URL_PROPERTY = "currentURL"; /** * Property name for queue size binding. */ ! public static final String PROP_URL_QUEUE_PROPERTY = "queueSize"; /** * Property name for visited URL size binding. */ ! public static final String PROP_URL_VISITED_PROPERTY = "visitedSize"; /** *************** *** 1454,1457 **** --- 1454,1462 ---- * * $Log$ + * Revision 1.4 2004/05/16 17:59:56 derrickoswald + * Alter bound property name constants to agree with section + * 8.8 Capitalization of inferred names. + * in the JavaBeans API specification. + * * Revision 1.3 2003/11/04 01:25:02 derrickoswald * Made visiting order the same order as on the page. |
From: Alberto N. <an...@us...> - 2004-05-12 14:16:20
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10385 Modified Files: ParserUtils.java Log Message: Added many trim and split methods. Index: ParserUtils.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** ParserUtils.java 14 Jan 2004 02:53:47 -0000 1.39 --- ParserUtils.java 12 May 2004 14:16:08 -0000 1.40 *************** *** 27,33 **** --- 27,47 ---- package org.htmlparser.util; + import java.io.ByteArrayInputStream; + import java.io.UnsupportedEncodingException; + import java.util.ArrayList; + import org.htmlparser.Node; import org.htmlparser.NodeFilter; + import org.htmlparser.Parser; [...1107 lines suppressed...] + return links; + + } + + private static String createDummyString (char fillingChar, int length) + { + StringBuffer dummyStringBuffer = new StringBuffer(); + for (int j=0; j<length; j++) + dummyStringBuffer = dummyStringBuffer.append(fillingChar); + return new String(dummyStringBuffer); + } + + private static String modifyDummyString (String dummyString, int beginTag, int endTag) + { + String dummyStringInterval = createDummyString ('*', endTag-beginTag); + return new String(dummyString.substring(0, beginTag) + dummyStringInterval + dummyString.substring(endTag, dummyString.length())); + } + } \ No newline at end of file |
From: Alberto N. <an...@us...> - 2004-05-12 14:14:41
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9985 Modified Files: HTMLParserUtilsTest.java Log Message: Added many trim and split functions, here are the tests Index: HTMLParserUtilsTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/HTMLParserUtilsTest.java,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** HTMLParserUtilsTest.java 2 Jan 2004 16:24:57 -0000 1.15 --- HTMLParserUtilsTest.java 12 May 2004 14:14:32 -0000 1.16 *************** *** 27,30 **** --- 27,34 ---- package org.htmlparser.tests.utilTests; + import org.htmlparser.NodeFilter; + import org.htmlparser.Parser; + import org.htmlparser.filters.*; + import org.htmlparser.tags.*; import org.htmlparser.tests.ParserTestCase; import org.htmlparser.util.ParserUtils; *************** *** 49,51 **** --- 53,401 ---- ); } + + public void testButCharsMethods() { + String[] tmpSplitButChars = ParserUtils.splitButChars("<DIV> +12.5, +3.4 </DIV>", "+.1234567890"); + assertStringEquals( + "modified text", + "+12.5*+3.4", + new String(tmpSplitButChars[0] + '*' + tmpSplitButChars[1]) + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimButChars("<DIV> +12.5 </DIV>", "+.1234567890") + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimButChars("<DIV> +1 2 . 5 </DIV>", "+.1234567890") + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimButCharsBeginEnd("<DIV> +12.5 </DIV>", "+.1234567890") + ); + assertStringEquals( + "modified text", + "+1 2 . 5", + ParserUtils.trimButCharsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+.1234567890") + ); + } + + public void testButDigitsMethods() { + String[] tmpSplitButDigits = ParserUtils.splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."); + assertStringEquals( + "modified text", + "+12.5*+3.4", + new String(tmpSplitButDigits[0] + '*' + tmpSplitButDigits[1]) + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimButDigits("<DIV> +12.5 </DIV>", "+.") + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimButDigits("<DIV> +1 2 . 5 </DIV>", "+.") + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimButDigitsBeginEnd("<DIV> +12.5 </DIV>", "+.") + ); + assertStringEquals( + "modified text", + "+1 2 . 5", + ParserUtils.trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+.") + ); + } + + public void testCharsMethods() { + String[] tmpSplitChars = ParserUtils.splitChars("<DIV> +12.5, +3.4 </DIV>", " <>DIV/,"); + assertStringEquals( + "modified text", + "+12.5*+3.4", + new String(tmpSplitChars[0] + '*' + tmpSplitChars[1]) + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimChars("<DIV> +12.5 </DIV>", "<>DIV/ ") + ); + assertStringEquals( + "modified text", + "Trimallchars", + ParserUtils.trimChars("<DIV> Trim all chars </DIV>", "<>DIV/ ") + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ ") + ); + assertStringEquals( + "modified text", + "Trim all spaces but not the ones inside the string", + ParserUtils.trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ ") + ); + } + + public void testSpacesMethods() { + String[] tmpSplitSpaces = ParserUtils.splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"); + assertStringEquals( + "modified text", + "+12.5*+3.4", + new String(tmpSplitSpaces[0] + '*' + tmpSplitSpaces[1]) + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/") + ); + assertStringEquals( + "modified text", + "Trimallspaces", + ParserUtils.trimSpaces("<DIV> Trim all spaces </DIV>", "<>DIV/") + ); + assertStringEquals( + "modified text", + "+12.5", + ParserUtils.trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/") + ); + assertStringEquals( + "modified text", + "Trim all spaces but not the ones inside the string", + ParserUtils.trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/") + ); + } + + public void testTagsMethods() { + try + { + String[] tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}); + assertStringEquals( + "modified text", + "Begin * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false); + assertStringEquals( + "modified text", + "Begin *<DIV> +12.5 </DIV>* ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false); + assertStringEquals( + "modified text", + "Begin * +12.5 * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true); + assertStringEquals( + "modified text", + "Begin * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}) + ); + assertStringEquals( + "modified text", + "<DIV> +12.5 </DIV> ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false) + ); + assertStringEquals( + "modified text", + " +12.5 ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true) + ); + } + catch (Exception e) + { + assertStringEquals( + "modified text", + "error msg", + e.getMessage() + ); + } + } + + public void testTagsFilterMethods() { + try + { + NodeFilter filter = new TagNameFilter ("DIV"); + String[] tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter); + assertStringEquals( + "modified text", + "Begin * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, false); + assertStringEquals( + "modified text", + "Begin *<DIV> +12.5 </DIV>* ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, true, false); + assertStringEquals( + "modified text", + "Begin * +12.5 * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, true); + assertStringEquals( + "modified text", + "Begin * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter) + ); + assertStringEquals( + "modified text", + "<DIV> +12.5 </DIV> ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, false) + ); + assertStringEquals( + "modified text", + " +12.5 ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, true, false) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, true) + ); + } + catch (Exception e) + { + assertStringEquals( + "modified text", + "error msg", + e.getMessage() + ); + } + } + + public void testTagsClassMethods() { + try + { + NodeFilter filter = new NodeClassFilter (Div.class); + String[] tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter); + assertStringEquals( + "modified text", + "Begin * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, false); + assertStringEquals( + "modified text", + "Begin *<DIV> +12.5 </DIV>* ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, true, false); + assertStringEquals( + "modified text", + "Begin * +12.5 * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2]) + ); + tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, true); + assertStringEquals( + "modified text", + "Begin * ALL OK", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter) + ); + assertStringEquals( + "modified text", + "<DIV> +12.5 </DIV> ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, false) + ); + assertStringEquals( + "modified text", + " +12.5 ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, true, false) + ); + assertStringEquals( + "modified text", + " ALL OK", + ParserUtils.trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", filter, false, true) + ); + } + catch (Exception e) + { + assertStringEquals( + "modified text", + "error msg", + e.getMessage() + ); + } + } + + public void testTagsComplexMethods() { + try + { + NodeFilter filterLink = new NodeClassFilter (LinkTag.class); + NodeFilter filterDiv = new NodeClassFilter (Div.class); + OrFilter filterLinkDiv = new OrFilter (filterLink, filterDiv); + NodeFilter filterTable = new NodeClassFilter (TableColumn.class); + OrFilter filter = new OrFilter (filterLinkDiv, filterTable); + String[] tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter); + assertStringEquals( + "modified text", + "OutsideLeft*OutsideRight", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter, false, false); + assertStringEquals( + "modified text", + "OutsideLeft*AInside*<DIV>DivInside</DIV>*TableColoumnInside*OutsideRight", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2] + '*' + tmpSplitTags[3] + '*' + tmpSplitTags[4]) + ); + tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter, true, false); + assertStringEquals( + "modified text", + "OutsideLeft*AInside*DivInside*TableColoumnInside*OutsideRight", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2] + '*' + tmpSplitTags[3] + '*' + tmpSplitTags[4]) + ); + tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter, false, true); + assertStringEquals( + "modified text", + "OutsideLeft*OutsideRight", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside<DIV><DIV>DivInside</DIV></DIV></A><TD>TableColoumnInside</TD>OutsideRight", new String[] {"DIV", "TD", "A"}); + assertStringEquals( + "modified text", + "OutsideLeft*OutsideRight", + new String(tmpSplitTags[0] + '*' + tmpSplitTags[1]) + ); + assertStringEquals( + "modified text", + "OutsideLeftOutsideRight", + ParserUtils.trimTags("OutsideLeft<A>AInside<DIV><DIV>DivInside</DIV></DIV></A><TD>TableColoumnInside</TD>OutsideRight", new String[] {"DIV", "TD", "A"}) + ); + } + catch (Exception e) + { + assertStringEquals( + "modified text", + "error msg", + e.getMessage() + ); + } + } } |
From: Derrick O. <der...@us...> - 2004-05-10 22:32:32
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/filterTests In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21948/tests/filterTests Modified Files: FilterTest.java Log Message: Add CssSelectorNodeFilter submitted by Rogers George. Index: FilterTest.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/filterTests/FilterTest.java,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** FilterTest.java 7 Dec 2003 23:41:41 -0000 1.2 --- FilterTest.java 10 May 2004 22:31:46 -0000 1.3 *************** *** 27,31 **** --- 27,33 ---- package org.htmlparser.tests.filterTests; + import org.htmlparser.Parser; import org.htmlparser.filters.AndFilter; + import org.htmlparser.filters.CssSelectorNodeFilter; import org.htmlparser.filters.HasAttributeFilter; import org.htmlparser.filters.HasChildFilter; *************** *** 35,43 **** --- 37,48 ---- import org.htmlparser.filters.StringFilter; import org.htmlparser.filters.TagNameFilter; + import org.htmlparser.lexer.Lexer; import org.htmlparser.lexer.nodes.StringNode; import org.htmlparser.lexer.nodes.TagNode; import org.htmlparser.tags.BodyTag; import org.htmlparser.tags.LinkTag; + import org.htmlparser.tags.Tag; import org.htmlparser.tests.ParserTestCase; + import org.htmlparser.util.NodeIterator; import org.htmlparser.util.NodeList; import org.htmlparser.util.ParserException; *************** *** 241,244 **** --- 246,275 ---- assertEquals ("attribute value", "three", link.getAttribute ("id")); } + + public void testEscape() throws Exception + { + assertEquals ("douchebag", CssSelectorNodeFilter.unescape ("doucheba\\g").toString ()); + } + + public void testSelectors() throws Exception + { + String html = "<html><head><title>sample title</title></head><body inserterr=\"true\" yomama=\"false\"><h3 id=\"heading\">big </invalid>heading</h3><ul id=\"things\"><li><br word=\"broken\"/>>moocow<li><applet/>doohickey<li class=\"last\"><b class=\"item\">final<br>item</b></ul></body></html>"; + Lexer l; + Parser p; + CssSelectorNodeFilter it; + NodeIterator i; + int count; + + l = new Lexer (html); + p = new Parser (l); + it = new CssSelectorNodeFilter ("li + li"); + count = 0; + for (i = p.extractAllNodesThatMatch (it).elements (); i.hasMoreNodes ();) + { + assertEquals ("tag name wrong", "LI", ((Tag)i.nextNode()).getTagName()); + count++; + } + assertEquals ("wrong count", 2, count); + } } |
From: Derrick O. <der...@us...> - 2004-05-10 22:32:11
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21948/filters Added Files: CssSelectorNodeFilter.java Log Message: Add CssSelectorNodeFilter submitted by Rogers George. --- NEW FILE: CssSelectorNodeFilter.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Rogers George // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/CssSelectorNodeFilter.java,v $ // $Author: derrickoswald $ // $Date: 2004/05/10 22:31:57 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.filters; import junit.framework.TestCase; import org.htmlparser.*; import org.htmlparser.lexer.Lexer; import org.htmlparser.tags.Tag; import org.htmlparser.util.NodeIterator; import org.htmlparser.util.NodeList; import java.util.regex.Matcher; import java.util.regex.Pattern; import java.net.URLConnection; /** * A NodeFilter that accepts nodes based on whether they match a CSS2 selector. * Refer to <a href="http://www.w3.org/TR/REC-CSS2/selector.html"> * http://www.w3.org/TR/REC-CSS2/selector.html</a> for syntax. * <p> * Todo: more thorough testing, any relevant pseudo-classes, css3 features */ public class CssSelectorNodeFilter implements NodeFilter { private static Pattern tokens = Pattern.compile("(" + "/\\*.*?\\*/" // comments + ") | (" + " \".*?[^\"]\"" // double quoted string + " | \'.*?[^\']\'" // single quoted string + " | \"\" | \'\' " // empty quoted string + ") | (" + " [\\~\\*\\$\\^]? = " // attrib-val relations + ") | (" + " [a-zA-Z_\\*](?:[a-zA-Z0-9_-]|\\\\.)* " // bare name + ") | \\s*(" + " [+>~\\s] " // combinators + ")\\s* | (" + " [\\.\\[\\]\\#\\:)(] " // class/id/attr/param delims + ") | (" + " [\\,] " // comma + ") | ( . )" // everything else (bogus) , Pattern.CASE_INSENSITIVE |Pattern.DOTALL |Pattern.COMMENTS); private static final int COMMENT = 1, QUOTEDSTRING = 2, RELATION = 3, NAME = 4, COMBINATOR = 5, DELIM = 6, COMMA = 7; private NodeFilter therule; public CssSelectorNodeFilter(String selector) { m = tokens.matcher(selector); if (nextToken()) therule = parse(); } public boolean accept(Node n) { return therule.accept(n); } private Matcher m = null; private int tokentype = 0; private String token = null; private boolean nextToken() { if (m != null && m.find()) for (int i = 1; i < m.groupCount(); i++) if (m.group(i) != null) { tokentype = i; token = m.group(i); return true; } tokentype = 0; token = null; return false; } private NodeFilter parse() { NodeFilter n = null; do { switch (tokentype) { case COMMENT: case NAME: case DELIM: if (n == null) n = parseSimple(); else n = new AndFilter(n, parseSimple()); break; case COMBINATOR: switch (token.charAt(0)) { case '+': n = new AdjacentFilter(n); break; case '>': n = new HasParentFilter(n); break; default: // whitespace n = new HasAncestorFilter(n); } nextToken(); break; case COMMA: n = new OrFilter(n, parse()); nextToken(); break; } } while (token != null); return n; } private NodeFilter parseSimple() { boolean done = false; NodeFilter n = null; if (token != null) do { switch (tokentype) { case COMMENT: nextToken(); break; case NAME: if ("*".equals(token)) n = new YesFilter(); else if (n == null) n = new TagNameFilter(unescape(token)); else n = new AndFilter(n, new TagNameFilter(unescape(token))); nextToken(); break; case DELIM: switch (token.charAt(0)) { case '.': nextToken(); if (tokentype != NAME) throw new IllegalArgumentException("Syntax error at " + token); if (n == null) n = new HasAttributeFilter("class", unescape(token)); else n = new AndFilter(n, new HasAttributeFilter("class", unescape(token))); break; case '#': nextToken(); if (tokentype != NAME) throw new IllegalArgumentException("Syntax error at " + token); if (n == null) n = new HasAttributeFilter("id", unescape(token)); else n = new AndFilter(n, new HasAttributeFilter("id", unescape(token))); break; case ':': nextToken(); if (n == null) n = parsePseudoClass(); else n = new AndFilter(n, parsePseudoClass()); break; case '[': nextToken(); if (n == null) n = parseAttributeExp(); else n = new AndFilter(n, parseAttributeExp()); break; } nextToken(); break; default: done = true; } } while (!done && token != null); return n; } private NodeFilter parsePseudoClass() { throw new IllegalArgumentException("pseudoclasses not implemented yet"); } private NodeFilter parseAttributeExp() { NodeFilter n = null; if (tokentype == NAME) { String attrib = token; nextToken(); if ("]".equals(token)) n = new HasAttributeFilter(unescape(attrib)); else if (tokentype == RELATION) { String val = null, rel = token; nextToken(); if (tokentype == QUOTEDSTRING) val = unescape(token.substring(1, token.length() - 1)); else if (tokentype == NAME) val = unescape(token); if ("~=".equals(rel) && val != null) n = new AttribMatchFilter(unescape(attrib), "\\b" + val.replaceAll("([^a-zA-Z0-9])", "\\\\$1") + "\\b"); else if ("=".equals(rel) && val != null) n = new HasAttributeFilter(attrib, val); } } if (n == null) throw new IllegalArgumentException("Syntax error at " + token + tokentype); nextToken(); return n; } public static String unescape(String escaped) { StringBuffer result = new StringBuffer(escaped.length()); Matcher m = Pattern.compile("\\\\(?:([a-fA-F0-9]{2,6})|(.))").matcher( escaped); while (m.find()) { if (m.group(1) != null) m.appendReplacement(result, String.valueOf((char)Integer.parseInt(m.group(1), 16))); else if (m.group(2) != null) m.appendReplacement(result, m.group(2)); } m.appendTail(result); return result.toString(); } private static class HasAncestorFilter implements NodeFilter { private NodeFilter atest; public HasAncestorFilter(NodeFilter n) { atest = n; } public boolean accept(Node n) { while (n != null) { n = n.getParent(); if (atest.accept(n)) return true; } return false; } } private static class AdjacentFilter implements NodeFilter { private NodeFilter sibtest; public AdjacentFilter(NodeFilter n) { sibtest = n; } public boolean accept(Node n) { if (n.getParent() != null) { NodeList l = n.getParent().getChildren(); for (int i = 0; i < l.size(); i++) if (l.elementAt(i) == n && i > 0) return (sibtest.accept(l.elementAt(i - 1))); } return false; } } private static class YesFilter implements NodeFilter { public boolean accept(Node n) {return true;} } private static class AttribMatchFilter implements NodeFilter { private Pattern rel; private String attrib; public AttribMatchFilter(String attrib, String regex) { rel = Pattern.compile(regex); this.attrib = attrib; } public boolean accept(Node node) { if (node instanceof Tag && ((Tag)node).getAttribute(attrib) != null) if (rel != null && !rel.matcher(((Tag)node).getAttribute(attrib)).find()) return false; else return true; else return false; } } } |
From: Derrick O. <der...@us...> - 2004-05-07 23:30:51
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28990 Modified Files: Page.java Log Message: Ignore null contentType to accommodate ServletContext.getResource(...) per suggestion by Rogers George. Index: Page.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** Page.java 18 Mar 2004 04:04:07 -0000 1.34 --- Page.java 7 May 2004 23:30:37 -0000 1.35 *************** *** 335,339 **** } type = getContentType (); ! if (!type.startsWith ("text")) throw new ParserException ( "URL " --- 335,339 ---- } type = getContentType (); ! if (type != null && !type.startsWith ("text")) throw new ParserException ( "URL " |
From: Derrick O. <der...@us...> - 2004-04-20 10:54:53
|
Update of /cvsroot/htmlparser/htmlparser/docs/pics In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30175 Added Files: alberto.jpg italy.gif Log Message: Add images. --- NEW FILE: alberto.jpg --- (This appears to be a binary file; contents omitted.) --- NEW FILE: italy.gif --- (This appears to be a binary file; contents omitted.) |
From: Derrick O. <der...@us...> - 2004-04-20 10:49:59
|
Update of /cvsroot/htmlparser/htmlparser/docs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29326 Modified Files: contributors.html Log Message: Add Alberto Nacher to contributors page. Index: contributors.html =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** contributors.html 16 Feb 2004 22:46:08 -0000 1.6 --- contributors.html 20 Apr 2004 10:49:51 -0000 1.7 *************** *** 225,228 **** --- 225,265 ---- </tr> <tr> + <td width="25%" height="270"valign="top"> + <!-- <img src="pics/alberto.jpg" width="181" height="265">--> + <img src="pics/alberto.jpg" width="100"> + <strong><img src="pics/italy.gif" width="53" height="39"></strong><br> + Alberto Nacher<br> + Software Developer - Consultant<br> + Corso Sebastopoli 39,<br> + 10134 Torino, Italy<br> + <a href="http://members.xoom.virgilio.it/nacher/Home.html">Personal Home Page</a><br> + <a href="http://sourceforge.net/sendmessage.php?touser=892989">email</a><br> + </td> + <td width="39%" valign="top"> + <strong>On Alberto Nacher</strong> + <p>I'm 31 years old, I'm a computer engineer and I have been working as + consultant since 1998.</p> + <p>I've worked with Microsoft VB and VB.NET technologies, with Java + technology and Livelink technology (knowledge management and developer + enviroment of OpenText company).</p> + <p>My hobbies: travelling, seeing football matches, going out with + friends, getting mushrooms, reading and this year also an English course!</p> + </td> + <td width="36%" valign="top"> + <strong>Alberto on Italy</strong></p> + <p>Italy is not so important if seen by high technology point of view. + The main activities in my country are fashion, car development (FIAT, + Ferrari, Alfa Romeo), pasta and food, wines and, of course, the big + state companies doing telecommunication systems, electrical + distribution, oil distribution. So... If you want to work as programmer + you have no relevant software houses to join with and it is better + being a technical consultant.</p> + <p>Anyway... If you want to visit Italy, you surely be charmed by the + beauty of my country! Venice, Florence, Rome are some of the best towns + in the world. But you can also visit Torino (my home town) where you can + see the 2nd Egyptian museum in the world.</p> + </td> + </tr> + <tr> <td height="213" valign="top"> <p><img src="pics/uk.gif" width="65" height="35"><br> Dr. Sam Joseph<br> |
From: Derrick O. <der...@us...> - 2004-04-06 11:04:47
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19036 Modified Files: TagNode.java Log Message: Documentation modifications requested by Leos Literak via htmlparser-user mail list. Index: TagNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v retrieving revision 1.33 retrieving revision 1.34 diff -C2 -d -r1.33 -r1.34 *** TagNode.java 20 Mar 2004 17:03:53 -0000 1.33 --- TagNode.java 6 Apr 2004 10:51:57 -0000 1.34 *************** *** 54,57 **** --- 54,59 ---- * The tag attributes. * Objects of type {@link Attribute}. + * The first element is the tag name, subsequent elements being either + * whitespace or real attributes. */ protected Vector mAttributes; *************** *** 280,283 **** --- 282,287 ---- * @param attribs The attribute collection to set. * Each element is an {@link Attribute Attribute}. + * The first attribute in the list must be the tag name ( + * <code>isStandalone()</code> returns <code>true</code>). */ public void setAttributeEx (Attribute attribute) *************** *** 341,344 **** --- 345,350 ---- * Gets the attributes in the tag. * @return Returns the list of {@link Attribute Attributes} in the tag. + * The first element is the tag name, subsequent elements being either + * whitespace or real attributes. */ public Vector getAttributesEx () *************** *** 491,494 **** --- 497,502 ---- /** * Sets the attributes. + * A special entry with a key of SpecialHashtable.TAGNAME ("$<TAGNAME>$") + * sets the tag name. * @param attributes The attribute collection to set. */ *************** *** 583,586 **** --- 591,598 ---- } + /** + * Parses the given text to create the tag contents. + * @param text A string of the form <TAGNAME xx="yy">. + */ public void setText (String text) { *************** *** 648,652 **** /** ! * Print the contents of the tag */ public String toString () --- 660,665 ---- /** ! * Print the contents of the tag. ! * @return An string describing the tag. For text that looks like HTML use #toHtml(). */ public String toString () |
From: Somik R. <so...@us...> - 2004-03-27 18:03:22
|
Update of /cvsroot/htmlparser/CVSROOT In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8320 Modified Files: checkoutlist Log Message: updated checkoutlist Index: checkoutlist =================================================================== RCS file: /cvsroot/htmlparser/CVSROOT/checkoutlist,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** checkoutlist 3 Apr 2001 16:10:41 -0000 1.1 --- checkoutlist 27 Mar 2004 17:52:13 -0000 1.2 *************** *** 11,13 **** # [<whitespace>]<filename><whitespace><error message><end-of-line> # ! # comment lines begin with '#' --- 11,13 ---- # [<whitespace>]<filename><whitespace><error message><end-of-line> # ! users Unable to check out 'users' file in CVSROOT \ No newline at end of file |
From: Derrick O. <der...@us...> - 2004-03-20 20:11:07
|
Update of /cvsroot/htmlparser/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18662 Modified Files: build.xml Log Message: Add Tag interface to htmllexer.jar. Index: build.xml =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v retrieving revision 1.62 retrieving revision 1.63 diff -C2 -d -r1.62 -r1.63 *** build.xml 18 Mar 2004 04:04:07 -0000 1.62 --- build.xml 20 Mar 2004 20:01:02 -0000 1.63 *************** *** 229,232 **** --- 229,233 ---- <include name="org/htmlparser/Node.class"/> <include name="org/htmlparser/NodeFilter.class"/> + <include name="org/htmlparser/Tag.class"/> <include name="org/htmlparser/util/ParserException.class"/> <include name="org/htmlparser/util/ChainedException.class"/> |
From: Derrick O. <der...@us...> - 2004-03-20 17:13:52
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15661/lexer/nodes Modified Files: TagNode.java Log Message: First pass refactoring. Create Tag interface, which isn't really used yet. Index: TagNode.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** TagNode.java 14 Mar 2004 20:31:38 -0000 1.32 --- TagNode.java 20 Mar 2004 17:03:53 -0000 1.33 *************** *** 33,36 **** --- 33,37 ---- import org.htmlparser.AbstractNode; + import org.htmlparser.Tag; import org.htmlparser.lexer.Cursor; import org.htmlparser.lexer.Lexer; *************** *** 47,50 **** --- 48,53 ---- extends AbstractNode + implements + Tag { /** *************** *** 273,276 **** --- 276,289 ---- } + /* + * Sets the attributes. + * @param attribs The attribute collection to set. + * Each element is an {@link Attribute Attribute}. + */ + public void setAttributeEx (Attribute attribute) + { + setAttribute (attribute); + } + /** * Set an attribute. |
From: Derrick O. <der...@us...> - 2004-03-20 17:13:52
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15661 Modified Files: PrototypicalNodeFactory.java Added Files: Tag.java Log Message: First pass refactoring. Create Tag interface, which isn't really used yet. Index: PrototypicalNodeFactory.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/PrototypicalNodeFactory.java,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** PrototypicalNodeFactory.java 25 Jan 2004 21:32:56 -0000 1.5 --- PrototypicalNodeFactory.java 20 Mar 2004 17:03:53 -0000 1.6 *************** *** 36,40 **** import org.htmlparser.lexer.nodes.Attribute; import org.htmlparser.lexer.nodes.NodeFactory; ! import org.htmlparser.tags.*; // import everything for now import org.htmlparser.util.ParserException; --- 36,69 ---- import org.htmlparser.lexer.nodes.Attribute; import org.htmlparser.lexer.nodes.NodeFactory; ! import org.htmlparser.tags.AppletTag; ! import org.htmlparser.tags.BaseHrefTag; ! import org.htmlparser.tags.BodyTag; ! import org.htmlparser.tags.Bullet; ! import org.htmlparser.tags.BulletList; ! import org.htmlparser.tags.Div; ! import org.htmlparser.tags.DoctypeTag; ! import org.htmlparser.tags.FormTag; ! import org.htmlparser.tags.FrameSetTag; ! import org.htmlparser.tags.FrameTag; ! import org.htmlparser.tags.HeadTag; ! import org.htmlparser.tags.Html; ! import org.htmlparser.tags.ImageTag; ! import org.htmlparser.tags.InputTag; ! import org.htmlparser.tags.JspTag; ! import org.htmlparser.tags.LabelTag; ! import org.htmlparser.tags.LinkTag; ! import org.htmlparser.tags.MetaTag; ! import org.htmlparser.tags.OptionTag; ! import org.htmlparser.tags.ScriptTag; ! import org.htmlparser.tags.SelectTag; ! import org.htmlparser.tags.Span; ! import org.htmlparser.tags.StyleTag; ! import org.htmlparser.tags.TableColumn; ! import org.htmlparser.tags.TableHeader; ! import org.htmlparser.tags.TableRow; ! import org.htmlparser.tags.TableTag; ! import org.htmlparser.tags.Tag; ! import org.htmlparser.tags.TextareaTag; ! import org.htmlparser.tags.TitleTag; import org.htmlparser.util.ParserException; --- NEW FILE: Tag.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Derrick Oswald // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Tag.java,v $ // $Author: derrickoswald $ // $Date: 2004/03/20 17:03:53 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser; import java.util.Vector; import org.htmlparser.lexer.nodes.Attribute; /** * Identifies what a Tag such as <XXX xxx yyy="zzz"> can do. * Adds features to a Node that are specific to a tag. */ public interface Tag extends Node { /** * Returns the value of an attribute. * @param name Name of attribute, case insensitive. * @return The value associated with the attribute or null if it does * not exist, or is a stand-alone or */ public String getAttribute (String name); /** * Set attribute with given key, value pair. * Figures out a quote character to use if necessary. * @param key The name of the attribute. * @param value The value of the attribute. */ public void setAttribute (String key, String value); /** * Set attribute with given key, value pair where the value is quoted by quote. * @param key The name of the attribute. * @param value The value of the attribute. * @param quote The quote character to be used around value. * If zero, it is an unquoted value. */ public void setAttribute (String key, String value, char quote); /** * Remove the attribute with the given key, if it exists. * @param key The name of the attribute. */ public void removeAttribute (String key); /** * Returns the attribute with the given name. * @param name Name of attribute, case insensitive. * @return The attribute or null if it does * not exist. */ public Attribute getAttributeEx (String name); /** * Set an attribute. * This replaces an attribute of the same name. * To set the zeroth attribute (the tag name), use setTagName(). * @param attribute The attribute to set. */ public void setAttributeEx (Attribute attribute); /** * Gets the attributes in the tag. * @return Returns the list of {@link Attribute Attributes} in the tag. */ public Vector getAttributesEx (); /** * Sets the attributes. * NOTE: Values of the extended hashtable are two element arrays of String, * with the first element being the original name (not uppercased), * and the second element being the value. * @param attribs The attribute collection to set. */ public void setAttributesEx (Vector attribs); /** * Return the name of this tag. * <p> * <em> * Note: This value is converted to uppercase and does not * begin with "/" if it is an end tag. Nor does it end with * a slash in the case of an XML type tag. * To get at the original text of the tag name use * {@link #getRawTagName getRawTagName()}. * The conversion to uppercase is performed with an ENGLISH locale. * </em> * @return The tag name. */ public String getTagName (); /** * Set the name of this tag. * This creates or replaces the first attribute of the tag (the * zeroth element of the attribute vector). * @param name The tag name. */ public void setTagName (String name); /** * Determines if the given tag breaks the flow of text. * @return <code>true</code> if following text would start on a new line, * <code>false</code> otherwise. */ public boolean breaksFlow (); /** * Predicate to determine if this tag is an end tag (i.e. </HTML>). * @return <code>true</code> if this tag is an end tag. */ public boolean isEndTag (); /** * Set this tag to be an end tag, or not. * Adds or removes the leading slash on the tag name. * @param endTag If true, this tag is made into an end tag. * Any attributes it may have had are dropped. */ // public void setEndTag (boolean endTag); /** * Is this an empty xml tag of the form <tag/>. * @return true if the last character of the last attribute is a '/'. */ public boolean isEmptyXmlTag (); /** * Set this tag to be an empty xml node, or not. * Adds or removes an ending slash on the tag. * @param emptyXmlTag If true, ensures there is an ending slash in the node, * i.e. <tag/>, otherwise removes it. */ public void setEmptyXmlTag (boolean emptyXmlTag); } |