htmlparser-cvs Mailing List for HTML Parser (Page 19)

Brought to you by: derrickoswald

htmlparser-cvs — syncmail email notification of CVS commits

You can subscribe to this list here.

2003	_Jan	_Feb	_Mar	_Apr	_May (141)	_Jun (108)	_Jul (66)	_Aug (127)	_Sep (155)	_Oct (149)	_Nov (72)	_Dec (72)
2004	_Jan (100)	_Feb (36)	_Mar (21)	_Apr (3)	_May (87)	_Jun (28)	_Jul (84)	_Aug (5)	_Sep (14)	_Oct	_Nov	_Dec
2005	_Jan (1)	_Feb (39)	_Mar (26)	_Apr (38)	_May (14)	_Jun (10)	_Jul	_Aug	_Sep (13)	_Oct (8)	_Nov (10)	_Dec
2006	_Jan	_Feb (1)	_Mar (17)	_Apr (20)	_May (28)	_Jun (24)	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2015	_Jan	_Feb	_Mar (1)	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 17 18 19 20 21 .. 61 > >> (Page 19 of 61)

[Htmlparser-cvs] htmlparser/src/org/htmlparser/filters CssSelectorNodeFilter.java,1.1,1.2

From: Derrick O. <der...@us...> - 2004-05-22 12:28:25

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10696

Modified Files:
	CssSelectorNodeFilter.java 
Log Message:
Remove junit import.



Index: CssSelectorNodeFilter.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/CssSelectorNodeFilter.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** CssSelectorNodeFilter.java	10 May 2004 22:31:57 -0000	1.1
--- CssSelectorNodeFilter.java	22 May 2004 12:28:15 -0000	1.2
***************
*** 27,31 ****
  package org.htmlparser.filters;
  
- import junit.framework.TestCase;
  import org.htmlparser.*;
  import org.htmlparser.lexer.Lexer;
--- 27,30 ----

[Htmlparser-cvs] htmlparser/docs changes.txt,1.199,1.200 release.txt,1.58,1.59

From: Derrick O. <der...@us...> - 2004-05-22 12:09:10

Update of /cvsroot/htmlparser/htmlparser/docs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7046/docs

Modified Files:
	changes.txt release.txt 
Log Message:
Update version to 1.5-20040522



Index: release.txt
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/docs/release.txt,v
retrieving revision 1.58
retrieving revision 1.59
diff -C2 -d -r1.58 -r1.59
*** release.txt	14 Mar 2004 16:31:40 -0000	1.58
--- release.txt	22 May 2004 12:08:59 -0000	1.59
***************
*** 1,3 ****
! HTMLParser Version 1.4 (Release Build Mar 14, 2004)
  *********************************************
  
--- 1,3 ----
! HTMLParser Version 1.5 (Integration Build May 22, 2004)
  *********************************************
  
***************
*** 19,108 ****
    (v) this file
  
! Changes since Version 1.3
  -------------------------
! Translation
!     Character entity encoding and decoding has been revamped, leading to
!     higher throughput and less memory churn.
! Beans
!     The StringBean can now be used as a visitor for parsers external to the bean.
! Decorators
!     The node decorator package has been added to provide support for the
!     delegate model.
! Lexer
!     A new lexer i/o subsystem has been added. This provides accurate line number
!     and character position data, tag and attribute names maintain their original
!     case, and attributes maintain their original order. Line numbers reported by
!     tags are now zero based, not one based. The node count for parsing goes up
!     in most cases because whitespace is strictly maintained, i.e. every
!     whitespace (i.e. newline) now counts as a StringNode too. Storage of
!     attributes is now in a Vector which means the element 0 Attribute is
!     actually the name of the tag, rather than having the $TAGNAME entry in a
!     HashTable. The htmllexer.jar is this new i/o subsystem broken out and made
!     JDK 1.1 compliant, the htmlparser.jar, which includes everything in
!     htmllexer.jar, is not necessarily intended to be used in JDK 1.1
!     environments. Some support for JIS escape sequences has been added.
! Tags
!     Zero arg tag constructors have been added. Attribute maintenance
!     (add/remove/edit) improved. There is no EndTag class any more. Just a
!     generic tag that responds true to isEndTag(). Improvements to form tag
!     handling, getting <input> and <textarea> tags nested within other tags.
!     Improvements to applet tag handling regarding parameters and codebases.
! Scanners
!     The concept of scanners has been completely reworked. Applications register
!     tags not scanners to express interest in parsing only some tags. The default
!     is now to parse all tags, which is equivalent to the old registerDOMTags(),
!     so some extra nesting of tags will need to be handled. CompositeTagScanner
!     logic has been improved to try and match unclosed open tags when an
!     unexpected end tag is encountered. This change also moved recursion off the
!     JDK stack, eliminating most StackOverflow exceptions. Also, a CompositeTag's
!     "startTag()" is "this", and the CompositeTagScanner just adds children.
!     The ScriptScanner will now decrypt Microsoft Script Encoder encrypted script
!     tags. The plaintext is available via ScriptTag.getScriptCode().
  Filters
!     A new powerful filtering capability has been added, which makes extracting
!     specific tags very easy.
! Applications
!     New example applications Thumbelina and SiteCapturer.
!     A mainline has been added to the Translate class to encode/decode stdin to
!     stdout.
  
  Bug Fixes
  ---------
! 911565 isValued() and isEmpty() don't work
! 902121 StringBean throws NullPointerException.
! 900128 RemarkNode.setText() does not set Text
! 900125 Style Tag Children not grouped
! 899413 bug in javascript end detection.
! 891058 Bug in lexer
! 865279 Documentation
! 851882 zero length alt tag causes bug in ImageScanner
! 839264 toHtml() parse error in Javascripts with "form" keyword
! 833592 DOCTYPE element is not parsed correctly
! 832530 empty attribute causes parser to fail
! 826764 ParserException occurs only when using setInputHTML() instea
! 825820 Words conjoined
! 825645 <input> not getting parsed inside table
! 813838 links not parsed correctly
! 805598 attribute src in tag img sometimes not correctly parsed
! 801118 two " characters at the end of an attribute value problem
! 798554 Applet Tag does not update codebase data
! 798553 setInputHtml does not set text
! 798552 Sample for node iterator incorrect
! 789439 Japanese page causes OutOfMemory Exception
! 788746 parser crashes on comments like <!-- foobar --!>
! 786869 LinkExtractor Sample not working
! 784767 irc://server/channel urls are HTTPLike?
! 778781 SRC-attribute suppression in IMG-tags
! 772700 Jsp Tags are not parsed correctly when in quoted attributes
! 765413 typo
! 761798 Error reading next element.
! 757337 Standalone attributes should remain standalone
! 755929 Empty string attr. value causes attr parsing to be stopped
! 753012 IMG SRC not parsed v1.3 & v1.4
! 753003 <IMG> within <A> missed when followed by <MAP>
! 750117 StackOverFlow while Node-Iteration
! 749295 Problem Parsing Table
! 745566 StackOverflowError on select with too many unclosed options
! 744610 getLink() Erroneous for Relative Links from Files on Windows
  
  Acknowledgements
--- 19,41 ----
    (v) this file
  
! Changes since Version 1.4
  -------------------------
! Configuration Management
!     Removed the need for the Translate class to be packaged with htmllexer.jar.
!     This results in a lighter weight component.
! Refactoring
!     Added Tag interface. Obviated LinkProcessor and moved it's functionality to
!     the Page class.
  Filters
!     Added CssSelectorNodeFilter.
!     
! Enhancement Requests
! --------------------
! 943593 LinkProcessor.extract(link,base) weird behaviour?
  
  Bug Fixes
  ---------
! 919738 Text has not been extracted correctly using StringBean
! 936392 ScriptTag visitor fails for comments with '
  
  Acknowledgements
***************
*** 140,143 ****
--- 73,78 ----
  [30] Gernot Fricke
  [31] Anthony Labarre
+ [32] Alberto Nacher
+ [33] Rogers George
  
  If you find any bugs, please go to 

Index: changes.txt
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/docs/changes.txt,v
retrieving revision 1.199
retrieving revision 1.200
diff -C2 -d -r1.199 -r1.200
*** changes.txt	14 Mar 2004 16:31:39 -0000	1.199
--- changes.txt	22 May 2004 12:08:57 -0000	1.200
***************
*** 11,3582 ****
  *    http://www.red-bean.com/cvs2cl/changelogs.html                           *
  *                                                                             *
  *******************************************************************************
  
! Release Build 1.4 - 20040314
! --------------------------------
! 
! 2004-03-14 10:53  derrickoswald
! 
! 	* src/org/htmlparser/beans/LinkBean.java:
[...3685 lines suppressed...]
  	src/org/htmlparser/tests/tagTests/BaseHrefTagTest.java,
  	src/org/htmlparser/tests/utilTests/AllTests.java,
  	src/org/htmlparser/tests/utilTests/HTMLLinkProcessorTest.java,
! 	src/org/htmlparser/util/LinkProcessor.java:
  
! 	Deprecate LinkProcessor.
! 	Functionality moved to Page.
  	
! 2004-03-15 17:50  derrickoswald
  
! 	* src/doc-files/building.html:
  
! 	Update build instruction problem identified by sarsie.
  	
! 2004-03-14 15:31  derrickoswald
  
! 	* build.xml, src/org/htmlparser/lexer/nodes/Attribute.java,
! 	src/org/htmlparser/lexer/nodes/TagNode.java:
  
! 	Remove requirement for Translate.class to be in htmllexer.jar.

[Htmlparser-cvs] htmlparser/src/org/htmlparser Parser.java,1.90,1.91

From: Derrick O. <der...@us...> - 2004-05-22 12:09:10

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7046/src/org/htmlparser

Modified Files:
	Parser.java 
Log Message:
Update version to 1.5-20040522



Index: Parser.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v
retrieving revision 1.90
retrieving revision 1.91
diff -C2 -d -r1.90 -r1.91
*** Parser.java	18 Mar 2004 04:04:07 -0000	1.90
--- Parser.java	22 May 2004 12:09:00 -0000	1.91
***************
*** 73,77 ****
       */
      public final static double
!     VERSION_NUMBER = 1.4
      ;
  
--- 73,77 ----
       */
      public final static double
!     VERSION_NUMBER = 1.5
      ;
  
***************
*** 80,84 ****
       */
      public final static String
!     VERSION_TYPE = "Release Build"
      ;
  
--- 80,84 ----
       */
      public final static String
!     VERSION_TYPE = "Integration Build"
      ;
  
***************
*** 87,91 ****
       */
      public final static String
!     VERSION_DATE = "Mar 14, 2004"
      ;
  
--- 87,91 ----
       */
      public final static String
!     VERSION_DATE = "May 22, 2004"
      ;

[Htmlparser-cvs] htmlparser build.xml,1.63,1.64

From: Derrick O. <der...@us...> - 2004-05-22 11:36:01

Update of /cvsroot/htmlparser/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv895

Modified Files:
	build.xml 
Log Message:
Change minor version to 5.



Index: build.xml
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v
retrieving revision 1.63
retrieving revision 1.64
diff -C2 -d -r1.63 -r1.64
*** build.xml	20 Mar 2004 20:01:02 -0000	1.63
--- build.xml	22 May 2004 11:35:50 -0000	1.64
***************
*** 20,34 ****
    that's why this step can't be automated
  - incorporate changes from ChangeLog into htmlparser/docs/changes under
!   a heading like "Integration Build 1.4 - 20040104"
  - 'ant versionSource' updates the version in Parser.java and release.txt
  - perform a CVS update on htmlparser to identify new and changed files
  - commit changed files (i.e. Parser.java, release.txt, docs/changes, docs/wiki
    and docs/wiki/images) to the head revision using a reason of the form:
! Update version to 1.4-20040104.
! - use CVS to tag the current head revisions with a name like v1_4_20040104
  - use CVS to checkout everything with the tag used above
  - 'ant test' compiles and runs the unit tests
  - 'ant clean htmlparser' updates the version headers, creates the jar file and doc files and zips
!   everything into a file htmlparser/distribution/htmlparser1_4_20040104.zip
  - use CVS to checkout everything against the head revision to reset your workspace
  
--- 20,34 ----
    that's why this step can't be automated
  - incorporate changes from ChangeLog into htmlparser/docs/changes under
!   a heading like "Integration Build 1.5 - 20040522"
  - 'ant versionSource' updates the version in Parser.java and release.txt
  - perform a CVS update on htmlparser to identify new and changed files
  - commit changed files (i.e. Parser.java, release.txt, docs/changes, docs/wiki
    and docs/wiki/images) to the head revision using a reason of the form:
! Update version to 1.5-20040522.
! - use CVS to tag the current head revisions with a name like v1_5_20040522.
  - use CVS to checkout everything with the tag used above
  - 'ant test' compiles and runs the unit tests
  - 'ant clean htmlparser' updates the version headers, creates the jar file and doc files and zips
!   everything into a file htmlparser/distribution/htmlparser1_5_20040522.zip
  - use CVS to checkout everything against the head revision to reset your workspace
  
***************
*** 40,47 ****
  ftp> cd incoming
  ftp> bin
! ftp> put htmlparser1_4_20040104.zip
  ftp> bye
  - add a release to the 'Integation Builds' package
! Admin-File Releases-Add Release, use a name of the form '1_4_20040104'
  - Step 1, 'Paste The Notes' (using numeric character references and
    character entity references because this is displayed as HTML) with a
--- 40,47 ----
  ftp> cd incoming
  ftp> bin
! ftp> put htmlparser1_5_20040522.zip
  ftp> bye
  - add a release to the 'Integation Builds' package
! Admin-File Releases-Add Release, use a name of the form '1_5_20040522'
  - Step 1, 'Paste The Notes' (using numeric character references and
    character entity references because this is displayed as HTML) with a
***************
*** 52,56 ****
  Pending Bugs:
  - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited
! - Step 2, check the checkbox of the htmlparser1_4_20040104.zip file from the
    list of files in the uploads section
  - Submit/Refresh
--- 52,56 ----
  Pending Bugs:
  - use the 'Upload Change Log:' field to specify the ChamgeLog file you edited
! - Step 2, check the checkbox of the htmlparser1_5_20040522.zip file from the
    list of files in the uploads section
  - Submit/Refresh
***************
*** 65,69 ****
  Submit News
  - from the project summary screen, select 'Submit News' and title it like:
! HTML Parser Integration Release 1.4-20040104 
  - type in a summary of the changes made
  - SUBMIT
--- 65,69 ----
  Submit News
  - from the project summary screen, select 'Submit News' and title it like:
! HTML Parser Integration Release 1.5-20040522 
  - type in a summary of the changes made
  - SUBMIT
***************
*** 77,84 ****
    <!--
         Note: These can be overridden on the command line, as in:
!        ant -DversionMinor=4 -DversionType=Release\ Build versionSource
    -->
    <property name="versionMajor" value="1"/>
!   <property name="versionMinor" value="4"/>
    <property name="versionType" value="Integration Build"/>
    <property name="versionNumber" value="${versionMajor}.${versionMinor}"/>
--- 77,84 ----
    <!--
         Note: These can be overridden on the command line, as in:
!        ant -DversionMinor=5 -DversionType=Release\ Build versionSource
    -->
    <property name="versionMajor" value="1"/>
!   <property name="versionMinor" value="5"/>
    <property name="versionType" value="Integration Build"/>
    <property name="versionNumber" value="${versionMajor}.${versionMinor}"/>
***************
*** 104,108 ****
    <target name="JDK1.4">
        <condition property="JDK1.4">
!         <equals arg1="1.4" arg2="${ant.java.version}"/>
        </condition>
    </target>
--- 104,111 ----
    <target name="JDK1.4">
        <condition property="JDK1.4">
!         <or>
!           <equals arg1="1.4" arg2="${ant.java.version}"/>
!           <equals arg1="1.4" arg2="${ant.java.version}"/>
!         </or>
        </condition>
    </target>
***************
*** 321,325 ****
      <property name="javadoc.doctitle" value="HTML Parser ${versionNumber}"/>
      <property name="javadoc.header" value="&lt;A HREF=&quot;http://htmlparser.sourceforge.net&quot; target=&quot;_top&quot;>HTML Parser Home Page&lt;/A>"/>
!     <property name="javadoc.footer" value="&amp;copy; 2004 Somik Raha&lt;div align=&quot;right&quot;&gt;${TODAY_STRING}&lt;/div&gt;"/>
      <property name="javadoc.bottom" value="HTML Parser is an open source library released under
      &lt;A HREF=&quot;http://www.opensource.org/licenses/lgpl-license.html&quot; target=&quot;_top&quot;&gt;LGPL&lt;/A&gt;.&lt;BR&gt;
--- 324,328 ----
      <property name="javadoc.doctitle" value="HTML Parser ${versionNumber}"/>
      <property name="javadoc.header" value="&lt;A HREF=&quot;http://htmlparser.sourceforge.net&quot; target=&quot;_top&quot;>HTML Parser Home Page&lt;/A>"/>
!     <property name="javadoc.footer" value="&amp;copy; 2004 Derrick Oswald&lt;div align=&quot;right&quot;&gt;${TODAY_STRING}&lt;/div&gt;"/>
      <property name="javadoc.bottom" value="HTML Parser is an open source library released under
      &lt;A HREF=&quot;http://www.opensource.org/licenses/lgpl-license.html&quot; target=&quot;_top&quot;&gt;LGPL&lt;/A&gt;.&lt;BR&gt;

[Htmlparser-cvs] htmlparser/src/org/htmlparser Tag.java,1.1,1.2

From: Derrick O. <der...@us...> - 2004-05-22 11:33:30

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv435/src/org/htmlparser

Modified Files:
	Tag.java 
Log Message:
Change minor version to 5. Fix doc comment warning.



Index: Tag.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Tag.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** Tag.java	20 Mar 2004 17:03:53 -0000	1.1
--- Tag.java	22 May 2004 11:33:20 -0000	1.2
***************
*** 105,110 ****
       * begin with "/" if it is an end tag. Nor does it end with
       * a slash in the case of an XML type tag.
-      * To get at the original text of the tag name use
-      * {@link #getRawTagName getRawTagName()}.
       * The conversion to uppercase is performed with an ENGLISH locale.
       * </em>
--- 105,108 ----

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/scannersTests ScriptScannerTest.java,1.53,1.54

From: Derrick O. <der...@us...> - 2004-05-22 03:57:40

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/scannersTests

Modified Files:
	ScriptScannerTest.java 
Log Message:
Fix bug# 919738 Text has not been extracted correctly using StringBean
and (duplicate) bug #936392 ScriptTag visitor fails for comments with '
by handling single and multiline ecmascript comments in the Lexer class
when called with quotesmart true.
Also added test cases for, but didn't fix bug #923146 tag nesting rule
too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable)
and bug #922439 OutOfMemory on huge HTML files (4,7MB)
(org.htmlparser.tests.MemoryTest) which are thus currently failing.



Index: ScriptScannerTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/scannersTests/ScriptScannerTest.java,v
retrieving revision 1.53
retrieving revision 1.54
diff -C2 -d -r1.53 -r1.54
*** ScriptScannerTest.java	28 Feb 2004 15:52:44 -0000	1.53
--- ScriptScannerTest.java	22 May 2004 03:57:30 -0000	1.54
***************
*** 203,207 ****
              "document.write(\"{ // do something\"); " +
              "document.write(\"}\"); " +
!             "// parser thinks this is the end tag. " +
              "document.write(\"</script>\");" +
              "</script>" +
--- 203,207 ----
              "document.write(\"{ // do something\"); " +
              "document.write(\"}\"); " +
!             "// parser thinks this is the end tag.\n" +
              "document.write(\"</script>\");" +
              "</script>" +
***************
*** 226,230 ****
              "document.write(\"{ // do something\"); " +
              "document.write(\"}\"); " +
!             "// parser thinks this is the end tag. " +
              "document.write(\"</script>\");",
              scriptTag.getScriptCode()
--- 226,230 ----
              "document.write(\"{ // do something\"); " +
              "document.write(\"}\"); " +
!             "// parser thinks this is the end tag.\n" +
              "document.write(\"</script>\");",
              scriptTag.getScriptCode()

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/visitorsTests ScriptCommentTest.java,NONE,1.1 AllTests.java,1.41,1.42

From: Derrick O. <der...@us...> - 2004-05-22 03:57:40

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/visitorsTests

Modified Files:
	AllTests.java 
Added Files:
	ScriptCommentTest.java 
Log Message:
Fix bug# 919738 Text has not been extracted correctly using StringBean
and (duplicate) bug #936392 ScriptTag visitor fails for comments with '
by handling single and multiline ecmascript comments in the Lexer class
when called with quotesmart true.
Also added test cases for, but didn't fix bug #923146 tag nesting rule
too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable)
and bug #922439 OutOfMemory on huge HTML files (4,7MB)
(org.htmlparser.tests.MemoryTest) which are thus currently failing.



--- NEW FILE: ScriptCommentTest.java ---
// HTMLParser Library $Name:  $ - A java-based parser for HTML
// http://sourceforge.org/projects/htmlparser
// Copyright (C) 2004 Jim Arnell
//
// Revision Control Information
//
// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/ScriptCommentTest.java,v $
// $Author: derrickoswald $
// $Date: 2004/05/22 03:57:31 $
// $Revision: 1.1 $
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//

package org.htmlparser.tests.visitorsTests;

import org.htmlparser.tags.CompositeTag;
import org.htmlparser.tags.ScriptTag;
import org.htmlparser.tags.Tag;
import org.htmlparser.tests.ParserTestCase;
import org.htmlparser.visitors.NodeVisitor;

public class ScriptCommentTest extends ParserTestCase {

    static
    {
        System.setProperty ("org.htmlparser.tests.visitorsTests.ScriptCommentTest", "ScriptCommentTest");
    }
    
    private String workingScriptTag =
        "<script language='javascript'>"
        + "// I cant handle single quotations\n"
        + "</script>";

    private String workingHtml =
        this.workingScriptTag
        + "<HTML>"
        + "</HTML>";

    private String failingScriptTag =
        "<script language='javascript'>"
        + "// I can't handle single quotations.\n"
        + "</script>";

    private String failingHtml =
        this.failingScriptTag
        + "<HTML>"
        + "</HTML>";

    private String failingHtml2 =
        "<HTML>"
        + this.failingScriptTag
        + "</HTML>";

    private String anotherFailingScriptTag =
        "<script language='javascript'>"
        + "/* I can't handle single quotations. */"
        + "</script>";

    private String failingHtml3 =
        this.anotherFailingScriptTag
        + "<HTML>"
        + "</HTML>";

    public ScriptCommentTest(String name) {
        super(name);
    }

    public void testTagWorking() throws Exception {
        createParser(this.workingHtml);
        ScriptVisitor visitor = new ScriptVisitor();
        this.parser.visitAllNodesWith(visitor);
        String scriptNodeHtml = visitor.scriptTag.toHtml();
        assertEquals("Script parsing worked", this.workingScriptTag, scriptNodeHtml);
    }

    public void testScriptTagNotWorkingOuter() throws Exception {
        createParser(this.failingHtml);
        ScriptVisitor visitor = new ScriptVisitor();
        this.parser.visitAllNodesWith(visitor);
        String scriptNodeHtml = visitor.scriptTag.toHtml();
        assertEquals("Script parsing not working", this.failingScriptTag, scriptNodeHtml);
    }

    public void testScriptTagNotWorkingInner() throws Exception {
        createParser(this.failingHtml2);
        ScriptVisitor visitor = new ScriptVisitor();
        this.parser.visitAllNodesWith(visitor);
        String scriptNodeHtml = visitor.scriptTag.toHtml();
        assertEquals("Script parsing not working", this.failingScriptTag, scriptNodeHtml);
    }

    public void testScriptTagNotWorkingMultiLine() throws Exception {
        createParser(this.anotherFailingScriptTag);
        ScriptVisitor visitor = new ScriptVisitor();
        this.parser.visitAllNodesWith(visitor);
        String scriptNodeHtml = visitor.scriptTag.toHtml();
        assertEquals("Script parsing not working", this.anotherFailingScriptTag, scriptNodeHtml);
    }

    /**
     * Implement test case NodeVisitor.
     */
    public final class ScriptVisitor extends NodeVisitor {

        /** Keps the only script tag. */
        public ScriptTag scriptTag;

        /**
         * Creates a new ScriptVisitor object.
         *
         * @param hat param.
         * @param hostString param.
         * @param direction param.
         */
        public ScriptVisitor() {
            super(true, true);
        }

        /**
         * @see org.htmlparser.visitors.NodeVisitor
         */
        public void visitTag(final Tag n) {
            if ((null != n.getParent())
                || ((n instanceof CompositeTag)
                    && (null == ((CompositeTag) n).getEndTag()))) {

                if (n instanceof ScriptTag) {
                    this.scriptTag = (ScriptTag) n;
                }
            } else {
                if (n instanceof ScriptTag) {
                    this.scriptTag = (ScriptTag) n;
                }
            }
        }
    }
}


Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/visitorsTests/AllTests.java,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** AllTests.java	2 Jan 2004 16:24:57 -0000	1.41
--- AllTests.java	22 May 2004 03:57:31 -0000	1.42
***************
*** 52,55 ****
--- 52,56 ----
          suite.addTestSuite(TextExtractingVisitorTest.class);
          suite.addTestSuite(UrlModifyingVisitorTest.class);
+         suite.addTestSuite(ScriptCommentTest.class);
  
          return suite;

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/tagTests InputTagTest.java,1.39,1.40

From: Derrick O. <der...@us...> - 2004-05-22 03:57:40

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/tagTests

Modified Files:
	InputTagTest.java 
Log Message:
Fix bug# 919738 Text has not been extracted correctly using StringBean
and (duplicate) bug #936392 ScriptTag visitor fails for comments with '
by handling single and multiline ecmascript comments in the Lexer class
when called with quotesmart true.
Also added test cases for, but didn't fix bug #923146 tag nesting rule
too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable)
and bug #922439 OutOfMemory on huge HTML files (4,7MB)
(org.htmlparser.tests.MemoryTest) which are thus currently failing.



Index: InputTagTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/InputTagTest.java,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** InputTagTest.java	2 Jan 2004 16:24:57 -0000	1.39
--- InputTagTest.java	22 May 2004 03:57:31 -0000	1.40
***************
*** 27,31 ****
--- 27,35 ----
  package org.htmlparser.tests.tagTests;
  
+ import org.htmlparser.tags.FormTag;
  import org.htmlparser.tags.InputTag;
+ import org.htmlparser.tags.TableColumn;
+ import org.htmlparser.tags.TableRow;
+ import org.htmlparser.tags.TableTag;
  import org.htmlparser.tests.ParserTestCase;
  import org.htmlparser.util.ParserException;
***************
*** 82,84 ****
--- 86,124 ----
          assertEquals("Name","Google",inputTag.getAttribute("NAME"));
      }
+ 
+     /**
+      * Bug #923146 tag nesting rule too strict for forms
+      */
+     public void testTable () throws ParserException
+     {
+         String html =
+             "<table>" +
+             "<tr>" +
+             "<td>" +
+             "<form>" +
+             "<input name=input1>" +
+             "</td>" +
+             // <tr> missing
+             "<tr>" +
+             "<td>" +
+             "<input name=input2>" +
+             "</td>" +
+             "</tr>" +
+             "</form>" +
+             "</table>";
+         createParser (html);
+         parseAndAssertNodeCount (1);
+         assertTrue ("not a table", node[0] instanceof TableTag);
+         TableTag table = (TableTag)node[0];
+         assertTrue ("not two rows", 2 == table.getRowCount ());
+ //        assertTrue ("not one row", 1 == table.getRowCount ());
+         TableRow row = table.getRow (0);
+         assertTrue ("not one column", 1 == row.getColumnCount ());
+         TableColumn column = row.getColumns ()[0];
+         assertTrue ("not one child", 1 == column.getChildCount ());
+         assertTrue ("column doesn't have a form", column.getChild (0) instanceof FormTag);
+         FormTag form = (FormTag)column.getChild (0);
+         assertTrue ("form only has one input field", 2 == form.getFormInputs ().size ());
+     }
+ 
  }
\ No newline at end of file

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests MemoryTest.java,NONE,1.1 AllTests.java,1.59,1.60

From: Derrick O. <der...@us...> - 2004-05-22 03:57:39

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests

Modified Files:
	AllTests.java 
Added Files:
	MemoryTest.java 
Log Message:
Fix bug# 919738 Text has not been extracted correctly using StringBean
and (duplicate) bug #936392 ScriptTag visitor fails for comments with '
by handling single and multiline ecmascript comments in the Lexer class
when called with quotesmart true.
Also added test cases for, but didn't fix bug #923146 tag nesting rule
too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable)
and bug #922439 OutOfMemory on huge HTML files (4,7MB)
(org.htmlparser.tests.MemoryTest) which are thus currently failing.



--- NEW FILE: MemoryTest.java ---
// HTMLParser Library $Name:  $ - A java-based parser for HTML
// http://sourceforge.org/projects/htmlparser
// Copyright (C) 2004 Derrick Oswald
//
// Revision Control Information
//
// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/MemoryTest.java,v $
// $Author: derrickoswald $
// $Date: 2004/05/22 03:57:30 $
// $Revision: 1.1 $
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//

package org.htmlparser.tests;

import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeIterator;

/**
 * Test big memory requirements.
 */
public class MemoryTest extends ParserTestCase
{
    
    static
    {
        System.setProperty ("org.htmlparser.tests.MemoryTest", "MemoryTest");
    }

    public MemoryTest (String name)
    {
        super (name);
    }

    /**
     * Test for bug #922439 OutOfMemory on huge HTML files (4,7MB)
     */
    public void testBigFile () throws Exception
    {
        Parser parser;
        NodeIterator iterator;
        Node node;
        int size;
        
        parser = new Parser ("http://htmlparser.sourceforge.net/test/A002.html");
        size = 0;
        try
        {
            iterator = parser.elements ();
            while (iterator.hasMoreNodes ())
            {
                node = iterator.nextNode ();
                size += node.toHtml ().length ();
            }
        }
        catch (OutOfMemoryError ome)
        {
            fail ("out of memory");
        }
        assertEquals ("wrong size fetched", size, 4697411);
    }
    
}

Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/AllTests.java,v
retrieving revision 1.59
retrieving revision 1.60
diff -C2 -d -r1.59 -r1.60
*** AllTests.java	2 Jan 2004 16:24:55 -0000	1.59
--- AllTests.java	22 May 2004 03:57:30 -0000	1.60
***************
*** 53,56 ****
--- 53,57 ----
          sub.addTestSuite (FunctionalTests.class);
          sub.addTestSuite (LineNumberAssignedByNodeReaderTest.class);
+         sub.addTestSuite (MemoryTest.class);
          suite.addTest (sub);
          suite.addTest (org.htmlparser.tests.lexerTests.AllTests.suite ());

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/parserHelperTests StringParserTest.java,1.46,1.47

From: Derrick O. <der...@us...> - 2004-05-22 03:57:39

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/tests/parserHelperTests

Modified Files:
	StringParserTest.java 
Log Message:
Fix bug# 919738 Text has not been extracted correctly using StringBean
and (duplicate) bug #936392 ScriptTag visitor fails for comments with '
by handling single and multiline ecmascript comments in the Lexer class
when called with quotesmart true.
Also added test cases for, but didn't fix bug #923146 tag nesting rule
too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable)
and bug #922439 OutOfMemory on huge HTML files (4,7MB)
(org.htmlparser.tests.MemoryTest) which are thus currently failing.



Index: StringParserTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/parserHelperTests/StringParserTest.java,v
retrieving revision 1.46
retrieving revision 1.47
diff -C2 -d -r1.46 -r1.47
*** StringParserTest.java	2 Jan 2004 16:24:56 -0000	1.46
--- StringParserTest.java	22 May 2004 03:57:30 -0000	1.47
***************
*** 206,213 ****
              "</head>" +
              "<script language=\"JavaScript\" type=\"text/JavaScript\">" +
!             "// if this fails, output a 'hello' " +
              "if (true) " +
              "{ " +
!             "//something good... " +
              "} " +
              "</script>" +
--- 206,213 ----
              "</head>" +
              "<script language=\"JavaScript\" type=\"text/JavaScript\">" +
!             "// if this fails, output a 'hello' \n" +
              "if (true) " +
              "{ " +
!             "//something good...\n" +
              "} " +
              "</script>" +

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Lexer.java,1.27,1.28

From: Derrick O. <der...@us...> - 2004-05-22 03:57:38

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25265/lexer

Modified Files:
	Lexer.java 
Log Message:
Fix bug# 919738 Text has not been extracted correctly using StringBean
and (duplicate) bug #936392 ScriptTag visitor fails for comments with '
by handling single and multiline ecmascript comments in the Lexer class
when called with quotesmart true.
Also added test cases for, but didn't fix bug #923146 tag nesting rule
too strict for forms (org.htmlparser.tests.tagTests.InputTagTest.testTable)
and bug #922439 OutOfMemory on huge HTML files (4,7MB)
(org.htmlparser.tests.MemoryTest) which are thus currently failing.



Index: Lexer.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** Lexer.java	18 Feb 2004 12:34:04 -0000	1.27
--- Lexer.java	22 May 2004 03:57:29 -0000	1.28
***************
*** 303,306 ****
--- 303,307 ----
                  break;
              default:
+                 probe.retreat (); // string needs to see leading foreslash
                  ret = parseString (probe, quotesmart);
                  break;
***************
*** 412,415 ****
--- 413,445 ----
              else if (quotesmart && (ch == quote))
                  quote = 0; // exit quoted state
+             else if (quotesmart && (0 == quote) && (ch == '/'))
+             {
+                 // handle multiline and double slash comments (with a quote) in script like:
+                 // I can't handle single quotations.
+                 ch = mPage.getCharacter (cursor);
+                 if (0 == ch)
+                     done = true;
+                 else if ('/' == ch)
+                 {
+                     do
+                         ch = mPage.getCharacter (cursor);
+                     while ((ch != 0) && (ch != '\n'));
+                 }
+                 else if ('*' == ch)
+                 {
+                     do
+                     {
+                         do
+                             ch = mPage.getCharacter (cursor);
+                         while ((ch != 0) && (ch != '*'));
+                         ch = mPage.getCharacter (cursor);
+                         if (ch == '*')
+                             cursor.retreat ();
+                     }
+                     while ((ch != 0) && (ch != '/'));
+                 }
+                 else
+                     cursor.retreat ();
+             }
              else if ((0 == quote) && ('<' == ch))
              {

[Htmlparser-cvs] htmlparser/src/org/htmlparser/beans StringBean.java,1.38,1.39 LinkBean.java,1.28,1.29

From: Derrick O. <der...@us...> - 2004-05-16 18:00:05

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20559/beans

Modified Files:
	StringBean.java LinkBean.java 
Log Message:
Alter bound property name constants to agree with section
8.8 Capitalization of inferred names.
in the JavaBeans API specification.



Index: LinkBean.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/LinkBean.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** LinkBean.java	14 Mar 2004 15:53:06 -0000	1.28
--- LinkBean.java	16 May 2004 17:59:57 -0000	1.29
***************
*** 50,54 ****
       * Property name in event where the URL contents changes.
       */
!     public static final String PROP_LINKS_PROPERTY = "Links";
  
      /**
--- 50,54 ----
       * Property name in event where the URL contents changes.
       */
!     public static final String PROP_LINKS_PROPERTY = "links";
  
      /**

Index: StringBean.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/beans/StringBean.java,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** StringBean.java	28 Feb 2004 15:52:42 -0000	1.38
--- StringBean.java	16 May 2004 17:59:57 -0000	1.39
***************
*** 77,86 ****
       * Property name in event where the URL contents changes.
       */
!     public static final String PROP_STRINGS_PROPERTY = "Strings";
  
      /**
       * Property name in event where the 'embed links' state changes.
       */
!     public static final String PROP_LINKS_PROPERTY = "Links";
  
      /**
--- 77,86 ----
       * Property name in event where the URL contents changes.
       */
!     public static final String PROP_STRINGS_PROPERTY = "strings";
  
      /**
       * Property name in event where the 'embed links' state changes.
       */
!     public static final String PROP_LINKS_PROPERTY = "links";
  
      /**
***************
*** 92,106 ****
       * Property name in event where the 'replace non-breaking spaces' state changes.
       */
!     public static final String PROP_REPLACE_SPACE_PROPERTY = "ReplaceSpace";
  
      /**
       * Property name in event where the 'collapse whitespace' state changes.
       */
!     public static final String PROP_COLLAPSE_PROPERTY = "Collapse";
  
      /**
       * Property name in event where the connection changes.
       */
!     public static final String PROP_CONNECTION_PROPERTY = "Connection";
  
      /**
--- 92,106 ----
       * Property name in event where the 'replace non-breaking spaces' state changes.
       */
!     public static final String PROP_REPLACE_SPACE_PROPERTY = "replaceNonBreakingSpaces";
  
      /**
       * Property name in event where the 'collapse whitespace' state changes.
       */
!     public static final String PROP_COLLAPSE_PROPERTY = "collapse";
  
      /**
       * Property name in event where the connection changes.
       */
!     public static final String PROP_CONNECTION_PROPERTY = "connection";
  
      /**

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexerapplications/thumbelina Thumbelina.java,1.3,1.4

From: Derrick O. <der...@us...> - 2004-05-16 18:00:05

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexerapplications/thumbelina
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20559/lexerapplications/thumbelina

Modified Files:
	Thumbelina.java 
Log Message:
Alter bound property name constants to agree with section
8.8 Capitalization of inferred names.
in the JavaBeans API specification.



Index: Thumbelina.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexerapplications/thumbelina/Thumbelina.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** Thumbelina.java	4 Nov 2003 01:25:02 -0000	1.3
--- Thumbelina.java	16 May 2004 17:59:56 -0000	1.4
***************
*** 83,95 ****
       * Property name for current URL binding.
       */
!     public static final String PROP_CURRENT_URL_PROPERTY = "CurrentURL";
      /**
       * Property name for queue size binding.
       */
!     public static final String PROP_URL_QUEUE_PROPERTY = "URLQueue";
      /**
       * Property name for visited URL size binding.
       */
!     public static final String PROP_URL_VISITED_PROPERTY = "URLVisited";
  
      /**
--- 83,95 ----
       * Property name for current URL binding.
       */
!     public static final String PROP_CURRENT_URL_PROPERTY = "currentURL";
      /**
       * Property name for queue size binding.
       */
!     public static final String PROP_URL_QUEUE_PROPERTY = "queueSize";
      /**
       * Property name for visited URL size binding.
       */
!     public static final String PROP_URL_VISITED_PROPERTY = "visitedSize";
  
      /**
***************
*** 1454,1457 ****
--- 1454,1462 ----
   *
   * $Log$
+  * Revision 1.4  2004/05/16 17:59:56  derrickoswald
+  * Alter bound property name constants to agree with section
+  * 8.8 Capitalization of inferred names.
+  * in the JavaBeans API specification.
+  *
   * Revision 1.3  2003/11/04 01:25:02  derrickoswald
   * Made visiting order the same order as on the page.

[Htmlparser-cvs] htmlparser/src/org/htmlparser/util ParserUtils.java,1.39,1.40

From: Alberto N. <an...@us...> - 2004-05-12 14:16:20

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10385

Modified Files:
	ParserUtils.java 
Log Message:

Added many trim and split methods.

Index: ParserUtils.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** ParserUtils.java	14 Jan 2004 02:53:47 -0000	1.39
--- ParserUtils.java	12 May 2004 14:16:08 -0000	1.40
***************
*** 27,33 ****
--- 27,47 ----
  package org.htmlparser.util;
  
+ import java.io.ByteArrayInputStream;
+ import java.io.UnsupportedEncodingException;
+ import java.util.ArrayList;
+ 
  import org.htmlparser.Node;
  import org.htmlparser.NodeFilter;
+ import org.htmlparser.Parser;
[...1107 lines suppressed...]
+         return links;
+         
+     }
+     
+     private static String createDummyString (char fillingChar, int length)
+     {
+         StringBuffer dummyStringBuffer = new StringBuffer();
+         for (int j=0; j<length; j++)
+             dummyStringBuffer = dummyStringBuffer.append(fillingChar);
+         return new String(dummyStringBuffer);
+     }
+     
+     private static String modifyDummyString (String dummyString, int beginTag, int endTag)
+     {
+         String dummyStringInterval = createDummyString ('*', endTag-beginTag);
+         return new String(dummyString.substring(0, beginTag) + dummyStringInterval + dummyString.substring(endTag, dummyString.length()));
+     }
+     
  }
\ No newline at end of file

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/utilTests HTMLParserUtilsTest.java,1.15,1.16

From: Alberto N. <an...@us...> - 2004-05-12 14:14:41

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9985

Modified Files:
	HTMLParserUtilsTest.java 
Log Message:

Added many trim and split functions, here are the tests

Index: HTMLParserUtilsTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/HTMLParserUtilsTest.java,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** HTMLParserUtilsTest.java	2 Jan 2004 16:24:57 -0000	1.15
--- HTMLParserUtilsTest.java	12 May 2004 14:14:32 -0000	1.16
***************
*** 27,30 ****
--- 27,34 ----
  package org.htmlparser.tests.utilTests;
  
+ import org.htmlparser.NodeFilter;
+ import org.htmlparser.Parser;
+ import org.htmlparser.filters.*;
+ import org.htmlparser.tags.*;
  import org.htmlparser.tests.ParserTestCase;
  import org.htmlparser.util.ParserUtils;
***************
*** 49,51 ****
--- 53,401 ----
          );
      }
+     
+     public void testButCharsMethods() {
+         String[] tmpSplitButChars = ParserUtils.splitButChars("<DIV>  +12.5, +3.4 </DIV>", "+.1234567890");
+         assertStringEquals(
+             "modified text",
+             "+12.5*+3.4",
+             new String(tmpSplitButChars[0] + '*' + tmpSplitButChars[1])
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimButChars("<DIV>  +12.5 </DIV>", "+.1234567890")
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimButChars("<DIV>  +1 2 . 5 </DIV>", "+.1234567890")
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimButCharsBeginEnd("<DIV>  +12.5 </DIV>", "+.1234567890")
+         );
+         assertStringEquals(
+             "modified text",
+             "+1 2 . 5",
+             ParserUtils.trimButCharsBeginEnd("<DIV>  +1 2 . 5 </DIV>", "+.1234567890")
+         );
+     }
+     
+     public void testButDigitsMethods() {
+         String[] tmpSplitButDigits = ParserUtils.splitButDigits("<DIV>  +12.5, +3.4 </DIV>", "+.");
+         assertStringEquals(
+             "modified text",
+             "+12.5*+3.4",
+             new String(tmpSplitButDigits[0] + '*' + tmpSplitButDigits[1])
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimButDigits("<DIV>  +12.5 </DIV>", "+.")
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimButDigits("<DIV>  +1 2 . 5 </DIV>", "+.")
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimButDigitsBeginEnd("<DIV>  +12.5 </DIV>", "+.")
+         );
+         assertStringEquals(
+             "modified text",
+             "+1 2 . 5",
+             ParserUtils.trimButDigitsBeginEnd("<DIV>  +1 2 . 5 </DIV>", "+.")
+         );
+     }
+     
+     public void testCharsMethods() {
+         String[] tmpSplitChars = ParserUtils.splitChars("<DIV>  +12.5, +3.4 </DIV>", " <>DIV/,");
+         assertStringEquals(
+             "modified text",
+             "+12.5*+3.4",
+             new String(tmpSplitChars[0] + '*' + tmpSplitChars[1])
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimChars("<DIV>  +12.5 </DIV>", "<>DIV/ ")
+         );
+         assertStringEquals(
+             "modified text",
+             "Trimallchars",
+             ParserUtils.trimChars("<DIV>  Trim all chars   </DIV>", "<>DIV/ ")
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimCharsBeginEnd("<DIV>  +12.5 </DIV>", "<>DIV/ ")
+         );
+         assertStringEquals(
+             "modified text",
+             "Trim all spaces but not the ones inside the string",
+             ParserUtils.trimCharsBeginEnd("<DIV>  Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ ")
+         );
+     }
+     
+     public void testSpacesMethods() {
+         String[] tmpSplitSpaces = ParserUtils.splitSpaces("<DIV>  +12.5, +3.4 </DIV>", "<>DIV/,");
+         assertStringEquals(
+             "modified text",
+             "+12.5*+3.4",
+             new String(tmpSplitSpaces[0] + '*' + tmpSplitSpaces[1])
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimSpaces("<DIV>  +12.5 </DIV>", "<>DIV/")
+         );
+         assertStringEquals(
+             "modified text",
+             "Trimallspaces",
+             ParserUtils.trimSpaces("<DIV>  Trim all spaces  </DIV>", "<>DIV/")
+         );
+         assertStringEquals(
+             "modified text",
+             "+12.5",
+             ParserUtils.trimSpacesBeginEnd("<DIV>  +12.5 </DIV>", "<>DIV/")
+         );
+         assertStringEquals(
+             "modified text",
+             "Trim all spaces but not the ones inside the string",
+             ParserUtils.trimSpacesBeginEnd("<DIV>  Trim all spaces but not the ones inside the string </DIV>", "<>DIV/")
+         );
+     }
+     
+     public void testTagsMethods() {
+         try
+         {
+             String[] tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"});
+             assertStringEquals(
+                 "modified text",
+                 "Begin * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false);
+             assertStringEquals(
+                 "modified text",
+                 "Begin *<DIV>  +12.5 </DIV>* ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false);
+             assertStringEquals(
+                 "modified text",
+                 "Begin *  +12.5 * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true);
+             assertStringEquals(
+                 "modified text",
+                 "Begin * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             assertStringEquals(
+                 "modified text",
+                 " ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"})
+             );
+             assertStringEquals(
+                 "modified text",
+                 "<DIV>  +12.5 </DIV> ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false)
+             );
+             assertStringEquals(
+                 "modified text",
+                 "  +12.5  ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false)
+             );
+             assertStringEquals(
+                 "modified text",
+                 " ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true)
+             );
+         }
+         catch (Exception e)
+         {
+             assertStringEquals(
+                 "modified text",
+                 "error msg",
+                 e.getMessage()
+             );
+         }
+     }
+     
+     public void testTagsFilterMethods() {
+         try
+         {
+             NodeFilter filter = new TagNameFilter ("DIV");
+             String[] tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter);
+             assertStringEquals(
+                 "modified text",
+                 "Begin * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, false);
+             assertStringEquals(
+                 "modified text",
+                 "Begin *<DIV>  +12.5 </DIV>* ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, true, false);
+             assertStringEquals(
+                 "modified text",
+                 "Begin *  +12.5 * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, true);
+             assertStringEquals(
+                 "modified text",
+                 "Begin * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             assertStringEquals(
+                 "modified text",
+                 " ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter)
+             );
+             assertStringEquals(
+                 "modified text",
+                 "<DIV>  +12.5 </DIV> ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, false)
+             );
+             assertStringEquals(
+                 "modified text",
+                 "  +12.5  ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, true, false)
+             );
+             assertStringEquals(
+                 "modified text",
+                 " ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, true)
+             );
+         }
+         catch (Exception e)
+         {
+             assertStringEquals(
+                 "modified text",
+                 "error msg",
+                 e.getMessage()
+             );
+         }
+     }
+     
+     public void testTagsClassMethods() {
+         try
+         {
+             NodeFilter filter = new NodeClassFilter (Div.class);
+             String[] tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter);
+             assertStringEquals(
+                 "modified text",
+                 "Begin * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, false);
+             assertStringEquals(
+                 "modified text",
+                 "Begin *<DIV>  +12.5 </DIV>* ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, true, false);
+             assertStringEquals(
+                 "modified text",
+                 "Begin *  +12.5 * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2])
+             );
+             tmpSplitTags = ParserUtils.splitTags("Begin <DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, true);
+             assertStringEquals(
+                 "modified text",
+                 "Begin * ALL OK",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             assertStringEquals(
+                 "modified text",
+                 " ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter)
+             );
+             assertStringEquals(
+                 "modified text",
+                 "<DIV>  +12.5 </DIV> ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, false)
+             );
+             assertStringEquals(
+                 "modified text",
+                 "  +12.5  ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, true, false)
+             );
+             assertStringEquals(
+                 "modified text",
+                 " ALL OK",
+                 ParserUtils.trimTags("<DIV><DIV>  +12.5 </DIV></DIV> ALL OK", filter, false, true)
+             );
+         }
+         catch (Exception e)
+         {
+             assertStringEquals(
+                 "modified text",
+                 "error msg",
+                 e.getMessage()
+             );
+         }
+     }
+     
+     public void testTagsComplexMethods() {
+         try
+         {
+             NodeFilter filterLink = new NodeClassFilter (LinkTag.class);
+             NodeFilter filterDiv = new NodeClassFilter (Div.class);
+             OrFilter filterLinkDiv = new OrFilter (filterLink, filterDiv);
+             NodeFilter filterTable = new NodeClassFilter (TableColumn.class);
+             OrFilter filter = new OrFilter (filterLinkDiv, filterTable);
+             String[] tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter);
+             assertStringEquals(
+                 "modified text",
+                 "OutsideLeft*OutsideRight",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter, false, false);
+             assertStringEquals(
+                 "modified text",
+                 "OutsideLeft*AInside*<DIV>DivInside</DIV>*TableColoumnInside*OutsideRight",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2] + '*' + tmpSplitTags[3] + '*' + tmpSplitTags[4])
+             );
+             tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter, true, false);
+             assertStringEquals(
+                 "modified text",
+                 "OutsideLeft*AInside*DivInside*TableColoumnInside*OutsideRight",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1] + '*' + tmpSplitTags[2] + '*' + tmpSplitTags[3] + '*' + tmpSplitTags[4])
+             );
+             tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside</A><DIV><DIV>DivInside</DIV></DIV><TD>TableColoumnInside</TD>OutsideRight", filter, false, true);
+             assertStringEquals(
+                 "modified text",
+                 "OutsideLeft*OutsideRight",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             tmpSplitTags = ParserUtils.splitTags("OutsideLeft<A>AInside<DIV><DIV>DivInside</DIV></DIV></A><TD>TableColoumnInside</TD>OutsideRight", new String[] {"DIV", "TD", "A"});
+             assertStringEquals(
+                 "modified text",
+                 "OutsideLeft*OutsideRight",
+                 new String(tmpSplitTags[0] + '*' + tmpSplitTags[1])
+             );
+             assertStringEquals(
+                 "modified text",
+                 "OutsideLeftOutsideRight",
+                 ParserUtils.trimTags("OutsideLeft<A>AInside<DIV><DIV>DivInside</DIV></DIV></A><TD>TableColoumnInside</TD>OutsideRight", new String[] {"DIV", "TD", "A"})
+             );
+         }
+         catch (Exception e)
+         {
+             assertStringEquals(
+                 "modified text",
+                 "error msg",
+                 e.getMessage()
+             );
+         }
+     }
  }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/filterTests FilterTest.java,1.2,1.3

From: Derrick O. <der...@us...> - 2004-05-10 22:32:32

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/filterTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21948/tests/filterTests

Modified Files:
	FilterTest.java 
Log Message:
Add CssSelectorNodeFilter submitted by Rogers George.



Index: FilterTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/filterTests/FilterTest.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** FilterTest.java	7 Dec 2003 23:41:41 -0000	1.2
--- FilterTest.java	10 May 2004 22:31:46 -0000	1.3
***************
*** 27,31 ****
--- 27,33 ----
  package org.htmlparser.tests.filterTests;
  
+ import org.htmlparser.Parser;
  import org.htmlparser.filters.AndFilter;
+ import org.htmlparser.filters.CssSelectorNodeFilter;
  import org.htmlparser.filters.HasAttributeFilter;
  import org.htmlparser.filters.HasChildFilter;
***************
*** 35,43 ****
--- 37,48 ----
  import org.htmlparser.filters.StringFilter;
  import org.htmlparser.filters.TagNameFilter;
+ import org.htmlparser.lexer.Lexer;
  import org.htmlparser.lexer.nodes.StringNode;
  import org.htmlparser.lexer.nodes.TagNode;
  import org.htmlparser.tags.BodyTag;
  import org.htmlparser.tags.LinkTag;
+ import org.htmlparser.tags.Tag;
  import org.htmlparser.tests.ParserTestCase;
+ import org.htmlparser.util.NodeIterator;
  import org.htmlparser.util.NodeList;
  import org.htmlparser.util.ParserException;
***************
*** 241,244 ****
--- 246,275 ----
          assertEquals ("attribute value", "three", link.getAttribute ("id"));
      }
+ 
+     public void testEscape() throws Exception
+     {
+         assertEquals ("douchebag", CssSelectorNodeFilter.unescape ("doucheba\\g").toString ());
+     }
+ 
+     public void testSelectors() throws Exception
+     {
+         String html = "<html><head><title>sample title</title></head><body inserterr=\"true\" yomama=\"false\"><h3 id=\"heading\">big </invalid>heading</h3><ul id=\"things\"><li><br word=\"broken\"/>&gt;moocow<li><applet/>doohickey<li class=\"last\"><b class=\"item\">final<br>item</b></ul></body></html>";
+         Lexer l;
+         Parser p;
+         CssSelectorNodeFilter it;
+         NodeIterator i;
+         int count;
+ 
+         l = new Lexer (html);
+         p = new Parser (l);
+         it = new CssSelectorNodeFilter ("li + li");
+         count = 0;
+         for (i = p.extractAllNodesThatMatch (it).elements (); i.hasMoreNodes ();)
+         {
+             assertEquals ("tag name wrong", "LI", ((Tag)i.nextNode()).getTagName());
+             count++;
+         }
+         assertEquals ("wrong count", 2, count);
+     }
  }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/filters CssSelectorNodeFilter.java,NONE,1.1

From: Derrick O. <der...@us...> - 2004-05-10 22:32:11

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21948/filters

Added Files:
	CssSelectorNodeFilter.java 
Log Message:
Add CssSelectorNodeFilter submitted by Rogers George.



--- NEW FILE: CssSelectorNodeFilter.java ---
// HTMLParser Library $Name:  $ - A java-based parser for HTML
// http://sourceforge.org/projects/htmlparser
// Copyright (C) 2004 Rogers George
//
// Revision Control Information
//
// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/CssSelectorNodeFilter.java,v $
// $Author: derrickoswald $
// $Date: 2004/05/10 22:31:57 $
// $Revision: 1.1 $
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//

package org.htmlparser.filters;

import junit.framework.TestCase;
import org.htmlparser.*;
import org.htmlparser.lexer.Lexer;
import org.htmlparser.tags.Tag;
import org.htmlparser.util.NodeIterator;
import org.htmlparser.util.NodeList;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.net.URLConnection;

/**
 * A NodeFilter that accepts nodes based on whether they match a CSS2 selector.
 * Refer to <a href="http://www.w3.org/TR/REC-CSS2/selector.html">
 * http://www.w3.org/TR/REC-CSS2/selector.html</a> for syntax.
 * <p>
 * Todo: more thorough testing, any relevant pseudo-classes, css3 features
 */
public class CssSelectorNodeFilter implements NodeFilter
{
    private static Pattern tokens =
        Pattern.compile("("
            + "/\\*.*?\\*/"             // comments
            + ") | ("
            + "   \".*?[^\"]\""   // double quoted string
            + " | \'.*?[^\']\'"   // single quoted string
            + " | \"\" | \'\' "     // empty quoted string
            + ") | ("
            + " [\\~\\*\\$\\^]? = " // attrib-val relations
            + ") | ("
            + " [a-zA-Z_\\*](?:[a-zA-Z0-9_-]|\\\\.)* " // bare name
            + ") | \\s*("
            + " [+>~\\s] "        // combinators
            + ")\\s* | ("
            + " [\\.\\[\\]\\#\\:)(] "       // class/id/attr/param delims
            + ") | ("
            + " [\\,] "                     // comma
            + ") | ( . )"                   // everything else (bogus)
            ,
            Pattern.CASE_INSENSITIVE
            |Pattern.DOTALL
            |Pattern.COMMENTS);

    private static final int COMMENT = 1, QUOTEDSTRING = 2, RELATION = 3,
        NAME = 4, COMBINATOR = 5, DELIM = 6, COMMA = 7;

    private NodeFilter therule;

    public CssSelectorNodeFilter(String selector)
    {
        m = tokens.matcher(selector);
        if (nextToken())
            therule = parse();
    }

    public boolean accept(Node n)
    {
        return therule.accept(n);
    }

    private Matcher m = null;
    private int tokentype = 0;
    private String token = null;

    private boolean nextToken()
    {
        if (m != null && m.find())
            for (int i = 1; i < m.groupCount(); i++)
                if (m.group(i) != null)
                {
                    tokentype = i;
                    token = m.group(i);
                    return true;
                }
        tokentype = 0;
        token = null;
        return false;
    }

    private NodeFilter parse()
    {
        NodeFilter n = null;
        do
        {
            switch (tokentype)
            {
                case COMMENT:
                case NAME:
                case DELIM:
                    if (n == null)
                        n = parseSimple();
                    else
                        n = new AndFilter(n, parseSimple());
                    break;
                case COMBINATOR:
                    switch (token.charAt(0))
                    {
                        case '+':
                            n = new AdjacentFilter(n);
                            break;
                        case '>':
                            n = new HasParentFilter(n);
                            break;
                        default: // whitespace
                            n = new HasAncestorFilter(n);
                    }
                    nextToken();
                    break;
                case COMMA:
                    n = new OrFilter(n, parse());
                    nextToken();
                    break;
            }
        }
        while (token != null);
        return n;
    }

    private NodeFilter parseSimple()
    {
        boolean done = false;
        NodeFilter n = null;

        if (token != null)
            do
            {
                switch (tokentype)
                {
                    case COMMENT:
                        nextToken();
                        break;
                    case NAME:
                        if ("*".equals(token))
                            n = new YesFilter();
                        else if (n == null)
                            n = new TagNameFilter(unescape(token));
                        else
                            n = new AndFilter(n, new TagNameFilter(unescape(token)));
                        nextToken();
                        break;
                    case DELIM:
                        switch (token.charAt(0))
                        {
                            case '.':
                                nextToken();
                                if (tokentype != NAME)
                                    throw new IllegalArgumentException("Syntax error at " + token);
                                if (n == null)
                                    n = new HasAttributeFilter("class", unescape(token));
                                else
                                    n
                                    = new AndFilter(n, new HasAttributeFilter("class", unescape(token)));
                                break;
                            case '#':
                                nextToken();
                                if (tokentype != NAME)
                                    throw new IllegalArgumentException("Syntax error at " + token);
                                if (n == null)
                                    n = new HasAttributeFilter("id", unescape(token));
                                else
                                    n = new AndFilter(n, new HasAttributeFilter("id", unescape(token)));
                                break;
                            case ':':
                                nextToken();
                                if (n == null)
                                    n = parsePseudoClass();
                                else
                                    n = new AndFilter(n, parsePseudoClass());
                                break;
                            case '[':
                                nextToken();
                                if (n == null)
                                    n = parseAttributeExp();
                                else
                                    n = new AndFilter(n, parseAttributeExp());
                                break;
                        }
                        nextToken();
                        break;
                    default:
                        done = true;
                }
            }
            while (!done && token != null);
        return n;
    }

    private NodeFilter parsePseudoClass()
    {
        throw new IllegalArgumentException("pseudoclasses not implemented yet");
    }

    private NodeFilter parseAttributeExp()
    {
        NodeFilter n = null;
        if (tokentype == NAME)
        {
            String attrib = token;
            nextToken();
            if ("]".equals(token))
                n = new HasAttributeFilter(unescape(attrib));
            else if (tokentype == RELATION)
            {
                String val = null, rel = token;
                nextToken();
                if (tokentype == QUOTEDSTRING)
                    val = unescape(token.substring(1, token.length() - 1));
                else if (tokentype == NAME)
                    val = unescape(token);
                if ("~=".equals(rel) && val != null)
                    n = new AttribMatchFilter(unescape(attrib),
                                                                        "\\b"
                                                                        + val.replaceAll("([^a-zA-Z0-9])", "\\\\$1")
                                                                        + "\\b");
                else if ("=".equals(rel) && val != null)
                    n = new HasAttributeFilter(attrib, val);
            }
        }
        if (n == null)
            throw new IllegalArgumentException("Syntax error at " + token + tokentype);

        nextToken();
        return n;
    }

    public static String unescape(String escaped)
    {
        StringBuffer result = new StringBuffer(escaped.length());
        Matcher m = Pattern.compile("\\\\(?:([a-fA-F0-9]{2,6})|(.))").matcher(
                        escaped);
        while (m.find())
        {
            if (m.group(1) != null)
                m.appendReplacement(result,
                                                        String.valueOf((char)Integer.parseInt(m.group(1), 16)));
            else if (m.group(2) != null)
                m.appendReplacement(result, m.group(2));
        }
        m.appendTail(result);

        return result.toString();
    }

    private static class HasAncestorFilter implements NodeFilter
    {
        private NodeFilter atest;

        public HasAncestorFilter(NodeFilter n)
        {
            atest = n;
        }

        public boolean accept(Node n)
        {
            while (n != null)
            {
                n = n.getParent();
                if (atest.accept(n))
                    return true;
            }
            return false;
        }
    }

    private static class AdjacentFilter implements NodeFilter
    {
        private NodeFilter sibtest;

        public AdjacentFilter(NodeFilter n)
        {
            sibtest = n;
        }

        public boolean accept(Node n)
        {
            if (n.getParent() != null)
            {
                NodeList l = n.getParent().getChildren();
                for (int i = 0; i < l.size(); i++)
                    if (l.elementAt(i) == n && i > 0)
                        return (sibtest.accept(l.elementAt(i - 1)));
            }
            return false;
        }
    }

    private static class YesFilter implements NodeFilter
    {
        public boolean accept(Node n)
        {return true;}
    }

    private static class AttribMatchFilter implements NodeFilter
    {
        private Pattern rel;
        private String attrib;

        public AttribMatchFilter(String attrib, String regex)
        {
            rel = Pattern.compile(regex);
            this.attrib = attrib;
        }

        public boolean accept(Node node)
        {
            if (node instanceof Tag && ((Tag)node).getAttribute(attrib) != null)
                if (rel != null
                        && !rel.matcher(((Tag)node).getAttribute(attrib)).find())
                    return false;
                else
                    return true;
            else
                return false;
        }
    }
}

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Page.java,1.34,1.35

From: Derrick O. <der...@us...> - 2004-05-07 23:30:51

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28990

Modified Files:
	Page.java 
Log Message:
Ignore null contentType to accommodate ServletContext.getResource(...) per
suggestion by Rogers George.



Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.34
retrieving revision 1.35
diff -C2 -d -r1.34 -r1.35
*** Page.java	18 Mar 2004 04:04:07 -0000	1.34
--- Page.java	7 May 2004 23:30:37 -0000	1.35
***************
*** 335,339 ****
          }
          type = getContentType ();
!         if (!type.startsWith ("text"))
              throw new ParserException (
                  "URL "
--- 335,339 ----
          }
          type = getContentType ();
!         if (type != null && !type.startsWith ("text"))
              throw new ParserException (
                  "URL "

[Htmlparser-cvs] htmlparser/docs/pics alberto.jpg,NONE,1.1 italy.gif,NONE,1.1

From: Derrick O. <der...@us...> - 2004-04-20 10:54:53

Update of /cvsroot/htmlparser/htmlparser/docs/pics
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30175

Added Files:
	alberto.jpg italy.gif 
Log Message:
Add images.



--- NEW FILE: alberto.jpg ---
(This appears to be a binary file; contents omitted.)

--- NEW FILE: italy.gif ---
(This appears to be a binary file; contents omitted.)

[Htmlparser-cvs] htmlparser/docs contributors.html,1.6,1.7

From: Derrick O. <der...@us...> - 2004-04-20 10:49:59

Update of /cvsroot/htmlparser/htmlparser/docs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29326

Modified Files:
	contributors.html 
Log Message:
Add Alberto Nacher to contributors page.



Index: contributors.html
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/docs/contributors.html,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** contributors.html	16 Feb 2004 22:46:08 -0000	1.6
--- contributors.html	20 Apr 2004 10:49:51 -0000	1.7
***************
*** 225,228 ****
--- 225,265 ----
    </tr>
    <tr> 
+     <td width="25%" height="270"valign="top">
+       <!-- <img src="pics/alberto.jpg" width="181" height="265">-->
+       <img src="pics/alberto.jpg" width="100">
+       <strong><img src="pics/italy.gif" width="53" height="39"></strong><br>
+       Alberto Nacher<br>
+       Software Developer - Consultant<br>
+       Corso Sebastopoli 39,<br>
+       10134 Torino, Italy<br>
+       <a href="http://members.xoom.virgilio.it/nacher/Home.html">Personal Home Page</a><br>
+       <a href="http://sourceforge.net/sendmessage.php?touser=892989">email</a><br>
+     </td>
+     <td width="39%" valign="top">
+       	<strong>On Alberto Nacher</strong>
+       	<p>I'm 31 years old, I'm a computer engineer and I have been working as
+         consultant since 1998.</p>
+         <p>I've worked with Microsoft VB and VB.NET technologies, with Java
+         technology and Livelink technology (knowledge management and developer
+         enviroment of OpenText company).</p>
+         <p>My hobbies: travelling, seeing football matches, going out with
+         friends, getting mushrooms, reading and this year also an English course!</p>
+     </td>
+     <td width="36%" valign="top">
+       <strong>Alberto on Italy</strong></p>
+       <p>Italy is not so important if seen by high technology point of view.
+          The main activities in my country are fashion, car development (FIAT,
+          Ferrari, Alfa Romeo), pasta and food, wines and, of course, the big
+          state companies doing telecommunication systems, electrical
+          distribution, oil distribution. So... If you want to work as programmer
+          you have no relevant software houses to join with and it is better
+          being a technical consultant.</p>
+       <p>Anyway... If you want to visit Italy, you surely be charmed by the
+         beauty of my country! Venice, Florence, Rome are some of the best towns
+         in the world. But you can also visit Torino (my home town) where you can
+         see the 2nd Egyptian museum in the world.</p>
+      </td>
+   </tr>  
+   <tr> 
      <td height="213" valign="top"> <p><img src="pics/uk.gif" width="65" height="35"><br>
          Dr. Sam Joseph<br>

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes TagNode.java,1.33,1.34

From: Derrick O. <der...@us...> - 2004-04-06 11:04:47

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19036

Modified Files:
	TagNode.java 
Log Message:
Documentation modifications requested by Leos Literak via htmlparser-user mail list.



Index: TagNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v
retrieving revision 1.33
retrieving revision 1.34
diff -C2 -d -r1.33 -r1.34
*** TagNode.java	20 Mar 2004 17:03:53 -0000	1.33
--- TagNode.java	6 Apr 2004 10:51:57 -0000	1.34
***************
*** 54,57 ****
--- 54,59 ----
       * The tag attributes.
       * Objects of type {@link Attribute}.
+      * The first element is the tag name, subsequent elements being either
+      * whitespace or real attributes.
       */
      protected Vector mAttributes;
***************
*** 280,283 ****
--- 282,287 ----
       * @param attribs The attribute collection to set.
       * Each element is an {@link Attribute Attribute}.
+      * The first attribute in the list must be the tag name (
+      * <code>isStandalone()</code> returns <code>true</code>).
       */
      public void setAttributeEx (Attribute attribute)
***************
*** 341,344 ****
--- 345,350 ----
       * Gets the attributes in the tag.
       * @return Returns the list of {@link Attribute Attributes} in the tag.
+      * The first element is the tag name, subsequent elements being either
+      * whitespace or real attributes.
       */
      public Vector getAttributesEx ()
***************
*** 491,494 ****
--- 497,502 ----
      /**
       * Sets the attributes.
+      * A special entry with a key of SpecialHashtable.TAGNAME ("$<TAGNAME>$")
+      * sets the tag name.
       * @param attributes The attribute collection to set.
       */
***************
*** 583,586 ****
--- 591,598 ----
      }
  
+     /**
+      * Parses the given text to create the tag contents.
+      * @param text A string of the form &lt;TAGNAME xx="yy"&gt;.
+      */
      public void setText (String text)
      {
***************
*** 648,652 ****
  
      /**
!      * Print the contents of the tag
       */
      public String toString ()
--- 660,665 ----
  
      /**
!      * Print the contents of the tag.
!      * @return An string describing the tag. For text that looks like HTML use #toHtml().
       */
      public String toString ()

[Htmlparser-cvs] CVSROOT checkoutlist,1.1,1.2

From: Somik R. <so...@us...> - 2004-03-27 18:03:22

Update of /cvsroot/htmlparser/CVSROOT
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8320

Modified Files:
	checkoutlist 
Log Message:
updated checkoutlist

Index: checkoutlist
===================================================================
RCS file: /cvsroot/htmlparser/CVSROOT/checkoutlist,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** checkoutlist	3 Apr 2001 16:10:41 -0000	1.1
--- checkoutlist	27 Mar 2004 17:52:13 -0000	1.2
***************
*** 11,13 ****
  #	[<whitespace>]<filename><whitespace><error message><end-of-line>
  #
! # comment lines begin with '#'
--- 11,13 ----
  #	[<whitespace>]<filename><whitespace><error message><end-of-line>
  #
! users   Unable to check out 'users' file in CVSROOT
\ No newline at end of file

[Htmlparser-cvs] htmlparser build.xml,1.62,1.63

From: Derrick O. <der...@us...> - 2004-03-20 20:11:07

Update of /cvsroot/htmlparser/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18662

Modified Files:
	build.xml 
Log Message:
Add Tag interface to htmllexer.jar.



Index: build.xml
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v
retrieving revision 1.62
retrieving revision 1.63
diff -C2 -d -r1.62 -r1.63
*** build.xml	18 Mar 2004 04:04:07 -0000	1.62
--- build.xml	20 Mar 2004 20:01:02 -0000	1.63
***************
*** 229,232 ****
--- 229,233 ----
        <include name="org/htmlparser/Node.class"/>
        <include name="org/htmlparser/NodeFilter.class"/>
+       <include name="org/htmlparser/Tag.class"/>
        <include name="org/htmlparser/util/ParserException.class"/>
        <include name="org/htmlparser/util/ChainedException.class"/>

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer/nodes TagNode.java,1.32,1.33

From: Derrick O. <der...@us...> - 2004-03-20 17:13:52

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15661/lexer/nodes

Modified Files:
	TagNode.java 
Log Message:
First pass refactoring.
Create Tag interface, which isn't really used yet.



Index: TagNode.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/nodes/TagNode.java,v
retrieving revision 1.32
retrieving revision 1.33
diff -C2 -d -r1.32 -r1.33
*** TagNode.java	14 Mar 2004 20:31:38 -0000	1.32
--- TagNode.java	20 Mar 2004 17:03:53 -0000	1.33
***************
*** 33,36 ****
--- 33,37 ----
  
  import org.htmlparser.AbstractNode;
+ import org.htmlparser.Tag;
  import org.htmlparser.lexer.Cursor;
  import org.htmlparser.lexer.Lexer;
***************
*** 47,50 ****
--- 48,53 ----
      extends
          AbstractNode
+     implements
+         Tag
  {
      /**
***************
*** 273,276 ****
--- 276,289 ----
      }
  
+     /*
+      * Sets the attributes.
+      * @param attribs The attribute collection to set.
+      * Each element is an {@link Attribute Attribute}.
+      */
+     public void setAttributeEx (Attribute attribute)
+     {
+         setAttribute (attribute);
+     }
+ 
      /**
       * Set an attribute.

[Htmlparser-cvs] htmlparser/src/org/htmlparser Tag.java,NONE,1.1 PrototypicalNodeFactory.java,1.5,1.6

From: Derrick O. <der...@us...> - 2004-03-20 17:13:52

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15661

Modified Files:
	PrototypicalNodeFactory.java 
Added Files:
	Tag.java 
Log Message:
First pass refactoring.
Create Tag interface, which isn't really used yet.



Index: PrototypicalNodeFactory.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/PrototypicalNodeFactory.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** PrototypicalNodeFactory.java	25 Jan 2004 21:32:56 -0000	1.5
--- PrototypicalNodeFactory.java	20 Mar 2004 17:03:53 -0000	1.6
***************
*** 36,40 ****
  import org.htmlparser.lexer.nodes.Attribute;
  import org.htmlparser.lexer.nodes.NodeFactory;
! import org.htmlparser.tags.*; // import everything for now
  import org.htmlparser.util.ParserException;
  
--- 36,69 ----
  import org.htmlparser.lexer.nodes.Attribute;
  import org.htmlparser.lexer.nodes.NodeFactory;
! import org.htmlparser.tags.AppletTag;
! import org.htmlparser.tags.BaseHrefTag;
! import org.htmlparser.tags.BodyTag;
! import org.htmlparser.tags.Bullet;
! import org.htmlparser.tags.BulletList;
! import org.htmlparser.tags.Div;
! import org.htmlparser.tags.DoctypeTag;
! import org.htmlparser.tags.FormTag;
! import org.htmlparser.tags.FrameSetTag;
! import org.htmlparser.tags.FrameTag;
! import org.htmlparser.tags.HeadTag;
! import org.htmlparser.tags.Html;
! import org.htmlparser.tags.ImageTag;
! import org.htmlparser.tags.InputTag;
! import org.htmlparser.tags.JspTag;
! import org.htmlparser.tags.LabelTag;
! import org.htmlparser.tags.LinkTag;
! import org.htmlparser.tags.MetaTag;
! import org.htmlparser.tags.OptionTag;
! import org.htmlparser.tags.ScriptTag;
! import org.htmlparser.tags.SelectTag;
! import org.htmlparser.tags.Span;
! import org.htmlparser.tags.StyleTag;
! import org.htmlparser.tags.TableColumn;
! import org.htmlparser.tags.TableHeader;
! import org.htmlparser.tags.TableRow;
! import org.htmlparser.tags.TableTag;
! import org.htmlparser.tags.Tag;
! import org.htmlparser.tags.TextareaTag;
! import org.htmlparser.tags.TitleTag;
  import org.htmlparser.util.ParserException;
  

--- NEW FILE: Tag.java ---
// HTMLParser Library $Name:  $ - A java-based parser for HTML
// http://sourceforge.org/projects/htmlparser
// Copyright (C) 2004 Derrick Oswald
//
// Revision Control Information
//
// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Tag.java,v $
// $Author: derrickoswald $
// $Date: 2004/03/20 17:03:53 $
// $Revision: 1.1 $
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//

package org.htmlparser;

import java.util.Vector;
import org.htmlparser.lexer.nodes.Attribute;

/**
 * Identifies what a Tag such as &lt;XXX xxx yyy="zzz"&gt; can do.
 * Adds features to a Node that are specific to a tag.
 */
public interface Tag extends Node
{
    /**
     * Returns the value of an attribute.
     * @param name Name of attribute, case insensitive.
     * @return The value associated with the attribute or null if it does
     * not exist, or is a stand-alone or
     */
    public String getAttribute (String name);

    /**
     * Set attribute with given key, value pair.
     * Figures out a quote character to use if necessary.
     * @param key The name of the attribute.
     * @param value The value of the attribute.
     */
    public void setAttribute (String key, String value);

    /**
     * Set attribute with given key, value pair where the value is quoted by quote.
     * @param key The name of the attribute.
     * @param value The value of the attribute.
     * @param quote The quote character to be used around value.
     * If zero, it is an unquoted value.
     */
    public void setAttribute (String key, String value, char quote);

    /**
     * Remove the attribute with the given key, if it exists.
     * @param key The name of the attribute.
     */
    public void removeAttribute (String key);

    /**
     * Returns the attribute with the given name.
     * @param name Name of attribute, case insensitive.
     * @return The attribute or null if it does
     * not exist.
     */
    public Attribute getAttributeEx (String name);

    /**
     * Set an attribute.
     * This replaces an attribute of the same name.
     * To set the zeroth attribute (the tag name), use setTagName().
     * @param attribute The attribute to set.
     */
    public void setAttributeEx (Attribute attribute);

    /**
     * Gets the attributes in the tag.
     * @return Returns the list of {@link Attribute Attributes} in the tag.
     */
    public Vector getAttributesEx ();

    /**
     * Sets the attributes.
     * NOTE: Values of the extended hashtable are two element arrays of String,
     * with the first element being the original name (not uppercased),
     * and the second element being the value.
     * @param attribs The attribute collection to set.
     */
    public void setAttributesEx (Vector attribs);

    /**
     * Return the name of this tag.
     * <p>
     * <em>
     * Note: This value is converted to uppercase and does not
     * begin with "/" if it is an end tag. Nor does it end with
     * a slash in the case of an XML type tag.
     * To get at the original text of the tag name use
     * {@link #getRawTagName getRawTagName()}.
     * The conversion to uppercase is performed with an ENGLISH locale.
     * </em>
     * @return The tag name.
     */
    public String getTagName ();

    /**
     * Set the name of this tag.
     * This creates or replaces the first attribute of the tag (the
     * zeroth element of the attribute vector).
     * @param name The tag name.
     */
    public void setTagName (String name);

    /**
     * Determines if the given tag breaks the flow of text.
     * @return <code>true</code> if following text would start on a new line,
     * <code>false</code> otherwise.
     */
    public boolean breaksFlow ();

    /**
     * Predicate to determine if this tag is an end tag (i.e. &lt;/HTML&gt;).
     * @return <code>true</code> if this tag is an end tag.
     */
    public boolean isEndTag ();

    /**
     * Set this tag to be an end tag, or not.
     * Adds or removes the leading slash on the tag name.
     * @param endTag If true, this tag is made into an end tag.
     * Any attributes it may have had are dropped.
     */
//    public void setEndTag (boolean endTag);

    /**
     * Is this an empty xml tag of the form &lt;tag/&gt;.
     * @return true if the last character of the last attribute is a '/'.
     */
    public boolean isEmptyXmlTag ();

    /**
     * Set this tag to be an empty xml node, or not.
     * Adds or removes an ending slash on the tag.
     * @param emptyXmlTag If true, ensures there is an ending slash in the node,
     * i.e. &lt;tag/&gt;, otherwise removes it.
     */
    public void setEmptyXmlTag (boolean emptyXmlTag);
}

1 message has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 17 18 19 20 21 .. 61 > >> (Page 19 of 61)

2003	Jan	Feb	Mar	Apr	May (141)	Jun (108)	Jul (66)	Aug (127)	Sep (155)	Oct (149)	Nov (72)	Dec (72)
2004	Jan (100)	Feb (36)	Mar (21)	Apr (3)	May (87)	Jun (28)	Jul (84)	Aug (5)	Sep (14)	Oct	Nov	Dec
2005	Jan (1)	Feb (39)	Mar (26)	Apr (38)	May (14)	Jun (10)	Jul	Aug	Sep (13)	Oct (8)	Nov (10)	Dec
2006	Jan	Feb (1)	Mar (17)	Apr (20)	May (28)	Jun (24)	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec