htmlparser-cvs Mailing List for HTML Parser (Page 3)

Brought to you by: derrickoswald

htmlparser-cvs — syncmail email notification of CVS commits

You can subscribe to this list here.

2003	_Jan	_Feb	_Mar	_Apr	_May (141)	_Jun (108)	_Jul (66)	_Aug (127)	_Sep (155)	_Oct (149)	_Nov (72)	_Dec (72)
2004	_Jan (100)	_Feb (36)	_Mar (21)	_Apr (3)	_May (87)	_Jun (28)	_Jul (84)	_Aug (5)	_Sep (14)	_Oct	_Nov	_Dec
2005	_Jan (1)	_Feb (39)	_Mar (26)	_Apr (38)	_May (14)	_Jun (10)	_Jul	_Aug	_Sep (13)	_Oct (8)	_Nov (10)	_Dec
2006	_Jan	_Feb (1)	_Mar (17)	_Apr (20)	_May (28)	_Jun (24)	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2015	_Jan	_Feb	_Mar (1)	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 2 3 4 5 .. 61 > >> (Page 3 of 61)

[Htmlparser-cvs] htmlparser/src/org/htmlparser/filters AndFilter.java,1.5,1.6 OrFilter.java,1.5,1.6

From: Ian M. <ian...@us...> - 2006-05-16 09:15:00

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters
In directory sc8-pr-cvs5.sourceforge.net:/tmp/cvs-serv29501/src/org/htmlparser/filters

Modified Files:
	AndFilter.java OrFilter.java 
Log Message:
Incorrect grammar in javadoc. Changed [it's]  to [its].

Index: AndFilter.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/AndFilter.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** AndFilter.java	16 May 2006 07:58:21 -0000	1.5
--- AndFilter.java	16 May 2006 09:14:55 -0000	1.6
***************
*** 31,35 ****
  
  /**
!  * Accepts nodes matching all of it's predicate filters (AND operation).
   */
  public class AndFilter
--- 31,35 ----
  
  /**
!  * Accepts nodes matching all of its predicate filters (AND operation).
   */
  public class AndFilter
***************
*** 102,106 ****
  
      /**
!      * Accept nodes that are acceptable to all of it's predicate filters.
       * @param node The node to check.
       * @return <code>true</code> if all the predicate filters find the node
--- 102,106 ----
  
      /**
!      * Accept nodes that are acceptable to all of its predicate filters.
       * @param node The node to check.
       * @return <code>true</code> if all the predicate filters find the node

Index: OrFilter.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/OrFilter.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** OrFilter.java	16 May 2006 07:58:21 -0000	1.5
--- OrFilter.java	16 May 2006 09:14:55 -0000	1.6
***************
*** 31,35 ****
  
  /**
!  * Accepts nodes matching any of it's predicates filters (OR operation).
   */
  public class OrFilter implements NodeFilter
--- 31,35 ----
  
  /**
!  * Accepts nodes matching any of its predicates filters (OR operation).
   */
  public class OrFilter implements NodeFilter
***************
*** 100,104 ****
  
      /**
!      * Accept nodes that are acceptable to any of it's predicate filters.
       * @param node The node to check.
       * @return <code>true</code> if any of the predicate filters find the node
--- 100,104 ----
  
      /**
!      * Accept nodes that are acceptable to any of its predicate filters.
       * @param node The node to check.
       * @return <code>true</code> if any of the predicate filters find the node

[Htmlparser-cvs] htmlparser/src/org/htmlparser/filters XorFilter.java,NONE,1.1

From: Ian M. <ian...@us...> - 2006-05-16 09:11:46

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters
In directory sc8-pr-cvs5.sourceforge.net:/tmp/cvs-serv28440/src/org/htmlparser/filters

Added Files:
	XorFilter.java 
Log Message:
New class that does XOR logic (to round out our NOT, AND and OR filters).

--- NEW FILE: XorFilter.java ---
// HTMLParser Library $Name:  $ - A java-based parser for HTML
// http://sourceforge.org/projects/htmlparser
// Copyright (C) 2003 Derrick Oswald
//
// Revision Control Information
//
// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/XorFilter.java,v $
// $Author: ian_macfarlane $
// $Date: 2006/05/16 09:11:41 $
// $Revision: 1.1 $
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//

package org.htmlparser.filters;

import org.htmlparser.Node;
import org.htmlparser.NodeFilter;

/**
 * Accepts nodes matching an odd number of its predicates filters (XOR operation).
 * For example, where it has two filters, it accepts only if and only if one of the two filters accepts the Node, but does not accept if both filters accept the Node. 
 */
public class XorFilter implements NodeFilter
{
    /**
     * The predicates that are to be xor'ed together;
     */
    protected NodeFilter[] mPredicates;

    /**
     * Creates a new instance of an XorFilter.
     * With no predicates, this would always answer <code>false</code>
     * to {@link #accept}.
     * @see #setPredicates
     */
    public XorFilter ()
    {
        setPredicates (null);
    }

    /**
     * Creates an XorFilter that accepts nodes acceptable to either filter, but not both.
     * @param left One filter.
     * @param right The other filter.
     */
    public XorFilter (NodeFilter left, NodeFilter right)
    {
        NodeFilter[] predicates;

        predicates = new NodeFilter[2];
        predicates[0] = left;
        predicates[1] = right;
        setPredicates (predicates);
    }
    
    /**
     * Creates an XorFilter that accepts nodes acceptable an odd number of the given filters.
     * @param predicates The list of filters. 
     */
    public XorFilter (NodeFilter[] predicates)
    {
        setPredicates (predicates);
    }

    /**
     * Get the predicates used by this XorFilter.
     * @return The predicates currently in use.
     */
    public NodeFilter[] getPredicates ()
    {
        return (mPredicates);
    }

    /**
     * Set the predicates for this XorFilter.
     * @param predicates The list of predidcates to use in {@link #accept}.
     */
    public void setPredicates (NodeFilter[] predicates)
    {
        if (null == predicates)
            predicates = new NodeFilter[0];
        mPredicates = predicates;
    }

    //
    // NodeFilter interface
    //

    /**
     * Accept nodes that are acceptable to an odd number of its predicate filters.
     * @param node The node to check.
     * @return <code>true</code> if an odd number of the predicate filters find the node
     * is acceptable, <code>false</code> otherwise.
     */
    public boolean accept (Node node)
    {
        int countTrue;

        countTrue = 0;

        for (int i = 0; i < mPredicates.length; i++)
            if (mPredicates[i].accept (node))
                ++countTrue;

        return ((countTrue % 2) == 1);
    }
}

[Htmlparser-cvs] htmlparser/src/org/htmlparser/filters AndFilter.java,1.4,1.5 OrFilter.java,1.4,1.5

From: Ian M. <ian...@us...> - 2006-05-16 07:58:24

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters
In directory sc8-pr-cvs5.sourceforge.net:/tmp/cvs-serv1765/src/org/htmlparser/filters

Modified Files:
	AndFilter.java OrFilter.java 
Log Message:
Added constructors to OrFilter/AndFilter that take an array of NodeFilter's.

Index: AndFilter.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/AndFilter.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** AndFilter.java	14 Jun 2005 10:37:33 -0000	1.4
--- AndFilter.java	16 May 2006 07:58:21 -0000	1.5
***************
*** 67,70 ****
--- 67,79 ----
          setPredicates (predicates);
      }
+     
+     /**
+      * Creates an AndFilter that accepts nodes acceptable to all given filters.
+      * @param predicates The list of filters. 
+      */
+     public AndFilter (NodeFilter[] predicates)
+     {
+         setPredicates (predicates);
+     }
  
      /**

Index: OrFilter.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/filters/OrFilter.java,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** OrFilter.java	15 May 2005 11:49:04 -0000	1.4
--- OrFilter.java	16 May 2006 07:58:21 -0000	1.5
***************
*** 65,68 ****
--- 65,77 ----
          setPredicates (predicates);
      }
+     
+     /**
+      * Creates an OrFilter that accepts nodes acceptable to any of the given filters.
+      * @param predicates The list of filters. 
+      */
+     public OrFilter (NodeFilter[] predicates)
+     {
+         setPredicates (predicates);
+     }
  
      /**

[Htmlparser-cvs] htmlparser/src/org/htmlparser Parser.java,1.112,1.113

From: Derrick O. <der...@us...> - 2006-04-24 22:12:18

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24188

Modified Files:
	Parser.java 
Log Message:
Fix incorrect example.

Index: Parser.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v
retrieving revision 1.112
retrieving revision 1.113
diff -C2 -d -r1.112 -r1.113
*** Parser.java	14 Apr 2006 22:18:47 -0000	1.112
--- Parser.java	24 Apr 2006 22:12:05 -0000	1.113
***************
*** 64,68 ****
   * <pre>
   * Parser parser = new Parser ("http://whatever");
!  * NodeList list = parser.parse ();
   * // do something with your list of nodes.
   * </pre>
--- 64,68 ----
   * <pre>
   * Parser parser = new Parser ("http://whatever");
!  * NodeList list = parser.parse (null);
   * // do something with your list of nodes.
   * </pre>

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tags TableHeader.java,1.3,1.4

From: Derrick O. <der...@us...> - 2006-04-23 11:59:48

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22131

Modified Files:
	TableHeader.java 
Log Message:
Change copyright as per request by P.I.M. Schrama

Index: TableHeader.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/TableHeader.java,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** TableHeader.java	31 Oct 2005 16:26:11 -0000	1.3
--- TableHeader.java	23 Apr 2006 11:59:44 -0000	1.4
***************
*** 1,5 ****
  // HTMLParser Library $Name$ - A java-based parser for HTML
  // http://sourceforge.org/projects/htmlparser
! // Copyright (C) 2004 Pim Schrama
  //
  // Revision Control Information
--- 1,5 ----
  // HTMLParser Library $Name$ - A java-based parser for HTML
  // http://sourceforge.org/projects/htmlparser
! // Copyright (C) 2006 Derrick Oswald
  //
  // Revision Control Information

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/lexerTests KitTest.java,1.10,NONE

From: Derrick O. <der...@us...> - 2006-04-18 00:08:05

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24569/lexerTests

Removed Files:
	KitTest.java 
Log Message:
Move non-junit test code to Request For Enhancement (RFE) as attachments.

--- KitTest.java DELETED ---

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests PerformanceTest.java,1.49,NONE

From: Derrick O. <der...@us...> - 2006-04-18 00:08:04

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24569

Removed Files:
	PerformanceTest.java 
Log Message:
Move non-junit test code to Request For Enhancement (RFE) as attachments.

--- PerformanceTest.java DELETED ---

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests PerformanceTest.java,1.48,1.49 ParserTestCase.java,1.52,1.53

From: Derrick O. <der...@us...> - 2006-04-17 23:45:15

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6163

Modified Files:
	PerformanceTest.java ParserTestCase.java 
Log Message:
Fix unit tests.

Index: ParserTestCase.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTestCase.java,v
retrieving revision 1.52
retrieving revision 1.53
diff -C2 -d -r1.52 -r1.53
*** ParserTestCase.java	31 Jul 2004 16:42:33 -0000	1.52
--- ParserTestCase.java	17 Apr 2006 23:45:11 -0000	1.53
***************
*** 60,63 ****
--- 60,67 ----
      }
  
+     public void testFake ()
+     {
+     }
+ 
      protected void parse(String response) throws ParserException {
          createParser(response,10000);

Index: PerformanceTest.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/PerformanceTest.java,v
retrieving revision 1.48
retrieving revision 1.49
diff -C2 -d -r1.48 -r1.49
*** PerformanceTest.java	31 Jul 2004 16:42:33 -0000	1.48
--- PerformanceTest.java	17 Apr 2006 23:45:11 -0000	1.49
***************
*** 28,31 ****
--- 28,32 ----
  
  import org.htmlparser.Parser;
+ import org.htmlparser.PrototypicalNodeFactory;
  import org.htmlparser.util.DefaultParserFeedback;
  import org.htmlparser.util.NodeIterator;
***************
*** 57,60 ****
--- 58,62 ----
              // Create the parser object
              parser = new Parser(file,new DefaultParserFeedback());
+             parser.setNodeFactory (new PrototypicalNodeFactory (true));
              long start=System.currentTimeMillis();
              for (NodeIterator e = parser.elements();e.hasMoreNodes();)

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests ParserTest.java,1.65,1.66

From: Derrick O. <der...@us...> - 2006-04-17 13:53:18

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30086

Modified Files:
	ParserTest.java 
Log Message:
Fix unit tests. Move failing test cases to downloads on corresponding RFE artifacts.

Index: ParserTest.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/ParserTest.java,v
retrieving revision 1.65
retrieving revision 1.66
diff -C2 -d -r1.65 -r1.66
*** ParserTest.java	24 Apr 2005 17:48:27 -0000	1.65
--- ParserTest.java	17 Apr 2006 13:53:12 -0000	1.66
***************
*** 431,435 ****
      /**
       * Test with a HTTP header with a valid charset parameter.
!      * Here, ibm.co.jp is an example of a HTTP server that correctly sets the
       * charset in the header to match the content encoding.
       */
--- 431,435 ----
      /**
       * Test with a HTTP header with a valid charset parameter.
!      * Here, Oracle Japan is an example of a HTTP server that correctly sets the
       * charset in the header to match the content encoding.
       */
***************
*** 439,443 ****
          try
          {
!             parser = new Parser("http://www.ibm.com/jp/", Parser.DEVNULL);
              assertTrue("Character set should be Shift_JIS", parser.getEncoding ().equalsIgnoreCase ("Shift_JIS"));
          }
--- 439,443 ----
          try
          {
!             parser = new Parser("http://www.oracle.co.jp/", Parser.DEVNULL);
              assertTrue("Character set should be Shift_JIS", parser.getEncoding ().equalsIgnoreCase ("Shift_JIS"));
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/utilTests CharacterTranslationTest.java,1.47,1.48

From: Derrick O. <der...@us...> - 2006-04-17 13:53:16

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30086/utilTests

Modified Files:
	CharacterTranslationTest.java 
Log Message:
Fix unit tests. Move failing test cases to downloads on corresponding RFE artifacts.

Index: CharacterTranslationTest.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/CharacterTranslationTest.java,v
retrieving revision 1.47
retrieving revision 1.48
diff -C2 -d -r1.47 -r1.48
*** CharacterTranslationTest.java	8 Apr 2006 13:33:47 -0000	1.47
--- CharacterTranslationTest.java	17 Apr 2006 13:53:12 -0000	1.48
***************
*** 334,337 ****
--- 334,339 ----
              if (string.startsWith ("-- "))
                  string = string.substring (3);
+             // remove newlines
+             string = string.replace ('\n', ' ');
              // remove doublespaces
              index = 0;

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/tagTests InputTagTest.java,1.41,1.42 TableTagTest.java,1.2,1.3

From: Derrick O. <der...@us...> - 2006-04-17 13:53:15

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30086/tagTests

Modified Files:
	InputTagTest.java TableTagTest.java 
Log Message:
Fix unit tests. Move failing test cases to downloads on corresponding RFE artifacts.

Index: TableTagTest.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/TableTagTest.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** TableTagTest.java	31 Jul 2004 16:42:31 -0000	1.2
--- TableTagTest.java	17 Apr 2006 13:53:12 -0000	1.3
***************
*** 140,154 ****
       * See bug #742254 Nested <TR> &<TD> tags should not be allowed
       */
-     public void testUnClosed1 () throws ParserException
-     {
-         createParser ("<TABLE><TR><TR></TR></TABLE>");
-         parseAndAssertNodeCount (1);
-         String s = node[0].toHtml ();
-         assertEquals ("Unclosed","<TABLE><TR></TR><TR></TR></TABLE>",s);
-     }
- 
-     /**
-      * See bug #742254 Nested <TR> &<TD> tags should not be allowed
-      */
      public void testUnClosed2 () throws ParserException
      {
--- 140,143 ----
***************
*** 160,174 ****
  
      /**
-      * See bug #742254 Nested <TR> &<TD> tags should not be allowed
-      */
-     public void testUnClosed3 () throws ParserException
-     {
-         createParser ("<TABLE><TR><TD>blah blah</TD><TR><TD>blah blah</TD></TR></TABLE>");
-         parseAndAssertNodeCount (1);
-         String s = node[0].toHtml ();
-         assertEquals ("Unclosed","<TABLE><TR><TD>blah blah</TD></TR><TR><TD>blah blah</TD></TR></TABLE>",s);
-     }
- 
-     /**
       * See bug #750117 StackOverFlow while Node-Iteration
       * Not reproducible.
--- 149,152 ----

Index: InputTagTest.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/tagTests/InputTagTest.java,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** InputTagTest.java	15 May 2005 11:49:05 -0000	1.41
--- InputTagTest.java	17 Apr 2006 13:53:12 -0000	1.42
***************
*** 86,124 ****
          assertEquals("Name","Google",inputTag.getAttribute("NAME"));
      }
- 
-     /**
-      * Bug #923146 tag nesting rule too strict for forms
-      */
-     public void testTable () throws ParserException
-     {
-         String html =
-             "<table>" +
-             "<tr>" +
-             "<td>" +
-             "<form>" +
-             "<input name=input1>" +
-             "</td>" +
-             // <tr> missing
-             "<tr>" +
-             "<td>" +
-             "<input name=input2>" +
-             "</td>" +
-             "</tr>" +
-             "</form>" +
-             "</table>";
-         createParser (html);
-         parseAndAssertNodeCount (1);
-         assertTrue ("not a table", node[0] instanceof TableTag);
-         TableTag table = (TableTag)node[0];
-         assertTrue ("not two rows", 2 == table.getRowCount ());
- //        assertTrue ("not one row", 1 == table.getRowCount ());
-         TableRow row = table.getRow (0);
-         assertTrue ("not one column", 1 == row.getColumnCount ());
-         TableColumn column = row.getColumns ()[0];
-         assertTrue ("not one child", 1 == column.getChildCount ());
-         assertTrue ("column doesn't have a form", column.getChild (0) instanceof FormTag);
-         FormTag form = (FormTag)column.getChild (0);
-         assertTrue ("form only has one input field", 2 == form.getFormInputs ().size ());
-     }
- 
  }
--- 86,88 ----

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/lexerTests LexerTests.java,1.28,1.29

From: Derrick O. <der...@us...> - 2006-04-17 13:53:15

Update of //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30086/lexerTests

Modified Files:
	LexerTests.java 
Log Message:
Fix unit tests. Move failing test cases to downloads on corresponding RFE artifacts.

Index: LexerTests.java
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/LexerTests.java,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** LexerTests.java	19 Mar 2006 15:01:25 -0000	1.28
--- LexerTests.java	17 Apr 2006 13:53:12 -0000	1.29
***************
*** 619,622 ****
--- 619,625 ----
          mAcceptable.add ("LI");
          mAcceptable.add ("IFRAME");
+         mAcceptable.add ("LINK");
+         mAcceptable.add ("H1");
+         mAcceptable.add ("H3");
      }

[Htmlparser-cvs] htmlparser/bin translate.cmd,1.1,1.2 lexer.cmd,1.1,1.2 stringextractor.cmd,1.1,1.2 filterbuilder.cmd,1.1,1.2 thumbelina.cmd,1.1,1.2 sitecapturer.cmd,1.1,1.2 parser.cmd,1.1,1.2 linkextractor.cmd,1.1,1.2 beanybaby.cmd,1.1,1.2

From: Derrick O. <der...@us...> - 2006-04-17 13:51:25

Update of //cvsroot/htmlparser/htmlparser/bin
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28620

Modified Files:
	translate.cmd lexer.cmd stringextractor.cmd filterbuilder.cmd 
	thumbelina.cmd sitecapturer.cmd parser.cmd linkextractor.cmd 
	beanybaby.cmd 
Log Message:
Allow execution from directory name containing spaces on Windows.

Index: parser.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/parser.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** parser.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- parser.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmlparser.jar org.htmlparser.Parser %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmlparser.jar" org.htmlparser.Parser %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: sitecapturer.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/sitecapturer.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** sitecapturer.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- sitecapturer.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmlparser.jar org.htmlparser.parserapplications.SiteCapturer %1 %2 %3
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmlparser.jar" org.htmlparser.parserapplications.SiteCapturer %1 %2 %3

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: translate.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/translate.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** translate.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- translate.cmd	17 Apr 2006 13:51:17 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmlparser.jar org.htmlparser.util.Translate %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmlparser.jar" org.htmlparser.util.Translate %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: lexer.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/lexer.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** lexer.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- lexer.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmllexer.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmllexer.jar org.htmlparser.lexer.Lexer %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmllexer.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi
! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmllexer.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmllexer.jar" org.htmlparser.lexer.Lexer %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmllexer.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: stringextractor.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/stringextractor.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** stringextractor.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- stringextractor.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmlparser.jar org.htmlparser.parserapplications.StringExtractor %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmlparser.jar" org.htmlparser.parserapplications.StringExtractor %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: beanybaby.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/beanybaby.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** beanybaby.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- beanybaby.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmlparser.jar org.htmlparser.beans.BeanyBaby %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmlparser.jar" org.htmlparser.beans.BeanyBaby %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: linkextractor.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/linkextractor.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** linkextractor.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- linkextractor.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,47 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -classpath %lib_path%htmlparser.jar org.htmlparser.parserapplications.LinkExtractor %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,47 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -classpath "%lib_path%htmlparser.jar" org.htmlparser.parserapplications.LinkExtractor %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: filterbuilder.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/filterbuilder.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** filterbuilder.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- filterbuilder.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,51 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmlparser.jar goto no_htmlparser_jar_error
! if not exist %lib_path%filterbuilder.jar goto no_filterbuilder_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -Xmx256M -classpath %lib_path%filterbuilder.jar;%lib_path%htmlparser.jar org.htmlparser.parserapplications.filterbuilder.FilterBuilder %1
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_htmlparser_jar_error
! echo Unable to find htmlparser.jar
! goto end
! :no_filterbuilder_jar_error
! echo Unable to find filterbuilder.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,51 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmlparser.jar" goto no_htmlparser_jar_error

! if not exist "%lib_path%filterbuilder.jar" goto no_filterbuilder_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -Xmx256M -classpath "%lib_path%filterbuilder.jar;%lib_path%htmlparser.jar" org.htmlparser.parserapplications.filterbuilder.FilterBuilder %1

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_htmlparser_jar_error

! echo Unable to find htmlparser.jar

! goto end

! :no_filterbuilder_jar_error

! echo Unable to find filterbuilder.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end


Index: thumbelina.cmd
===================================================================
RCS file: //cvsroot/htmlparser/htmlparser/bin/thumbelina.cmd,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** thumbelina.cmd	10 Apr 2005 23:20:41 -0000	1.1
--- thumbelina.cmd	17 Apr 2006 13:51:19 -0000	1.2
***************
*** 1,51 ****
! @echo off
! rem HTMLParser Library $Name$ - A java-based parser for HTML
! rem http://sourceforge.org/projects/htmlparser
! rem Copyright (C) 2005 Derrick Oswald
! rem
! rem Revision Control Information
! rem
! rem $Source$
! rem $Author$
! rem $Date$
! rem $Revision$
! rem
! rem This library is free software; you can redistribute it and/or
! rem modify it under the terms of the GNU Lesser General Public
! rem License as published by the Free Software Foundation; either
! rem version 2.1 of the License, or (at your option) any later version.
! rem
! rem This library is distributed in the hope that it will be useful,
! rem but WITHOUT ANY WARRANTY; without even the implied warranty of
! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
! rem Lesser General Public License for more details.
! rem
! rem You should have received a copy of the GNU Lesser General Public
! rem License along with this library; if not, write to the Free Software
! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
! rem
! setlocal enableextensions
! if errorlevel 1 goto no_extensions_error
! for %%i in (%0) do set cmd_path= %%~dpi
! for /D %%i in (%cmd_path%..\lib\) do set lib_path=%%~dpi
! if not exist %lib_path%htmllexer.jar goto no_htmllexer_jar_error
! if not exist %lib_path%thumbelina.jar goto no_thumbelina_jar_error
! for %%i in (java.exe) do set java_executable=%%~$PATH:i
! if "%java_executable%"=="" goto no_java_error
! @echo on
! %java_executable% -Xmx256M -classpath %lib_path%thumbelina.jar;%lib_path%htmllexer.jar org.htmlparser.lexerapplications.thumbelina.Thumbelina %1 %2
! @echo off
! goto end
! :no_extensions_error
! echo Unable to use CMD extensions
! goto end
! :no_htmllexer_jar_error
! echo Unable to find htmllexer.jar
! goto end
! :no_thumbelina_jar_error
! echo Unable to find thumbelina.jar
! goto end
! :no_java_error
! echo Unable to find java.exe
! goto end
! :end
--- 1,51 ----
! @echo off

! rem HTMLParser Library $Name$ - A java-based parser for HTML

! rem http://sourceforge.org/projects/htmlparser

! rem Copyright (C) 2005 Derrick Oswald

! rem

! rem Revision Control Information

! rem

! rem $Source$

! rem $Author$

! rem $Date$

! rem $Revision$

! rem

! rem This library is free software; you can redistribute it and/or

! rem modify it under the terms of the GNU Lesser General Public

! rem License as published by the Free Software Foundation; either

! rem version 2.1 of the License, or (at your option) any later version.

! rem

! rem This library is distributed in the hope that it will be useful,

! rem but WITHOUT ANY WARRANTY; without even the implied warranty of

! rem MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

! rem Lesser General Public License for more details.

! rem

! rem You should have received a copy of the GNU Lesser General Public

! rem License along with this library; if not, write to the Free Software

! rem Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

! rem

! setlocal enableextensions

! if errorlevel 1 goto no_extensions_error

! for %%i in ("%0") do set cmd_path=%%~dpi

! for /D %%i in ("%cmd_path%..\lib\") do set lib_path=%%~dpi

! if not exist "%lib_path%htmllexer.jar" goto no_htmllexer_jar_error

! if not exist "%lib_path%thumbelina.jar" goto no_thumbelina_jar_error

! for %%i in (java.exe) do set java_executable=%%~$PATH:i

! if "%java_executable%"=="" goto no_java_error

! @echo on

! %java_executable% -Xmx256M -classpath "%lib_path%thumbelina.jar;%lib_path%htmllexer.jar" org.htmlparser.lexerapplications.thumbelina.Thumbelina %1 %2

! @echo off

! goto end

! :no_extensions_error

! echo Unable to use CMD extensions

! goto end

! :no_htmllexer_jar_error

! echo Unable to find htmllexer.jar

! goto end

! :no_thumbelina_jar_error

! echo Unable to find thumbelina.jar

! goto end

! :no_java_error

! echo Unable to find java.exe

! goto end

! :end

[Htmlparser-cvs] htmlparser/src/org/htmlparser/http ConnectionManager.java,1.9,1.10

From: Derrick O. <der...@us...> - 2006-04-14 22:18:53

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31675/src/org/htmlparser/http

Modified Files:
	ConnectionManager.java 
Log Message:
Cleanup to isolate htmllexer jar build.

Index: ConnectionManager.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http/ConnectionManager.java,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** ConnectionManager.java	19 Mar 2006 20:14:58 -0000	1.9
--- ConnectionManager.java	14 Apr 2006 22:18:47 -0000	1.10
***************
*** 59,63 ****
      {
          mDefaultRequestProperties.put ("User-Agent", "HTMLParser/"
!             + org.htmlparser.Parser.VERSION_NUMBER);
          mDefaultRequestProperties.put ("Accept-Encoding", "gzip");
      }
--- 59,63 ----
      {
          mDefaultRequestProperties.put ("User-Agent", "HTMLParser/"
!             + org.htmlparser.lexer.Lexer.VERSION_NUMBER);
          mDefaultRequestProperties.put ("Accept-Encoding", "gzip");
      }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Lexer.java,1.44,1.45

From: Derrick O. <der...@us...> - 2006-04-14 22:18:52

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31675/src/org/htmlparser/lexer

Modified Files:
	Lexer.java 
Log Message:
Cleanup to isolate htmllexer jar build.

Index: Lexer.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Lexer.java,v
retrieving revision 1.44
retrieving revision 1.45
diff -C2 -d -r1.44 -r1.45
*** Lexer.java	19 Mar 2006 21:26:32 -0000	1.44
--- Lexer.java	14 Apr 2006 22:18:47 -0000	1.45
***************
*** 59,62 ****
--- 59,95 ----
          NodeFactory
  {
+     // Please don't change the formatting of the version variables below.
+     // This is done so as to facilitate ant script processing.
+ 
+     /**
+      * The floating point version number ({@value}).
+      */
+     public static final double
+     VERSION_NUMBER = 1.6
+     ;
+ 
+     /**
+      * The type of version ({@value}).
+      */
+     public static final String
+     VERSION_TYPE = "Integration Build"
+     ;
+ 
+     /**
+      * The date of the version ({@value}).
+      */
+     public static final String
+     VERSION_DATE = "Mar 19, 2006"
+     ;
+ 
+     // End of formatting
+ 
+     /**
+      * The display version ({@value}).
+      */
+     public static final String VERSION_STRING =
+             "" + VERSION_NUMBER
+             + " (" + VERSION_TYPE + " " + VERSION_DATE + ")";
+ 
      /**
       * The page lexemes are retrieved from.
***************
*** 84,87 ****
--- 117,140 ----
      protected static int mDebugLineTrigger = -1;
  
+     //
+     // Static methods
+     //
+ 
+     /**
+      * Return the version string of this parser.
+      * @return A string of the form:
+      * <pre>
+      * "[floating point number] ([build-type] [build-date])"
+      * </pre>
+      */
+     public static String getVersion ()
+     {
+         return (VERSION_STRING);
+     }
+ 
+     //
+     // Constructors
+     //
+ 
      /**
       * Creates a new instance of a Lexer.
***************
*** 124,137 ****
      }
  
!     /**
!      * Reset the lexer to start parsing from the beginning again.
!      * The underlying components are reset such that the next call to
!      * <code>nextNode()</code> will return the first lexeme on the page.
!      */
!     public void reset ()
!     {
!         getPage ().reset ();
!         setCursor (new Cursor (getPage (), 0));
!     }
  
      /**
--- 177,183 ----
      }
  
!     //
!     // Bean patterns
!     //
  
      /**
***************
*** 234,237 ****
--- 280,298 ----
      }
  
+     //
+     // Public methods
+     //
+ 
+     /**
+      * Reset the lexer to start parsing from the beginning again.
+      * The underlying components are reset such that the next call to
+      * <code>nextNode()</code> will return the first lexeme on the page.
+      */
+     public void reset ()
+     {
+         getPage ().reset ();
+         setCursor (new Cursor (getPage (), 0));
+     }
+ 
      /**
       * Get the next node from the source.
***************
*** 333,336 ****
--- 394,659 ----
  
      /**
+      * Return CDATA as a text node.
+      * According to appendix <a href="http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data">
+      * B.3.2 Specifying non-HTML data</a> of the
+      * <a href="http://www.w3.org/TR/html4/">HTML 4.01 Specification</a>:<br>
+      * <quote>
+      * <b>Element content</b><br>
+      * When script or style data is the content of an element (SCRIPT and STYLE),
+      * the data begins immediately after the element start tag and ends at the
+      * first ETAGO ("&lt;/") delimiter followed by a name start character ([a-zA-Z]);
+      * note that this may not be the element's end tag.
+      * Authors should therefore escape "&lt;/" within the content. Escape mechanisms
+      * are specific to each scripting or style sheet language.
+      * </quote>
+      * @return The <code>TextNode</code> of the CDATA or <code>null</code> if none.
+      * @exception ParserException If a problem occurs reading from the source.
+      */
+     public Node parseCDATA ()
+         throws
+             ParserException
+     {
+         return (parseCDATA (false));
+     }
+ 
+     /**
+      * Return CDATA as a text node.
+      * Slightly less rigid than {@link #parseCDATA()} this method provides for
+      * parsing CDATA that may contain quoted strings that have embedded
+      * ETAGO ("&lt;/") delimiters and skips single and multiline comments.
+      * @param quotesmart If <code>true</code> the strict definition of CDATA is
+      * extended to allow for single or double quoted ETAGO ("&lt;/") sequences.
+      * @return The <code>TextNode</code> of the CDATA or <code>null</code> if none.
+      * @see #parseCDATA()
+      * @exception ParserException If a problem occurs reading from the source.
+      */
+     public Node parseCDATA (boolean quotesmart)
+         throws
+             ParserException
+     {
+         int start;
+         int state;
+         boolean done;
+         char quote;
+         char ch;
+         int end;
+         boolean comment;
+ 
+         start = mCursor.getPosition ();
+         state = 0;
+         done = false;
+         quote = 0;
+         comment = false;
+ 
+         while (!done)
+         {
+             ch = mPage.getCharacter (mCursor);
+             switch (state)
+             {
+                 case 0: // prior to ETAGO
+                     switch (ch)
+                     {
+                         case Page.EOF:
+                             done = true;
+                             break;
+                         case '\'':
+                             if (quotesmart && !comment)
+                                 if (0 == quote)
+                                     quote = '\''; // enter quoted state
+                                 else if ('\'' == quote)
+                                     quote = 0; // exit quoted state
+                             break;
+                         case '"':
+                             if (quotesmart && !comment)
+                                 if (0 == quote)
+                                     quote = '"'; // enter quoted state
+                                 else if ('"' == quote)
+                                     quote = 0; // exit quoted state
+                             break;
+                         case '\\':
+                             if (quotesmart)
+                                 if (0 != quote)
+                                 {
+                                     ch = mPage.getCharacter (mCursor); // try to consume escaped character
+                                     if (Page.EOF == ch)
+                                         done = true;
+                                     else if (  (ch != '\\') && (ch != quote))
+                                         mCursor.retreat (); // unconsume char if character was not an escapable char.
+                                 }
+                             break;
+                         case '/':
+                             if (quotesmart)
+                                 if (0 == quote)
+                                 {
+                                     // handle multiline and double slash comments (with a quote)
+                                     ch = mPage.getCharacter (mCursor);
+                                     if (Page.EOF == ch)
+                                         done = true;
+                                     else if ('/' == ch)
+                                         comment = true;
+                                     else if ('*' == ch)
+                                     {
+                                         do
+                                         {
+                                             do
+                                                 ch = mPage.getCharacter (mCursor);
+                                             while ((Page.EOF != ch) && ('*' != ch));
+                                             ch = mPage.getCharacter (mCursor);
+                                             if (ch == '*')
+                                                 mCursor.retreat ();
+                                         }
+                                         while ((Page.EOF != ch) && ('/' != ch));
+                                     }
+                                     else
+                                         mCursor.retreat ();
+                                 }
+                             break;
+                         case '\n':
+                             comment = false;
+                             break;
+                         case '<':
+                             if (quotesmart)
+                             {
+                                 if (0 == quote)
+                                     state = 1;
+                             }
+                             else
+                                 state = 1;
+                             break;
+                         default:
+                             break;
+                     }
+                     break;
+                 case 1: // <
+                     switch (ch)
+                     {
+                         case Page.EOF:
+                             done = true;
+                             break;
+                         case '/':
+                             state = 2;
+                             break;
+                         case '!':
+                             ch = mPage.getCharacter (mCursor);
+                             if (Page.EOF == ch)
+                                 done = true;
+                             else if ('-' == ch)
+                             {
+                                 ch = mPage.getCharacter (mCursor);
+                                 if (Page.EOF == ch)
+                                     done = true;
+                                 else if ('-' == ch)
+                                     state = 3;
+                                 else
+                                     state = 0;
+                             }
+                             else
+                                 state = 0;
+                             break;
+                         default:
+                             state = 0;
+                             break;
+                     }
+                     break;
+                 case 2: // </
+                     comment = false;
+                     if (Page.EOF == ch)
+                         done = true;
+                     else if (Character.isLetter (ch))
+                     {
+                         done = true;
+                         // back up to the start of ETAGO
+                         mCursor.retreat ();
+                         mCursor.retreat ();
+                         mCursor.retreat ();
+                     }
+                     else
+                         state = 0;
+                     break;
+                 case 3: // <!
+                     comment = false;
+                     if (Page.EOF == ch)
+                         done = true;
+                     else if ('-' == ch)
+                     {
+                         ch = mPage.getCharacter (mCursor);
+                         if (Page.EOF == ch)
+                             done = true;
+                         else if ('-' == ch)
+                         {
+                             ch = mPage.getCharacter (mCursor);
+                             if (Page.EOF == ch)
+                                 done = true;
+                             else if ('>' == ch)
+                                 state = 0;
+                             else
+                             {
+                                 mCursor.retreat ();
+                                 mCursor.retreat ();
+                             }
+                         }
+                         else
+                             mCursor.retreat ();
+                     }
+                     break;
+                 default:
+                     throw new IllegalStateException ("how the fuck did we get in state " + state);
+             }
+         }
+         end = mCursor.getPosition ();
+ 
+         return (makeString (start, end));
+     }
+ 
+     //
+     // NodeFactory interface
+     //
+ 
+     /**
+      * Create a new string node.
+      * @param page The page the node is on.
+      * @param start The beginning position of the string.
+      * @param end The ending positiong of the string.
+      * @return The created Text node.
+      */
+     public Text createStringNode (Page page,  int start, int end)
+     {
+         return (new TextNode (page, start, end));
+     }
+ 
+     /**
+      * Create a new remark node.
+      * @param page The page the node is on.
+      * @param start The beginning position of the remark.
+      * @param end The ending positiong of the remark.
+      * @return The created Remark node.
+      */
+     public Remark createRemarkNode (Page page,  int start, int end)
+     {
+         return (new RemarkNode (page, start, end));
+     }
+ 
+     /**
+      * Create a new tag node.
+      * Note that the attributes vector contains at least one element,
+      * which is the tag name (standalone attribute) at position zero.
+      * This can be used to decide which type of node to create, or
+      * gate other processing that may be appropriate.
+      * @param page The page the node is on.
+      * @param start The beginning position of the tag.
+      * @param end The ending positiong of the tag.
+      * @param attributes The attributes contained in this tag.
+      * @return The created Tag node.
+      */
+     public Tag createTagNode (Page page, int start, int end, Vector attributes)
+     {
+         return (new TagNode (page, start, end, attributes));
+     }
+ 
+     //
+     // Internal methods
+     //
+ 
+     /**
       * Advance the cursor through a JIS escape sequence.
       * @param cursor A cursor positioned within the escape sequence.
***************
*** 1303,1565 ****
      }
  
-     /**
-      * Return CDATA as a text node.
-      * According to appendix <a href="http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data">
-      * B.3.2 Specifying non-HTML data</a> of the
-      * <a href="http://www.w3.org/TR/html4/">HTML 4.01 Specification</a>:<br>
-      * <quote>
-      * <b>Element content</b><br>
-      * When script or style data is the content of an element (SCRIPT and STYLE),
-      * the data begins immediately after the element start tag and ends at the
-      * first ETAGO ("&lt;/") delimiter followed by a name start character ([a-zA-Z]);
-      * note that this may not be the element's end tag.
-      * Authors should therefore escape "&lt;/" within the content. Escape mechanisms
-      * are specific to each scripting or style sheet language.
-      * </quote>
-      * @return The <code>TextNode</code> of the CDATA or <code>null</code> if none.
-      * @exception ParserException If a problem occurs reading from the source.
-      */
-     public Node parseCDATA ()
-         throws
-             ParserException
-     {
-         return (parseCDATA (false));
-     }
- 
-     /**
-      * Return CDATA as a text node.
-      * Slightly less rigid than {@link #parseCDATA()} this method provides for
-      * parsing CDATA that may contain quoted strings that have embedded
-      * ETAGO ("&lt;/") delimiters and skips single and multiline comments.
-      * @param quotesmart If <code>true</code> the strict definition of CDATA is
-      * extended to allow for single or double quoted ETAGO ("&lt;/") sequences.
-      * @return The <code>TextNode</code> of the CDATA or <code>null</code> if none.
-      * @see #parseCDATA()
-      * @exception ParserException If a problem occurs reading from the source.
-      */
-     public Node parseCDATA (boolean quotesmart)
-         throws
-             ParserException
-     {
-         int start;
-         int state;
-         boolean done;
-         char quote;
-         char ch;
-         int end;
-         boolean comment;
- 
-         start = mCursor.getPosition ();
-         state = 0;
-         done = false;
-         quote = 0;
-         comment = false;
- 
-         while (!done)
-         {
-             ch = mPage.getCharacter (mCursor);
-             switch (state)
-             {
-                 case 0: // prior to ETAGO
-                     switch (ch)
-                     {
-                         case Page.EOF:
-                             done = true;
-                             break;
-                         case '\'':
-                             if (quotesmart && !comment)
-                                 if (0 == quote)
-                                     quote = '\''; // enter quoted state
-                                 else if ('\'' == quote)
-                                     quote = 0; // exit quoted state
-                             break;
-                         case '"':
-                             if (quotesmart && !comment)
-                                 if (0 == quote)
-                                     quote = '"'; // enter quoted state
-                                 else if ('"' == quote)
-                                     quote = 0; // exit quoted state
-                             break;
-                         case '\\':
-                             if (quotesmart)
-                                 if (0 != quote)
-                                 {
-                                     ch = mPage.getCharacter (mCursor); // try to consume escaped character
-                                     if (Page.EOF == ch)
-                                         done = true;
-                                     else if (  (ch != '\\') && (ch != quote))
-                                         mCursor.retreat (); // unconsume char if character was not an escapable char.
-                                 }
-                             break;
-                         case '/':
-                             if (quotesmart)
-                                 if (0 == quote)
-                                 {
-                                     // handle multiline and double slash comments (with a quote)
-                                     ch = mPage.getCharacter (mCursor);
-                                     if (Page.EOF == ch)
-                                         done = true;
-                                     else if ('/' == ch)
-                                         comment = true;
-                                     else if ('*' == ch)
-                                     {
-                                         do
-                                         {
-                                             do
-                                                 ch = mPage.getCharacter (mCursor);
-                                             while ((Page.EOF != ch) && ('*' != ch));
-                                             ch = mPage.getCharacter (mCursor);
-                                             if (ch == '*')
-                                                 mCursor.retreat ();
-                                         }
-                                         while ((Page.EOF != ch) && ('/' != ch));
-                                     }
-                                     else
-                                         mCursor.retreat ();
-                                 }
-                             break;
-                         case '\n':
-                             comment = false;
-                             break;
-                         case '<':
-                             if (quotesmart)
-                             {
-                                 if (0 == quote)
-                                     state = 1;
-                             }
-                             else
-                                 state = 1;
-                             break;
-                         default:
-                             break;
-                     }
-                     break;
-                 case 1: // <
-                     switch (ch)
-                     {
-                         case Page.EOF:
-                             done = true;
-                             break;
-                         case '/':
-                             state = 2;
-                             break;
-                         case '!':
-                             ch = mPage.getCharacter (mCursor);
-                             if (Page.EOF == ch)
-                                 done = true;
-                             else if ('-' == ch)
-                             {
-                                 ch = mPage.getCharacter (mCursor);
-                                 if (Page.EOF == ch)
-                                     done = true;
-                                 else if ('-' == ch)
-                                     state = 3;
-                                 else
-                                     state = 0;
-                             }
-                             else
-                                 state = 0;
-                             break;
-                         default:
-                             state = 0;
-                             break;
-                     }
-                     break;
-                 case 2: // </
-                     comment = false;
-                     if (Page.EOF == ch)
-                         done = true;
-                     else if (Character.isLetter (ch))
-                     {
-                         done = true;
-                         // back up to the start of ETAGO
-                         mCursor.retreat ();
-                         mCursor.retreat ();
-                         mCursor.retreat ();
-                     }
-                     else
-                         state = 0;
-                     break;
-                 case 3: // <!
-                     comment = false;
-                     if (Page.EOF == ch)
-                         done = true;
-                     else if ('-' == ch)
-                     {
-                         ch = mPage.getCharacter (mCursor);
-                         if (Page.EOF == ch)
-                             done = true;
-                         else if ('-' == ch)
-                         {
-                             ch = mPage.getCharacter (mCursor);
-                             if (Page.EOF == ch)
-                                 done = true;
-                             else if ('>' == ch)
-                                 state = 0;
-                             else
-                             {
-                                 mCursor.retreat ();
-                                 mCursor.retreat ();
-                             }
-                         }
-                         else
-                             mCursor.retreat ();
-                     }
-                     break;
-                 default:
-                     throw new IllegalStateException ("how the fuck did we get in state " + state);
-             }
-         }
-         end = mCursor.getPosition ();
- 
-         return (makeString (start, end));
-     }
- 
      //
!     // NodeFactory interface
      //
  
      /**
-      * Create a new string node.
-      * @param page The page the node is on.
-      * @param start The beginning position of the string.
-      * @param end The ending positiong of the string.
-      * @return The created Text node.
-      */
-     public Text createStringNode (Page page,  int start, int end)
-     {
-         return (new TextNode (page, start, end));
-     }
- 
-     /**
-      * Create a new remark node.
-      * @param page The page the node is on.
-      * @param start The beginning position of the remark.
-      * @param end The ending positiong of the remark.
-      * @return The created Remark node.
-      */
-     public Remark createRemarkNode (Page page,  int start, int end)
-     {
-         return (new RemarkNode (page, start, end));
-     }
- 
-     /**
-      * Create a new tag node.
-      * Note that the attributes vector contains at least one element,
-      * which is the tag name (standalone attribute) at position zero.
-      * This can be used to decide which type of node to create, or
-      * gate other processing that may be appropriate.
-      * @param page The page the node is on.
-      * @param start The beginning position of the tag.
-      * @param end The ending positiong of the tag.
-      * @param attributes The attributes contained in this tag.
-      * @return The created Tag node.
-      */
-     public Tag createTagNode (Page page, int start, int end, Vector attributes)
-     {
-         return (new TagNode (page, start, end, attributes));
-     }
- 
-     /**
       * Mainline for command line operation
       * @param args [0] The URL to parse.
--- 1626,1634 ----
      }
  
      //
!     // Main program
      //
  
      /**
       * Mainline for command line operation
       * @param args [0] The URL to parse.
***************
*** 1572,1585 ****
              ParserException
      {
          Lexer lexer;
          Node node;
  
          if (0 >= args.length)
              System.out.println ("usage: java -jar htmllexer.jar <url>");
          else
          {
              try
              {
!                 ConnectionManager manager = Page.getConnectionManager ();
                  lexer = new Lexer (manager.openConnection (args[0]));
                  while (null != (node = lexer.nextNode (false)))
--- 1641,1659 ----
              ParserException
      {
+         ConnectionManager manager;
          Lexer lexer;
          Node node;
  
          if (0 >= args.length)
+         {
+             System.out.println ("HTML Lexer v" + getVersion () + "\n");
+             System.out.println ();
              System.out.println ("usage: java -jar htmllexer.jar <url>");
+         }
          else
          {
              try
              {
!                 manager = Page.getConnectionManager ();
                  lexer = new Lexer (manager.openConnection (args[0]));
                  while (null != (node = lexer.nextNode (false)))

[Htmlparser-cvs] htmlparser/src/org/htmlparser/util NodeList.java,1.60,1.61

From: Derrick O. <der...@us...> - 2006-04-14 22:18:51

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31675/src/org/htmlparser/util

Modified Files:
	NodeList.java 
Log Message:
Cleanup to isolate htmllexer jar build.

Index: NodeList.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/NodeList.java,v
retrieving revision 1.60
retrieving revision 1.61
diff -C2 -d -r1.60 -r1.61
*** NodeList.java	18 Sep 2005 23:00:27 -0000	1.60
--- NodeList.java	14 Apr 2006 22:18:47 -0000	1.61
***************
*** 32,36 ****
  import org.htmlparser.Node;
  import org.htmlparser.NodeFilter;
- import org.htmlparser.filters.NodeClassFilter;
  import org.htmlparser.visitors.NodeVisitor;
  
--- 32,35 ----

[Htmlparser-cvs] htmlparser build.xml,1.82,1.83

From: Derrick O. <der...@us...> - 2006-04-14 22:18:51

Update of /cvsroot/htmlparser/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31675

Modified Files:
	build.xml 
Log Message:
Cleanup to isolate htmllexer jar build.

Index: build.xml
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/build.xml,v
retrieving revision 1.82
retrieving revision 1.83
diff -C2 -d -r1.82 -r1.83
*** build.xml	19 Mar 2006 22:03:56 -0000	1.82
--- build.xml	14 Apr 2006 22:18:47 -0000	1.83
***************
*** 13,17 ****
  - incorporate changes from ChangeLog into htmlparser/docs/changes under
    a heading like "Integration Build 1.5 - 20040522"
! - 'ant versionSource' updates the version in Parser.java and release.txt
  - edit docs/release.txt to update changes since the last version, bugs fixed
    and enhancements completed
--- 13,17 ----
  - incorporate changes from ChangeLog into htmlparser/docs/changes under
    a heading like "Integration Build 1.5 - 20040522"
! - 'ant versionSource' updates the version in Parser.java, Lexer.java and release.txt
  - edit docs/release.txt to update changes since the last version, bugs fixed
    and enhancements completed
***************
*** 209,219 ****
      <echo message="Replacing version VERSION_NUMBER = ${VERSION_NUMBER} with VERSION_NUMBER = ${versionNumber} in ${src}/org/htmlparser/Parser.java"/>
      <replace file="${src}/org/htmlparser/Parser.java" token="VERSION_NUMBER = ${VERSION_NUMBER}" value="VERSION_NUMBER = ${versionNumber}"/>
- 
      <echo message="Replacing version VERSION_TYPE = &quot;${VERSION_TYPE}&quot; with VERSION_TYPE = &quot;${versionType}&quot; in ${src}/org/htmlparser/Parser.java"/>
      <replace file="${src}/org/htmlparser/Parser.java" token="VERSION_TYPE = &quot;${VERSION_TYPE}&quot;" value="VERSION_TYPE = &quot;${versionType}&quot;"/>
- 
      <echo message="Replacing version VERSION_DATE = &quot;${VERSION_DATE}&quot; with VERSION_DATE = &quot;${TODAY_STRING}&quot; in ${src}/org/htmlparser/Parser.java"/>
      <replace file="${src}/org/htmlparser/Parser.java" token="VERSION_DATE = &quot;${VERSION_DATE}&quot;" value="VERSION_DATE = &quot;${TODAY_STRING}&quot;"/>
  
      <chmod file="${docs}/release.txt" perm="u+w"/>
      <echo message="Replacing version &quot;${VERSION_NUMBER} (${VERSION_TYPE} ${VERSION_DATE})&quot; with &quot;${versionNumber} (${versionType} ${TODAY_STRING})&quot; in ${docs}/release.txt"/>
--- 209,225 ----
      <echo message="Replacing version VERSION_NUMBER = ${VERSION_NUMBER} with VERSION_NUMBER = ${versionNumber} in ${src}/org/htmlparser/Parser.java"/>
      <replace file="${src}/org/htmlparser/Parser.java" token="VERSION_NUMBER = ${VERSION_NUMBER}" value="VERSION_NUMBER = ${versionNumber}"/>
      <echo message="Replacing version VERSION_TYPE = &quot;${VERSION_TYPE}&quot; with VERSION_TYPE = &quot;${versionType}&quot; in ${src}/org/htmlparser/Parser.java"/>
      <replace file="${src}/org/htmlparser/Parser.java" token="VERSION_TYPE = &quot;${VERSION_TYPE}&quot;" value="VERSION_TYPE = &quot;${versionType}&quot;"/>
      <echo message="Replacing version VERSION_DATE = &quot;${VERSION_DATE}&quot; with VERSION_DATE = &quot;${TODAY_STRING}&quot; in ${src}/org/htmlparser/Parser.java"/>
      <replace file="${src}/org/htmlparser/Parser.java" token="VERSION_DATE = &quot;${VERSION_DATE}&quot;" value="VERSION_DATE = &quot;${TODAY_STRING}&quot;"/>
  
+     <chmod file="${src}/org/htmlparser/lexer/Lexer.java" perm="u+w"/>
+     <echo message="Replacing version VERSION_NUMBER = ${VERSION_NUMBER} with VERSION_NUMBER = ${versionNumber} in ${src}/org/htmlparser/lexer/Lexer.java"/>
+     <replace file="${src}/org/htmlparser/lexer/Lexer.java" token="VERSION_NUMBER = ${VERSION_NUMBER}" value="VERSION_NUMBER = ${versionNumber}"/>
+     <echo message="Replacing version VERSION_TYPE = &quot;${VERSION_TYPE}&quot; with VERSION_TYPE = &quot;${versionType}&quot; in ${src}/org/htmlparser/lexer/Lexer.java"/>
+     <replace file="${src}/org/htmlparser/lexer/Lexer.java" token="VERSION_TYPE = &quot;${VERSION_TYPE}&quot;" value="VERSION_TYPE = &quot;${versionType}&quot;"/>
+     <echo message="Replacing version VERSION_DATE = &quot;${VERSION_DATE}&quot; with VERSION_DATE = &quot;${TODAY_STRING}&quot; in ${src}/org/htmlparser/lexer/Lexer.java"/>
+     <replace file="${src}/org/htmlparser/lexer/Lexer.java" token="VERSION_DATE = &quot;${VERSION_DATE}&quot;" value="VERSION_DATE = &quot;${TODAY_STRING}&quot;"/>
+ 
      <chmod file="${docs}/release.txt" perm="u+w"/>
      <echo message="Replacing version &quot;${VERSION_NUMBER} (${VERSION_TYPE} ${VERSION_DATE})&quot; with &quot;${versionNumber} (${versionType} ${TODAY_STRING})&quot; in ${docs}/release.txt"/>
***************
*** 324,328 ****
        <include name="org/htmlparser/util/sort/**/*.class"/>
        <include name="org/htmlparser/visitors/NodeVisitor.class"/>
-       <include name="org/htmlparser/parserHelper/SpecialHashtable.class"/>
        <manifest>
          <attribute name="Main-Class" value="org.htmlparser.lexer.Lexer"/>
--- 330,333 ----

[Htmlparser-cvs] htmlparser/src/org/htmlparser Parser.java,1.111,1.112

From: Derrick O. <der...@us...> - 2006-04-14 22:18:51

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31675/src/org/htmlparser

Modified Files:
	Parser.java 
Log Message:
Cleanup to isolate htmllexer jar build.

Index: Parser.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v
retrieving revision 1.111
retrieving revision 1.112
diff -C2 -d -r1.111 -r1.112
*** Parser.java	20 Mar 2006 00:26:01 -0000	1.111
--- Parser.java	14 Apr 2006 22:18:47 -0000	1.112
***************
*** 168,171 ****
--- 168,178 ----
      public static final ParserFeedback STDOUT = new DefaultParserFeedback ();
  
+     static
+     {
+         getConnectionManager ().getDefaultRequestProperties ().put (
+             "User-Agent", "HTMLParser/" + getVersionNumber ());
+     
+     }
+ 
      //
      // Static methods
***************
*** 784,788 ****
          if (args.length < 1 || args[0].equals ("-help"))
          {
!             System.out.println ("HTML Parser v" + VERSION_STRING + "\n");
              System.out.println ();
              System.out.println ("Syntax : java -jar htmlparser.jar"
--- 791,795 ----
          if (args.length < 1 || args[0].equals ("-help"))
          {
!             System.out.println ("HTML Parser v" + getVersion () + "\n");
              System.out.println ();
              System.out.println ("Syntax : java -jar htmlparser.jar"

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests AllTests.java,1.60,1.61 MemoryTest.java,1.3,NONE

From: Derrick O. <der...@us...> - 2006-04-11 12:03:10

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv8666

Modified Files:
	AllTests.java 
Removed Files:
	MemoryTest.java 
Log Message:
Move failing unit test to RFE as a download.

--- MemoryTest.java DELETED ---

Index: AllTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/AllTests.java,v
retrieving revision 1.60
retrieving revision 1.61
diff -C2 -d -r1.60 -r1.61
*** AllTests.java	22 May 2004 03:57:30 -0000	1.60
--- AllTests.java	11 Apr 2006 12:03:07 -0000	1.61
***************
*** 53,57 ****
          sub.addTestSuite (FunctionalTests.class);
          sub.addTestSuite (LineNumberAssignedByNodeReaderTest.class);
-         sub.addTestSuite (MemoryTest.class);
          suite.addTest (sub);
          suite.addTest (org.htmlparser.tests.lexerTests.AllTests.suite ());
--- 53,56 ----

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Page.java,1.54,1.55

From: Derrick O. <der...@us...> - 2006-04-10 21:38:45

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6947

Modified Files:
	Page.java 
Log Message:
Fix Bug #1467712 Page#getCharset never works
Use Content-Type header field instead of connection's getContentType method.

Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.54
retrieving revision 1.55
diff -C2 -d -r1.54 -r1.55
*** Page.java	7 Apr 2006 00:58:19 -0000	1.54
--- Page.java	10 Apr 2006 21:38:41 -0000	1.55
***************
*** 665,669 ****
          if (null != connection)
          {
!             content = connection.getContentType ();
              if (null != content)
                  ret = content;
--- 665,671 ----
          if (null != connection)
          {
!             // can't use connection#getContentType
!             // see Bug #1467712 Page#getCharset never works
!             content = connection.getHeaderField ("Content-Type");
              if (null != content)
                  ret = content;

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/utilTests CharacterTranslationTest.java,1.46,1.47

From: Derrick O. <der...@us...> - 2006-04-08 13:33:53

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21563

Modified Files:
	CharacterTranslationTest.java 
Log Message:
Typo.

Index: CharacterTranslationTest.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/utilTests/CharacterTranslationTest.java,v
retrieving revision 1.46
retrieving revision 1.47
diff -C2 -d -r1.46 -r1.47
*** CharacterTranslationTest.java	31 Jul 2004 16:42:32 -0000	1.46
--- CharacterTranslationTest.java	8 Apr 2006 13:33:47 -0000	1.47
***************
*** 1,5 ****
  // HTMLParser Library $Name$ - A java-based parser for HTML
  // http://sourceforge.org/projects/htmlparser
! // Copyright (C) 2004 Derick Oswald
  //
  // Revision Control Information
--- 1,5 ----
  // HTMLParser Library $Name$ - A java-based parser for HTML
  // http://sourceforge.org/projects/htmlparser
! // Copyright (C) 2004 Derrick Oswald
  //
  // Revision Control Information

[Htmlparser-cvs] htmlparser/src/org/htmlparser/lexer Page.java,1.53,1.54

From: Derrick O. <der...@us...> - 2006-04-07 00:58:24

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27590/lexer

Modified Files:
	Page.java 
Log Message:
Fix Bug #1461473 Relative links starting with ?
Added overloaded methods taking boolean 'strict' flag on URL manipulators.
Default is loose interpretation like most browsers.

Index: Page.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/lexer/Page.java,v
retrieving revision 1.53
retrieving revision 1.54
diff -C2 -d -r1.53 -r1.54
*** Page.java	19 Mar 2006 17:09:09 -0000	1.53
--- Page.java	7 Apr 2006 00:58:19 -0000	1.54
***************
*** 828,841 ****
  
      /**
!      * Build a URL from the link and base provided.
!      * @return An absolute URL.
       * @param link The (relative) URI.
       * @param base The base URL of the page, either from the &lt;BASE&gt; tag
       * or, if none, the URL the page is being fetched from.
       * @exception MalformedURLException If creating the URL fails.
       */
      public URL constructUrl (String link, String base)
          throws MalformedURLException
      {
          String path;
          boolean modified;
--- 828,860 ----
  
      /**
!      * Build a URL from the link and base provided using non-strict rules.
       * @param link The (relative) URI.
       * @param base The base URL of the page, either from the &lt;BASE&gt; tag
       * or, if none, the URL the page is being fetched from.
+      * @return An absolute URL.
       * @exception MalformedURLException If creating the URL fails.
+      * @see #constructUrl(String, String, boolean)
       */
      public URL constructUrl (String link, String base)
          throws MalformedURLException
      {
+         return (constructUrl (link, base, false));
+     }
+ 
+     /**
+      * Build a URL from the link and base provided.
+      * @param link The (relative) URI.
+      * @param base The base URL of the page, either from the &lt;BASE&gt; tag
+      * or, if none, the URL the page is being fetched from.
+      * @param strict If <code>true</code> a link starting with '?' is handled
+      * according to <a href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>,
+      * otherwise the common interpretation of a query appended to the base
+      * is used instead.
+      * @return An absolute URL.
+      * @exception MalformedURLException If creating the URL fails.
+      */
+     public URL constructUrl (String link, String base, boolean strict)
+         throws MalformedURLException
+     {
          String path;
          boolean modified;
***************
*** 844,848 ****
          URL url; // constructed URL combining relative link and base
  
!         url = new URL (new URL (base), link);
          path = url.getFile ();
          modified = false;
--- 863,875 ----
          URL url; // constructed URL combining relative link and base
  
!         // Bug #1461473 Relative links starting with ?
!         if (!strict && ('?' == link.charAt (0)))
!         {   // remove query part of base if any
!             if (-1 != (index = base.lastIndexOf ('?')))
!                 base = base.substring (0, index);
!             url = new URL (base + link);
!         }
!         else
!             url = new URL (new URL (base), link);
          path = url.getFile ();
          modified = false;
***************
*** 887,890 ****
--- 914,932 ----
      public String getAbsoluteURL (String link)
      {
+         return (getAbsoluteURL (link, false));
+     }
+ 
+     /**
+      * Create an absolute URL from a relative link.
+      * @param link The reslative portion of a URL.
+      * @param strict If <code>true</code> a link starting with '?' is handled
+      * according to <a href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>,
+      * otherwise the common interpretation of a query appended to the base
+      * is used instead.
+      * @return The fully qualified URL or the original link if it was absolute
+      * already or a failure occured.
+      */
+     public String getAbsoluteURL (String link, boolean strict)
+     {
          String base;
          URL url;
***************
*** 903,907 ****
                  else
                  {
!                     url = constructUrl (link, base);
                      ret = url.toExternalForm ();
                  }
--- 945,949 ----
                  else
                  {
!                     url = constructUrl (link, base, strict);
                      ret = url.toExternalForm ();
                  }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/lexerTests PageTests.java,1.19,1.20

From: Derrick O. <der...@us...> - 2006-04-07 00:58:24

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27590/tests/lexerTests

Modified Files:
	PageTests.java 
Log Message:
Fix Bug #1461473 Relative links starting with ?
Added overloaded methods taking boolean 'strict' flag on URL manipulators.
Default is loose interpretation like most browsers.

Index: PageTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/PageTests.java,v
retrieving revision 1.19
retrieving revision 1.20
diff -C2 -d -r1.19 -r1.20
*** PageTests.java	15 May 2005 11:49:05 -0000	1.19
--- PageTests.java	7 Apr 2006 00:58:19 -0000	1.20
***************
*** 192,196 ****
      public void test7 () throws ParserException
      {
!         assertEquals ("test7 failed", "http://a/b/c/?y", mPage.getAbsoluteURL ("?y"));
      }
      public void test8 () throws ParserException
--- 192,197 ----
      public void test7 () throws ParserException
      {
!         assertEquals ("test7 strict failed", "http://a/b/c/?y", mPage.getAbsoluteURL ("?y", true));
!         assertEquals ("test7 non-strict failed", "http://a/b/c/d;p?y", mPage.getAbsoluteURL ("?y"));
      }
      public void test8 () throws ParserException

[Htmlparser-cvs] htmlparser/docs changes.txt,1.208,1.209 release.txt,1.72,1.73

From: Derrick O. <der...@us...> - 2006-03-20 00:26:06

Update of /cvsroot/htmlparser/htmlparser/docs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5259/htmlparser/docs

Modified Files:
	changes.txt release.txt 
Log Message:
Update version to 1.6-20060319.

Index: release.txt
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/docs/release.txt,v
retrieving revision 1.72
retrieving revision 1.73
diff -C2 -d -r1.72 -r1.73
*** release.txt	12 Nov 2005 15:11:45 -0000	1.72
--- release.txt	20 Mar 2006 00:26:01 -0000	1.73
***************
*** 1,3 ****
! HTMLParser Version 1.6 (Integration Build Nov 12, 2005)
  *********************************************
  
--- 1,3 ----
! HTMLParser Version 1.6 (Integration Build Mar 19, 2006)
  *********************************************
  
***************
*** 37,40 ****
--- 37,42 ----
      The TextNode class has an added isWhiteSpace method that returns true
      when it contains no printable characters.
+     NodeTreeWalker, a utility class to traverse a tree of Node objects using
+     either depth-first or breadth-first tree order has been added.
  
  Refactoring
***************
*** 47,50 ****
--- 49,56 ----
  Bug Fixes
  ---------
+ #1445795 return as TextNode when processing jsp
+ #1445309 XML processing instructions are returned as text
+ #1376851 Null-valued cookies cause exception
+ #1375230 some javascript breaks stringbean
  #1344687 A bug when set cookies
  #1334408 Exception occurs based on string length

Index: changes.txt
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/docs/changes.txt,v
retrieving revision 1.208
retrieving revision 1.209
diff -C2 -d -r1.208 -r1.209
*** changes.txt	12 Nov 2005 15:11:45 -0000	1.208
--- changes.txt	20 Mar 2006 00:26:01 -0000	1.209
***************
*** 16,19 ****
--- 16,109 ----
  *******************************************************************************
  
+ Integration Build 1.6 - 20060319
+ --------------------------------
+ 2006-03-19 19:02  derrickoswald
+ 
+ 	* src/org/htmlparser/tests/tagTests/BodyTagTest.java:
+ 
+ 	Fix unit test for body tag.
+ 	
+ 2006-03-19 17:13  derrickoswald
+ 
+ 	* docs/panel.html:
+ 
+ 	Fix name of current build.
+ 	
+ 2006-03-19 17:03  derrickoswald
+ 
+ 	* build.xml, docs/bug.html, docs/panel.html:
+ 
+ 	Fix bug #1363500 http://htmlparser.sourceforge.net/bug.html is wrong
+ 	Take down the wiki.
+ 	
+ 2006-03-19 16:26  derrickoswald
+ 
+ 	* src/org/htmlparser/: lexer/Lexer.java, tags/BodyTag.java:
+ 
+ 	Fix bug #1375230 some javascript breaks stringbean
+ 	Retrace non-conforming end of remark.
+ 	
+ 2006-03-19 15:14  derrickoswald
+ 
+ 	* src/org/htmlparser/http/: ConnectionManager.java, Cookie.java:
+ 
+ 	Fix bug #1376851 Null-valued cookies cause exception
+ 	Add handling for namewless cookies.
+ 	
+ 2006-03-19 13:40  derrickoswald
+ 
+ 	* src/org/htmlparser/http/ConnectionManager.java:
+ 
+ 	Remove deflate option from default request properties.
+ 	See RFE #1394144 handle deflate encoding.
+ 	
+ 2006-03-19 12:09  derrickoswald
+ 
+ 	* src/org/htmlparser/lexer/Page.java:
+ 
+ 	Typo.
+ 	
+ 2006-03-19 11:11  derrickoswald
+ 
+ 	* src/org/htmlparser/lexer/Lexer.java:
+ 
+ 	Fix bug #1445795 return as TextNode when processing jsp
+ 	Handle single and double line comments within jsp nodes.
+ 	Suggested alteration to handle jsp tags within tag attributes wasn't implemented.
+ 	
+ 2006-03-19 10:01  derrickoswald
+ 
+ 	* docs/contributors.html,
+ 	src/org/htmlparser/PrototypicalNodeFactory.java,
+ 	src/org/htmlparser/lexer/Lexer.java,
+ 	src/org/htmlparser/tags/ProcessingInstructionTag.java,
+ 	src/org/htmlparser/tests/lexerTests/LexerTests.java:
+ 
+ 	Incorporated patch #1450095 Fix for Bug 1445309 from Trejkaz Xaoza.
+ 	Addition of code to parse XML processing instructions.
+ 	
+ 2006-02-13 09:50  ian_macfarlane
+ 
+ 	* src/org/htmlparser/util/NodeTreeWalker.java:
+ 
+ 	A utility class to traverse a tree of Node objects using either depth-first or breadth-first tree traversal. Kind of like a NodeIterator for DOM-type trees of Nodes instead of linear sequences of Nodes.
+ 	
+ 	Post to the dev mailing list about this on the way.
+ 	
+ 2005-11-14 21:09  derrickoswald
+ 
+ 	* src/org/htmlparser/: Attribute.java, Node.java, Parser.java,
+ 	PrototypicalNodeFactory.java, Remark.java, StringNodeFactory.java,
+ 	Tag.java, Text.java:
+ 
+ 	Fix warnings flagged by doccheck.
+ 	
+ 2005-11-12 11:44  derrickoswald
+ 
+ 	* src/org/htmlparser/tests/: lexerTests/LexerTests.java,
+ 	tagTests/FormTagTest.java, tagTests/LinkTagTest.java:
+ 
+ 	Update tests for addition of Paragraph tag.
+ 
  Integration Build 1.6 - 20051112
  --------------------------------

[Htmlparser-cvs] htmlparser/src/org/htmlparser Parser.java,1.110,1.111

From: Derrick O. <der...@us...> - 2006-03-20 00:26:06

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5259/htmlparser/src/org/htmlparser

Modified Files:
	Parser.java 
Log Message:
Update version to 1.6-20060319.

Index: Parser.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/Parser.java,v
retrieving revision 1.110
retrieving revision 1.111
diff -C2 -d -r1.110 -r1.111
*** Parser.java	15 Nov 2005 02:09:10 -0000	1.110
--- Parser.java	20 Mar 2006 00:26:01 -0000	1.111
***************
*** 133,137 ****
       */
      public static final String
!     VERSION_DATE = "Nov 12, 2005"
      ;
  
--- 133,137 ----
       */
      public static final String
!     VERSION_DATE = "Mar 19, 2006"
      ;

1 message has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 5 .. 61 > >> (Page 3 of 61)

2003	Jan	Feb	Mar	Apr	May (141)	Jun (108)	Jul (66)	Aug (127)	Sep (155)	Oct (149)	Nov (72)	Dec (72)
2004	Jan (100)	Feb (36)	Mar (21)	Apr (3)	May (87)	Jun (28)	Jul (84)	Aug (5)	Sep (14)	Oct	Nov	Dec
2005	Jan (1)	Feb (39)	Mar (26)	Apr (38)	May (14)	Jun (10)	Jul	Aug	Sep (13)	Oct (8)	Nov (10)	Dec
2006	Jan	Feb (1)	Mar (17)	Apr (20)	May (28)	Jun (24)	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec