Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/lexerTests LexerTests.java,1.15,1.16

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests
In directory sc8-pr-cvs1:/tmp/cvs-serv3574/tests/lexerTests

Modified Files:
	LexerTests.java 
Log Message:
Fix bug #874175 StringBean doesn't handle charset change well
Add EncodingChangeException to distinguish a recoverable character set change
occuring after the lexer has already coughed up some characters using the wrong
encoding. Added testEncodingChange in LexerTests to excercise it.
Changed IteratorImpl to not wrap a ParserException with another ParserException.
Changed StringBean to retry the URL when an encoding change exception is caught.


Index: LexerTests.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tests/lexerTests/LexerTests.java,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** LexerTests.java	2 Jan 2004 16:24:55 -0000	1.15
--- LexerTests.java	10 Jan 2004 15:23:33 -0000	1.16
***************
*** 52,55 ****
--- 52,56 ----
  import org.htmlparser.util.NodeIterator;
  import org.htmlparser.util.NodeList;
+ import org.htmlparser.util.EncodingChangeException;
  import org.htmlparser.util.ParserException;
  
***************
*** 620,628 ****
       * causes spurious tags.
       * The root cause is characters bracketed by [esc]$B and [esc](J (contrary
!      * to what is indicated in then j_s_nightingale analysis of the problem) that
       * sometimes have an angle bracket (&lt; or 0x3c) embedded in them. These
       * are taken to be tags by the parser, instead of being considered strings.
       * <p>
!      * The URL refrenced has an ISO-8859-1 encoding (the default), but
       * Japanese characters intermixed on the page with English, using the JIS
       * encoding. We detect failure by looking for weird tag names which were
--- 621,629 ----
       * causes spurious tags.
       * The root cause is characters bracketed by [esc]$B and [esc](J (contrary
!      * to what is indicated in the j_s_nightingale analysis of the problem) that
       * sometimes have an angle bracket (&lt; or 0x3c) embedded in them. These
       * are taken to be tags by the parser, instead of being considered strings.
       * <p>
!      * The URL http://www.009.com/ has an ISO-8859-1 encoding (the default), but
       * Japanese characters intermixed on the page with English, using the JIS
       * encoding. We detect failure by looking for weird tag names which were
***************
*** 666,670 ****
          NodeIterator iterator;
          
!         parser = new Parser ("http://www.009.com/");
          iterator = parser.elements ();
          while (iterator.hasMoreNodes ())
--- 667,671 ----
          NodeIterator iterator;
          
!         parser = new Parser ("http://htmlparser.sourceforge.net/test/www_009_com.html");
          iterator = parser.elements ();
          while (iterator.hasMoreNodes ())
***************
*** 745,748 ****
--- 746,784 ----
      }
  
+     /**
+      * See bug #874175 StringBean doesn't handle charset change well
+      * Force an encoding change exception, reset and re-read.
+      */
+     public void testEncodingChange ()
+         throws
+             ParserException
+     {
+         NodeIterator iterator;
+         Node node;
+         boolean success;
+ 
+         parser = new Parser ("http://htmlparser.sourceforge.net/test/www_china-pub_com.html");
+         success = false;
+         try
+         {
+             for (iterator = parser.elements (); iterator.hasMoreNodes (); )
+                 node = iterator.nextNode ();
+         }
+         catch (EncodingChangeException ece)
+         {
+             success = true;
+             try
+             {
+                 parser.reset ();
+                 for (iterator = parser.elements (); iterator.hasMoreNodes (); )
+                     node = iterator.nextNode ();
+             }
+             catch (ParserException pe)
+             {
+                 success = false;
+             }
+         }
+         assertTrue ("encoding change failed", success);
+     }
  }

Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser/tests/lexerTests LexerTests.java,1.15,1.16

htmlparser-cvs