[Htmlparser-cvs] htmlparser/src/org/htmlparser/util EncodingChangeException.java,NONE,1.1 IteratorIm

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util
In directory sc8-pr-cvs1:/tmp/cvs-serv3574/util

Modified Files:
	IteratorImpl.java 
Added Files:
	EncodingChangeException.java 
Log Message:
Fix bug #874175 StringBean doesn't handle charset change well
Add EncodingChangeException to distinguish a recoverable character set change
occuring after the lexer has already coughed up some characters using the wrong
encoding. Added testEncodingChange in LexerTests to excercise it.
Changed IteratorImpl to not wrap a ParserException with another ParserException.
Changed StringBean to retry the URL when an encoding change exception is caught.


--- NEW FILE: EncodingChangeException.java ---
// HTMLParser Library $Name:  $ - A java-based parser for HTML
// http://sourceforge.org/projects/htmlparser
// Copyright (C) 2004 Claude Duguay
//
// Revision Control Information
//
// $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/EncodingChangeException.java,v $
// $Author: derrickoswald $
// $Date: 2004/01/10 15:23:33 $
// $Revision: 1.1 $
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//

package org.htmlparser.util;

/**
 * The encoding is changed invalidating already scanned characters.
 * When the encoding is changed, as for example when encountering a &lt;META&gt;
 * tag that includes a charset directive in the content attribute that
 * disagrees with the encoding specified by the HTTP header (or the default
 * encoding if none), the parser retraces the bytes it has interpreted so far
 * comparing the characters produced under the new encoding. If the new
 * characters differ from those it has already yielded to the application, it
 * throws this exception to indicate that processing should be restarted under
 * the new encoding.
 * This exception is the object thrown so that applications may distinguish
 * between an encoding change, which may be successfully cured by restarting
 * the parse from the beginning, from more serious errors.
 * @see IteratorImpl
 * @see ParserException
 **/
public class EncodingChangeException
    extends
        ParserException
{
    /**
     * Create an exception idicative of a problematic encoding change.
     * @param message The message describing the error condifion.
     */
    public EncodingChangeException (String message)
    {
        super(message);
    }
}


Index: IteratorImpl.java
===================================================================
RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/IteratorImpl.java,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** IteratorImpl.java	2 Jan 2004 16:24:58 -0000	1.39
--- IteratorImpl.java	10 Jan 2004 15:23:33 -0000	1.40
***************
*** 64,69 ****
       * Get the next node.
       * @return The next node in the HTML stream, or null if there are no more nodes.
       */
!     public Node nextNode() throws ParserException
      {
          Tag tag;
--- 64,70 ----
       * Get the next node.
       * @return The next node in the HTML stream, or null if there are no more nodes.
+      * @exception ParserException If an unrecoverable error occurs.
       */
!     public Node nextNode () throws ParserException
      {
          Tag tag;
***************
*** 95,107 ****
              }
          }
          catch (Exception e)
          {
!             StringBuffer msgBuffer = new StringBuffer();
!             msgBuffer.append("Unexpected Exception occurred while reading ");
!             msgBuffer.append(mLexer.getPage ().getUrl ());
!             msgBuffer.append(", in nextHTMLNode");
! //                reader.appendLineDetails(msgBuffer);
!             ParserException ex = new ParserException(msgBuffer.toString(),e);
!             mFeedback.error(msgBuffer.toString(),ex);
              throw ex;
          }
--- 96,112 ----
              }
          }
+         catch (ParserException pe)
+         {
+             throw pe; // no need to wrap an existing ParserException
+         }
          catch (Exception e)
          {
!             StringBuffer msgBuffer = new StringBuffer ();
!             msgBuffer.append ("Unexpected Exception occurred while reading ");
!             msgBuffer.append (mLexer.getPage ().getUrl ());
!             msgBuffer.append (", in nextNode");
!             // TODO: appendLineDetails (msgBuffer);
!             ParserException ex = new ParserException (msgBuffer.toString (), e);
!             mFeedback.error (msgBuffer.toString (), ex);
              throw ex;
          }

[Htmlparser-cvs] htmlparser/src/org/htmlparser/util EncodingChangeException.java,NONE,1.1 IteratorIm

[Htmlparser-cvs] htmlparser/src/org/htmlparser/util EncodingChangeException.java,NONE,1.1 IteratorImpl.java,1.39,1.40