[Htmlparser-cvs] htmlparser/src/org/htmlparser/util EncodingChangeException.java,NONE,1.1 IteratorIm
Brought to you by:
derrickoswald
From: <der...@us...> - 2004-01-10 15:23:36
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1:/tmp/cvs-serv3574/util Modified Files: IteratorImpl.java Added Files: EncodingChangeException.java Log Message: Fix bug #874175 StringBean doesn't handle charset change well Add EncodingChangeException to distinguish a recoverable character set change occuring after the lexer has already coughed up some characters using the wrong encoding. Added testEncodingChange in LexerTests to excercise it. Changed IteratorImpl to not wrap a ParserException with another ParserException. Changed StringBean to retry the URL when an encoding change exception is caught. --- NEW FILE: EncodingChangeException.java --- // HTMLParser Library $Name: $ - A java-based parser for HTML // http://sourceforge.org/projects/htmlparser // Copyright (C) 2004 Claude Duguay // // Revision Control Information // // $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/EncodingChangeException.java,v $ // $Author: derrickoswald $ // $Date: 2004/01/10 15:23:33 $ // $Revision: 1.1 $ // // This library is free software; you can redistribute it and/or // modify it under the terms of the GNU Lesser General Public // License as published by the Free Software Foundation; either // version 2.1 of the License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA // package org.htmlparser.util; /** * The encoding is changed invalidating already scanned characters. * When the encoding is changed, as for example when encountering a <META> * tag that includes a charset directive in the content attribute that * disagrees with the encoding specified by the HTTP header (or the default * encoding if none), the parser retraces the bytes it has interpreted so far * comparing the characters produced under the new encoding. If the new * characters differ from those it has already yielded to the application, it * throws this exception to indicate that processing should be restarted under * the new encoding. * This exception is the object thrown so that applications may distinguish * between an encoding change, which may be successfully cured by restarting * the parse from the beginning, from more serious errors. * @see IteratorImpl * @see ParserException **/ public class EncodingChangeException extends ParserException { /** * Create an exception idicative of a problematic encoding change. * @param message The message describing the error condifion. */ public EncodingChangeException (String message) { super(message); } } Index: IteratorImpl.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/IteratorImpl.java,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** IteratorImpl.java 2 Jan 2004 16:24:58 -0000 1.39 --- IteratorImpl.java 10 Jan 2004 15:23:33 -0000 1.40 *************** *** 64,69 **** * Get the next node. * @return The next node in the HTML stream, or null if there are no more nodes. */ ! public Node nextNode() throws ParserException { Tag tag; --- 64,70 ---- * Get the next node. * @return The next node in the HTML stream, or null if there are no more nodes. + * @exception ParserException If an unrecoverable error occurs. */ ! public Node nextNode () throws ParserException { Tag tag; *************** *** 95,107 **** } } catch (Exception e) { ! StringBuffer msgBuffer = new StringBuffer(); ! msgBuffer.append("Unexpected Exception occurred while reading "); ! msgBuffer.append(mLexer.getPage ().getUrl ()); ! msgBuffer.append(", in nextHTMLNode"); ! // reader.appendLineDetails(msgBuffer); ! ParserException ex = new ParserException(msgBuffer.toString(),e); ! mFeedback.error(msgBuffer.toString(),ex); throw ex; } --- 96,112 ---- } } + catch (ParserException pe) + { + throw pe; // no need to wrap an existing ParserException + } catch (Exception e) { ! StringBuffer msgBuffer = new StringBuffer (); ! msgBuffer.append ("Unexpected Exception occurred while reading "); ! msgBuffer.append (mLexer.getPage ().getUrl ()); ! msgBuffer.append (", in nextNode"); ! // TODO: appendLineDetails (msgBuffer); ! ParserException ex = new ParserException (msgBuffer.toString (), e); ! mFeedback.error (msgBuffer.toString (), ex); throw ex; } |