Re: [Htmlparser-user] Non-charset=utf-8 Characters in Parsed Text?
Brought to you by:
derrickoswald
From: Karsten O. <wid...@t-...> - 2007-12-12 05:29:40
|
Jeffery Brewer schrieb: > I'm running into an issue where I'm getting question mark characters in > place of quotes, apostrophes, hyphens, etc. Have you read the FAQ? http://htmlparser.sourceforge.net/faq.html The "Why am I getting an EncodingChangeException?" should be helpful how to handle character encoding issues. If the web page does not contain an encoding hint, let the parser fetch the web site for you, maybe the HTTP header contains the correct encoding. So it is used. If the web site is offline, set the correct encoding in the parser. Does this help? Regards, Karsten > > I know this has to do with the website using characters outside those > defined by the specification. Is there a way to correct this in the > htmlparser? I started trying to do a simple character replacement on the > parsed text, but whenever I do an "(int) string.charAt(n)" for any special > character I'm getting a 65533, and if I do a "Character.getNumericValue( > string.charAt(n))" I'm getting a -1, so I'm assuming I'm far to far > "downstream" to fix the problem. > > Also I've just been using the Parser.parse method to return nodelists and > have been working my way through the documents that way rather than try any > of the other htmlparser features (which may already account for this??). > > Thanks in advance for any help. I'm really enjoying working with the parser > and thanks to everyone who built this thing. > > Jeff > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > SF.Net email is sponsored by: > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |