[Htmlparser-user] Non-charset=utf-8 Characters in Parsed Text?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I'm running into an issue where I'm getting question mark characters in
place of quotes, apostrophes, hyphens, etc.

I know this has to do with the website using characters outside those
defined by the specification. Is there a way to correct this in the
htmlparser? I started trying to do a simple character replacement on the
parsed text, but whenever I do an "(int) string.charAt(n)" for any special
character I'm getting a 65533, and if I do a "Character.getNumericValue(
string.charAt(n))" I'm getting a -1, so I'm assuming I'm far to far
"downstream" to fix the problem.

Also I've just been using the Parser.parse method to return nodelists and
have been working my way through the documents that way rather than try any
of the other htmlparser features (which may already account for this??).

Thanks in advance for any help. I'm really enjoying working with the parser
and thanks to everyone who built this thing.

Jeff