cannot determine cannonical charset message

Brought to you by: derrickoswald

cannot determine cannonical charset message

Forum: htmlparser-user

Creator: jerrymouse

Created: 2006-12-02

Updated: 2013-04-27

jerrymouse - 2006-12-02

http://www.nasa.gov/

org.htmlparser.lexer.Page.java

method findCharset(String name, String fallback)

I'm working on web spider and I use HTMLParser 1.6 for parsing pages.
My output is all to console and I have a problem with

method findCharset(String name, String fallback)

on page

http://www.nasa.gov/

It always writes
"unable to determine cannonical charset name for .utf8 - using ISO-8859-1"
to console when I call

Parser parser = new Parser(connection); // (connection is URLConnection to http://www.nasa.gov\)

http://www.nasa.gov/ HTTP header:

Connection=[Transfer-Encoding, keep-alive], null=[HTTP/1.1 200 OK], Date=[Sat, 02 Dec 2006 23:08:56 GMT], Server=[Apache/2.0.45 (Unix) mod_perl/1.99_09-dev Perl/v5.6.1 covalent_auth/2.3 DAV/2 CovalentSSL/2.3.3 RSA/SSLC mod_jk/1.2.2-beta-1], Content-Type=[text/html; charset=.utf8], Transfer-Encoding=[chunked]

I think the main problem is field charset=.utf8 and it's nasa.gov server problem (bad supplied charset), but I'd like to have it confirmed. I'd like to know what can I do to suppress this message in console too (use different Parser() constructor ? whichone? how?).

Thank you.
Vojtech Liska

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.