I think the main problem is field charset=.utf8 and it's nasa.gov server problem (bad supplied charset), but I'd like to have it confirmed. I'd like to know what can I do to suppress this message in console too (use different Parser() constructor ? whichone? how?).
Thank you.
Vojtech Liska
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
http://www.nasa.gov/
org.htmlparser.lexer.Page.java
method findCharset(String name, String fallback)
I'm working on web spider and I use HTMLParser 1.6 for parsing pages.
My output is all to console and I have a problem with
method findCharset(String name, String fallback)
on page
http://www.nasa.gov/
It always writes
"unable to determine cannonical charset name for .utf8 - using ISO-8859-1"
to console when I call
Parser parser = new Parser(connection); // (connection is URLConnection to http://www.nasa.gov\)
http://www.nasa.gov/ HTTP header:
Connection=[Transfer-Encoding, keep-alive], null=[HTTP/1.1 200 OK], Date=[Sat, 02 Dec 2006 23:08:56 GMT], Server=[Apache/2.0.45 (Unix) mod_perl/1.99_09-dev Perl/v5.6.1 covalent_auth/2.3 DAV/2 CovalentSSL/2.3.3 RSA/SSLC mod_jk/1.2.2-beta-1], Content-Type=[text/html; charset=.utf8], Transfer-Encoding=[chunked]
I think the main problem is field charset=.utf8 and it's nasa.gov server problem (bad supplied charset), but I'd like to have it confirmed. I'd like to know what can I do to suppress this message in console too (use different Parser() constructor ? whichone? how?).
Thank you.
Vojtech Liska