This tool samples the characters to guess the code point. Since all characters in google.com is English it narrowed down to ASCII. As you might know ASCII and UTF-8 share the same code points for English characters. The use of this tool is when the meta tag is missing in the HTML page. If you are writing a spider you first want to check the html meta tag to get the correct charset. If it is missing then you pass the data to this tool to guess the charset.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Welcome to Help
use this sample : "HtmlCharsetDetector.java",i cannot get the correct result of detected ,
the url is "http://www.google.com",
the result is :CHARSET = ASCII
but i found this page's charset is UTF-8;
so i wish you can help me!
thanx!
This tool samples the characters to guess the code point. Since all characters in google.com is English it narrowed down to ASCII. As you might know ASCII and UTF-8 share the same code points for English characters. The use of this tool is when the meta tag is missing in the HTML page. If you are writing a spider you first want to check the html meta tag to get the correct charset. If it is missing then you pass the data to this tool to guess the charset.