From: Markus Scherer <markus.scherer@jt...> - 2002-10-28 20:15:16
> Is there any way to find out the Encoding of any InputStream in
> Java. is ICU4C library can help me in this way?? if yes, then how?
ICU4C currently only has a function for interpreting Unicode signature byte sequences (BOMs). ICU
does not have any heuristic code for "guessing" charsets.
Such heuristics depend a lot on what kind of documents you expect to encounter - HTML, XML, plain
text, natural language vs. mostly data, known language(s) and script(s), ...
Mozilla has such code, optimized for HTML pages and charsets commonly used in those. Other libraries
may have different code with different optimizations.