Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
UniversalCharDetCS.bin.7z | 2012-04-03 | 78.0 kB | |
UniversalCharDetCS.src.7z | 2012-04-03 | 120.8 kB | |
Readme.txt | 2012-04-03 | 4.1 kB | |
Totals: 3 Items | 202.9 kB | 1 |
UniversalCharDetCS ==================== UniversalCharDetCS is a standalone program (written in C#) for automatic charset / encoding detection of a given text file or web pages. If automatic detection does not give good results, you can select the encoding manually evaluating the results visually. In manual mode you can select the language. Then the list of encodings available for selection will accordingly be narrowed. Charsets automatically recognized ================================= Language / Name Alias CodePage Remarks Unicode UTF-8 utf-8 65001 Unicode (UTF-8) UTF-16LE utf-16 1200 Unicode UTF-16, little endian byte order (BMP of ISO 10646) UTF-16BE unicodeFFFE 1201 Unicode UTF-16, big endian byte order UTF-32LE utf-32 12000 Unicode UTF-32, little endian byte order Available only to managed applications UTF-32BE utf-32BE 12001 Unicode UTF-32, big endian byte order Available only to managed applications X-ISO-10646-UCS-4-2143 utf-32 12000 Unusual BOM (3412 order) It is not supported on MS Windows. Very similar is the UTF-32LE X-ISO-10646-UCS-4-3412 utf-32BE 12001 Unusual BOM (3412 order) It is not supported on MS Windows. Very similar is the UTF-32BE Bulgarian ISO-8859-5 iso-8859-5 28595 ISO 8859-5 Cyrillic windows-1251 windows-1251 1251 ANSI Cyrillic, Cyrillic (Windows) Chinese Big5 big5 950 ANSI/OEM Traditional Chinese (Taiwan, Hong Kong SAR, PRC) GB18030 GB18030 54936 Simplified Chinese (4 byte), Chinese Simplified (GB18030) Windows XP and later HZ-GB-2312 hz-gb-2312 52936 HZ-GB2312 Simplified Chinese, Chinese Simplified (HZ) ISO-2022-CN x-cp50227 50227 ISO 2022 Simplified Chinese, Chinese Simplified (ISO 2022) x-euc-tw EUC-CN 51936 EUC Simplified Chinese, Chinese Simplified (EUC) Greek ISO-8859-7 iso-8859-7 28597 ISO 8859-7 Greek windows-1253 windows-1253 1253 ANSI Greek, Greek (Windows) Hebrew ISO-8859-8 iso-8859-8 28598 ISO 8859-8 Hebrew, Hebrew (ISO-Visual) windows-1255 windows-1255 1255 ANSI Hebrew, Hebrew (Windows) Japanese EUC-JP euc-jp 51932 EUC Japanese ISO-2022-JP csISO2022JP 50222 ISO 2022 Japanese JIS X 0201-1989, Japanese (JIS-Allow 1 byte Kana - SO/SI) or 50221? or 50220? Shift_JIS shift_jis 932 ANSI/OEM Japanese, Japanese (Shift-JIS) Korean EUC-KR euc-kr 51949 EUC Korean ISO-2022-KR iso-2022-kr 50225 ISO 2022 Korean Russian IBM855 IBM855 855 OEM Cyrillic (primarily Russian) IBM866 cp866 866 OEM Russian, Cyrillic (DOS) ISO-8859-5 iso-8859-5 28595 ISO 8859-5 Cyrillic KOI8-R koi8-r 20866 Russian (KOI8-R), Cyrillic (KOI8-R) windows-1251 windows-1251 1251 ANSI Cyrillic, Cyrillic (Windows) x-mac-cyrillic x-mac-cyrillic 10007 Cyrillic (Mac) Thai TIS-620 ISO 8859-11 874 TIS-620 (8-bit Thai) = ISO 8859-11 28601 not supported on MS Windows, supported by windows-874 Others ASCII us-ascii 20127 US-ASCII (7-bit) windows-1252 windows-1252 1252 ANSI Latin 1, Western European (Windows) Information =========== Software is based on Mozilla Universal Charset Detector: http://mxr.mozilla.org/mozilla/source/extensions/universalchardet/src/ Techniques used by universalchardet are described at: http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html Majority of basic code was taken from a Ude (C # port): http://code.google.com/p/ude/ Related works (from where taken some ideas): (Pascal) http://chsdet.sourceforge.net/ (C#) http://code.google.com/p/nuniversalchardet/ (Java) http://code.google.com/p/juniversalchardet/ Code Page Identifiers: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx License ======= The software is subject to the Mozilla Public License Version 1.1. Alternatively, the software may be used under the terms of either the GNU General Public License Version 2 or later, or the GNU Lesser General Public License 2.1 or later. Copyright (C) 2012 by Pawel57 <pawel57(at)users(dot)sourceforge(dot)net> http://sourceforge.net/projects/streaman/files/Useful_Tools/