Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
MIK.sbcs | 2019-12-23 | 5.0 kB | |
readme.md | 2019-12-23 | 1.6 kB | |
Totals: 2 Items | 6.6 kB | 0 |
Custom charsets are supported as long as they are single-byte charsets. That is, a Unicode code point is represented by a single byte only.
The character map of such a charset has to be defined in an ASCII text file (alternatively, in a UTF-8 text file without BOM).
The character map consists of one line for each supported code point.
Each line consists of three comma-separated numeric values
[priority], [code point], [character value]
-
priority
Either 0 or 1.
A charset may require code points of similar-shaped glyphs to be represented by the same character. E.g. the MIK charset requires both the Greek lowercase beta (U+03B2, β) and the German sharp s (U+00DF, ß) to be represented by character 0xE1. If a conversion from the custom charset to another charset is performed then the character translation defaults to the code point marked with priority 0.
Make sure that the character map contains exactly one code point marked with priority 0 for each character 0x00..0xFF. -
code point
Unicode code point that a character shall represent. -
character value
The value of the byte that represents the given code point in the custom charset.
Example lines for the mentioned β and ß in the MIK character map:
0, 0x0003B2, 0xE1
1, 0x0000DF, 0xE1
To use the custom charset, pass the name of the charset file rather than the code page ID.
E.g. perform the conversion from UTF-16 LE to MIK:
convertcp utf-16 "MIK.sbcs" /i "u16.txt" /o "mik.txt"
or vice versa:
convertcp "MIK.sbcs" utf-16 /i "mik.txt" /o "u16.txt" /b