Here is the paragraph documenting the ECI tratment of zint.
If you are using data from file which is not UTF-8 formatted then you can
specify the encoding by using the --eci= switch followed by the appropriate number
from the table below. This procedure adds an ECI flag in the barcode data which
tells the barcode reader to change character encoding.
--------------------------------------------------------
ECI Code | Character Encoding Scheme
--------------------------------------------------------
3 | ISO-8859-1 - Latin alphabet No. 1 (default)
4 | ISO-8859-2 - Latin alphabet No. 2
5 | ISO-8859-3 - Latin alphabet No. 3
6 | ISO-8859-4 - Latin alphabet No. 4
7 | ISO-8859-5 - Latin/Cyrillic alphabet
8 | ISO-8859-6 - Latin/Arabic alphabet
9 | ISO-8859-7 - Latin/Greek alphabet
10 | ISO-8859-8 - Latin/Hebrew alphabet
11 | ISO-8859-9 - Latin alphabet No. 5
12 | ISO-8859-10 - Latin alphabet No. 6
13 | ISO-8859-11 - Latin/Thai alphabet
15 | ISO-8859-13 - Latin alphabet No. 7
16 | ISO-8859-14 - Latin alphabet No. 8 (Celtic)
17 | ISO-8859-15 - Latin alphabet No. 9
18 | ISO-8859-16 - Latin alphabet No. 10
20 * | Shift-JIS (JISX 0208 amd JISX 0201)
21 | Windows-1250 - Latin 2 (Central Europe)
22 | Windows-1251 - Cyrillic
23 | Windows-1252 - Latin 1
24 | Windows-1256 - Arabic
25 * | UCS-2 Unicode (High order byte first)
26 | Unicode (UTF-8)
27 | ISO-646:1991 7-bit character set
28 * | Big-5 (Taiwan) Chinese Character Set
29 * | GB (PRC) Chinese Character Set
30 * | Korean Character Set (KSX1001:1998)
--------------------------------------------------------
* Note: When using the ECI flag Zint will treat all input data as raw binary.
This means that data which is encoded using a multiple-byte encoding schemes
(other than UTF-8) will not use optimal compression. It is therefore
recommended that data using these schemes be converted to UTF-8 using iconv
or similar before passing it to Zint.
What I found out how it works:
Example: encode the Euro sign in ISO8859-15. The Euro-Sign has ISO8859-15 codepoint A4h and utf-8 representation: e2 82 ac (which is in the file utfeuro.txt).
zint.exe -b 71 --square --scale 10 --eci 17 -i utfeuro.txt
Gives an Image with the correct ECI and one byte encoded, which has the character code "A4". This is correct for ISO8859-15.
Adding the binary switch will:
Take the Chinese character "常". This is unicode codepoint 5e38.
The big5 encoding is: 9c 75
Thus, the file big5test.txt contains those two bytes.
> zint.exe -b 71 --square --scale 10 --eci 28 -i big5test.txt
Error 204: Invalid characters in input data
Try with --binary switch:
>zint.exe -b 71 --square --scale 10 --eci 28 --binary -i big5test.txt
This works well and gives the right result.
Try with utf-8 data (utfbig5.txt conatains : e5 b8 b8 (utf8 of 9c 75)
>zint.exe -b 71 --square --scale 10 --eci 28 -i utfbig5.txt
Error 204: Invalid characters in input data
So, this does not work (as expected).
So, as a result, the lines with the stars need explicit binary data, the others utf8 data.
Here is my proposal for the descriptive text:
If your data contains non ISO-Latin-1 characters, you may encode it using an ECI-aware Symbology and an ECI value from the table below.
The ECI information is added to your code symbol as prefix data.
The ECI-Value may be specified with the --eci switch, followed by the value in the column "ECI Code".
The first row of the table (ECI code 3) is the default value and does not lead to any ECI information included into the symbol.
The input data should be utf-8 formatted. Zint automatically translates the data into the target encoding.
The rows marked with a star (*) do not do this transformation. The data must be specified as binary data (--binary switch) with the data in the encoding given by the "Character Encoding Scheme" column.
--------------------------------------------------------
ECI Code | Character Encoding Scheme
--------------------------------------------------------
3 | ISO-8859-1 - Latin alphabet No. 1 (default)
4 | ISO-8859-2 - Latin alphabet No. 2
5 | ISO-8859-3 - Latin alphabet No. 3
6 | ISO-8859-4 - Latin alphabet No. 4
7 | ISO-8859-5 - Latin/Cyrillic alphabet
8 | ISO-8859-6 - Latin/Arabic alphabet
9 | ISO-8859-7 - Latin/Greek alphabet
10 | ISO-8859-8 - Latin/Hebrew alphabet
11 | ISO-8859-9 - Latin alphabet No. 5
12 | ISO-8859-10 - Latin alphabet No. 6
13 | ISO-8859-11 - Latin/Thai alphabet
15 | ISO-8859-13 - Latin alphabet No. 7
16 | ISO-8859-14 - Latin alphabet No. 8 (Celtic)
17 | ISO-8859-15 - Latin alphabet No. 9
18 | ISO-8859-16 - Latin alphabet No. 10
20 * | Shift-JIS (JISX 0208 amd JISX 0201)
21 | Windows-1250 - Latin 2 (Central Europe)
22 | Windows-1251 - Cyrillic
23 | Windows-1252 - Latin 1
24 | Windows-1256 - Arabic
25 * | UCS-2 Unicode (High order byte first)
26 | Unicode (UTF-8)
27 | ISO-646:1991 7-bit character set
28 * | Big-5 (Taiwan) Chinese Character Set
29 * | GB (PRC) Chinese Character Set
30 * | Korean Character Set (KSX1001:1998)
--------------------------------------------------------
Two examples:
Ex1: The Euro sign should be encoded in ISO-8859-15.
The Euro-Sign has the ISO8859-15 codepoint hex A4.
It is encoded in utf-8 as the hex sequence: e2 82 ac
Those 3 bytes are contained in the file "utf8euro.txt"
This command will generate the corresponding code:
zint.exe -b 71 --square --scale 10 --eci 17 -i utf8euro.txt
Ex2: The Chinese character with Unicode codepoint hex 5e38 should be encoded in big5 encoding.
The big5 ECI is marked in the upper table to require input data in big5 instead of utf-8.
The big5 representation of this character are the two hex bytes: 9c 75 (contained in the file big5char.txt).
The generation command is:
zint.exe -b 71 --square --scale 10 --eci 28 --binary -i big5char.txt
Maybe, also some words may be written about automatic ECI choice.
What do you think ?
Harald
The attached zip containes the input files and the generated symbols.
This looks perfectly sensible. Please patch the documentation as appropriate and I will pass the changes to the website.
Robin.
Committed by [a083b3].
Thank you,
Harald
Related
Commit: [a083b3]