Zint Barcode Generator / Tickets / #128 ECI documentation paragraph

A barcode encoding library supporting over 50 symbologies.

#128 ECI documentation paragraph

Milestone: 1.0

Status: closed

Owner: Harald Oehlmann

Labels: None

Updated: 2019-03-18

Created: 2018-11-02

Creator: Harald Oehlmann

Private: No

Here is the paragraph documenting the ECI tratment of zint.

If you are using data from file which is not UTF-8 formatted then you can 
specify the encoding by using the --eci= switch followed by the appropriate number 
from the table below. This procedure adds an ECI flag in the barcode data which 
tells the barcode reader to change character encoding.

--------------------------------------------------------
ECI Code  |  Character Encoding Scheme
--------------------------------------------------------
3         |  ISO-8859-1 - Latin alphabet No. 1 (default)
4         |  ISO-8859-2 - Latin alphabet No. 2
5         |  ISO-8859-3 - Latin alphabet No. 3
6         |  ISO-8859-4 - Latin alphabet No. 4
7         |  ISO-8859-5 - Latin/Cyrillic alphabet
8         |  ISO-8859-6 - Latin/Arabic alphabet
9         |  ISO-8859-7 - Latin/Greek alphabet
10        |  ISO-8859-8 - Latin/Hebrew alphabet
11        |  ISO-8859-9 - Latin alphabet No. 5
12        |  ISO-8859-10 - Latin alphabet No. 6
13        |  ISO-8859-11 - Latin/Thai alphabet
15        |  ISO-8859-13 - Latin alphabet No. 7
16        |  ISO-8859-14 - Latin alphabet No. 8 (Celtic)
17        |  ISO-8859-15 - Latin alphabet No. 9
18        |  ISO-8859-16 - Latin alphabet No. 10
20 *      |  Shift-JIS (JISX 0208 amd JISX 0201)
21        |  Windows-1250 - Latin 2 (Central Europe)
22        |  Windows-1251 - Cyrillic
23        |  Windows-1252 - Latin 1
24        |  Windows-1256 - Arabic
25 *      |  UCS-2 Unicode (High order byte first)
26        |  Unicode (UTF-8)
27        |  ISO-646:1991 7-bit character set
28 *      |  Big-5 (Taiwan) Chinese Character Set
29 *      |  GB (PRC) Chinese Character Set
30 *      |  Korean Character Set (KSX1001:1998)
--------------------------------------------------------


* Note: When using the ECI flag Zint will treat all input data as raw binary.
This means that data which is encoded using a multiple-byte encoding schemes
(other than UTF-8) will not use optimal compression. It is therefore
recommended that data using these schemes be converted to UTF-8 using iconv
or similar before passing it to Zint.

What I found out how it works:

Input mode should be unicode (no -binary switch)
Input should be in utf-8 if there is no star in the upper table
Input should be in the given encoding if there is a star in the upper table

Example: encode the Euro sign in ISO8859-15. The Euro-Sign has ISO8859-15 codepoint A4h and utf-8 representation: e2 82 ac (which is in the file utfeuro.txt).

zint.exe -b 71 --square --scale 10 --eci 17 -i utfeuro.txt

Gives an Image with the correct ECI and one byte encoded, which has the character code "A4". This is correct for ISO8859-15.

Adding the binary switch will:

still encode the ECI
don't know what happens on the data. I got hex 42 3d 4c 03 2A

Take the Chinese character "常". This is unicode codepoint 5e38.
The big5 encoding is: 9c 75
Thus, the file big5test.txt contains those two bytes.

> zint.exe -b 71 --square --scale 10 --eci 28 -i big5test.txt
Error 204: Invalid characters in input data

Try with --binary switch:

>zint.exe -b 71 --square --scale 10 --eci 28 --binary -i big5test.txt

This works well and gives the right result.

Try with utf-8 data (utfbig5.txt conatains : e5 b8 b8 (utf8 of 9c 75)

>zint.exe -b 71 --square --scale 10 --eci 28 -i utfbig5.txt
Error 204: Invalid characters in input data

So, this does not work (as expected).

So, as a result, the lines with the stars need explicit binary data, the others utf8 data.

Here is my proposal for the descriptive text:

If your data contains non ISO-Latin-1 characters, you may encode it using an ECI-aware Symbology and an ECI value from the table below.
The ECI information is added to your code symbol as prefix data.

The ECI-Value may be specified with the --eci switch, followed by the value in the column "ECI Code".

The first row of the table (ECI code 3) is the default value and does not lead to any ECI information included into the symbol.

The input data should be utf-8 formatted. Zint automatically translates the data into the target encoding.
The rows marked with a star (*) do not do this transformation. The data must be specified as binary data (--binary switch) with the data in the encoding given by the "Character Encoding Scheme" column.

--------------------------------------------------------
ECI Code  |  Character Encoding Scheme
--------------------------------------------------------
3         |  ISO-8859-1 - Latin alphabet No. 1 (default)
4         |  ISO-8859-2 - Latin alphabet No. 2
5         |  ISO-8859-3 - Latin alphabet No. 3
6         |  ISO-8859-4 - Latin alphabet No. 4
7         |  ISO-8859-5 - Latin/Cyrillic alphabet
8         |  ISO-8859-6 - Latin/Arabic alphabet
9         |  ISO-8859-7 - Latin/Greek alphabet
10        |  ISO-8859-8 - Latin/Hebrew alphabet
11        |  ISO-8859-9 - Latin alphabet No. 5
12        |  ISO-8859-10 - Latin alphabet No. 6
13        |  ISO-8859-11 - Latin/Thai alphabet
15        |  ISO-8859-13 - Latin alphabet No. 7
16        |  ISO-8859-14 - Latin alphabet No. 8 (Celtic)
17        |  ISO-8859-15 - Latin alphabet No. 9
18        |  ISO-8859-16 - Latin alphabet No. 10
20 *      |  Shift-JIS (JISX 0208 amd JISX 0201)
21        |  Windows-1250 - Latin 2 (Central Europe)
22        |  Windows-1251 - Cyrillic
23        |  Windows-1252 - Latin 1
24        |  Windows-1256 - Arabic
25 *      |  UCS-2 Unicode (High order byte first)
26        |  Unicode (UTF-8)
27        |  ISO-646:1991 7-bit character set
28 *      |  Big-5 (Taiwan) Chinese Character Set
29 *      |  GB (PRC) Chinese Character Set
30 *      |  Korean Character Set (KSX1001:1998)
--------------------------------------------------------

Two examples:
Ex1: The Euro sign should be encoded in ISO-8859-15.
The Euro-Sign has the ISO8859-15 codepoint hex A4.
It is encoded in utf-8 as the hex sequence: e2 82 ac
Those 3 bytes are contained in the file "utf8euro.txt"
This command will generate the corresponding code:
zint.exe -b 71 --square --scale 10 --eci 17 -i utf8euro.txt

Ex2: The Chinese character with Unicode codepoint hex 5e38 should be encoded in big5 encoding.
The big5 ECI is marked in the upper table to require input data in big5 instead of utf-8. 
The big5 representation of this character are the two hex bytes: 9c 75 (contained in the file big5char.txt).
The generation command is:
zint.exe -b 71 --square --scale 10 --eci 28 --binary -i big5char.txt

Maybe, also some words may be written about automatic ECI choice.

What do you think ?
Harald

The attached zip containes the input files and the generated symbols.

1 Attachments

ecitest.zip

Discussion

Robin Stuart - 2019-03-15

This looks perfectly sensible. Please patch the documentation as appropriate and I will pass the changes to the website.

Robin.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Harald Oehlmann - 2019-03-18

status: open --> closed

assigned_to: Harald Oehlmann
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Harald Oehlmann - 2019-03-18

Committed by [a083b3].
Thank you,
Harald

Related

Commit: [a083b3]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ECI documentation paragraph

A barcode encoding library supporting over 50 symbologies.

Milestone

Searches

Help

#128 ECI documentation paragraph

Discussion

Related