This is a mitigation for [bugs:#1046]
Detecting the encoding of TSV glossary files is fallible due to the nature of encoding detection algorithms.
To help avoid misdetections, OmegaT now attempts to determine the encoding of a TSV glossary file (except with extension .utf8) by inspecting the first line of the file for a "magic comment".
A magic comment is a comment line with content formatted like
-*- foo: bar; biz: baz -*-
which represents instructions to set foo to bar and biz to baz.
OmegaT recognizes only the setting coding: <charset> where <charset> is a charset recognized by Java's Charset#forName, such as utf-8.
A magic comment setting the coding to utf-8 will be automatically included as the first line of a writable glossary file created by OmegaT. Since the recognized comment marker for glossary files is #, the magic comment is:
# -*- coding: utf-8 -*-
Note that you can include arbitrary content between the # and the first -*-.
Existing glossary files are not modified with respect to the magic comment, and still suffer from [bugs:#1046]. Users should add an appropriate magic comment if desired.
Diff:
Related
Bugs:
#1046Diff:
Related
Bugs:
#1046This is implemented in [85bd5c].
Related
Commit: [85bd5c]
Released in OmegaT 5.6.0.