Menu

#1393 Extract Text Content script to export to UTF-8

4.1
closed-fixed
scripts (9)
5
2018-08-06
2018-06-21
Gabix
No

Using OmegaT 4.1.5 on Windows 7 / Russian locale / JRE 1.8.0_171. When running the Extract Text Content script (extract_text_content.groovy file), discovered an issue as follows: the output files are encoded in cp1251 (i. e. the system default encoding), which results in character loss for languages that use characters not included in the mentioned encoding.

Steps to reproduce:
1. Download the attached archive and unpack the project
2. Open in OmegaT in Windows with a Russian (Belarusian, Ukrainian) locale.
3. Run the script. It should run fine, no errors.
4. Open project_source_content.txt and project_target_content.txt. The latter opens fine. However, project_source_content.txt shows question marks instead of French letters with diacritics such as é, à, ç.

So, suggestion is to explicitely force UTF-8 (or UTF-16) for the script output.

P. S.
My skills in Groovy are absolute zero, so I searched the Web and tried to modify the lines

srcTextFile << source + "\n";
and
tgtTextFile << target + "\n";
to, respectively,
srcTextFile.withWriter('UTF-8') << source + "\n";
and
tgtTextFile.withWriter('UTF-8') << target + "\n";

However, this results in empty output files. I can't figure out anything else.

1 Attachments

Discussion

  • Didier Briel

    Didier Briel - 2018-06-21
    • assigned_to: Didier Briel
    • Group: future --> 4.1
     
  • Didier Briel

    Didier Briel - 2018-06-21

    Do not worry, I'll make the changes.

    Didier

     
  • Didier Briel

    Didier Briel - 2018-06-21
    • summary: Extract Text Content script to xport to UTF-8 --> Extract Text Content script to export to UTF-8
     
  • Didier Briel

    Didier Briel - 2018-06-22
    • status: open --> open-fixed
     
  • Didier Briel

    Didier Briel - 2018-06-22

    Implemented in SVN (/trunk, [r10427]).

    In addition to saving in UTF-8, I have used a system-newline (instead of a hardcoded \n), so that newlines are visible under Windows Notepad, for instance.

    Didier

     

    Related

    Commit: [r10427]

  • Didier Briel

    Didier Briel - 2018-08-06
    • status: open-fixed --> closed-fixed
     
  • Didier Briel

    Didier Briel - 2018-08-06

    Closed in the released version 4.1.5 update 1 of OmegaT.

    Didier

     

Log in to post a comment.