Menu

#43 Error with texindy and -C utf8

v2.5
closed
None
3
2014-04-20
2013-02-22
U_Fischer
No

I have a problem with the texindy script of TeXLive 2012 (and also with an older version).

xindy release: 2.4
texindy.pl script version: 1.11
xindy.pl script version: 1.16

My idx file is generated by xelatex and is a real utf8-file with non-ascii-chars (no LICR) in the file.

When I run

I:\Z-Test>texindy -M mystyle -L german-duden -C utf8 test.idx

I get
...
Error in line 2:
(require "tex/inputenc/utf8.xdy")
ERROR: Could not find file "tex/inputenc/utf8.xdy" !

(And yes there is no utf8.xdy in this folder)

Reponsable is line 463 of texindy.pl:

push (@opt, '-M', "tex/inputenc/$codepage") if $codepage;

On the other hand xindy
I:\test>xindy -M texindy -M mystyle -L german-duden -C utf8 test.idx

works fine (but doesn't try to load the utf8.xdy but only language related utf8 files.)

The problem has also been mentioned on tex.sx:
http://tex.stackexchange.com/questions/58010/sorting-index-entries-with-accented-words/58016#58016

I also think that it is the problem described in the closed bug report 3463826.

Somewhere is a bug, but I don't know if the bug is the missing utf8.xdy (and that xindy is not using it), or that texindy is trying to find it.

--
Ulrike Fischer
http://www.troubleshooting-tex.de/

Discussion

  • Joachim Schrod

    Joachim Schrod - 2013-02-22

    Ulrike,

    Can you please try

    xindy -M texindy -M mystyle -L german-duden -C utf8 test.idx

    and see if that works?

    texindy is supposed to be used with LICR-encoded raw index files. That should be more emphasized in the documentation (which was written before xelatex was available). Or texindy should support to work with xelatex files out of the box, but I don't know yet how to detect that automatically.

    Cheers, Joachim

     
  • U_Fischer

    U_Fischer - 2013-02-22

    Joachim,

    as I already wrote the xindy call works fine.

    But reading carefully texindy.pdf I found a call that works:

    texindy -M mystyle -L german-duden -I omega text.idx

    But I do find the option handling of texindy confusing. If it assumes LICR, why does it accept the -C switch at all? And why does texindy need the merge rules in the /tex/inputenc-files but "xindy -M texindy" not? Also I think texindy should handle "-C utf8" case in a more sensible way. It seems quite natural to use this switch when handling index file created by one of the new unicode engines - much more natural dann "-I omega" ;-).

    Ulrike Fischer

     
  • Joachim Schrod

    Joachim Schrod - 2013-02-23
    • assigned_to: nobody --> jschrod
    • priority: 5 --> 3
     
  • Joachim Schrod

    Joachim Schrod - 2013-02-23

    > as I already wrote the xindy call works fine.

    Sorry, I didn't read your report thoroughly enough.

    > why does texindy need the merge rules in the /tex/inputenc-files but "xindy -M texindy" not?

    Because that's its whole reason of being. The difference between "xindy -M texindy" and the command texindy is decoding LICR. The command was created because it was too difficult to explain the module one would have to write. (I did this when I wrote the indexing chapter in the LaTeX Companion. Before there was no command texindy.)

    > If it assumes LICR, why does it accept the -C switch at all?
    > I think texindy should handle "-C utf8" case in a more sensible way.

    The -C switch defines the result of LICR decoding and thus the decoding used as base for the following markup normalization and sorting phase. There is no LICR decoding defined for LaTeX input encoding utf8. If it would be defined, it would also be something different and would *not* mean support of xelatex or luatex. "sensible" in this context would mean support of "\usepackage[utf8]{inputenc}", which is not what you want...

    > It seems quite natural to use this switch when handling index file created by one of
    > the new unicode engines - much more natural dann "-I omega" ;-).

    "-I omega" supports ^^^^ encoding instead of LICR encoding. That it was implemented as a texindy option, was probably an error, in hindsight.

    Really, the currently supported and proposed way to use xindy for "one of the new unicode engines" is by calling. "xindy -M texindy -C utf8 ..." This should be spelled out explicitely in texindy's man page. (The actual problem is that input encoding is used for sorting at all, and not Unicode, but that is a design problem that was done decades earlier, when 8-bit TeX was new and we didn't think about Unicode at all.)

    I might add input encoding options "xelatex" and "lualatex" nevertheless, to support people who look for the command texindy first, instead of using xindy directly. Maybe for TeX-Live 2013.

     
  • Joachim Schrod

    Joachim Schrod - 2014-04-20

    texindy man page now explains explicitely that it must not be used with XeLaTeX and LuaLaTeX. xindy must be used instead.

    For classic (pdf)LaTeX users with Latin alphabets, an utf8.xdy was added to be able to get UTF-8 letter group names in the output. This will not work with non-Latin-alphabets. But since most authors of such alphabets use XeLaTeX and LuaLaTeX anyhow, they have to use xindy instead.

    Btw, for anybody searching and reading this old ticket: The correct xindy call is not just "xindy -C utf8 -M texindy", but
    xindy -C utf8 -M texindy -M page-ranges

     
  • Joachim Schrod

    Joachim Schrod - 2014-04-20
    • status: open --> closed
    • Group: --> v2.5
     

Log in to post a comment.