Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#21 Problem with PyHnjHyphenator and umlauts

closed-wont-fix
nobody
None
5
2009-06-26
2009-06-19
Roberto Alsina
No

When trying to hyphenate some german text using PyHnjHyohenator, I get this:

File "/usr/lib/python2.6/site-packages/wordaxe/PyHnjHyphenator.py", line 107, in hyph
hword = HyphenatedWord(aWord, hyphenations=self.zerlegeWort(aWord))
File "/usr/lib/python2.6/site-packages/wordaxe/PyHnjHyphenator.py", line 95, in zerlegeWort
codes = self.hnj.getCodes(zusgWort.lower())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)

I think the preferred hyphenator for german is DCW, but I can't use it (previous bug report)

Discussion

  • H. von Bargen
    H. von Bargen
    2009-06-21

    Roberto, as mentioned in the documentation, you should better use the PyHyphenHyphenator instead of the PyHnjHyphenator, as the PyHnjHyphenator has problems with non-ascii characters like umlauts and might even crash the Python interpreter.

    Regarding PyHyphenHyphenator vs DCWHyphenator:
    The PyHyphenHyphenator is *way* faster and the results should be as good as in Open Office, since the same C library and dict files are used.
    The DCWHyphenator avoids unwanted, ambigous or misleading hyphenations: If the algorithm splits a word, it means that the word is known and the hyphenation should be correct. On the other hand, the algorithm is much slower in comparison to PyHyphen-/PyHnjHyphenator and a lot of words will not be hyphenated at all, just because the list of known words (DEhyph.py) is *very* limited. However, you can easily add your own words to that list.

     
  • H. von Bargen
    H. von Bargen
    2009-06-21

    • status: open --> open-wont-fix
     
  • Roberto Alsina
    Roberto Alsina
    2009-06-22

    Ok, I'll remove the PyHnj usage from rst2pdf. I'll leave the DCW/PyHyphen choice to the germans in the group, since spanish is very easy to hyphenate :-)

     
  • H. von Bargen
    H. von Bargen
    2009-06-26

    • status: open-wont-fix --> closed-wont-fix
     
  • H. von Bargen
    H. von Bargen
    2009-06-26

    The solution is to use PyHyphenHyphenator instead of PyHnjHyphenator.