Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#9 Problem with unicode quote

closed-out-of-date
nobody
None
5
2008-12-21
2008-09-30
Anonymous
No

To make sf happy: ralsina@netmanagers.com.ar

I am not sure if it's a problem with wordaxe or pyhyphen, but this word fails to hyphenate and gives a backtrace:

raven’s

That's a u'\u2019' there btween the n and the s.

Here's the end of the backtrace:

File "/usr/lib/python2.5/site-packages/wordaxe/hyphen.py", line 316, in hyphenate
hword = self.i_hyphenate(aWord)
File "/usr/lib/python2.5/site-packages/wordaxe/plugins/PyHyphenHyphenator.py", line 107, in i_hyphenate
return ExplicitHyphenator.i_hyphenate_derived(self, aWord)
File "/usr/lib/python2.5/site-packages/wordaxe/ExplicitHyphenator.py", line 137, in i_hyphenate_derived
hword = self.stripper.apply_stripped(word, self.hyph)
File "/usr/lib/python2.5/site-packages/wordaxe/BaseHyphenator.py", line 61, in apply_stripped
result = func(base, *args, **kwargs)
File "/usr/lib/python2.5/site-packages/wordaxe/plugins/PyHyphenHyphenator.py", line 100, in hyph
hword = HyphenatedWord(aWord, hyphenations=self.zerlegeWort(aWord))
File "/usr/lib/python2.5/site-packages/wordaxe/plugins/PyHyphenHyphenator.py", line 64, in zerlegeWort
for left, right in self.hnj.pairs(zusgWort):
File "/usr/lib/python2.5/site-packages/hyphen/__init__.py", line 199, in pairs
return self.__hyphenate__.apply(word, mode)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2019' in position 5: ordinal not in range(256)

Discussion

  • H. von Bargen
    H. von Bargen
    2008-10-12

    This seems to be a bug in pyhyphen. Problably not much I can do about it.
    Trent W. Buck already reported a similar bug for the pyhyphen package:
    http://code.google.com/p/pyhyphen/issues/detail?id=4

     
  • H. von Bargen
    H. von Bargen
    2008-12-21

    • status: open --> closed-out-of-date
     
  • H. von Bargen
    H. von Bargen
    2008-12-21

    It seems like this was a bug in pyhyphen.
    From http://pypi.python.org/pypi/PyHyphen/ :
    ...
    new in version 0.9:
    ...
    fixed important bug in 'pairs' method that could cause a unicode error if 'word' was not encodable to the dictionary's encoding. In the latter case, the new version returns an empty list (consistent with other cases where the word is not hyphenable).

    So I assume this actually was a pyhyphen bug and close this wordaxe bug item.