#9 Problem with unicode quote

closed-out-of-date
nobody
None
5
2008-12-21
2008-09-30
Anonymous
No

To make sf happy: ralsina@netmanagers.com.ar

I am not sure if it's a problem with wordaxe or pyhyphen, but this word fails to hyphenate and gives a backtrace:

raven’s

That's a u'\u2019' there btween the n and the s.

Here's the end of the backtrace:

File "/usr/lib/python2.5/site-packages/wordaxe/hyphen.py", line 316, in hyphenate
hword = self.i_hyphenate(aWord)
File "/usr/lib/python2.5/site-packages/wordaxe/plugins/PyHyphenHyphenator.py", line 107, in i_hyphenate
return ExplicitHyphenator.i_hyphenate_derived(self, aWord)
File "/usr/lib/python2.5/site-packages/wordaxe/ExplicitHyphenator.py", line 137, in i_hyphenate_derived
hword = self.stripper.apply_stripped(word, self.hyph)
File "/usr/lib/python2.5/site-packages/wordaxe/BaseHyphenator.py", line 61, in apply_stripped
result = func(base, *args, **kwargs)
File "/usr/lib/python2.5/site-packages/wordaxe/plugins/PyHyphenHyphenator.py", line 100, in hyph
hword = HyphenatedWord(aWord, hyphenations=self.zerlegeWort(aWord))
File "/usr/lib/python2.5/site-packages/wordaxe/plugins/PyHyphenHyphenator.py", line 64, in zerlegeWort
for left, right in self.hnj.pairs(zusgWort):
File "/usr/lib/python2.5/site-packages/hyphen/__init__.py", line 199, in pairs
return self.__hyphenate__.apply(word, mode)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2019' in position 5: ordinal not in range(256)

Discussion

  • H. von Bargen

    H. von Bargen - 2008-12-21
    • status: open --> closed-out-of-date
     
  • H. von Bargen

    H. von Bargen - 2008-12-21

    It seems like this was a bug in pyhyphen.
    From http://pypi.python.org/pypi/PyHyphen/ :
    ...
    new in version 0.9:
    ...
    fixed important bug in 'pairs' method that could cause a unicode error if 'word' was not encodable to the dictionary's encoding. In the latter case, the new version returns an empty list (consistent with other cases where the word is not hyphenable).

    So I assume this actually was a pyhyphen bug and close this wordaxe bug item.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks