#185 Bad UTF-8 char count in pipe mode

v1.0 (example)
closed-fixed
None
5
2014-10-16
2011-02-11
No

Hi,

Using Hunspell 1.2.11 (but 1.2.14 seems to show the same problem)

When run in pipe mode (-a option) hunspell seems to count UTF-8 chars as a sum of unibyte chars instead as of a single multibyte char. That is causing problems under Emacs.

With attached file,

$ cat test-utf8-shift.txt | hunspell -d en_US -i utf-8 -a | grep ^\&
& Feedbooks 8 24: Feed books, Feed-books, Feedbacks, Feedback, Feedbags, Studbooks, Feedbag, Letterbox

when it should be (showing aspell behavior for comparison)

$ cat test-utf8-shift.txt | aspell --encoding=utf-8 -d en_US -a | grep ^\&
& Feedbooks 6 22: Feed books, Feed-books, Feedback's, Feedbags, Feedbag's, Feedback

Note that the conflicting UTF-8 apostrophe is three byte multi-byte char, thus causing the two characters shift.

References:

http://debbugs.gnu.org/7781

Cheers,

Discussion

  • File used to reproduce the bug report

     
    Attachments
  • Those who want to compile a bug fix for themselves can find fixes (based on Hunspell 1.2.8 and Emacs V23) to spell check word-separated Thai in UTF-8 from Emacs at http://homepage.ntlworld.com/richard.wordingham/thai/hunspell-1.2.8-jrw1.1.zip - the problem above was just one of those met and resolved. The full list is:

    On Hunspell:

    Bad UTF-8 char count in pipe mode - ID: 3178449
    No Encoding of Word for Suggestions in Piped Mode (https://sourceforge.net/tracker/?func=detail&aid=3468022&group_id=143754&atid=756395)
    Multidictionary guesses dictionary for suggestions (https://sourceforge.net/tracker/?func=detail&aid=3468039&group_id=143754&atid=756395)
    Hunspell 1.2.8 Groups Thai TIS-620 Chars in Lower/Upper Case Pairs (https://bugs.launchpad.net/ubuntu/+source/hunspell/+bug/910452) (fixed in Release 1.2.14)

    On the Thai dictionary:

    th_TH Affix File Inadequate for Hunspell (https://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447)

     
  • Peter Münster
    Peter Münster
    2014-04-20

    Hi,
    Will this be fixed?
    If yes, about when please?
    TIA, Peter

     
  • Reuben Thomas
    Reuben Thomas
    2014-09-25

    I can confirm that this patch is still needed with hunspell 1.3.3. It also applies cleanly, passes all the tests, and fixes the bug.

     
    • status: open --> closed-fixed
    • assigned_to: caolan mcnamara
    • Group: --> v1.0 (example)
     
  • ok, integrated it