Menu

Does sphinx3 support latin-1?

Help
chris
2008-01-11
2012-09-22
  • chris

    chris - 2008-01-11

    Hi,
    I want to know if sphinx3(3.7 official release) supports latin-1?
    I wrote use the french accoustic modle to recognize,it seems when it is parsing the special charactor(bigger than 128) which does't exist in ASCII,the dictionary will be failed.finally I found in file strfuncs function str2words(char line, char *ptr, int32 max_ptr) function,line "while (line[i] && !isspace((int)line[i]))",when the line[i] is a special charactor,(int)line[i] is a negative value,the isspace function will be failed,when it converts the parameter to unsigned int,the result is a very big number, then it is failed because of this "_ASSERTE((unsigned)(c + 1) <= 256);".if I changed the line "while (line[i] && !isspace((int)line[i]))" to "while (line[i] && !isspace((unsigned char)line[i]))",it seems ok for dictionary part,but I do't know if it's ok for rest part(the fsg parsing & searching ...).
    is there any maintainer can tell me?Very appreciate!

    Best Wishes
    Chris

     
    • Nickolay V. Shmyrev

      Thanks for your report Chris. It was reported earlier as

      https://sourceforge.net/tracker/index.php?func=detail&aid=1236322&group_id=1904&atid=101904

      but was not fixed properly. So the answer to your question is - yes, but with bugs.

      I've just committed a fix to svn, but we don't know if something else is broken. Please report about problems if you'll find them. It seems we also doesn't support UTF-8 properly, partially by efficiency reasons. But it would be nice to get UTF-8 support too one day. Although I'm currenty using UTF-8 for Russian and haven't noticed something serious.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.