Hi,
I want to know if sphinx3(3.7 official release) supports latin-1?
I wrote use the french accoustic modle to recognize,it seems when it is parsing the special charactor(bigger than 128) which does't exist in ASCII,the dictionary will be failed.finally I found in file strfuncs function str2words(char line, char *ptr, int32 max_ptr) function,line "while (line[i] && !isspace((int)line[i]))",when the line[i] is a special charactor,(int)line[i] is a negative value,the isspace function will be failed,when it converts the parameter to unsigned int,the result is a very big number, then it is failed because of this "_ASSERTE((unsigned)(c + 1) <= 256);".if I changed the line "while (line[i] && !isspace((int)line[i]))" to "while (line[i] && !isspace((unsigned char)line[i]))",it seems ok for dictionary part,but I do't know if it's ok for rest part(the fsg parsing & searching ...).
is there any maintainer can tell me?Very appreciate!
Best Wishes
Chris
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
but was not fixed properly. So the answer to your question is - yes, but with bugs.
I've just committed a fix to svn, but we don't know if something else is broken. Please report about problems if you'll find them. It seems we also doesn't support UTF-8 properly, partially by efficiency reasons. But it would be nice to get UTF-8 support too one day. Although I'm currenty using UTF-8 for Russian and haven't noticed something serious.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I want to know if sphinx3(3.7 official release) supports latin-1?
I wrote use the french accoustic modle to recognize,it seems when it is parsing the special charactor(bigger than 128) which does't exist in ASCII,the dictionary will be failed.finally I found in file strfuncs function str2words(char line, char *ptr, int32 max_ptr) function,line "while (line[i] && !isspace((int)line[i]))",when the line[i] is a special charactor,(int)line[i] is a negative value,the isspace function will be failed,when it converts the parameter to unsigned int,the result is a very big number, then it is failed because of this "_ASSERTE((unsigned)(c + 1) <= 256);".if I changed the line "while (line[i] && !isspace((int)line[i]))" to "while (line[i] && !isspace((unsigned char)line[i]))",it seems ok for dictionary part,but I do't know if it's ok for rest part(the fsg parsing & searching ...).
is there any maintainer can tell me?Very appreciate!
Best Wishes
Chris
Thanks for your report Chris. It was reported earlier as
https://sourceforge.net/tracker/index.php?func=detail&aid=1236322&group_id=1904&atid=101904
but was not fixed properly. So the answer to your question is - yes, but with bugs.
I've just committed a fix to svn, but we don't know if something else is broken. Please report about problems if you'll find them. It seems we also doesn't support UTF-8 properly, partially by efficiency reasons. But it would be nice to get UTF-8 support too one day. Although I'm currenty using UTF-8 for Russian and haven't noticed something serious.