I search through the forums, but found only one comment that you can train unicode languages by converting characters to some other ASCII representation.
I converted my transcription, phone list, and dic to UTF-8, but it still did not work. So i guess it is true that Sphinx training module currently does not support UNICODE, right?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The log file was saying that there were extra space in the phone list, but I could not see in the vi. Then notice that vi show the file as a [dos] format. :D
Run the dos2unix on both dic and phone file, everything is working now. :-)
I created those files in the windows, and try to look at the problem in the wrong places because i thought Sphinx did not support UTF-8 :)
Thank you for confirming.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I search through the forums, but found only one comment that you can train unicode languages by converting characters to some other ASCII representation.
http://sourceforge.net/forum/message.php?msg_id=3126838
I converted my transcription, phone list, and dic to UTF-8, but it still did not work. So i guess it is true that Sphinx training module currently does not support UNICODE, right?
> http://sourceforge.net/forum/message.php?msg_id=3126838
I don't think you understood it properly. Only phoneset must be ascii, the rest can be utf8.
> I converted my transcription, phone list, and dic to UTF-8, but it still did not work
'did not work' is not a good description of the problem. It's not possible to help unless you provide more information.
Found the problem now...
The log file was saying that there were extra space in the phone list, but I could not see in the vi. Then notice that vi show the file as a [dos] format. :D
Run the dos2unix on both dic and phone file, everything is working now. :-)
I created those files in the windows, and try to look at the problem in the wrong places because i thought Sphinx did not support UTF-8 :)
Thank you for confirming.