I am running Pocketsphinx_continuous in Windows. When I do English recognition
it works fine, but when I do Chinese recognition the output is all messy code
(unreadable characters). I think it is because it is not using the correct
character set. I tried to convert the result of ps_get_hyp() with
WideCharToMultiByte(CP_ACP,...) and stepped into the code but didn't succeed.
Could anyone tell me what character set the internal functions are using for
the Chinese string, and how to output it to standard output or files
correctly? Many thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks nshmyrev, I have managed to output Chinese result to the console by
using MultiByteToWideChar and WriteConsole. But the recognition result is not
too good. With the default language model file in the install folder, the
result wasn't correct even once (a few are close). After defining a Chinese
version 'goforward' grammar file and applying it, the result was 80% correct.
What's the best result of Chinese recognition to your knowledge (maybe with
high-end devices, better accent, trained acoustic model )?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i do the same :
t the recognition result is not too good. With the default language model file
in the install folder, the result wasn't correct even once
so please help me how to "defining a Chinese version 'goforward' grammar file"
? thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After defining a Chinese version 'goforward' grammar file and applying it, the
result was 80% correct.
===
i run Pocketsphinx_continuous.exe in Windows ,and pass the arguments at
command line as -hmm "hmm path" -lm "lm path" -dict "dic path" , the ASR
result seldom right. how do you pass "grammar file" parameter to the program.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What's the best result of Chinese recognition to your knowledge (maybe with
high-end devices, better accent, trained acoustic model )?
Best result depends on the task, since I don't know which task do you target
it's hard to get a good advise. Overall I think you'll be interested to read
the tutorial first
I am running Pocketsphinx_continuous in Windows. When I do English recognition
it works fine, but when I do Chinese recognition the output is all messy code
(unreadable characters). I think it is because it is not using the correct
character set. I tried to convert the result of ps_get_hyp() with
WideCharToMultiByte(CP_ACP,...) and stepped into the code but didn't succeed.
Could anyone tell me what character set the internal functions are using for
the Chinese string, and how to output it to standard output or files
correctly? Many thanks.
Pocketsphinx outputs UTF-8 characters. You can change your console codepage to
UTF-8 using
command. You can convert to wide bytes and back to encoding you need with
double calls of MultiByteToWideChar. and WideCharToMultiByte
Thanks nshmyrev, I have managed to output Chinese result to the console by
using MultiByteToWideChar and WriteConsole. But the recognition result is not
too good. With the default language model file in the install folder, the
result wasn't correct even once (a few are close). After defining a Chinese
version 'goforward' grammar file and applying it, the result was 80% correct.
What's the best result of Chinese recognition to your knowledge (maybe with
high-end devices, better accent, trained acoustic model )?
After defining a Chinese version 'goforward' grammar file and applying it, the
result was 80% correct.
how to "defining a Chinese version 'goforward' grammar file", thanks!
i do the same :
t the recognition result is not too good. With the default language model file
in the install folder, the result wasn't correct even once
so please help me how to "defining a Chinese version 'goforward' grammar file"
? thanks!
There is a example of jsgf grammar here: http://hackaday.com/2010/07/11
/adding-speach-recognition-to-your-embedded-
platform/
thanks firstary !
my msn : danielchendc@live.cn
i research ASR and TTS too, you can add my MSN if you want ,thanks!
After defining a Chinese version 'goforward' grammar file and applying it, the
result was 80% correct.
===
i run Pocketsphinx_continuous.exe in Windows ,and pass the arguments at
command line as -hmm "hmm path" -lm "lm path" -dict "dic path" , the ASR
result seldom right. how do you pass "grammar file" parameter to the program.
cmd line :
pocketsphinx_continuous.exe -hmm
D:\PocketSphinx\pocketsphinx\model\hmm\zh\tdt_sc_8k
-lm D:\PocketSphinx\pocketsphinx\model\lm\zh_CN\gigatdt.5000.DMP
-dict D:\PocketSphinx\pocketsphinx\model\lm\zh_CN\mandarin_notone.dic
Best result depends on the task, since I don't know which task do you target
it's hard to get a good advise. Overall I think you'll be interested to read
the tutorial first
http://cmusphinx.sourceforge.net/wiki/tutorial