How to output Chinese in Windows

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

How to output Chinese in Windows

Forum: Help

Creator: Youzhi Yu

Created: 2011-01-06

Updated: 2012-09-22

Youzhi Yu - 2011-01-06

I am running Pocketsphinx_continuous in Windows. When I do English recognition
it works fine, but when I do Chinese recognition the output is all messy code
(unreadable characters). I think it is because it is not using the correct
character set. I tried to convert the result of ps_get_hyp() with
WideCharToMultiByte(CP_ACP,...) and stepped into the code but didn't succeed.
Could anyone tell me what character set the internal functions are using for
the Chinese string, and how to output it to standard output or files
correctly? Many thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-01-06

Pocketsphinx outputs UTF-8 characters. You can change your console codepage to
UTF-8 using

chcp 65001

command. You can convert to wide bytes and back to encoding you need with
double calls of MultiByteToWideChar. and WideCharToMultiByte
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Youzhi Yu - 2011-01-07

Thanks nshmyrev, I have managed to output Chinese result to the console by
using MultiByteToWideChar and WriteConsole. But the recognition result is not
too good. With the default language model file in the install folder, the
result wasn't correct even once (a few are close). After defining a Chinese
version 'goforward' grammar file and applying it, the result was 80% correct.

What's the best result of Chinese recognition to your knowledge (maybe with
high-end devices, better accent, trained acoustic model )?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

daniel chen - 2011-01-07

After defining a Chinese version 'goforward' grammar file and applying it, the
result was 80% correct.

how to "defining a Chinese version 'goforward' grammar file", thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

daniel chen - 2011-01-07

i do the same :
t the recognition result is not too good. With the default language model file
in the install folder, the result wasn't correct even once

so please help me how to "defining a Chinese version 'goforward' grammar file"
? thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Youzhi Yu - 2011-01-07

There is a example of jsgf grammar here: http://hackaday.com/2010/07/11
/adding-speach-recognition-to-your-embedded-
platform/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

daniel chen - 2011-01-07

thanks firstary !

my msn : danielchendc@live.cn

i research ASR and TTS too, you can add my MSN if you want ,thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

daniel chen - 2011-01-07

After defining a Chinese version 'goforward' grammar file and applying it, the
result was 80% correct.
===
i run Pocketsphinx_continuous.exe in Windows ,and pass the arguments at
command line as -hmm "hmm path" -lm "lm path" -dict "dic path" , the ASR
result seldom right. how do you pass "grammar file" parameter to the program.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

daniel chen - 2011-01-07

cmd line :
pocketsphinx_continuous.exe -hmm
D:\PocketSphinx\pocketsphinx\model\hmm\zh\tdt_sc_8k
-lm D:\PocketSphinx\pocketsphinx\model\lm\zh_CN\gigatdt.5000.DMP
-dict D:\PocketSphinx\pocketsphinx\model\lm\zh_CN\mandarin_notone.dic

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-01-07

What's the best result of Chinese recognition to your knowledge (maybe with
high-end devices, better accent, trained acoustic model )?

Best result depends on the task, since I don't know which task do you target
it's hard to get a good advise. Overall I think you'll be interested to read
the tutorial first

http://cmusphinx.sourceforge.net/wiki/tutorial

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.