I wrote a fsg file like this:
FSG_BEGIN
N 7
S 0
F 4
T 0 1 0.9 go
T 1 2 0.5 forward
T 1 5 0.5 backward
T 2 3 0.5 ten
T 3 4 0.5 meters
T 5 6 0.5 two
T 6 4 0.5 meters
FSG_END
for the audio data from the demo package goforward.16k,it works well but when I use the audio
file which I recorded,it is very bad,when I said 'go forward ten meters',sometimes it is recogonize some words(not all of the whole sentences),when I said 'go backward two meters',non of them can be recogonized.Does any body know about this?
otherwise,I am looking for a detailed document about how to write FSG file,anybody have?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-06-01
Chris -- Which Sphinx are you trying to use with your FSG? How is it configured, and what acoustic model are you using?
I do not have any experience with writing Sphinx FSGs, but I've read the very minimal documentation in http://cmusphinx.sourceforge.net/sphinx2/doc/sphinx2.html#sec_fsgfmt , and I looked at the examples that Nicolay cited, and your example looks reasonable, and I think it should work -- it permits the two sentences GO FORWARD TEN METERS and GO BACKWARD TWO METERS. (AFAIK, the usual practice is for the transition probabilities for all paths leaving a node to sum to 1.0, but your values should work as well.)
But let's step back from the question of FSG format and look at the observations you have reported.
1. The demo audio file goforward.16k works -- I assume that means it produces a correct recognition of GO FORWARD TEN METERS.
2. But when you tried it with audio files that you recorded (both 'go forward ten meters' and 'go backward two meters'), it doesn't work, or work well.
1 suggests that the demo file is consistent with your Sphinx configuration, the acoustic model, and the FSG, since the utterance was successfully recognized. #2 says that your audio files didn't work, so we should inquire as to how your audio files are different from the demo file.
First of all, are they the same sample rate and format as the demo file goforward.16k? (As I recall, that's 16K samples/sec, raw format.)
If your files are consistent, then is the recorded audio clear and undistorted (have you listened to them)?
Is the speech in them consistent with the acoustic model? For example, if the acoustic model was trained from American English, is your speech the same (I don't know if you are a native speaker of English)?
There's not much anyone can say about your poor recognitions without seeing the complete output from the recognizer. Can you post this? And what's the Sphinx configuration?
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Jerry,
Thank you very much for replaying.
My sphinx2 works successfully now,I run the sphinx_batch to recogonize and use the parameter according to the sphinx-text,I change some parameter the parameter list:
maybe it is not the best one,I still tune the performance,now the sentense rate error is about 10% to 20%.
I use hub4 acoustic model,I even write a script(python) to generate the FSG File,and just build a small dictionary which just contain the words appears in the grammar.
I am not a native English speaker,I am from China,I think that is why it is not very accurate.
I think I should try to train the model myself.
by the way I use sox to convert the audio file,as:
sox input_file.wav -s -w -r 16000 -t sph output_file.16k
I hope it could help others who has the same problem as me.
If you think some thing wrong with my configuration,can you just tell me?
Thank you very much again!
Regards
Chris
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-06-09
For non-native English speech with an American English acoustic model, a 10-20% sentence error rate may be quite reasonable, depending on the size of your grammar.
Changing the sample rate to 16 kHz to match the model is the correct thing to do, assuming that the original rate is greater than 16 kHz. (This will not be satisfactory if the original rate is less.) I believe that Sox has 3 rate-changing methods, but I do not remember which is the best one. Be sure to use the best method.
Sorry, I have not used Sphinx2 for several years, so I cannot judge your configuration.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I met another problem.
I want to use fsg for sphinx3,but it seems it does not work,I use batch model,and the parameter list is :
-mdef /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef \
-fdict /tmp/sphinx3test/share/sphinx3/model/lm/an4/filler.dict \
-fsg data/eval.fsg \
-ctl data/eval.ctl \
-dict data/eval.dict \
-cepdir audio \
-mean /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \
-var /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \
-mixw /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights \
-tmat /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices \
-maxwpf 1 \
-beam 1e-40 \
-pbeam 1e-30 \
-wbeam 1e-20 \
-maxhmmpf 1500
the last information I got is
INFO: srch.c(447): Search Initialization.
INFO: srch.c(724): lmset is NULL and vithist is NULL in op_mode OP_TST_DECODE, wrong operation mode?
FATAL_ERROR: "kb.c", line 352: Search initialization failed. Forced exit
lmset means language model?it can't use fsg?but the instruction said I can use it.anybody know about this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
All of the decoding routines could be accessible under the executable sphinx3_decode through using the -op_mode options. (-op_mode 2: FST, -op_mode 3: Flat Lexicon Decoder, -op_mode: Tree Lexicon Decoder) The original flat-lexicon decoder interface still exists for backward compatibility purpose.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank you,it is working,now.
can you tell me how to generate mfc file from wav file?I searched the whole morning,I still have no clue,thank you very much.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I wrote a fsg file like this:
FSG_BEGIN
N 7
S 0
F 4
T 0 1 0.9 go
T 1 2 0.5 forward
T 1 5 0.5 backward
T 2 3 0.5 ten
T 3 4 0.5 meters
T 5 6 0.5 two
T 6 4 0.5 meters
FSG_END
for the audio data from the demo package goforward.16k,it works well but when I use the audio
file which I recorded,it is very bad,when I said 'go forward ten meters',sometimes it is recogonize some words(not all of the whole sentences),when I said 'go backward two meters',non of them can be recogonized.Does any body know about this?
otherwise,I am looking for a detailed document about how to write FSG file,anybody have?
ok,I will try,thank you very very much.
Have you tried existing tidigits fsg from sphinx3?
http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/sphinx3/model/hmm/tidigits/
Can you paste complete output from sphinx2 you are running?
Oh,thank you very much,I will have a try.
Chris -- Which Sphinx are you trying to use with your FSG? How is it configured, and what acoustic model are you using?
I do not have any experience with writing Sphinx FSGs, but I've read the very minimal documentation in http://cmusphinx.sourceforge.net/sphinx2/doc/sphinx2.html#sec_fsgfmt , and I looked at the examples that Nicolay cited, and your example looks reasonable, and I think it should work -- it permits the two sentences GO FORWARD TEN METERS and GO BACKWARD TWO METERS. (AFAIK, the usual practice is for the transition probabilities for all paths leaving a node to sum to 1.0, but your values should work as well.)
But let's step back from the question of FSG format and look at the observations you have reported.
1. The demo audio file goforward.16k works -- I assume that means it produces a correct recognition of GO FORWARD TEN METERS.
2. But when you tried it with audio files that you recorded (both 'go forward ten meters' and 'go backward two meters'), it doesn't work, or work well.
1 suggests that the demo file is consistent with your Sphinx configuration, the acoustic model, and the FSG, since the utterance was successfully recognized. #2 says that your audio files didn't work, so we should inquire as to how your audio files are different from the demo file.
First of all, are they the same sample rate and format as the demo file goforward.16k? (As I recall, that's 16K samples/sec, raw format.)
If your files are consistent, then is the recorded audio clear and undistorted (have you listened to them)?
Is the speech in them consistent with the acoustic model? For example, if the acoustic model was trained from American English, is your speech the same (I don't know if you are a native speaker of English)?
There's not much anyone can say about your poor recognitions without seeing the complete output from the recognizer. Can you post this? And what's the Sphinx configuration?
cheers,
jerry
Hi Jerry,
Thank you very much for replaying.
My sphinx2 works successfully now,I run the sphinx_batch to recogonize and use the parameter according to the sphinx-text,I change some parameter the parameter list:
/usr/local/sphinx2/bin/sphinx2_batch \ -adcin TRUE \ -adcext 16k \ -ctlfn data/file.ctl \ -ctloffset 0 \ -datadir audio \ -agcmax TRUE \ -langwt 6.5 \ -fwdflatlw 8.5 \ -rescorelw 9.5 \ -ugwt 0.5 \ -fillpen 1e-12 \ -silpen 0.002 \ -inspen 0.65 \ -top 4 \ -topsenfrm 4 \ -topsenthresh \ -70000 \ -beam 0 \ -npbeam 0 \ -lpbeam 0 \ -lponlybeam 0 \ -nwbeam 0 \ -fwdflat TRUE \ -fwdflatbeam 1e-08 \ -fwdflatnwbeam 0.0001 \ -fsgfn data/eval.fsg \ -dictfn data/eval.dict \ -ndictfn data/model/hub4/sphinx_2_format/noisedict \ -phnfn data/model/hub4/sphinx_2_format/phone \ -mapfn data/model/hub4/sphinx_2_format/map \ -hmmdir data/model/hub4/sphinx_2_format \ -hmmdirlist data/model/hub4/sphinx_2_format \ -sendumpfn data/model/hub4/sphinx_2_format/sendump \ -cbdir data/model/hub4/sphinx_2_format \ -8bsen TRUE \ -bestpath TRUE \ -fsgbfs TRUE \ -fsgusealtpron FALSE \ -fsgusefiller FALSE \ -compressprior FALSE \ -compress FALSE \ -compallsen TRUE \ -latsize 5000 \ -normmean TRUE \ -maxhmmpf
maybe it is not the best one,I still tune the performance,now the sentense rate error is about 10% to 20%.
I use hub4 acoustic model,I even write a script(python) to generate the FSG File,and just build a small dictionary which just contain the words appears in the grammar.
I am not a native English speaker,I am from China,I think that is why it is not very accurate.
I think I should try to train the model myself.
by the way I use sox to convert the audio file,as:
sox input_file.wav -s -w -r 16000 -t sph output_file.16k
I hope it could help others who has the same problem as me.
If you think some thing wrong with my configuration,can you just tell me?
Thank you very much again!
Regards
Chris
For non-native English speech with an American English acoustic model, a 10-20% sentence error rate may be quite reasonable, depending on the size of your grammar.
Changing the sample rate to 16 kHz to match the model is the correct thing to do, assuming that the original rate is greater than 16 kHz. (This will not be satisfactory if the original rate is less.) I believe that Sox has 3 rate-changing methods, but I do not remember which is the best one. Be sure to use the best method.
Sorry, I have not used Sphinx2 for several years, so I cannot judge your configuration.
cheers,
jerry
I met another problem.
I want to use fsg for sphinx3,but it seems it does not work,I use batch model,and the parameter list is :
-mdef /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef \ -fdict /tmp/sphinx3test/share/sphinx3/model/lm/an4/filler.dict \ -fsg data/eval.fsg \ -ctl data/eval.ctl \ -dict data/eval.dict \ -cepdir audio \ -mean /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \ -var /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \ -mixw /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights \ -tmat /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices \ -maxwpf 1 \ -beam 1e-40 \ -pbeam 1e-30 \ -wbeam 1e-20 \ -maxhmmpf 1500
the last information I got is
INFO: srch.c(447): Search Initialization.
INFO: srch.c(724): lmset is NULL and vithist is NULL in op_mode OP_TST_DECODE, wrong operation mode?
FATAL_ERROR: "kb.c", line 352: Search initialization failed. Forced exit
lmset means language model?it can't use fsg?but the instruction said I can use it.anybody know about this?
You should also use an option to point decoder mode: -mode 2
To be more precise:
http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html
All of the decoding routines could be accessible under the executable sphinx3_decode through using the -op_mode options. (-op_mode 2: FST, -op_mode 3: Flat Lexicon Decoder, -op_mode: Tree Lexicon Decoder) The original flat-lexicon decoder interface still exists for backward compatibility purpose.
thank you very much,I will try it now
thank you,it is working,now.
can you tell me how to generate mfc file from wav file?I searched the whole morning,I still have no clue,thank you very much.
Use sphinx_fe program. Decoder should read wav files perfectly though.
I just use raw2cep to convert sphinx2 sph files to mfc files,I will try sphinx_fe,where can I find it ?
thanks a lot.
it's a part of sphinxbase - sphinxbase/src/sphinx_fe