I'm using sphinx3 with the acoustic model you give HUB4.
I was wondering if sphinx3 would work, with the wsj and wsj_8kHz you give with sphinx4.
Since ther is the .mdef file, the variance, mean, mixture_weights, and the transition matrice
i though it could work.
If not: -does it exist an other continuous model available with less words than in HUB? Indeed 64000 words is more than I need for my application. Around 5000word model would be just fine.
-does i exit a 8khz trained model, so It can match with telephone recognizion.
Thank's for reading me and for the great job you're doing.
Best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
Thanks for your answer,
I already have the oportunity to use sox so it's not gonna be a problem.
Thank's for this clear answer.
You guys are doing a great job.
I don't have any more question (for now) ;-)
Best regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yup, it works fine. We use them to test Sphinx3 all the time.
However there is nothing that says you need to use a 64k vocabulary with the Hub4 models. Vocabulary size is determined by the language model, not the acoustic model.
For 8k models the Communicator ones are pretty good.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I still have a few question to ask
I hope you'll know the answers.
When I ask sphinx3_decode to recognize a file, it stops the recognition at the end of the sentece (when it detects a long time without anyone speaking). How can I make it continue the recognizion until the end of the file.
Is there a way to improve the recognizion by changing paramters??? I don't mind if the recognizer take more time. I've tryed sphinx3_decode_anytopo it didn't really improved the recognizion.
Here is my config:
sphinx3_decode is just organized that way. You can try different recognizers, say sphinx3_livepretend, they should ignore silences. Anyhow it's easier for decoder to work with small utterances actually, so probably it's a good idea to split big files on utterances first.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Exactly. It's very difficult to do speech recognition on inputs of arbitrary length. That's why all systems use some sort of endpointer or segmenter to break it down into smaller segments.
The program you actually want is sphinx3_continuous.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the control file I guess I have to specify the files I want to convert.
What I'm supposed to put in the rawdir file???
And what about the config file... Is the config file the same that I was using for sphinx3_decode?
I'm wondering if one day my questions will stop leading t others questions ;-)
Best regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the config file should be the same, except that sphinx3_continuous accepts only raw, unheadered audio files (sorry about this). These should have the file extension .raw. The control file will list the file names without that extension, while <rawdir> is just the name of the directory where they live.
Also, make sure your raw files are 16000Hz, 16-bit little endian. You can use sox to convert things - say your input file is foo.wav:
sox foo.wav -r 16000 -s -w -t raw foo.raw
This will create foo.raw as a raw file in the correct format.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi guys,
I'm using sphinx3 with the acoustic model you give HUB4.
I was wondering if sphinx3 would work, with the wsj and wsj_8kHz you give with sphinx4.
Since ther is the .mdef file, the variance, mean, mixture_weights, and the transition matrice
i though it could work.
If not: -does it exist an other continuous model available with less words than in HUB? Indeed 64000 words is more than I need for my application. Around 5000word model would be just fine.
-does i exit a 8khz trained model, so It can match with telephone recognizion.
Thank's for reading me and for the great job you're doing.
Best regards
Hi,
Thanks for your answer,
I already have the oportunity to use sox so it's not gonna be a problem.
Thank's for this clear answer.
You guys are doing a great job.
I don't have any more question (for now) ;-)
Best regards.
Yup, it works fine. We use them to test Sphinx3 all the time.
However there is nothing that says you need to use a 64k vocabulary with the Hub4 models. Vocabulary size is determined by the language model, not the acoustic model.
For 8k models the Communicator ones are pretty good.
Indeed it works,
Thank's for your answer.
I still have a few question to ask
I hope you'll know the answers.
When I ask sphinx3_decode to recognize a file, it stops the recognition at the end of the sentece (when it detects a long time without anyone speaking). How can I make it continue the recognizion until the end of the file.
Is there a way to improve the recognizion by changing paramters??? I don't mind if the recognizer take more time. I've tryed sphinx3_decode_anytopo it didn't really improved the recognizion.
Here is my config:
-mdef ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/etc/WSJ_clean_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef -fdict ../../Sphinx3/sphinx3-0.6/model/lm/an4/filler.dict -dict ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/dict/cmudict.0.6d -mean ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/cd_continuous_8gau/means -var ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/cd_continuous_8gau/variances -mixw ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/cd_continuous_8gau/mixture_weights -tmat ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/cd_continuous_8gau/transition_matrices
-ctl ../File_to_recognize/simple.ctl
-subvqbeam 1e-02
-epl 4
-fillprob 0.02
-feat 1s_c_d_dd
-lw 9.5 -maxwpf 1
-beam 1e-40
-pbeam 1e-30
-wbeam 1e-20
-maxhmmpf 1500
-wend_beam 1e-1
-ci_pbeam 1e-5
-lm ../../Sphinx3/sphinx3-0.6/model/hmm/wsj/wsj5k.DMP
Best regards.
Hi,
No answer???
That end of sentence problem really is annoying.
Don't you guys have any idea how to solve it?
I'm sure it is possible.
Regards.
sphinx3_decode is just organized that way. You can try different recognizers, say sphinx3_livepretend, they should ignore silences. Anyhow it's easier for decoder to work with small utterances actually, so probably it's a good idea to split big files on utterances first.
Exactly. It's very difficult to do speech recognition on inputs of arbitrary length. That's why all systems use some sort of endpointer or segmenter to break it down into smaller segments.
The program you actually want is sphinx3_continuous.
Hi,
Thank you both for your answers,
I understand better now.
I' m gonna use sphinx3_continuous.
Here is what I get when I execute sphinx3_continuous
USAGE: ./sphinx3_continuous <ctrlfile> <rawdir> <cfgfile>
In the control file I guess I have to specify the files I want to convert.
What I'm supposed to put in the rawdir file???
And what about the config file... Is the config file the same that I was using for sphinx3_decode?
I'm wondering if one day my questions will stop leading t others questions ;-)
Best regards.
Yes, the config file should be the same, except that sphinx3_continuous accepts only raw, unheadered audio files (sorry about this). These should have the file extension .raw. The control file will list the file names without that extension, while <rawdir> is just the name of the directory where they live.
Also, make sure your raw files are 16000Hz, 16-bit little endian. You can use sox to convert things - say your input file is foo.wav:
sox foo.wav -r 16000 -s -w -t raw foo.raw
This will create foo.raw as a raw file in the correct format.