Should the transcription file be an exact transcription of the audio ? I have a lot of audio files (10 minutes each ) which i would be using to train the model using. What would happen if i ignore some words from audio while creating transcription file ( typically words like ummm, well, is ) ... Would it mess up the text conversion completely when i decode a new audio file with this model ?
When i tried to run these audio files with the models provided my CMUSphinx, i did not get good results, so am trying to create new models.
Please let me know ( apologies if the questions are basic, but i am short of time)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is no need to start a new thread if you can continue old one.
What would happen if i ignore some words from audio while creating transcription file ( typically words like ummm, well, is ) ... Would it mess up the text conversion completely when i decode a new audio file with this model ?
The accuracy will be significantly lower.
When i tried to run these audio files with the models provided my CMUSphinx, i did not get good results, so am trying to create new models.
It is recommended to use original cmusphinx models which should work pretty good. To get help on the accuracy you need to provide the data you are using.
Please let me know ( apologies if the questions are basic, but i am short of time)
It is not possible to train a good model in less than a month, as pointed in tutorial.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Am trying model adaptation following word by word http://cmusphinx.sourceforge.net/wiki/tutorialadapt site ...
"bw" command gives me error saying - "Number of feature streams in mixture_weights file 3 differs from the configured value 1, check the command line options" ... i saw this faced by others too but their solutions didnt work for me either.
Can someone please point me to the correct lm file that i should be using to fix this ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
it also says -
" For example, for continuous model you don't need to include the svspec option. Instead, you need to use just -ts2cbfn .cont. For semi-continuous models use -ts2cbfn .semi. If model has feature_transform file like en-us continuous model, you need to add -lda feature_transform argument to bw, otherwise it will not work properly. "
btw, i chose a different accoustic model ( attached ) and "bw" worked.
arctic20.fileids --- has 20 files
arctic20.transcription -- has transcription for 20 files
cmudict-en-us.dict --- dictionary
cmusphinx-en-us-5.2 --- accoustic model ( attached )
en-us.lm.dmp ----- language model
Should the transcription file be an exact transcription of the audio ? I have a lot of audio files (10 minutes each ) which i would be using to train the model using. What would happen if i ignore some words from audio while creating transcription file ( typically words like ummm, well, is ) ... Would it mess up the text conversion completely when i decode a new audio file with this model ?
When i tried to run these audio files with the models provided my CMUSphinx, i did not get good results, so am trying to create new models.
Please let me know ( apologies if the questions are basic, but i am short of time)
There is no need to start a new thread if you can continue old one.
The accuracy will be significantly lower.
It is recommended to use original cmusphinx models which should work pretty good. To get help on the accuracy you need to provide the data you are using.
It is not possible to train a good model in less than a month, as pointed in tutorial.
Am trying model adaptation following word by word http://cmusphinx.sourceforge.net/wiki/tutorialadapt site ...
"bw" command gives me error saying - "Number of feature streams in mixture_weights file 3 differs from the configured value 1, check the command line options" ... i saw this faced by others too but their solutions didnt work for me either.
Can someone please point me to the correct lm file that i should be using to fix this ?
It has nothing to do with language model. Which acoustic model are you trying to adapt?
i meant i used /usr/local/share/pocketsphinx/model/en-us/en-us . This doesnt work and gives the above error. Any comments ?
In bw command you miss -svspec option provided in tutorial. You need to follow tutorial precisely.
it also says -
" For example, for continuous model you don't need to include the svspec option. Instead, you need to use just -ts2cbfn .cont. For semi-continuous models use -ts2cbfn .semi. If model has
feature_transformfile like en-us continuous model, you need to add -lda feature_transform argument to bw, otherwise it will not work properly. "btw, i chose a different accoustic model ( attached ) and "bw" worked.
my sequence of activities -
sphinx_fe -argfile cmusphinx-en-us-5.2/feat.params -samprate 16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes
pocketsphinx_mdef_convert -text cmusphinx-en-us-5.2/mdef cmusphinx-en-us-5.2/mdef.txt
bw -hmmdir cmusphinx-en-us-5.2 -moddeffn cmusphinx-en-us-5.2/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn arctic20.fileids -lsnfn arctic20.transcription -accumdir .
map_adapt -moddeffn cmusphinx-en-us-5.2/mdef.txt -ts2cbfn .ptm. -meanfn cmusphinx-en-us-5.2/means -varfn cmusphinx-en-us-5.2/variances -mixwfn cmusphinx-en-us-5.2/mixture_weights -tmatfn cmusphinx-en-us-5.2/transition_matrices -accumdir . -mapmeanfn cmusphinx-en-us-5.2-adapt/means -mapvarfn cmusphinx-en-us-5.2-adapt/variances -mapmixwfn cmusphinx-en-us-5.2-adapt/mixture_weights -maptmatfn cmusphinx-en-us-5.2-adapt/transition_matrices
pocketsphinx_continuous -hmm cmusphinx-en-us-5.2-adapt -lm en-us.lm.dmp -dict cmudict-en-us.dict -infile arctic_0001.wav
arctic20.fileids --- has 20 files
arctic20.transcription -- has transcription for 20 files
cmudict-en-us.dict --- dictionary
cmusphinx-en-us-5.2 --- accoustic model ( attached )
en-us.lm.dmp ----- language model
Attached is arctic_0001.wav for reference.
for some reason its not letting me upload the model - here is the link - http://softlayer-sng.dl.sourceforge.net/project/cmusphinx/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-5.2.tar.gz
it worked for me. the issue is resolved.
i am facing another issue while running the "bw" command -
(am using telephone model for 8khz)
i am trying adaptation.
bw -hmmdir wsj_8kHz -moddeffn wsj_8kHz/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn fileid -lsnfn trans -accumdir .
INFO: cmn.c(183): CMN: 13.25 0.43 -0.29 -0.19 -0.32 -0.31 -0.32 -0.20 -0.25 -0.16 -0.26 -0.14 -0.20
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: 12 ignored
utt> 0 12 303 0 156 35 utt 0.003x 0.998e upd 0.003x 0.997e fwd 0.003x 0.998e bwd 0.000x 0.000e gau 0.002x 0.983e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
Attached audio file attached with transcript -
and then you are gonna be able to download your call log(12)Can someone please help ?
Can someone please help me on this ? The details above