hi,i'm a final year student doing my engineering from mumbai university.we are
using pocketsphinx for speech recognition in our project.we are trying to
adapt the acoustic model by the procedure mentioned on CMU Sphinx adaptation
tutorial,but we are facing few errors like:
1) sphinx_fe ''cat hub4wsj_sc_8k/feat.params'' -samprate 16000 -c
arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes
INFO: cmd_ln.c(512): Parsing command line:
sphinx_fe cat hub4wsj_sc_8k/feat.params \
-samprate 16000 \
-c arctic20.fileids \
-di . \
-do . \
-ei wav \
-eo mfc \
-mswav yes
ERROR: "cmd_ln.c", line 565: Unknown argument name 'cat'
ERROR: "cmd_ln.c", line 650: cmd_ln_parse_r failed
in order to eliminate this we wrote: sphinx_fe -nfilt 20 -lowerf 1 -upperf
4000 -wlen 0.025 -transform dct -round_filters no -remove_dc yes -samprate
16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes
this time we did not get any error and continued the process.
2)./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
FATAL_ERROR: "corpus.c", line 1647: Failed to get the files after 100 retries
of getting MFCC(about 300 seconds)
we need help urgently,it would be kind if anyone would guide us on this.
waiting for reply
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for posting your reply.The first error is gone,'-argfile
model/feat.params instead of cat:' worked.
but the collection of statistics is still giving the same error:
./bw \ -hmmdir hub4wsj_sc_8k \ -moddeffn hub4wsj_sc_8k/mdef.txt \ -ts2cbfn
.semi. \ -feat 1s_c_d_dd \ -svspec 0-12/13-25/26-38 \ -cmn current \ -agc none
\ -dictfn arctic20.dic \ -ctlfn arctic20.fileids \ -lsnfn
arctic20.transcription \ -accumdir . FATAL_ERROR: "corpus.c", line 1647:
Failed to get the files after 100 retries of getting MFCC(about 300 seconds)
it was also given ,that most continuous models don't need to include svspec
option.
by eliminating that,it still gives error:
FATAL_ERROR: "mod_inv.c", line 354: # of features in mixw file, 3, is
inconsistent w/ prior setting, 1
does hub4wsj_sc_8k require svspec.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@nshmyrev hi i too am trying to adapt the existing model .
im getting the same error as lloyd89 where the files are not being read
it keeps trying 100 times and finally gives an error
i tried setting the cepdir to my current working directory too but the same
problem persists.
has this anything to do with the way i downloaded the arctic files....
when clicked on the link it opens a new page with the file names..i copied the
text into gedit and saved it according to the given extension
-help no no
-example no no
-hmmdir hub4wsj_sc_8k
-moddeffn hub4wsj_sc_8k/mdef.txt
-tmatfn
-mixwfn
-meanfn
-varfn
-fullvar no no
-diagfull no no
-mwfloor 0.00001 1.000000e-05
-tpfloor 0.0001 1.000000e-04
-varfloor 0.00001 1.000000e-05
-topn 4 4
-dictfn arctic20.dic
-fdictfn
-ltsoov no no
-ctlfn arctic20.listoffiles
-nskip
-runlen -1 -1
-part
-npart
-cepext mfc mfc
-cepdir .
-phsegext phseg phseg
-phsegdir
-outphsegdir
-sentdir
-sentext sent sent
-lsnfn arctic20.transcription
-accumdir .
-ceplen 13 13
-cepwin 0 0
-agc max none
-cmn current current
-varnorm no no
-silcomp none none
-sildel no no
-siltag SIL SIL
-abeam 1e-100 1.000000e-100
-bbeam 1e-100 1.000000e-100
-varreest yes yes
-meanreest yes yes
-mixwreest yes yes
-tmatreest yes yes
-mllrmat
-cb2mllrfn .1cls. .1cls.
-ts2cbfn .semi.
-feat 1s_c_d_dd 1s_c_d_dd
-svspec 0-12/13-25/26-38
-ldafn
-ldadim 29 29
-ldaaccum no no
-timing yes yes
-viterbi no no
-2passvar no no
-sildelfn
-spthresh 0.0 0.000000e+00
-maxuttlen 0 0
-ckptintv
-outputfullpath no no
-fullsuffixmatch no no
-pdumpdir
INFO: main.c(255): Reading hub4wsj_sc_8k/mdef.txt
INFO: model_def_io.c(587): Model definition info:
INFO: model_def_io.c(588): 143097 total models defined (50 base, 143047 tri)
INFO: model_def_io.c(589): 572388 total states
INFO: model_def_io.c(590): 5150 total tied states
INFO: model_def_io.c(591): 150 total tied CI states
INFO: model_def_io.c(592): 50 total tied transition matrices
INFO: model_def_io.c(593): 4 max state/model
INFO: model_def_io.c(594): 4 min state/model
INFO: s3mixw_io.c(116): Read hub4wsj_sc_8k/mixture_weights
INFO: s3tmat_io.c(115): Read hub4wsj_sc_8k/transition_matrices
INFO: mod_inv.c(297): inserting tprob floor 1.000000e-04 and renormalizing
INFO: s3gau_io.c(166): Read hub4wsj_sc_8k/means
INFO: s3gau_io.c(166): Read hub4wsj_sc_8k/variances
INFO: gauden.c(181): 1 total mgau
INFO: gauden.c(155): 3 feature streams (|0|=13 |1|=13 |2|=13 )
INFO: gauden.c(192): 256 total densities
INFO: gauden.c(98): min_var=1.000000e-05
INFO: gauden.c(170): compute 4 densities/frame
INFO: main.c(363): Will reestimate mixing weights.
INFO: main.c(365): Will reestimate means.
INFO: main.c(367): Will reestimate variances.
INFO: main.c(369): WIll NOT optionally delete silence in Baum Welch or
Viterbi.
INFO: main.c(377): Will reestimate transition matrices
INFO: main.c(390): Reading main lexicon: arctic20.dic
INFO: lexicon.c(233): 174 entries added from arctic20.dic
INFO: main.c(402): Reading filler lexicon: hub4wsj_sc_8k/noisedict
INFO: lexicon.c(233): 11 entries added from hub4wsj_sc_8k/noisedict
INFO: main.c(423): Silence Tag SIL
INFO: corpus.c(1343): Will process all remaining utts starting at 0
INFO: main.c(622): Reestimation: Baum-Welch
INFO: main.c(627): Generating profiling information consumes significant CPU
resources.
INFO: main.c(628): If you are not interested in profiling, use -timing no
column defns
<seq>
<id>
<n_frame_in>
<n_frame_del>
<n_state_shmm>
<avg_states_alpha>
<avg_states_beta>
<avg_states_reest>
<avg_posterior_prune>
<frame_log_lik>
<utt_log_lik>
... timing info ...
.mfc) failed/arctic_0001arctic_0001
failed. Retrying after sleep...C read of arctic_0001
.mfc) failed/arctic_0001
failed. Retrying after sleep...C read of arctic_0001 </utt_log_lik></frame_log_lik></avg_posterior_prune></avg_states_reest></avg_states_beta></avg_states_alpha></n_state_shmm></n_frame_del></n_frame_in></id></seq>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your fileids file have windows-style newlines (CR+LF). You can either remove
them with dos2unix command or you can download and build latest Sphinxtrain
snapshot which handles newlines gracefully.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks a lot the process completed succesfully.....
one more doubt....the currently adapted model does not give me a good
output......can i add sentences (with words that are specific to my use) to
the arctic file and make a new model....will it work?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi,i'm a final year student doing my engineering from mumbai university.we are
using pocketsphinx for speech recognition in our project.we are trying to
adapt the acoustic model by the procedure mentioned on CMU Sphinx adaptation
tutorial,but we are facing few errors like:
1) sphinx_fe ''cat hub4wsj_sc_8k/feat.params'' -samprate 16000 -c
arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes
INFO: cmd_ln.c(512): Parsing command line:
sphinx_fe cat hub4wsj_sc_8k/feat.params \
-samprate 16000 \
-c arctic20.fileids \
-di . \
-do . \
-ei wav \
-eo mfc \
-mswav yes
ERROR: "cmd_ln.c", line 565: Unknown argument name 'cat'
ERROR: "cmd_ln.c", line 650: cmd_ln_parse_r failed
in order to eliminate this we wrote: sphinx_fe -nfilt 20 -lowerf 1 -upperf
4000 -wlen 0.025 -transform dct -round_filters no -remove_dc yes -samprate
16000 -c arctic20.fileids -di . -do . -ei wav -eo mfc -mswav yes
this time we did not get any error and continued the process.
2)./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
FATAL_ERROR: "corpus.c", line 1647: Failed to get the files after 100 retries
of getting MFCC(about 300 seconds)
we need help urgently,it would be kind if anyone would guide us on this.
waiting for reply
It's possible to buy 24x7 support on CMUSphinx if you'll pay some money
The wiki page was updated, you need to read it again. Basically you need to
use -argfile model/feat.params instead of cat:
sphinx_fe -argfile model/feat.params
This way the contents of the feat.params file will be properly handled by
sphinx_fe.
Thank you for posting your reply.The first error is gone,'-argfile
model/feat.params instead of cat:' worked.
but the collection of statistics is still giving the same error:
./bw \ -hmmdir hub4wsj_sc_8k \ -moddeffn hub4wsj_sc_8k/mdef.txt \ -ts2cbfn
.semi. \ -feat 1s_c_d_dd \ -svspec 0-12/13-25/26-38 \ -cmn current \ -agc none
\ -dictfn arctic20.dic \ -ctlfn arctic20.fileids \ -lsnfn
arctic20.transcription \ -accumdir . FATAL_ERROR: "corpus.c", line 1647:
Failed to get the files after 100 retries of getting MFCC(about 300 seconds)
it was also given ,that most continuous models don't need to include svspec
option.
by eliminating that,it still gives error:
FATAL_ERROR: "mod_inv.c", line 354: # of features in mixw file, 3, is
inconsistent w/ prior setting, 1
does hub4wsj_sc_8k require svspec.
It fails to read the file, probably you need to set cepdir to point to the
folder where mfc files are. Are files really there?
Yes, hub4 adaptation is described in wiki as is.
@nshmyrev hi i too am trying to adapt the existing model .
im getting the same error as lloyd89 where the files are not being read
it keeps trying 100 times and finally gives an error
i tried setting the cepdir to my current working directory too but the same
problem persists.
has this anything to do with the way i downloaded the arctic files....
when clicked on the link it opens a new page with the file names..i copied the
text into gedit and saved it according to the given extension
here is the error-
nikhil@nikhil-desktop:~/project/adapt$ ./bw -hmmdir hub4wsj_sc_8k -moddeffn
hub4wsj_sc_8k/mdef.txt -ts2cbfn .semi. -feat 1s_c_d_dd -svspec
0-12/13-25/26-38 -cmn current -agc none -dictfn arctic20.dic -ctlfn
arctic20.listoffiles -lsnfn arctic20.transcription -cepdir . -accumdir .
INFO: main.c(196): Compiled on Sep 21 2010 at 00:02:20
./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.listoffiles \
-lsnfn arctic20.transcription \
-cepdir . \
-accumdir .
-help no no
-example no no
-hmmdir hub4wsj_sc_8k
-moddeffn hub4wsj_sc_8k/mdef.txt
-tmatfn
-mixwfn
-meanfn
-varfn
-fullvar no no
-diagfull no no
-mwfloor 0.00001 1.000000e-05
-tpfloor 0.0001 1.000000e-04
-varfloor 0.00001 1.000000e-05
-topn 4 4
-dictfn arctic20.dic
-fdictfn
-ltsoov no no
-ctlfn arctic20.listoffiles
-nskip
-runlen -1 -1
-part
-npart
-cepext mfc mfc
-cepdir .
-phsegext phseg phseg
-phsegdir
-outphsegdir
-sentdir
-sentext sent sent
-lsnfn arctic20.transcription
-accumdir .
-ceplen 13 13
-cepwin 0 0
-agc max none
-cmn current current
-varnorm no no
-silcomp none none
-sildel no no
-siltag SIL SIL
-abeam 1e-100 1.000000e-100
-bbeam 1e-100 1.000000e-100
-varreest yes yes
-meanreest yes yes
-mixwreest yes yes
-tmatreest yes yes
-mllrmat
-cb2mllrfn .1cls. .1cls.
-ts2cbfn .semi.
-feat 1s_c_d_dd 1s_c_d_dd
-svspec 0-12/13-25/26-38
-ldafn
-ldadim 29 29
-ldaaccum no no
-timing yes yes
-viterbi no no
-2passvar no no
-sildelfn
-spthresh 0.0 0.000000e+00
-maxuttlen 0 0
-ckptintv
-outputfullpath no no
-fullsuffixmatch no no
-pdumpdir
INFO: main.c(255): Reading hub4wsj_sc_8k/mdef.txt
INFO: model_def_io.c(587): Model definition info:
INFO: model_def_io.c(588): 143097 total models defined (50 base, 143047 tri)
INFO: model_def_io.c(589): 572388 total states
INFO: model_def_io.c(590): 5150 total tied states
INFO: model_def_io.c(591): 150 total tied CI states
INFO: model_def_io.c(592): 50 total tied transition matrices
INFO: model_def_io.c(593): 4 max state/model
INFO: model_def_io.c(594): 4 min state/model
INFO: s3mixw_io.c(116): Read hub4wsj_sc_8k/mixture_weights
INFO: s3tmat_io.c(115): Read hub4wsj_sc_8k/transition_matrices
INFO: mod_inv.c(297): inserting tprob floor 1.000000e-04 and renormalizing
INFO: s3gau_io.c(166): Read hub4wsj_sc_8k/means
INFO: s3gau_io.c(166): Read hub4wsj_sc_8k/variances
INFO: gauden.c(181): 1 total mgau
INFO: gauden.c(155): 3 feature streams (|0|=13 |1|=13 |2|=13 )
INFO: gauden.c(192): 256 total densities
INFO: gauden.c(98): min_var=1.000000e-05
INFO: gauden.c(170): compute 4 densities/frame
INFO: main.c(363): Will reestimate mixing weights.
INFO: main.c(365): Will reestimate means.
INFO: main.c(367): Will reestimate variances.
INFO: main.c(369): WIll NOT optionally delete silence in Baum Welch or
Viterbi.
INFO: main.c(377): Will reestimate transition matrices
INFO: main.c(390): Reading main lexicon: arctic20.dic
INFO: lexicon.c(233): 174 entries added from arctic20.dic
INFO: main.c(402): Reading filler lexicon: hub4wsj_sc_8k/noisedict
INFO: lexicon.c(233): 11 entries added from hub4wsj_sc_8k/noisedict
INFO: main.c(423): Silence Tag SIL
INFO: corpus.c(1343): Will process all remaining utts starting at 0
INFO: main.c(622): Reestimation: Baum-Welch
INFO: main.c(627): Generating profiling information consumes significant CPU
resources.
INFO: main.c(628): If you are not interested in profiling, use -timing no
column defns
<seq>
<id>
<n_frame_in>
<n_frame_del>
<n_state_shmm>
<avg_states_alpha>
<avg_states_beta>
<avg_states_reest>
<avg_posterior_prune>
<frame_log_lik>
<utt_log_lik>
... timing info ...
.mfc) failed/arctic_0001arctic_0001
failed. Retrying after sleep...C read of arctic_0001
.mfc) failed/arctic_0001
failed. Retrying after sleep...C read of arctic_0001 </utt_log_lik></frame_log_lik></avg_posterior_prune></avg_states_reest></avg_states_beta></avg_states_alpha></n_state_shmm></n_frame_del></n_frame_in></id></seq>
Your fileids file have windows-style newlines (CR+LF). You can either remove
them with dos2unix command or you can download and build latest Sphinxtrain
snapshot which handles newlines gracefully.
thanks a lot the process completed succesfully.....
one more doubt....the currently adapted model does not give me a good
output......can i add sentences (with words that are specific to my use) to
the arctic file and make a new model....will it work?
Yes, you can