Menu

Acoustic model hub4wsj_sc_8k adaptation.

Help
2010-04-05
2012-09-22
  • Gopalakrishna

    Gopalakrishna - 2010-04-05

    Hello,
    I need to adapt the hub4wsj_sc_8k acoustic model provided with pocketsphinx
    0.6. I have some doubts regarding the adaptation process given at
    http://cmusphinx.sourceforge.net/wiki/AcousticModelAdaptation

    1) What is the sampling rate at which the acoustic mode lhub4wsj_sc_8k was
    built?
    2) How to get the mixture_weights file for this acoustic model?
    3) What is the sampling rate at which I should record the input audio?

    Thank you.

     
  • Nickolay V. Shmyrev

    Hi

    What is the sampling rate at which the acoustic mode lhub4wsj_sc_8k was
    built?

    WSJ audio is 16kHz, however, hub4wsj filters strip everything above 4 kHz (see
    upperf in feat.params). That basically means you can use any audio even 8kHz
    one.

    How to get the mixture_weights file for this acoustic model?

    Ask the developer (David Huggins-Daines) to share it. It was available in SVN
    history I remember but was removed to save memory.

    What is the sampling rate at which I should record the input audio?

    It depends on your recording capabilities. If you are not at telephone line
    and can record at 16 kHz, do that.

     
  • Gopalakrishna

    Gopalakrishna - 2010-04-08

    Thank you.

    It depends on your recording capabilities. If you are not at telephone line
    and can record at 16 kHz, do that.

    Does this mean that I should sample at 16khz and then the procedure given in
    the wiki should be followed?

     
  • Nickolay V. Shmyrev

    How to get the mixture_weights file for this acoustic model?

    Missing files were added back to trunk today

    Does this mean that I should sample at 16khz and then the procedure given in
    the wiki should be followed?

    Yes

     
  • Gopalakrishna

    Gopalakrishna - 2010-04-09

    Thank you.. As per your suggestion I had contacted David Huggins-Daines.

     
  • Gopalakrishna

    Gopalakrishna - 2010-04-19

    Hello,
    I have a problem training the model given at the svn repo as per the wiki..
    1) I recorded the audio at 16khz and saved in raw format.
    2) The features where extracted using

    sphinx_fe -nfilt 20 -lowerf 1 -upperf 4000 -wlen 0.025 -transform dct -round_filters no -remove_dc yes -samprate 16000 -c ../doc/a.listoffiles -di . -do . -ei raw -eo mfc -raw yes
    

    3) Now when i run

    ../doc/bw -hmmdir adapt -moddeffn adapt/mdef.txt -ts2cbfn .semi. -feat 1s_c_d_dd -cmn current -agc none -dictfn ../doc/a.dic -ctlfn ../doc/a.listoffiles -lsnfn ../doc/a.transcription -accumdir .
    

    i get

    ../doc/bw -hmmdir adapt -moddeffn adapt/mdef.txt -ts2cbfn .semi. -feat
    1s_c_d_dd -cmn current -agc none -dictfn ../doc/a.dic -ctlfn
    ../doc/a.listoffiles -lsnfn ../doc/a.transcription -accumdir .
    INFO: main.c(196): Compiled on Apr 17 2010 at 15:02:54
    ../doc/bw \
    -hmmdir adapt \
    -moddeffn adapt/mdef.txt \
    -ts2cbfn .semi. \
    -feat 1s_c_d_dd \
    -cmn current \
    -agc none \
    -dictfn ../doc/a.dic \
    -ctlfn ../doc/a.listoffiles \
    -lsnfn ../doc/a.transcription \
    -accumdir .

    -help no no
    -example no no
    -hmmdir adapt
    -moddeffn adapt/mdef.txt
    -tmatfn
    -mixwfn
    -meanfn
    -varfn
    -fullvar no no
    -diagfull no no
    -mwfloor 0.00001 1.000000e-05
    -tpfloor 0.0001 1.000000e-04
    -varfloor 0.00001 1.000000e-05
    -topn 4 4
    -dictfn ../doc/a.dic
    -fdictfn
    -ltsoov no no
    -ctlfn ../doc/a.listoffiles
    -nskip
    -runlen -1 -1
    -part
    -npart
    -cepext mfc mfc
    -cepdir
    -phsegext phseg phseg
    -phsegdir
    -outphsegdir
    -sentdir
    -sentext sent sent
    -lsnfn ../doc/a.transcription
    -accumdir .
    -ceplen 13 13
    -cepwin 0 0
    -agc max none
    -cmn current current
    -varnorm no no
    -silcomp none none
    -sildel no no
    -siltag SIL SIL
    -abeam 1e-100 1.000000e-100
    -bbeam 1e-100 1.000000e-100
    -varreest yes yes
    -meanreest yes yes
    -mixwreest yes yes
    -tmatreest yes yes
    -mllrmat
    -cb2mllrfn .1cls. .1cls.
    -ts2cbfn .semi.
    -feat 1s_c_d_dd 1s_c_d_dd
    -svspec
    -ldafn
    -ldadim 29 29
    -ldaaccum no no
    -timing yes yes
    -viterbi no no
    -2passvar no no
    -sildelfn
    -spthresh 0.0 0.000000e+00
    -maxuttlen 0 0
    -ckptintv
    -outputfullpath no no
    -fullsuffixmatch no no
    -pdumpdir
    INFO: main.c(255): Reading adapt/mdef.txt
    INFO: model_def_io.c(587): Model definition info:
    INFO: model_def_io.c(588): 143097 total models defined (50 base, 143047 tri)
    INFO: model_def_io.c(589): 572388 total states
    INFO: model_def_io.c(590): 5150 total tied states
    INFO: model_def_io.c(591): 150 total tied CI states
    INFO: model_def_io.c(592): 50 total tied transition matrices
    INFO: model_def_io.c(593): 4 max state/model
    INFO: model_def_io.c(594): 4 min state/model
    INFO: s3mixw_io.c(116): Read adapt/mixture_weights
    FATAL_ERROR: "mod_inv.c", line 354: # of features in mixw file, 3, is
    inconsistent w/ prior setting, 1

    Audio files are here
    http://www.mediafire.com/file/oghy10t2jrt/gk1.tar.gz

     
  • Nickolay V. Shmyrev

    Try to follow feat.params of the model more precisly. In particular I see you
    miss -svspec 0-12/13-25/26-38 option and that makes bw think there is one
    stream instead of 3.

     
  • Gopalakrishna

    Gopalakrishna - 2010-04-22

    Thanks, that helped :)

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.