CMU Sphinx / Forums / Help: Acoustic Model Adaptation

Jeff Acquaviva - 2014-05-14

Hi,
I'm trying to adapt the generic english acoustic model that is now default with sphinx. I was following this tutorial for pocketsphinx I got an error at the ./bw program execution

FATAL_ERROR: "mod_inv.c", line 357: Number of feature streams in mixture_weights file 1 differs from the configured value 3, check the command line options

I've also coppied the entire output of the command below. But other than this, I'm not sure what other information might help.

Thanks in advance,

$ /usr/local/libexec/sphinxtrain/bw -hmmdir ../en-us -moddeffn ../en-us/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn /home/jxa5147/ASR_Support/models/dictionary/cmudict.0.6d -ctlfn bush2002su.fileids -lsnfn bush2002su.transcription -accumdir .
INFO: main.c(229): Compiled on May 14 2014 at 13:09:11
INFO: cmd_ln.c(691): Parsing command line:
/usr/local/libexec/sphinxtrain/bw
-hmmdir ../en-us
-moddeffn ../en-us/mdef
-ts2cbfn .cont.
-feat 1s_c_d_dd
-svspec 0-12/13-25/26-38
-cmn current
-agc none
-dictfn /home/jxa5147/ASR_Support/models/dictionary/cmudict.0.6d
-ctlfn bush2002su.fileids
-lsnfn bush2002su.transcription
-accumdir .
Current configuration:
[NAME] [DEFLT] [VALUE]
-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none none
-agcthresh 2.0 2.000000e+00
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn current current
-cmninit 8.0 8.0
-ctlfn bush2002su.fileids
-diagfull no no
-dictfn /home/jxa5147/ASR_Support/models/dictionary/cmudict.0.6d
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullsuffixmatch no no
-fullvar no no
-help no no
-hmmdir ../en-us
-latdir
-latext
-lda
-ldaaccum no no
-ldadim 0 0
-lsnfn bush2002su.transcription
-lw 11.5 1.150000e+01
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn ../en-us/mdef
-mwfloor 0.00001 1.000000e-05
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+00
-svspec 0-12/13-25/26-38
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-04
-ts2cbfn .cont.
-varfloor 0.00001 1.000000e-05
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: main.c(254): Using subvector specification 0-12/13-25/26-38
INFO: main.c(318): Reading ../en-us/mdef
INFO: model_def_io.c(573): Model definition info:
INFO: model_def_io.c(574): 168390 total models defined (46 base, 168344 tri)
INFO: model_def_io.c(575): 673560 total states
INFO: model_def_io.c(576): 6138 total tied states
INFO: model_def_io.c(577): 138 total tied CI states
INFO: model_def_io.c(578): 46 total tied transition matrices
INFO: model_def_io.c(579): 4 max state/model
INFO: model_def_io.c(580): 4 min state/model
INFO: s3mixw_io.c(116): Read ../en-us/mixture_weights [6138x1x32 array]
FATAL_ERROR: "mod_inv.c", line 357: Number of feature streams in mixture_weights file 1 differs from the configured value 3, check the command line options

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-05-14
  
  Remove -svspec 0-12/13-25/26-38 from the command line.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Acquaviva - 2014-05-14

Thanks, that worked.

However, now I'm getting this error for every utternance in the set

INFO: cmn.c(175): CMN: 48.42 -4.17 -7.71 8.56 -0.77 7.08 -5.16 0.28 0.97 0.75 -0.08 -1.26 -4.47
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: bush2002su001 ignored
utt> 0 bush2002su001 704 0 172 25 utt 0.008x 1.197e upd 0.008x 1.191e fwd 0.008x 1.219e bwd 0.000x 0.000e gau 0.006x 1.203e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

Do you know what might cause this?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-05-14

Do you know what might cause this?

This error is caused by the transcript which doesn't match the actual audio and can not be aligned to it. It might be different words or different pronunciation in the dictionary for one of the words.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Acquaviva - 2014-05-15

different pronunciation in the dictionary for one of the words.

Does this mean the dictionary for adaptation cannot use multiple pronunciations for a word?

I was using the cmudict pronunciation dictionary rather than creating a dictionary of only words used in the transcription. Could this be part of the problem?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Acquaviva - 2014-05-15

I'm stumped. The transcriptions do match the audio files. I've attached the directory so you can verify this as well. For reference I'm using the cmudict.0.6.d for the adaption dictionary.

However, I have noticed this error does not occur on utterance 18 (bush2002su018) when I do not include the starting <s> identifier.

Is it a problem with my audio files?

Last edit: Jeff Acquaviva 2014-05-16

bush_adapt.tar.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jeff Acquaviva - 2014-05-16
  
  Friendly bump. Any thought what my issue might be?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2014-05-16
    
    Hello Jeff
    
    I reviewed your adaptation set, everything is aligned properly there. Please make sure you are using the right adaptation command. The important parts are -ts2cbfn .cont. and -lda en-us-generic/feat.params:
    
    ~~~~~~~~~~~~~~~~
    /home/shmyrev/local/libexec/sphinxtrain/bw \ -hmmdir en-us-generic \ -moddeffn en-us-generic/mdef \ -ts2cbfn .cont. \ -feat 1s_c_d_dd \ -cmn current \ -agc none \ -dictfn cmu07a.dic \ -ctlfn bush2002su.fileids \ -lsnfn bush2002su.transcription \ -lda en-us-generic/feature_transform \ -accumdir .
    ~~~~~~~~~~~~~~~~~
    
    Also make sure your dictionary is uppercase (I suggest you to use lowercase transcription instead). If you provide exact command you are running earlier, you could get this advise earlier too.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Acquaviva - 2014-05-19

Sorry about that. I will add the commands in future questions.
I was missing the -lda option. May I suggest adding it to the adaption tutorial?

What do you mean by

Also make sure your dictionary is uppercase (I suggest you to use lowercase transcription instead)

Should I have an uppercase dictionary but lower case transcription?
When I do this I receive the error

WARNING: "mk_phone_list.c", line 178: Unable to lookup word 'our' in the lexicon
WARNING: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance 'our soldiers working with the bosnian government seized terrorists who were plotting to bomb our embassy '

~~for all of my sentences. However, when I use the uppercase for both the transcription and the dictionary, there are no errors.~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-05-21

I was missing the -lda option. May I suggest adding it to the adaption tutorial?

Well, I added it, thanks for the suggestions. Svspec part was already there but you didn't notice it anyway. I would actually prefer to create some high-level adaptation tool that will automate required tasks for adaptation. We need help on this.

for all of my sentences. However, when I use the uppercase for both the transcription and the dictionary, there are no errors.

You need both lowercase in transcription and lowercase in the dictionary for the words. We are approaching that style.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeff Acquaviva - 2014-05-21

Out of curiocity, why does the case of the transcription and dictionary matter? To me, it makes sense that both need to be the same case, but is lowercase better than upper?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-05-21
  
  For compatibility with many other tools, databases and dictionaries around all of them use lowercase.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Acoustic Model Adaptation

Speech Recognition Toolkit

Forums

Help

Acoustic Model Adaptation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Acoustic Model Adaptation