CMU Sphinx / Forums / Help: pocketsphinx: help on "adapting the default acoustic model"

HUI-YING LU - 2016-02-04

Hi,

I tried to follow the link: http://cmusphinx.sourceforge.net/wiki/tutorialadapt to adapting the default acoustic model. Our system just have 14 simple commands. Attached is the fileids file and transcription file, wav files for each command (from single speaker). I am able to generate the .mfc file for each wav file. I downloaded cmusphinx-en-us-ptm-5.2.tar.gz and get mixture_weights from it. I copied the default acoustic model from ../local/share/pocketsphinx/model/en-us/en-us to a new working directory and put mixture_weights got from the above tar ball there. I also converted mdef file to mdef.txt file use pocketsphinx_mdef_convert program.

The problem I met is when executing bw command: below is the command I executed:
./bw \
-hmmdir en-us \
-moddeffn en-us/mdef.txt \
-ts2cbfn .cont. \
-feat 1s_c_d_dd \
-cmn current \
-agc none \
-varnorm no \
-cmninit 40,3,-1 \
-dictfn cmudict-en-us.dict \
-ctlfn arctic14.fileids \
-lsnfn arctic14.transcription \
-accumdir .

I see the following error:
FATAL: "mod_inv.c", line 358: Number of feature streams in mixture_weights file 3 differs from the configured value 1, check the command line options
The run time log is also attached.

I did not set -lda option - may be this is related to this error.

My questions:
1) where can I find the 'feature_transform' file? (which directory and file name?).
2) Are the above parameters passed to bw appropriate? - I changed 'ts2cbfn .ptm.' to 'ts2cbfn .cont.' according to the above web link. (we are using continuous model, pocketsphinx_countinous), but why feat.params still use 'ts2cbfn .ptm.'? (feat.params from default acoustic model is attached too).
3) I removed svspec option but why it's in feat.params?
4) do I need to pass '-varnorm no' and '-cmninit 40,3,-1'? Those options are in feat.params.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HUI-YING LU - 2016-02-04

Attachments.

bw.tar.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-04
  
  Default model is ptm, not continuous, you need to follow instructions as is and they will work, you do not need to modify anyting.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HUI-YING LU - 2016-02-04

You meant I should use the options/parameters documented in http://cmusphinx.sourceforge.net/wiki/tutorialadapt?
./bw \
-hmmdir en-us \
-moddeffn en-us/mdef.txt \
-ts2cbfn .ptm. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn cmudict-en-us.dict \
-ctlfn arctic20.fileids \ (our case is arctic14.fileids)
-lsnfn arctic20.transcription \ (our case if arctic14.transcription)
-accumdir .
Even though I am using pocketsphinx_continuous command (the model using is the default, which is ptm, not en-us continuous), right?

What about 2 other parameters listed in feat.params - '-varnorm no' and
'-cmninit 40,3,-1'? I don't need to add in those neither?

Thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HUI-YING LU - 2016-02-05

Now I have finished all the steps before "Testing the adaption". I just went ahead to proceed "Using the model". I see slightly improvement. For old issues we have, keyword "start" and "stop" got mis-decoded still, same as "tilt up", although the mis-decoded did not occur as often as before.
creating an

Now I have some questions:
1) In the "Testing the adaptation" session, it mentioned we need to build language model - can we just use a language model/dictionary built from few very simple commands (like in our case, 14 simple commands, web generated language model/dictionary)? (I think we can always use the language mode/dictionary en-us.lm.bin and cmudict-en-us.dict, right). What should be included in adaptation-test.transcription? Any phrases/sentences made from our dictionary? (don't have to be the same as arctic20.transcription?).
2) Goes back to "Creating an adaptation corpus" - it used arctic20.transcription - can we use our own transcription file? As mentioned, we just have 14 simple commands. And those in arctic20.transcription does not seen to cover most words used in our commands, I don't get why we can improve accuracy if using arctic20.transcription for our case?
3) If we want to adapt to different acent/multiple speakers/recording environment, what should we do? Should we collect wav files (say arctic0001.wav to arctic0020.wav) from different speaker/environment, and repeat steps from "Creating an adaptation corpus" until Recreating the adapted sendump file" under the same working directory for each set of arcticxxxx.wav files? (say arctic0001.wav to arctic0020.wav from person A, arctic0001.wav to arctic0020.wav from person B).

3) is important to us. Looking forward to your help!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx: help on "adapting the default acoustic model"

Speech Recognition Toolkit

Forums

Help

pocketsphinx: help on "adapting the default acoustic model" document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pocketsphinx: help on "adapting the default acoustic model"