CMU Sphinx / Forums / Help: Adapting a default acoustic model: segmentation fault when reading mdef file

Dino The Dinosaur - 2016-08-24

Hello!

I ran into problems when trying to adapt a continuous russian acoustic model (zero_ru.cd_cont_4000) at the step of collecting statistics. My command and log are below:

$ ./bw -hmmdir ru2 -moddeffn ru2/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir .
INFO: main.c(229): Compiled on Aug 4 2016 at 10:44:45
Current configuration:
[NAME] [DEFLT] [VALUE]
-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none none
-agcthresh 2.0 2.000000e+000
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn live current
-cmninit 40,3,-1 40,3,-1
-ctlfn ru_train.fileids
-diagfull no no
-dictfn ru.dic
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullvar no no
-help no no
-hmmdir ru
-latdir
-latext
-lda
-ldadim 0 0
-lsnfn ru_train.transcription
-lw 11.5 1.150000e+001
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn ru/mdef.txt
-mwfloor 0.00001 1.000000e-005
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+000
-svspec
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-004
-ts2cbfn .cont.
-varfloor 0.00001 1.000000e-005
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: main.c(318): Reading ru/mdef.txt
Segmentation fault

I tried adding the -lda option, but the log then looks like this:

$ ./bw -hmmdir ru2 -moddeffn ru2/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir -lda ru/feature_transform .
INFO: main.c(229): Compiled on Aug 4 2016 at 10:44:45
ERROR: "cmd_ln.c", line 604: Unknown argument name 'ru/feature_transform'
ERROR: "cmd_ln.c", line 701: Failed to parse arguments list
ERROR: "cmd_ln.c", line 750: Failed to parse arguments list, forced exit

What could be the possible problems?

Here is my working directory without audio files (it shouldn't be the case anyway, I think) http://www.megafileupload.com/snn0/2gis_adapt.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-08-24
  
  It should be
  
  $ ./bw -hmmdir ru2 -moddeffn ru2/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir . -lda ru/feature_transform
  
  Dot is an value fo the argument of accumdir (current folder).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dino The Dinosaur - 2016-08-24
    
    Oh, right, my bad.
    It gives the segmentation error too though
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-08-24
      
      To debug the crash you need to provide the information about the version of software you are using. What OS, what sphinxtrain version and so on. If you are on Linux you might provide the output of the tool under valgrind:
      
      valgrind ./bw -hmmdir ru2 -moddeffn ru2/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir . -lda ru/feature_transform
      
      You also have incompatible phoneset. You need to use phoneset from ru2/ru2.dic, not ru.dic The ru.dic you have in root folder is quite aligned with Russian phonetics either. Stress is very important for Russian.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dino The Dinosaur - 2016-08-24

Yes, my mistakes are even embarassing! I understood what was the isuue. I copied wrong binaries to the folder (I work from Cygwin, but copied Windows binaries) and messed up with the dictionaries (used the one that I made myself, when trying to train an acoustic model).

Thank you for your help!

Last edit: Dino The Dinosaur 2016-08-24

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adapting a default acoustic model: segmentation fault when reading mdef file

Speech Recognition Toolkit

Forums

Help

Adapting a default acoustic model: segmentation fault when reading mdef file document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Adapting a default acoustic model: segmentation fault when reading mdef file