I ran into problems when trying to adapt a continuous russian acoustic model (zero_ru.cd_cont_4000) at the step of collecting statistics. My command and log are below:
$ ./bw -hmmdir ru2 -moddeffn ru2/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir .
INFO: main.c(229): Compiled on Aug 4 2016 at 10:44:45
Current configuration: [NAME][DEFLT][VALUE]
-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none none
-agcthresh 2.0 2.000000e+000
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn live current
-cmninit 40,3,-1 40,3,-1
-ctlfn ru_train.fileids
-diagfull no no
-dictfn ru.dic
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullvar no no
-help no no
-hmmdir ru
-latdir
-latext
-lda
-ldadim 0 0
-lsnfn ru_train.transcription
-lw 11.5 1.150000e+001
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn ru/mdef.txt
-mwfloor 0.00001 1.000000e-005
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+000
-svspec
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-004
-ts2cbfn .cont.
-varfloor 0.00001 1.000000e-005
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no
To debug the crash you need to provide the information about the version of software you are using. What OS, what sphinxtrain version and so on. If you are on Linux you might provide the output of the tool under valgrind:
You also have incompatible phoneset. You need to use phoneset from ru2/ru2.dic, not ru.dic The ru.dic you have in root folder is quite aligned with Russian phonetics either. Stress is very important for Russian.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, my mistakes are even embarassing! I understood what was the isuue. I copied wrong binaries to the folder (I work from Cygwin, but copied Windows binaries) and messed up with the dictionaries (used the one that I made myself, when trying to train an acoustic model).
Thank you for your help!
Last edit: Dino The Dinosaur 2016-08-24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello!
I ran into problems when trying to adapt a continuous russian acoustic model (zero_ru.cd_cont_4000) at the step of collecting statistics. My command and log are below:
$ ./bw -hmmdir ru2 -moddeffn ru2/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir .
INFO: main.c(229): Compiled on Aug 4 2016 at 10:44:45
Current configuration:
[NAME] [DEFLT] [VALUE]
-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none none
-agcthresh 2.0 2.000000e+000
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn live current
-cmninit 40,3,-1 40,3,-1
-ctlfn ru_train.fileids
-diagfull no no
-dictfn ru.dic
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullvar no no
-help no no
-hmmdir ru
-latdir
-latext
-lda
-ldadim 0 0
-lsnfn ru_train.transcription
-lw 11.5 1.150000e+001
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn ru/mdef.txt
-mwfloor 0.00001 1.000000e-005
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+000
-svspec
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-004
-ts2cbfn .cont.
-varfloor 0.00001 1.000000e-005
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: main.c(318): Reading ru/mdef.txt
Segmentation fault
I tried adding the -lda option, but the log then looks like this:
$ ./bw -hmmdir ru2 -moddeffn ru2/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn ru.dic -ctlfn ru_train.fileids -lsnfn ru_train.transcription -accumdir -lda ru/feature_transform .
INFO: main.c(229): Compiled on Aug 4 2016 at 10:44:45
ERROR: "cmd_ln.c", line 604: Unknown argument name 'ru/feature_transform'
ERROR: "cmd_ln.c", line 701: Failed to parse arguments list
ERROR: "cmd_ln.c", line 750: Failed to parse arguments list, forced exit
What could be the possible problems?
Here is my working directory without audio files (it shouldn't be the case anyway, I think) http://www.megafileupload.com/snn0/2gis_adapt.zip
It should be
Dot is an value fo the argument of accumdir (current folder).
Oh, right, my bad.
It gives the segmentation error too though
To debug the crash you need to provide the information about the version of software you are using. What OS, what sphinxtrain version and so on. If you are on Linux you might provide the output of the tool under valgrind:
You also have incompatible phoneset. You need to use phoneset from ru2/ru2.dic, not ru.dic The ru.dic you have in root folder is quite aligned with Russian phonetics either. Stress is very important for Russian.
Yes, my mistakes are even embarassing! I understood what was the isuue. I copied wrong binaries to the folder (I work from Cygwin, but copied Windows binaries) and messed up with the dictionaries (used the one that I made myself, when trying to train an acoustic model).
Thank you for your help!
Last edit: Dino The Dinosaur 2016-08-24