Am trying to train an acoustic model for a native language. Training data set is ~1 hour. After running sphinxtrain run, I see some warnings + errors, and finally training stops. Some extracts from the warnings and errors are shown inline [1]. Complete log file is attached. Any idea what is going wrong here?
[1].
WARNING: This phone (BEE) occurs in the phonelist (xxx.phone), but not in any word in the transcription (yyy_train.transcription)
ERROR: This step had 18 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
Current Overall Likelihood Per Frame = -151.683113792911
Convergence Ratio = 0.262417801179481
Baum welch starting for iteration: 6 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 18 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
Current Overall Likelihood Per Frame = -151.643740815671
Training completed after 6 iterations
GE 0
GE 1
GE 2
GEA 0
GEA 1
GEA 2
GEE 0
ERROR: FATAL: "main.c", line 730: Initialization failed
MODULE: 45 Prune Trees
Phase 1: Tree Pruning
ERROR: FATAL: "main.c", line 167: Unable to open xxxx.unpruned/BEE-0.dtree for reading: No such file or directory
MODULE: 50 Training Context dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Copy CI to CD initialize
ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0% ERROR: FATAL: "main.c", line 1839: initialization failed
ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
ERROR: Failed to start bw
ERROR: Only 0 parts of 1 of Baum Welch were successfully completed
ERROR: Parts 1 failed to run!
Thanks for the input. I fixed the first warning set, and ran again. Then came across another error [1]. As per the troubleshooting tips given in [2], tried with CFG_FORCEDALIGN set to 'true'. That reduced the number of errors [3], but then got the error mentioning sphinx3_align script is not found [4]. I took a checkout of the cmusphinx githug repo, and tried building sphinx3 (I had built sphinxbase-5prealpha earlier). However, hit some compilation issues [5]. Suspect this is because of an incompatibility in the sphinxbase and sphinx3 I tried to compile.
Why do I still get ERRORs after setting CFG_FORCEDALIGN to 'true'? Also, could you please point me to a source location/released version of sphinx3 compatible with sphinxbase-5prealpha.
Thank you.
[1]. Failed to align audio to transcript: final state of the search is not reached.
[3].
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Normalization for iteration: 1
Normalization for iteration: 2
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 3
ERROR: This step had 6 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 4
ERROR: This step had 10 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
ERROR: This step had 10 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
ERROR: This step had 12 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 7
ERROR: This step had 12 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 8
Current Overall Likelihood Per Frame = -166.001400946947
Training completed after 8 iterations
[4]. Skipped: No sphinx3_align(.exe) found in /usr/local/libexec/sphinxtrain
[5].
corpus.c: In function 'ctl_read_entry':
corpus.c:422:9: error: too many arguments to function 'path2basename'
path2basename(uttfile, base);
^
In file included from corpus.c💯0:
/usr/local/include/sphinxbase/filename.h:83:13: note: declared here
const char path2basename(const char path);
^
corpus.c: In function 'ctl_process':
corpus.c:552:11: warning: variable 'k' set but not used [-Wunused-but-set-variable]
kb_t k;
^
corpus.c: In function 'ctl_process_utt':
corpus.c:701:5: error: too many arguments to function 'path2basename'
path2basename(uttfile, base);
^
In file included from corpus.c💯0:
/usr/local/include/sphinxbase/filename.h:83:13: note: declared here
const char path2basename(const char *path);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
Am trying to train an acoustic model for a native language. Training data set is ~1 hour. After running sphinxtrain run, I see some warnings + errors, and finally training stops. Some extracts from the warnings and errors are shown inline [1]. Complete log file is attached. Any idea what is going wrong here?
[1].
WARNING: This phone (BEE) occurs in the phonelist (xxx.phone), but not in any word in the transcription (yyy_train.transcription)
ERROR: This step had 18 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
Current Overall Likelihood Per Frame = -151.683113792911
Convergence Ratio = 0.262417801179481
Baum welch starting for iteration: 6 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 18 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
Current Overall Likelihood Per Frame = -151.643740815671
Training completed after 6 iterations
ERROR: FATAL: "main.c", line 730: Initialization failed
MODULE: 45 Prune Trees
Phase 1: Tree Pruning
ERROR: FATAL: "main.c", line 167: Unable to open xxxx.unpruned/BEE-0.dtree for reading: No such file or directory
MODULE: 50 Training Context dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Copy CI to CD initialize
ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0% ERROR: FATAL: "main.c", line 1839: initialization failed
ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
ERROR: Failed to start bw
ERROR: Only 0 parts of 1 of Baum Welch were successfully completed
ERROR: Parts 1 failed to run!
Last edit: Izza 2016-06-09
You need to fix the first warning
Hi Nickolay,
Thanks for the input. I fixed the first warning set, and ran again. Then came across another error [1]. As per the troubleshooting tips given in [2], tried with CFG_FORCEDALIGN set to 'true'. That reduced the number of errors [3], but then got the error mentioning sphinx3_align script is not found [4]. I took a checkout of the cmusphinx githug repo, and tried building sphinx3 (I had built sphinxbase-5prealpha earlier). However, hit some compilation issues [5]. Suspect this is because of an incompatibility in the sphinxbase and sphinx3 I tried to compile.
Why do I still get ERRORs after setting CFG_FORCEDALIGN to 'true'? Also, could you please point me to a source location/released version of sphinx3 compatible with sphinxbase-5prealpha.
Thank you.
[1]. Failed to align audio to transcript: final state of the search is not reached.
[2]. http://cmusphinx.sourceforge.net/wiki/tutorialam
[3].
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Normalization for iteration: 1
Normalization for iteration: 2
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 3
ERROR: This step had 6 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 4
ERROR: This step had 10 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
ERROR: This step had 10 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
ERROR: This step had 12 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 7
ERROR: This step had 12 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 8
Current Overall Likelihood Per Frame = -166.001400946947
Training completed after 8 iterations
[4]. Skipped: No sphinx3_align(.exe) found in /usr/local/libexec/sphinxtrain
[5].
corpus.c: In function 'ctl_read_entry':
corpus.c:422:9: error: too many arguments to function 'path2basename'
path2basename(uttfile, base);
^
In file included from corpus.c💯0:
/usr/local/include/sphinxbase/filename.h:83:13: note: declared here
const char path2basename(const char path);
^
corpus.c: In function 'ctl_process':
corpus.c:552:11: warning: variable 'k' set but not used [-Wunused-but-set-variable]
kb_t k;
^
corpus.c: In function 'ctl_process_utt':
corpus.c:701:5: error: too many arguments to function 'path2basename'
path2basename(uttfile, base);
^
In file included from corpus.c💯0:
/usr/local/include/sphinxbase/filename.h:83:13: note: declared here
const char path2basename(const char *path);
https://sourceforge.net/p/cmusphinx/code/HEAD/tree/trunk/sphinx3/