CMU Sphinx / Forums / Help: FATAL_ERROR: "corpus.c" duri...

Speech Recognition Toolkit

FATAL_ERROR: "corpus.c" duri...

Forum: Help

Creator: Pezhman Lali

Created: 2011-12-13

Updated: 2012-09-22

Pezhman Lali - 2011-12-13

Dear
The Runall.pl script makes the following error. I can not find the reason by
the googling, may be you can help me

sphinxtrain 1.0.7
pocketsphinx 0.7
base 0.7

./scripts_pl/RunAll.pl

MODULE: 00 verify training files
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 6 words using 14 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive);
files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Estimated Total Hours Training: 0.00261666666666667
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
dictionary
Words in dictionary: 3
Words in filler dictionary: 3
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
Feature type is s2_4x which is 4 streams
LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
Skipping LDA training
Feature type is s2_4x which is 4 streams
LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
Skipping MLLT training
MODULE: 05 Vector Quantization
This step had 2 ERROR messages and 3191 WARNING messages. Please check the log
file for details.
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Training failed in iteration 1
Something failed: (/root/sphinx/man/scripts_pl/20.ci_hmm/slave_convg.pl)

from the log :

==> man.1.1-2.bw.log <==
INFO: main.c(397): Will reestimate means.
INFO: main.c(399): Will reestimate variances.
INFO: main.c(407): Will reestimate transition matrices
INFO: main.c(420): Reading main lexicon: /root/sphinx/man/etc/man.dic
INFO: lexicon.c(233): 3 entries added from /root/sphinx/man/etc/man.dic
INFO: main.c(432): Reading filler lexicon: /root/sphinx/man/etc/man.filler
INFO: lexicon.c(233): 3 entries added from /root/sphinx/man/etc/man.filler
INFO: corpus.c(436): skipping 4 utts.
FATAL_ERROR: "corpus.c", line 1345: File length mismatch at line 4 in
/root/sphinx/man/etc/man_train.transcription
Tue Dec 13 06:40:18 2011

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pezhman Lali - 2011-12-13

this is /root/sphinx/man/etc/man_train.transcription

cat /root/sphinx/man/etc/man_train.transcription

~~FAARSI~~ (file_1)
~~ENGHILIYSI~~ (file_2)
~~BAAZDIYD~~ (file_3)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-12-13

Your transcription file has less lines than your fileids file.

Your database is too small for training

For more information on training read the tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pezhman Lali - 2011-12-13

Thanks for your reply

This is the fileid

cat man_train.fileids

TOM1/file_1
TOM1/file_2
TOM1/file_3
TOM2/file_1
TOM2/file_2
TOM2/file_3
Mary1/file_1
Mary1/file_2
Mary1/file_3

The file id has more lines, Because we have only 3 words for recognition, but
we have some speakers(TOM1,TOM2,Tom3, Mary1,Mary2,.... am I right ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-12-13

Transcription lines for each speaker should be present. The lines should
repeat. The number of lines in fileids file must be equal to the number of
lines in transcription file.

I definitely suggest you to go through the tutorial first with the test an4
database. Then do by analogy.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

FATAL_ERROR: &quot;corpus.c&quot; duri...

Speech Recognition Toolkit

Forums

Help

FATAL_ERROR: &quot;corpus.c&quot; duri... document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

./scripts_pl/RunAll.pl

cat /root/sphinx/man/etc/man_train.transcription

cat man_train.fileids

FATAL_ERROR: "corpus.c" duri...

FATAL_ERROR: "corpus.c" duri...