Menu

FATAL_ERROR: "corpus.c" duri...

Help
2011-12-13
2012-09-22
  • Pezhman Lali

    Pezhman Lali - 2011-12-13

    Dear
    The Runall.pl script makes the following error. I can not find the reason by
    the googling, may be you can help me

    sphinxtrain 1.0.7
    pocketsphinx 0.7
    base 0.7

    ./scripts_pl/RunAll.pl

    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
    phonelist file.
    Found 6 words using 14 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the
    dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive);
    files exist
    Phase 4: CTL - Checking number of lines in the transcript should match lines
    in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
    reasonable.
    Estimated Total Hours Training: 0.00261666666666667
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
    dictionary
    Words in dictionary: 3
    Words in filler dictionary: 3
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
    the phonelist, and all phones in the phonelist appear at least once
    Feature type is s2_4x which is 4 streams
    LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
    Skipping LDA training
    Feature type is s2_4x which is 4 streams
    LDA/MLLT only has sense for single stream features, for example 1s_c_d_dd
    Skipping MLLT training
    MODULE: 05 Vector Quantization
    This step had 2 ERROR messages and 3191 WARNING messages. Please check the log
    file for details.
    MODULE: 10 Training Context Independent models for forced alignment and VTLN
    Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
    MODULE: 11 Force-aligning transcripts
    Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg
    MODULE: 12 Force-aligning data for VTLN
    Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
    MODULE: 20 Training Context Independent models
    Phase 1: Cleaning up directories:
    accumulator...logs...qmanager...models...
    Phase 2: Flat initialize
    Phase 3: Forward-Backward
    Training failed in iteration 1
    Something failed: (/root/sphinx/man/scripts_pl/20.ci_hmm/slave_convg.pl)

    from the log :

    ==> man.1.1-2.bw.log <==
    INFO: main.c(397): Will reestimate means.
    INFO: main.c(399): Will reestimate variances.
    INFO: main.c(407): Will reestimate transition matrices
    INFO: main.c(420): Reading main lexicon: /root/sphinx/man/etc/man.dic
    INFO: lexicon.c(233): 3 entries added from /root/sphinx/man/etc/man.dic
    INFO: main.c(432): Reading filler lexicon: /root/sphinx/man/etc/man.filler
    INFO: lexicon.c(233): 3 entries added from /root/sphinx/man/etc/man.filler
    INFO: corpus.c(436): skipping 4 utts.
    FATAL_ERROR: "corpus.c", line 1345: File length mismatch at line 4 in
    /root/sphinx/man/etc/man_train.transcription
    Tue Dec 13 06:40:18 2011

     
  • Pezhman Lali

    Pezhman Lali - 2011-12-13

    this is /root/sphinx/man/etc/man_train.transcription

    cat /root/sphinx/man/etc/man_train.transcription

    FAARSI (file_1)
    ENGHILIYSI (file_2)
    BAAZDIYD (file_3)

     
  • Nickolay V. Shmyrev

    Your transcription file has less lines than your fileids file.

    Your database is too small for training

    For more information on training read the tutorial

    http://cmusphinx.sourceforge.net/wiki/tutorialam

     
  • Pezhman Lali

    Pezhman Lali - 2011-12-13

    Thanks for your reply

    This is the fileid

    cat man_train.fileids

    TOM1/file_1
    TOM1/file_2
    TOM1/file_3
    TOM2/file_1
    TOM2/file_2
    TOM2/file_3
    Mary1/file_1
    Mary1/file_2
    Mary1/file_3

    The file id has more lines, Because we have only 3 words for recognition, but
    we have some speakers(TOM1,TOM2,Tom3, Mary1,Mary2,.... am I right ?

     
  • Nickolay V. Shmyrev

    Transcription lines for each speaker should be present. The lines should
    repeat. The number of lines in fileids file must be equal to the number of
    lines in transcription file.

    I definitely suggest you to go through the tutorial first with the test an4
    database. Then do by analogy.

     

Log in to post a comment.