Menu

Training errors

Help
Anati
2011-11-20
2012-09-22
1 2 > >> (Page 1 of 2)
  • Anati

    Anati - 2011-11-20

    perl ./scripts_pl/RunAll.pl
    .
    .
    .

      Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
        Phase 3: CTL - Check general format; utterance length (must be positive); files exist
    WARNING: CTL file, /home/anati/Corpus/Anati/feat/2-1.mfc, does not exist, or is empty
    

    .
    .
    .

    WARNING: Utterance ID mismatch on line 204: 9-20 vs 9-13
    Use of uninitialized value $_[0] in substitution (s///) at /usr/share/perl5/File/Basename.pm line 341, <TRN> line 205.
    fileparse(): need a valid pathname at /home/anati/Corpus/Anati/scripts_pl/00.verify/verify_all.pl line 389
    Something failed: (/home/anati/Corpus/Anati/scripts_pl/00.verify/verify_all.pl)
    

    how i can fix errors ???

     
  • Anati

    Anati - 2011-11-21

    WARNING: Utterance ID mismatch on line 204: 9-20 vs 9-13

    Use of uninitialized value $_[0] in substitution (s///) at /usr/share/perl5/File/Basename.pm line 341, <TRN> line 205.
    fileparse(): need a valid pathname at /home/anati/Corpus/Anati/scripts_pl/00.verify/verify_all.pl line 389
    

    why ??

     
  • Nickolay V. Shmyrev

    Anati, my friend. I'm sorry to say that but a very clear reason of the problem
    is contained in the message you posted. It's written in a plain text with big
    letters. You just need to read and understand it. If you don't understand
    something specific about this message you are welcome to ask.

    It's not productive to create 20 topics about the same problem, I don't see
    how it can help you. Instead, try to understand the program output.

     
  • Anati

    Anati - 2011-11-21

    thnx for ur advise

    but i have some notes:

    -- sphinx warning halt the code execution which is not strange in programming
    language !! ... so i ignore warnings which is false in sphinx state !!

    -- many important thing is not clear in tutorial !! ... i promise you i will
    write clear steps for sphinx ... if you allow me !


     Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
            Words in dictionary: 9
            Words in filler dictionary: 0
    WARNING: This word: <s> was in the transcript file, but is not in the dictionary (<s> ������ </s> ). Do cases match?
    

    this msg appear for all my words in train_tra....

    but im sure this words is exist in dic , so what are the wrongs ??

     
  • Nickolay V. Shmyrev

    sphinx warning halt the code execution which is not strange in programming
    language !! ... so i ignore warnings which is false in sphinx state !!

    Sorry, I don't quite understand that

    -- many important thing is not clear in tutorial !! ... i promise you i will
    write clear steps for sphinx ... if you allow me !

    That would be much appreciated

    this msg appear for all my words in train_tra....

    I more trust the program than you and think that words are really missing.
    Most likely your filler dictionary is empty. In any case if you can't find the
    issue yourself you are always welcome to share the etc folder. Pack it into
    archive and upload to file sharing resource. Give here a link.

     
  • Anati

    Anati - 2011-11-22

    my filler is empty because i dont know where i should use it !

    plz see the etc file

    http://www.2shared.com/file/EXT7U8xl/etctar.html

     
  • Nickolay V. Shmyrev

    my filler is empty because i dont know where i should use it !

    It's described in tutorial

    http://cmusphinx.sourceforge.net/wiki/tutorialam

    You just need to read it

     
  • Anati

    Anati - 2011-11-22

    Thus, in addition to the speech signals, you will also be given a set of
    transcripts for the database (in a single file) and two dictionaries, one in
    which legitimate words in the language are mapped sequences of sound units (or
    sub-word units), and another in which non-speech sounds are mapped to
    corresponding non-speech or speech-like sound units. We will refer to the
    former as the language dictionary and the latter as the filler dictionary.

    There are no " non-speech or speech-like sound units " in my commands so i
    didnt use it .

    plz see my etc file

     
  • Nickolay V. Shmyrev

    There are no " non-speech or speech-like sound units " in my commands so i
    didnt use it .

    You still need to create filler file as described in tutorial.

     
  • Anati

    Anati - 2011-11-22

    I filled filler file with:

    <s> SIL
    <sil>   SIL
    </s>    SIL
    !INH    +INH+
    !NOISE  +NOISE+
    
    
    
    
    
    [anati@Anati Anati]$ perl ./scripts_pl/RunAll.pl 
    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
        Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
    WARNING: The phonelist (/home/anati/Corpus/Anati/etc/asr.phone) does not define the phone SIL (required!)
            Found 14 words using 23 phones
    WARNING: This phone (+INH+) occurs in the dictionary (/home/anati/Corpus/Anati/etc/asr.dic), but not in the phonelist (/home/anati/Corpus/Anati/etc/asr.phone)
    WARNING: This phone (+NOISE+) occurs in the dictionary (/home/anati/Corpus/Anati/etc/asr.dic), but not in the phonelist (/home/anati/Corpus/Anati/etc/asr.phone)
    WARNING: This phone (SIL) occurs in the dictionary (/home/anati/Corpus/Anati/etc/asr.dic), but not in the phonelist (/home/anati/Corpus/Anati/etc/asr.phone)
        Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
        Phase 3: CTL - Check general format; utterance length (must be positive); files exist
        Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
        Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
            Estimated Total Hours Training: 0.275394444444444
            This is a small amount of data, no comment at this time
        Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
            Words in dictionary: 9
            Words in filler dictionary: 5
        Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    Something failed: (/home/anati/Corpus/Anati/scripts_pl/00.verify/verify_all.pl)
    
     
  • Anati

    Anati - 2011-11-22

    thanx ... i resolve problem in filler ... and this is the training result

    Training for 8 Gaussian(s) completed after 6 iterations
    MODULE: 60 Lattice Generation
    Skipped:  $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 61 Lattice Pruning
    Skipped:  $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 62 Lattice Format Conversion
    Skipped:  $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 65 MMIE Training
    Skipped:  $ST::CFG_MMIE set to 'no' in sphinx_train.cfg
    MODULE: 90 deleted interpolation
    Skipped for continuous models
    

    is it true??

     
  • Anati

    Anati - 2011-11-22

    but plz check if lm is true in etc folder !

     
  • Anati

    Anati - 2011-11-22

    [anati@Anati Anati]$ ./scripts_pl/decode/slave.pl

    MODULE: DECODE Decoding using models previously trained
            Decoding 0 segments starting at 0 (part 1 of 1) 
            Aligning results to find error rate
    Can't open /home/anati/Corpus/Anati/result/asr-1-1.match
    Can't open /home/anati/Corpus/Anati/etc/Anati_test.fileids for reading
    

    there is no asr-1-1.match file ,how i can generate it?

     
  • Nickolay V. Shmyrev

    Read logdir for the details of the problems you have.

     
  • Anati

    Anati - 2011-11-25

    look

    http://cmusphinx.sourceforge.net/wiki/tutoriallm

    Converting model into DMP format
    To quickly load large models you probably would like to convert them to binary format that will save your decoder initialization time. [b]That's not necessary with small models[/b]. Pocketsphinx and sphinx3 can handle both of them with -lm option. Sphinx4 requires you to submit DMP model into TrigramModel component and ARPA model to SimpleNGramModel component.
    

    but the error is

    ERROR: "ngram_model_arpa.c", line 466: File /home/anati/Corpus/Anati/etc/Anati.lm.DMP not found
    ERROR: "ngram_model_dmp.c", line 106: Dump file /home/anati/Corpus/Anati/etc/Anati.lm.DMP not found
    ERROR: "ngram_search.c", line 208: Failed to read language model file: /home/anati/Corpus/Anati/etc/Anati.lm.DMP
    

    !!!

    after that i typed

    sphinx_lm_convert -i ../../../Anati/etc/asr.lm -o model.dmp
    

    but thr result

    NFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number 234a5347: ../../../Anati/etc/asr.lm is not a dump file
    Segmentation fault (core dumped)
    

    note: this is the lm file ... is it true format ? and how i can convert it to
    dmp file ?

    #JSGF V1.0;
    /**
     * JSGF Grammar for Hello World example
     */
    grammar asr;
    public <cmd> = (يمين | يسار |أعلى |أسفل |أمام |خلف|توقف|تحرك|أسرع );
    
     
  • Nickolay V. Shmyrev

    note: this is the lm file ... is it true format ? and how i can convert it
    to dmp file ?

    This is not an lm format, it's a grammar in jsgf format. You can not convert
    it to dmp. You still can use it with a decoder, but you need to edit the
    script scripts_pl/decode/ps_decode.pl. You need to change -lm to -jsgf there.

     
  • Anati

    Anati - 2011-11-26

    ok,please how i write my commands

     يمين  |  يسار  |  أعلى  | أسفل  | أمام  | خلف | توقف | تحرك | أسرع
    

    in lm format ??

     
  • Nickolay V. Shmyrev

    ok,please how i write my commands in lm format ??

    Read the tutorial

    http://cmusphinx.sourceforge.net/wiki/tutoriallm

     
  • Anati

    Anati - 2011-11-26

    i did steps in tutorial ... but the result was :

    Error in FORM header! []
    FORM error in form block (formtype)
    FORM error in form file (corpus)
    FORM error in form file (handdict)
    FORM error in form file (extrawords)
    FORM error in form block (phoneset)
    FORM error in form block (bracket)
    FORM error in form block (model)
    FORM error in form block (class)
    FORM error in form block (discount)
    Terminating process.
    

    why ?

    and i put my arabic word as is in corpus.txt

     
  • Anati

    Anati - 2011-11-27

    plz see

    [anati@Anati Anati]$ perl ./scripts_pl/decode/slave.pl 
    MODULE: DECODE Decoding using models previously trained
            Decoding 204 segments starting at 0 (part 1 of 1) 
            0% 
            Aligning results to find error rate
            SENTENCE ERROR: 100.0% (204/204)   WORD ERROR RATE: 100.5% (204/204)
    

    my logdir and lm .... here

    http://www.2shared.com/file/uOwbmp5Y/checktar.html

     
  • Nickolay V. Shmyrev

    Everything is good in the archive. Check the files in the result folder too.

     
  • Anati

    Anati - 2011-11-27
     
  • Nickolay V. Shmyrev

    And, did you look inside? I see the reason after 1 minute.

     
  • Anati

    Anati - 2011-11-27

    i see there is word redundant خلف at 6/6-12 !!

    althought in test files it isnt !

     
  • Anati

    Anati - 2011-11-28

    plz what is wrong?

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.