Menu

This is why?

Help
yanpeng
2014-09-17
2017-06-09
1 2 > >> (Page 1 of 2)
  • yanpeng

    yanpeng - 2014-09-17

    WARNING: Utterance ID mismatch on line 1: voice_0001 vs voice_0001
    WARNING: This word:  was in the transcript file, but is not in the dictionary (      ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (      ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (              ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (   ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (         ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (              ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (      ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (         ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (         ). Do cases match?
    WARNING: This word:  was in the transcript file, but is not in the dictionary (         ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (      ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (    ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (      ). Do cases match?
    WARNING: This word: was in the transcript file, but is not in the dictionary (      ). Do cases match?
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    Something failed: (/home/yanrui/my_mnx/scripts_pl/00.verify/verify_all.pl)

     
  • yanpeng

    yanpeng - 2014-09-17

    This is my .dic file and .phone file.I found no mistake.Help me,thank you.

     
  • yanpeng

    yanpeng - 2014-09-17

    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
    Found 88 words using 22 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive); files exist
    WARNING: CTL file, /home/yanrui/my_mnx/feat/voice_0001.mfc, does not exist, or is empty
    Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: 0.0161331196581197
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
    Words in dictionary: 85
    Words in filler dictionary: 3

    This is a warning.
    ***WARNING: CTL file, /home/yanrui/my_mnx/feat/voice_0001.mfc, does not exist, or is empty******

     
    • Nickolay V. Shmyrev

      Your unicode editor inserts invisible character 0xfe 0xff to the beginning fo the file and it confuses trainer. You need to use editor that doesn't insert UTF-8 BOF symbols and you need to remove them from the files.

       
  • yanpeng

    yanpeng - 2014-09-18

    I have removed UTF-8 BOF symbols from the files.But there hava some errors also.

    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
    Found 88 words using 22 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive); files exist
    Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: 0.0168277777777778
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
    Words in dictionary: 85
    Words in filler dictionary: 3
    WARNING: This word:  was in the transcript file, but is not in the dictionary (      ). Do cases match?
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    Something failed: (/home/yanrui/my_mnx/scripts_pl/00.verify/verify_all.pl)

     
  • yanpeng

    yanpeng - 2014-09-18

    This word:  was in the transcript file, and also in the dictionary.I have checked it yet.

     
    • Nickolay V. Shmyrev

      This word:  was in the transcript file, and also in the dictionary.I have checked it yet.

      Computers rarely lie, they just follow the instructions. If computer says you that word is missing it is really missing. You need to doubt yourself, not computer results.

       
  • yanpeng

    yanpeng - 2014-09-18

    MODULE: 00 verify training files
    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
    Found 88 words using 22 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive); files exist
    WARNING: CTL file missing a newline at end of file
    Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
    WARNING: Transcript file missing a newline at end of file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: 0.0168277777777778
    This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
    Words in dictionary: 85
    Words in filler dictionary: 3
    WARNING: This word:  was in the transcript file, but is not in the dictionary (      ). Do cases match?
    WARNING: This word:  was in the transcript file, but is not in the dictionary (      ). Do cases match?
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    Something failed: (/home/yanrui/my_mnx/scripts_pl/00.verify/verify_all.pl)

     
  • yanpeng

    yanpeng - 2014-09-18

    This is log file.

     
    • Nickolay V. Shmyrev

      you need to share etc folder to get help on this issue.

       
  • yanpeng

    yanpeng - 2014-09-19

    This is my etc folder.

     
  • Nickolay V. Shmyrev

    You still have BOF symbols (0xfe 0xff) in your files. You have one in the beginning of the transcription file. Another one is on line 56 in your dictionary file. This is why you get corresponding warnings.

    You need to remove BOF symbols or just use newer sphinxtrain instead of old one.

     
  • yanpeng

    yanpeng - 2014-09-19

    Thanks.How can I remove BOF symbols.I can't see BOF symbols(Oxfe Oxff) in my files.

     
    • Nickolay V. Shmyrev

      Use editor that displays them. vi for example.

       
  • yanpeng

    yanpeng - 2014-09-20

    There are some errors also.

     
  • yanpeng

    yanpeng - 2014-09-20

    One BOF symbols is on line 56<feff> in my dictionary file,I have found.But in the beginning of the transcription file,I can not find it.

     
  • yanpeng

    yanpeng - 2014-09-21

    How can I solve?

     
  • yanpeng

    yanpeng - 2014-09-21

    I have used newer sphinxtrain-1.0.8 instead of old one(sphinxtrain-1.0.7).

     
    • Nickolay V. Shmyrev

      You need to provide more information about your issues. Yyu need to share logdir folder, not just the log. The real error is described there in details.

      Most likely you didn't properly install sphinxtrain. You need to edit ld.so.conf to include /usr/local/lib into linker search path or export LD_LIBRARY_PATH configuration variable.

       
  • yanpeng

    yanpeng - 2014-09-21

    This is my logdir folder.

     
    • Nickolay V. Shmyrev

      Logdir suggests that you do not have enough data to train the model. You need to have at least one hour of recordings.

      You can segment and audiobook or radio show for example to get that amount of data.

       
  • yanpeng

    yanpeng - 2014-09-21

    First of all,than you very much.
    I would like to ask you,how to use logdir file found errors.

     
  • yanpeng

    yanpeng - 2014-09-23

    English and Chinese 10 words can be identified, why Mongolian can not

     
  • Junaid Khan

    Junaid Khan - 2017-06-07

    Sir i am trying to train accoustic model for a small set of data but during training it is giving me the same error This word: "mein" was in the transcript file, but is not in the dictionary Do cases match?
    This word exisits in the dictionary but i don't know why its generating this error plz sir help me out. Thanks in advance

     
    • Nickolay V. Shmyrev

      Your dictionary has invisible UTF-8 BOM symbols, you need to remove them.

       
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.