CMU Sphinx / Forums / Help: This is why?

yanpeng - 2014-09-17

WARNING: Utterance ID mismatch on line 1: voice_0001 vs voice_0001
WARNING: This word:  was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~            ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~ ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~       ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary (              ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~       ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~       ~~ ). Do cases match?
WARNING: This word:  was in the transcript file, but is not in the dictionary ( ~~       ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~  ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
Something failed: (/home/yanrui/my_mnx/scripts_pl/00.verify/verify_all.pl)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-17

This is my .dic file and .phone file.I found no mistake.Help me,thank you.

my_mnx.dic

my_mnx.phone

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-17

Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
Found 88 words using 22 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
WARNING: CTL file, /home/yanrui/my_mnx/feat/voice_0001.mfc, does not exist, or is empty
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0161331196581197
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 85
Words in filler dictionary: 3

This is a warning.
**WARNING: CTL file, /home/yanrui/my_mnx/feat/voice_0001.mfc, does not exist, or is empty***

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-09-17
  
  Your unicode editor inserts invisible character 0xfe 0xff to the beginning fo the file and it confuses trainer. You need to use editor that doesn't insert UTF-8 BOF symbols and you need to remove them from the files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-18

I have removed UTF-8 BOF symbols from the files.But there hava some errors also.

MODULE: 00 verify training files
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
Found 88 words using 22 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0168277777777778
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 85
Words in filler dictionary: 3
WARNING: This word:  was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
Something failed: (/home/yanrui/my_mnx/scripts_pl/00.verify/verify_all.pl)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-18

This word:  was in the transcript file, and also in the dictionary.I have checked it yet.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-09-18
  
  This word:  was in the transcript file, and also in the dictionary.I have checked it yet.
  
  Computers rarely lie, they just follow the instructions. If computer says you that word is missing it is really missing. You need to doubt yourself, not computer results.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-18

MODULE: 00 verify training files
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
Found 88 words using 22 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
WARNING: CTL file missing a newline at end of file
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
WARNING: Transcript file missing a newline at end of file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.0168277777777778
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 85
Words in filler dictionary: 3
WARNING: This word: was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
WARNING: This word:  was in the transcript file, but is not in the dictionary ( ~~    ~~ ). Do cases match?
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
Something failed: (/home/yanrui/my_mnx/scripts_pl/00.verify/verify_all.pl)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-18

This is log file.

my_mnx.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-09-18
  
  you need to share etc folder to get help on this issue.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-19

This is my etc folder.

etc.rar

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-09-19

You still have BOF symbols (0xfe 0xff) in your files. You have one in the beginning of the transcription file. Another one is on line 56 in your dictionary file. This is why you get corresponding warnings.

You need to remove BOF symbols or just use newer sphinxtrain instead of old one.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-19

Thanks.How can I remove BOF symbols.I can't see BOF symbols(Oxfe Oxff) in my files.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-09-19
  
  Use editor that displays them. vi for example.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-20

There are some errors also.

my_mnx.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-20

One BOF symbols is on line 56<feff> in my dictionary file,I have found.But in the beginning of the transcription file,I can not find it.</feff>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-21

How can I solve?

my_mnx.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-21

I have used newer sphinxtrain-1.0.8 instead of old one(sphinxtrain-1.0.7).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-09-21
  
  You need to provide more information about your issues. Yyu need to share logdir folder, not just the log. The real error is described there in details.
  
  Most likely you didn't properly install sphinxtrain. You need to edit ld.so.conf to include /usr/local/lib into linker search path or export LD_LIBRARY_PATH configuration variable.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-21

This is my logdir folder.

logdir.tar.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-09-21
  
  Logdir suggests that you do not have enough data to train the model. You need to have at least one hour of recordings.
  
  You can segment and audiobook or radio show for example to get that amount of data.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-21

First of all,than you very much.
I would like to ask you,how to use logdir file found errors.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

yanpeng - 2014-09-23

English and Chinese 10 words can be identified, why Mongolian can not

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Junaid Khan - 2017-06-07

Sir i am trying to train accoustic model for a small set of data but during training it is giving me the same error This word: "mein" was in the transcript file, but is not in the dictionary Do cases match?
This word exisits in the dictionary but i don't know why its generating this error plz sir help me out. Thanks in advance

etc.rar

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-07
  
  Your dictionary has invisible UTF-8 BOM symbols, you need to remove them.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

This is why?

Speech Recognition Toolkit

Forums

Help

This is why? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

This is why?