You should update dictionary with the words from language model. Look into .lm file, check that words are pointed properly there and has non-zero probabilities. Update dictionary with transcription of the words in language model.
Paste full log next time, often references to the problem are listed not in the last row with error but earlier too.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
Thank's for answering.
The word from my langage model are in my dictionnary. Indeed, the dictionnary have been built from that model.
although I can't find out what is going wrong:
Here are my logs:
INFO: dict.c(463): Reading main dictionary: ../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/Message2/1821.dic
ERROR: "dict.c", line 251: Line 1: Bad ciphone: AX; word A ignored
ERROR: "dict.c", line 251: Line 2: Bad ciphone: EY; word A(2) ignored
ERROR: "dict.c", line 251: Line 3: Bad ciphone: AX; word ABOUT ignored
ERROR: "dict.c", line 251: Line 4: Bad ciphone: B; word BOOK ignored
...
...
INFO: lm.c(592): LM read('../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/test_message.lm.DMP', lw= 9.50, wip= 0.70, uw= 0.70)
INFO: lm.c(594): Reading LM file ../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/test_message.lm.DMP (LM name "default")
INFO: lm_3g_dmp.c(618): Reading LM in 16 bits format
INFO: lm_3g_dmp.c(674): Read 30 unigrams [in memory]
INFO: lm_3g_dmp.c(747): 47 bigrams [on disk]
INFO: lm_3g_dmp.c(820): 58 bigrams [on disk]
INFO: lm_3g_dmp.c(890): 15 bigram prob entries
INFO: lm_3g_dmp.c(924): 14 trigram bowt entries
INFO: lm_3g_dmp.c(955): 8 trigram prob entries
INFO: lm_3g_dmp.c(986): 1 trigram segtable entries (512 segsize)
INFO: lm_3g_dmp.c(1041): 30 word strings
INFO: lm.c(685): The LM routine is operating at 16 bits mode
ERROR: "wid.c", line 282: A is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: ABOUT is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: BOOK is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: BUY is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: CHECK is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: DESTINATION is not a word in dictionary and it is not a class tag.
...
...
INFO: wid.c(292): 28 LM words not in dictionary; ignored
INFO: Initialization of fillpen_t, report:
INFO: Language weight =9.500000
INFO: Word Insertion Penalty =0.700000
INFO: Silence probability =0.100000
INFO: Filler probability =0.020000
INFO:
INFO: dict2pid.c(567): Building PID tables for dictionary
INFO: Initialization of dict2pid_t, report:
INFO: Dict2pid is in composite triphone mode
INFO: 3 composite states; 1 composite sseq
INFO:
INFO: kbcore.c(645): Inside kbcore: Verifying models consistency ......
INFO: kbcore.c(667): End of Initialization of Core Models:
INFO: Initialization of beam_t, report:
INFO: Parameters used in Beam Pruning of Viterbi Search:
INFO: Beam=-307006
INFO: PBeam=-230254
INFO: WBeam=-153503 (Skip=0)
INFO: WEndBeam=-7675
INFO: No of CI Phone assumed=34
INFO:
INFO: Initialization of fast_gmm_t, report:
INFO: Parameters used in Fast GMM computation:
INFO: Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
INFO: GMM-level: CI phone beam -38375. MAX CD 100000
INFO: Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -15350
INFO:
INFO: Initialization of pl_t, report:
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme look-ahead type = 0
INFO: Phoneme look-ahead beam size = 65945
INFO: No of CI Phones assumed=34
INFO:
INFO: Initialization of ascr_t, report:
INFO: No. of CI senone =102
INFO: No. of senone = 602
INFO: No. of composite senone = 3
INFO: No. of senone sequence = 308
INFO: No. of composite senone sequence=1
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme lookahead window = 1
INFO:
INFO: vithist.c(167): Initializing Viterbi-history module
INFO: Initialization of vithist_t, report:
INFO: Word beam = -153503
INFO: Bigram Mode =0
INFO: Rescore Mode =1
INFO: Trace sil Mode =1
INFO:
INFO: srch.c(447): Search Initialization.
WARNING: "srch_time_switch_tree.c", line 166: -Nstalextree is omitted in TST search.
INFO: lextree.c(226): Creating Unigram Table for lm (name: default)
INFO: lextree.c(239): Size of word table after unigram + words in class: 0.
FATAL_ERROR: "lextree.c", line 243: 0 active words in default
I also show you my .lm file since I do have 0.000 probability for bigrams, I don't know if the problem could come from here:
Language model created by QuickLM on Tue Jun 12 04:55:33 EDT 2007
Copyright (c) 1996-2000
Carnegie Mellon University and Alexander I. Rudnicky
This model based on a corpus of 1 sentences and 30 words
The (fixed) discount mass is 0.5
\data\
ngram 1=30
ngram 2=47
ngram 3=58
\1-grams:
-2.4771 </s> -0.3010
-2.4771 <s> -0.2833
-1.5740 A -0.2893
-2.1761 ABOUT -0.2680
-1.8751 BOOK -0.2893
-2.1761 BUY -0.2893
-1.8751 CHECK -0.2863
-2.1761 DESTINATION -0.2680
-2.1761 DOES -0.2981
-2.1761 FLIGHT -0.2680
-1.6990 HAS -0.2923
-1.3979 I -0.2833
-2.1761 IF -0.2893
-2.1761 IN -0.2981
-2.1761 IS -0.2981
-1.8751 LANDED -0.2833
-1.4771 LIKE -0.2553
-1.6990 MADRID -0.2617
-1.6990 OFF -0.2680
-1.4771 PLANE -0.2803
-1.8751 SYDNEY -0.2818
-1.8751 TAKE -0.2923
-2.1761 TAKEN -0.2923
-1.5740 THE -0.2863
-2.1761 THIS -0.2863
-1.6990 TICKET -0.2680
-1.1347 TO -0.2648
-2.1761 WANT -0.2680
-2.1761 WHEN -0.2981
-1.4771 WOULD -0.2863
\2-grams:
-0.3010 <s> I -0.0669
-0.9031 A FLIGHT -0.1891
-0.4260 A TICKET -0.1891
-0.3010 ABOUT TO -0.2808
-0.3010 BOOK A -0.1891
-0.3010 BUY A -0.0969
-0.6021 CHECK IF -0.1891
-0.6021 CHECK THE -0.1891
-0.3010 DESTINATION TO -0.2374
-0.3010 DOES THIS -0.1891
-0.3010 FLIGHT TO -0.2374
-0.4771 HAS LANDED -0.1891
-0.7782 HAS TAKEN -0.1891
-1.0792 I WANT -0.1891
-0.3802 I WOULD -0.1891
-0.3010 IF THE -0.1891
-0.3010 IN DESTINATION -0.1891
-0.3010 IS ABOUT -0.1891
-0.3010 LANDED I -0.0669
-1.0000 LIKE A -0.0969
-0.3979 LIKE TO -0.1891
-0.7782 MADRID HAS -0.2218
-0.7782 MADRID I -0.2632
-0.7782 MADRID THE -0.1891
-0.7782 OFF I -0.0669
-0.7782 OFF THE -0.1891
-0.7782 OFF WHEN -0.1891
-0.6990 PLANE HAS -0.1249
-1.0000 PLANE IN -0.1891
-1.0000 PLANE IS -0.1891
-1.0000 PLANE TAKE -0.1891
-0.9031 SYDNEY </s> -0.3010
-0.4260 SYDNEY I -0.0669
-0.3010 TAKE OFF -0.1249
-0.3010 TAKEN OFF -0.2218
-0.3010 THE PLANE -0.0792
-0.3010 THIS PLANE -0.2553
-0.3010 TICKET TO -0.1891
-1.0414 TO BOOK -0.1891
-1.3424 TO BUY -0.1891
-1.0414 TO CHECK -0.1891
-0.8653 TO MADRID -0.1891
-1.0414 TO SYDNEY -0.1891
-1.3424 TO TAKE -0.1891
-0.3010 WANT TO -0.2596
-0.3010 WHEN DOES 0.0000
-0.3010 WOULD LIKE 0.0000
\3-grams:
-0.3010 <s> I WOULD
-0.3010 A FLIGHT TO
-0.3010 A TICKET TO
-0.3010 ABOUT TO TAKE
-0.6021 BOOK A FLIGHT
-0.6021 BOOK A TICKET
-0.3010 BUY A TICKET
-0.3010 CHECK IF THE
-0.3010 CHECK THE PLANE
-0.3010 DESTINATION TO MADRID
-0.3010 DOES THIS PLANE
-0.3010 FLIGHT TO MADRID
-0.3010 HAS LANDED I
-0.3010 HAS TAKEN OFF
-0.3010 I WANT TO
-0.3010 I WOULD LIKE
-0.3010 IF THE PLANE
-0.3010 IN DESTINATION TO
-0.3010 IS ABOUT TO
-0.3010 LANDED I WOULD
-0.3010 LIKE A TICKET
-0.6021 LIKE TO BOOK
-0.9031 LIKE TO BUY
-0.9031 LIKE TO CHECK
-0.3010 MADRID HAS TAKEN
-0.3010 MADRID I WANT
-0.3010 MADRID THE PLANE
-0.3010 OFF I WOULD
-0.3010 OFF THE PLANE
-0.3010 OFF WHEN DOES
-0.3010 PLANE HAS LANDED
-0.3010 PLANE IN DESTINATION
-0.3010 PLANE IS ABOUT
-0.3010 PLANE TAKE OFF
-0.3010 SYDNEY I WOULD
-0.6021 TAKE OFF I
-0.6021 TAKE OFF THE
-0.3010 TAKEN OFF WHEN
-0.6021 THE PLANE HAS
-0.9031 THE PLANE IN
-0.9031 THE PLANE IS
-0.3010 THIS PLANE TAKE
-0.7782 TICKET TO MADRID
-0.4771 TICKET TO SYDNEY
-0.3010 TO BOOK A
-0.3010 TO BUY A
-0.6021 TO CHECK IF
-0.6021 TO CHECK THE
-0.7782 TO MADRID HAS
-0.7782 TO MADRID I
-0.7782 TO MADRID THE
-0.9031 TO SYDNEY </s>
-0.4260 TO SYDNEY I
-0.3010 TO TAKE OFF
-0.3010 WANT TO CHECK
-0.3010 WHEN DOES THIS
-1.0000 WOULD LIKE A
-0.3979 WOULD LIKE TO
\end\
Best regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sorry, didn't find any ".phone"
but I guess this is not the problem, the error you quoted appears to every phone not just only EY.
I use this dictionnary:
A AX
A(2) EY
ABOUT AX B AW T
BOOK B UH K
BUY B AY
CHECK CH EH K
DESTINATION D EH S T AX N EY SH AX N
DESTINATION(2) D EH S T IX N EY SH AX N
DOES D AH Z
DOES(2) D IX Z
FLIGHT F L AY T
HAS HH AE Z
HAS(2) HH AX Z
I AY
IF IH F
IF(2) IX F
IN IH N
IN(2) IX N
IS IH Z
IS(2) IX Z
LANDED L AE N D AX D
LANDED(2) L AE N D IX D
LIKE L AY K
MADRID M AX D R IH D
OFF AO F
PLANE P L EY N
SYDNEY S IH D N IY
TAKE T EY K
TAKEN T EY K AX N
THE DH AH
THE(2) DH AX
THE(3) DH IY
THIS DH IH S
THIS(2) DH IX S
TICKET T IH K AX T
TICKET(2) T IH K IX T
TO T AX
TO(2) T IX
TO(3) T UW
WANT W AA N T
WANT(2) W AO N T
WHEN HH W EH N
WHEN(2) HH W IH N
WHEN(3) W EH N
WHEN(4) W IH N
WOULD W UH D
but even when I use the CMU dictionnary it gives me the same kind of trouble.
I really don't know what's wrong. I've try to change many things but I didn't get any result.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
phonelist, which is a list of all acoustic units that you want to train models for. The SPHINX does not permit you to have units other than those in your dictionaries. All units in your two dictionaries must be listed here. In other words, your phonelist must have exactly the same units used in your dictionaries, no more and no less. Each phone must be listed on a separate line in the file, begining from the left, with no extra spaces after the phone. an example:
AA
AE
OW
B
CH
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ups, I thought you are working with sphinxtrain :( Are you trying your model with tidigits? Tidigits models are based on different phoneset (from .mdef file):
So if you'd like tidigits you need to transcribe your dictionaries in terms of phoneset above:
AX_one and so on. If you need generic phoneset, use hub4 model in config file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
I am trying to create my own language model.
I'm using the lmtool:
http://www.speech.cs.cmu.edu/tools/lmtool.html
It created a .lm langage model. so I convert it to .DMP file using lm3g2dmp.
Here is my problem. when I start the application sphinx_decoder using sphinx3and my .DMP file, It gives me this error message:
FATAL_ERROR: "lextree.c", line 243: 0 active words in default
Do you have any idea what I might have done wrong or if I have missed a step.
Thanks
Best regards
You should update dictionary with the words from language model. Look into .lm file, check that words are pointed properly there and has non-zero probabilities. Update dictionary with transcription of the words in language model.
Paste full log next time, often references to the problem are listed not in the last row with error but earlier too.
Hi Nickolay,
Thank's for answering.
The word from my langage model are in my dictionnary. Indeed, the dictionnary have been built from that model.
although I can't find out what is going wrong:
Here are my logs:
INFO: dict.c(463): Reading main dictionary: ../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/Message2/1821.dic
ERROR: "dict.c", line 251: Line 1: Bad ciphone: AX; word A ignored
ERROR: "dict.c", line 251: Line 2: Bad ciphone: EY; word A(2) ignored
ERROR: "dict.c", line 251: Line 3: Bad ciphone: AX; word ABOUT ignored
ERROR: "dict.c", line 251: Line 4: Bad ciphone: B; word BOOK ignored
...
...
INFO: lm.c(592): LM read('../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/test_message.lm.DMP', lw= 9.50, wip= 0.70, uw= 0.70)
INFO: lm.c(594): Reading LM file ../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/test_message.lm.DMP (LM name "default")
INFO: lm_3g_dmp.c(618): Reading LM in 16 bits format
INFO: lm_3g_dmp.c(674): Read 30 unigrams [in memory]
INFO: lm_3g_dmp.c(747): 47 bigrams [on disk]
INFO: lm_3g_dmp.c(820): 58 bigrams [on disk]
INFO: lm_3g_dmp.c(890): 15 bigram prob entries
INFO: lm_3g_dmp.c(924): 14 trigram bowt entries
INFO: lm_3g_dmp.c(955): 8 trigram prob entries
INFO: lm_3g_dmp.c(986): 1 trigram segtable entries (512 segsize)
INFO: lm_3g_dmp.c(1041): 30 word strings
INFO: lm.c(685): The LM routine is operating at 16 bits mode
ERROR: "wid.c", line 282: A is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: ABOUT is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: BOOK is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: BUY is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: CHECK is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: DESTINATION is not a word in dictionary and it is not a class tag.
...
...
INFO: wid.c(292): 28 LM words not in dictionary; ignored
INFO: Initialization of fillpen_t, report:
INFO: Language weight =9.500000
INFO: Word Insertion Penalty =0.700000
INFO: Silence probability =0.100000
INFO: Filler probability =0.020000
INFO:
INFO: dict2pid.c(567): Building PID tables for dictionary
INFO: Initialization of dict2pid_t, report:
INFO: Dict2pid is in composite triphone mode
INFO: 3 composite states; 1 composite sseq
INFO:
INFO: kbcore.c(645): Inside kbcore: Verifying models consistency ......
INFO: kbcore.c(667): End of Initialization of Core Models:
INFO: Initialization of beam_t, report:
INFO: Parameters used in Beam Pruning of Viterbi Search:
INFO: Beam=-307006
INFO: PBeam=-230254
INFO: WBeam=-153503 (Skip=0)
INFO: WEndBeam=-7675
INFO: No of CI Phone assumed=34
INFO:
INFO: Initialization of fast_gmm_t, report:
INFO: Parameters used in Fast GMM computation:
INFO: Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
INFO: GMM-level: CI phone beam -38375. MAX CD 100000
INFO: Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -15350
INFO:
INFO: Initialization of pl_t, report:
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme look-ahead type = 0
INFO: Phoneme look-ahead beam size = 65945
INFO: No of CI Phones assumed=34
INFO:
INFO: Initialization of ascr_t, report:
INFO: No. of CI senone =102
INFO: No. of senone = 602
INFO: No. of composite senone = 3
INFO: No. of senone sequence = 308
INFO: No. of composite senone sequence=1
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme lookahead window = 1
INFO:
INFO: vithist.c(167): Initializing Viterbi-history module
INFO: Initialization of vithist_t, report:
INFO: Word beam = -153503
INFO: Bigram Mode =0
INFO: Rescore Mode =1
INFO: Trace sil Mode =1
INFO:
INFO: srch.c(447): Search Initialization.
WARNING: "srch_time_switch_tree.c", line 166: -Nstalextree is omitted in TST search.
INFO: lextree.c(226): Creating Unigram Table for lm (name: default)
INFO: lextree.c(239): Size of word table after unigram + words in class: 0.
FATAL_ERROR: "lextree.c", line 243: 0 active words in default
I also show you my .lm file since I do have 0.000 probability for bigrams, I don't know if the problem could come from here:
Language model created by QuickLM on Tue Jun 12 04:55:33 EDT 2007
Copyright (c) 1996-2000
Carnegie Mellon University and Alexander I. Rudnicky
This model based on a corpus of 1 sentences and 30 words
The (fixed) discount mass is 0.5
\data\ ngram 1=30
ngram 2=47
ngram 3=58
\1-grams:
-2.4771 </s> -0.3010
-2.4771 <s> -0.2833
-1.5740 A -0.2893
-2.1761 ABOUT -0.2680
-1.8751 BOOK -0.2893
-2.1761 BUY -0.2893
-1.8751 CHECK -0.2863
-2.1761 DESTINATION -0.2680
-2.1761 DOES -0.2981
-2.1761 FLIGHT -0.2680
-1.6990 HAS -0.2923
-1.3979 I -0.2833
-2.1761 IF -0.2893
-2.1761 IN -0.2981
-2.1761 IS -0.2981
-1.8751 LANDED -0.2833
-1.4771 LIKE -0.2553
-1.6990 MADRID -0.2617
-1.6990 OFF -0.2680
-1.4771 PLANE -0.2803
-1.8751 SYDNEY -0.2818
-1.8751 TAKE -0.2923
-2.1761 TAKEN -0.2923
-1.5740 THE -0.2863
-2.1761 THIS -0.2863
-1.6990 TICKET -0.2680
-1.1347 TO -0.2648
-2.1761 WANT -0.2680
-2.1761 WHEN -0.2981
-1.4771 WOULD -0.2863
\2-grams:
-0.3010 <s> I -0.0669
-0.9031 A FLIGHT -0.1891
-0.4260 A TICKET -0.1891
-0.3010 ABOUT TO -0.2808
-0.3010 BOOK A -0.1891
-0.3010 BUY A -0.0969
-0.6021 CHECK IF -0.1891
-0.6021 CHECK THE -0.1891
-0.3010 DESTINATION TO -0.2374
-0.3010 DOES THIS -0.1891
-0.3010 FLIGHT TO -0.2374
-0.4771 HAS LANDED -0.1891
-0.7782 HAS TAKEN -0.1891
-1.0792 I WANT -0.1891
-0.3802 I WOULD -0.1891
-0.3010 IF THE -0.1891
-0.3010 IN DESTINATION -0.1891
-0.3010 IS ABOUT -0.1891
-0.3010 LANDED I -0.0669
-1.0000 LIKE A -0.0969
-0.3979 LIKE TO -0.1891
-0.7782 MADRID HAS -0.2218
-0.7782 MADRID I -0.2632
-0.7782 MADRID THE -0.1891
-0.7782 OFF I -0.0669
-0.7782 OFF THE -0.1891
-0.7782 OFF WHEN -0.1891
-0.6990 PLANE HAS -0.1249
-1.0000 PLANE IN -0.1891
-1.0000 PLANE IS -0.1891
-1.0000 PLANE TAKE -0.1891
-0.9031 SYDNEY </s> -0.3010
-0.4260 SYDNEY I -0.0669
-0.3010 TAKE OFF -0.1249
-0.3010 TAKEN OFF -0.2218
-0.3010 THE PLANE -0.0792
-0.3010 THIS PLANE -0.2553
-0.3010 TICKET TO -0.1891
-1.0414 TO BOOK -0.1891
-1.3424 TO BUY -0.1891
-1.0414 TO CHECK -0.1891
-0.8653 TO MADRID -0.1891
-1.0414 TO SYDNEY -0.1891
-1.3424 TO TAKE -0.1891
-0.3010 WANT TO -0.2596
-0.3010 WHEN DOES 0.0000
-0.3010 WOULD LIKE 0.0000
\3-grams:
-0.3010 <s> I WOULD
-0.3010 A FLIGHT TO
-0.3010 A TICKET TO
-0.3010 ABOUT TO TAKE
-0.6021 BOOK A FLIGHT
-0.6021 BOOK A TICKET
-0.3010 BUY A TICKET
-0.3010 CHECK IF THE
-0.3010 CHECK THE PLANE
-0.3010 DESTINATION TO MADRID
-0.3010 DOES THIS PLANE
-0.3010 FLIGHT TO MADRID
-0.3010 HAS LANDED I
-0.3010 HAS TAKEN OFF
-0.3010 I WANT TO
-0.3010 I WOULD LIKE
-0.3010 IF THE PLANE
-0.3010 IN DESTINATION TO
-0.3010 IS ABOUT TO
-0.3010 LANDED I WOULD
-0.3010 LIKE A TICKET
-0.6021 LIKE TO BOOK
-0.9031 LIKE TO BUY
-0.9031 LIKE TO CHECK
-0.3010 MADRID HAS TAKEN
-0.3010 MADRID I WANT
-0.3010 MADRID THE PLANE
-0.3010 OFF I WOULD
-0.3010 OFF THE PLANE
-0.3010 OFF WHEN DOES
-0.3010 PLANE HAS LANDED
-0.3010 PLANE IN DESTINATION
-0.3010 PLANE IS ABOUT
-0.3010 PLANE TAKE OFF
-0.3010 SYDNEY I WOULD
-0.6021 TAKE OFF I
-0.6021 TAKE OFF THE
-0.3010 TAKEN OFF WHEN
-0.6021 THE PLANE HAS
-0.9031 THE PLANE IN
-0.9031 THE PLANE IS
-0.3010 THIS PLANE TAKE
-0.7782 TICKET TO MADRID
-0.4771 TICKET TO SYDNEY
-0.3010 TO BOOK A
-0.3010 TO BUY A
-0.6021 TO CHECK IF
-0.6021 TO CHECK THE
-0.7782 TO MADRID HAS
-0.7782 TO MADRID I
-0.7782 TO MADRID THE
-0.9031 TO SYDNEY </s>
-0.4260 TO SYDNEY I
-0.3010 TO TAKE OFF
-0.3010 WANT TO CHECK
-0.3010 WHEN DOES THIS
-1.0000 WOULD LIKE A
-0.3979 WOULD LIKE TO
\end\
Best regards.
ERROR: "dict.c", line 251: Line 2: Bad ciphone: EY; word A(2) ignored
Look is EY a part of you phoneset? Its in etc/something.phone. Probably your phoneset is different or has wrong format.
sorry, didn't find any ".phone"
but I guess this is not the problem, the error you quoted appears to every phone not just only EY.
I use this dictionnary:
A AX
A(2) EY
ABOUT AX B AW T
BOOK B UH K
BUY B AY
CHECK CH EH K
DESTINATION D EH S T AX N EY SH AX N
DESTINATION(2) D EH S T IX N EY SH AX N
DOES D AH Z
DOES(2) D IX Z
FLIGHT F L AY T
HAS HH AE Z
HAS(2) HH AX Z
I AY
IF IH F
IF(2) IX F
IN IH N
IN(2) IX N
IS IH Z
IS(2) IX Z
LANDED L AE N D AX D
LANDED(2) L AE N D IX D
LIKE L AY K
MADRID M AX D R IH D
OFF AO F
PLANE P L EY N
SYDNEY S IH D N IY
TAKE T EY K
TAKEN T EY K AX N
THE DH AH
THE(2) DH AX
THE(3) DH IY
THIS DH IH S
THIS(2) DH IX S
TICKET T IH K AX T
TICKET(2) T IH K IX T
TO T AX
TO(2) T IX
TO(3) T UW
WANT W AA N T
WANT(2) W AO N T
WHEN HH W EH N
WHEN(2) HH W IH N
WHEN(3) W EH N
WHEN(4) W IH N
WOULD W UH D
but even when I use the CMU dictionnary it gives me the same kind of trouble.
I really don't know what's wrong. I've try to change many things but I didn't get any result.
Hm, I thought there should be phone file. It should just contain the list of phones used. It should be mentioned in etc/sphinx_train.cfg:
$CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
See http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html too:
phonelist, which is a list of all acoustic units that you want to train models for. The SPHINX does not permit you to have units other than those in your dictionaries. All units in your two dictionaries must be listed here. In other words, your phonelist must have exactly the same units used in your dictionaries, no more and no less. Each phone must be listed on a separate line in the file, begining from the left, with no extra spaces after the phone. an example:
AA
AE
OW
B
CH
Ups, I thought you are working with sphinxtrain :( Are you trying your model with tidigits? Tidigits models are based on different phoneset (from .mdef file):
AX_one - - - n/a 0 0 1 2 N
AY_five - - - n/a 1 3 4 5 N
AY_nine - - - n/a 2 6 7 8 N
EH_seven - - - n/a 3 9 10 11 N
EY_eight - - - n/a 4 12 13 14 N
E_seven - - - n/a 5 15 16 17 N
F_five - - - n/a 6 18 19 20 N
F_four - -
So if you'd like tidigits you need to transcribe your dictionaries in terms of phoneset above:
AX_one and so on. If you need generic phoneset, use hub4 model in config file.
yep,
you were right.
I was using the tidigits model.
I was sure I wasn't, newbee mistake ;)
Anyway, thank's for coming to my rescue.
It's working fine now.
Best regards.