CMU Sphinx / Forums / Help: create language model

Hi Nickolay,
Thank's for answering.
The word from my langage model are in my dictionnary. Indeed, the dictionnary have been built from that model.
although I can't find out what is going wrong:
Here are my logs:

INFO: dict.c(463): Reading main dictionary: ../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/Message2/1821.dic
ERROR: "dict.c", line 251: Line 1: Bad ciphone: AX; word A ignored
ERROR: "dict.c", line 251: Line 2: Bad ciphone: EY; word A(2) ignored
ERROR: "dict.c", line 251: Line 3: Bad ciphone: AX; word ABOUT ignored
ERROR: "dict.c", line 251: Line 4: Bad ciphone: B; word BOOK ignored
...
...
INFO: lm.c(592): LM read('../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/test_message.lm.DMP', lw= 9.50, wip= 0.70, uw= 0.70)
INFO: lm.c(594): Reading LM file ../../Sphinx3/sphinx3-0.6/model/hmm/tidigits/test_message.lm.DMP (LM name "default")
INFO: lm_3g_dmp.c(618): Reading LM in 16 bits format
INFO: lm_3g_dmp.c(674): Read 30 unigrams [in memory]
INFO: lm_3g_dmp.c(747): 47 bigrams [on disk]
INFO: lm_3g_dmp.c(820): 58 bigrams [on disk]
INFO: lm_3g_dmp.c(890): 15 bigram prob entries
INFO: lm_3g_dmp.c(924): 14 trigram bowt entries
INFO: lm_3g_dmp.c(955): 8 trigram prob entries
INFO: lm_3g_dmp.c(986): 1 trigram segtable entries (512 segsize)
INFO: lm_3g_dmp.c(1041): 30 word strings
INFO: lm.c(685): The LM routine is operating at 16 bits mode
ERROR: "wid.c", line 282: A is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: ABOUT is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: BOOK is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: BUY is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: CHECK is not a word in dictionary and it is not a class tag.
ERROR: "wid.c", line 282: DESTINATION is not a word in dictionary and it is not a class tag.
...
...
INFO: wid.c(292): 28 LM words not in dictionary; ignored
INFO: Initialization of fillpen_t, report:
INFO: Language weight =9.500000
INFO: Word Insertion Penalty =0.700000
INFO: Silence probability =0.100000
INFO: Filler probability =0.020000
INFO:
INFO: dict2pid.c(567): Building PID tables for dictionary
INFO: Initialization of dict2pid_t, report:
INFO: Dict2pid is in composite triphone mode
INFO: 3 composite states; 1 composite sseq
INFO:
INFO: kbcore.c(645): Inside kbcore: Verifying models consistency ......
INFO: kbcore.c(667): End of Initialization of Core Models:
INFO: Initialization of beam_t, report:
INFO: Parameters used in Beam Pruning of Viterbi Search:
INFO: Beam=-307006
INFO: PBeam=-230254
INFO: WBeam=-153503 (Skip=0)
INFO: WEndBeam=-7675
INFO: No of CI Phone assumed=34
INFO:
INFO: Initialization of fast_gmm_t, report:
INFO: Parameters used in Fast GMM computation:
INFO: Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
INFO: GMM-level: CI phone beam -38375. MAX CD 100000
INFO: Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -15350
INFO:
INFO: Initialization of pl_t, report:
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme look-ahead type = 0
INFO: Phoneme look-ahead beam size = 65945
INFO: No of CI Phones assumed=34
INFO:
INFO: Initialization of ascr_t, report:
INFO: No. of CI senone =102
INFO: No. of senone = 602
INFO: No. of composite senone = 3
INFO: No. of senone sequence = 308
INFO: No. of composite senone sequence=1
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme lookahead window = 1
INFO:
INFO: vithist.c(167): Initializing Viterbi-history module
INFO: Initialization of vithist_t, report:
INFO: Word beam = -153503
INFO: Bigram Mode =0
INFO: Rescore Mode =1
INFO: Trace sil Mode =1
INFO:
INFO: srch.c(447): Search Initialization.
WARNING: "srch_time_switch_tree.c", line 166: -Nstalextree is omitted in TST search.
INFO: lextree.c(226): Creating Unigram Table for lm (name: default)
INFO: lextree.c(239): Size of word table after unigram + words in class: 0.
FATAL_ERROR: "lextree.c", line 243: 0 active words in default

I also show you my .lm file since I do have 0.000 probability for bigrams, I don't know if the problem could come from here:

Language model created by QuickLM on Tue Jun 12 04:55:33 EDT 2007
Copyright (c) 1996-2000
Carnegie Mellon University and Alexander I. Rudnicky

This model based on a corpus of 1 sentences and 30 words
The (fixed) discount mass is 0.5

\data\
ngram 1=30
ngram 2=47
ngram 3=58

\1-grams:
-2.4771 </s> -0.3010
-2.4771 <s> -0.2833
-1.5740 A -0.2893
-2.1761 ABOUT -0.2680
-1.8751 BOOK -0.2893
-2.1761 BUY -0.2893
-1.8751 CHECK -0.2863
-2.1761 DESTINATION -0.2680
-2.1761 DOES -0.2981
-2.1761 FLIGHT -0.2680
-1.6990 HAS -0.2923
-1.3979 I -0.2833
-2.1761 IF -0.2893
-2.1761 IN -0.2981
-2.1761 IS -0.2981
-1.8751 LANDED -0.2833
-1.4771 LIKE -0.2553
-1.6990 MADRID -0.2617
-1.6990 OFF -0.2680
-1.4771 PLANE -0.2803
-1.8751 SYDNEY -0.2818
-1.8751 TAKE -0.2923
-2.1761 TAKEN -0.2923
-1.5740 THE -0.2863
-2.1761 THIS -0.2863
-1.6990 TICKET -0.2680
-1.1347 TO -0.2648
-2.1761 WANT -0.2680
-2.1761 WHEN -0.2981
-1.4771 WOULD -0.2863

\2-grams:
-0.3010 <s> I -0.0669
-0.9031 A FLIGHT -0.1891
-0.4260 A TICKET -0.1891
-0.3010 ABOUT TO -0.2808
-0.3010 BOOK A -0.1891
-0.3010 BUY A -0.0969
-0.6021 CHECK IF -0.1891
-0.6021 CHECK THE -0.1891
-0.3010 DESTINATION TO -0.2374
-0.3010 DOES THIS -0.1891
-0.3010 FLIGHT TO -0.2374
-0.4771 HAS LANDED -0.1891
-0.7782 HAS TAKEN -0.1891
-1.0792 I WANT -0.1891
-0.3802 I WOULD -0.1891
-0.3010 IF THE -0.1891
-0.3010 IN DESTINATION -0.1891
-0.3010 IS ABOUT -0.1891
-0.3010 LANDED I -0.0669
-1.0000 LIKE A -0.0969
-0.3979 LIKE TO -0.1891
-0.7782 MADRID HAS -0.2218
-0.7782 MADRID I -0.2632
-0.7782 MADRID THE -0.1891
-0.7782 OFF I -0.0669
-0.7782 OFF THE -0.1891
-0.7782 OFF WHEN -0.1891
-0.6990 PLANE HAS -0.1249
-1.0000 PLANE IN -0.1891
-1.0000 PLANE IS -0.1891
-1.0000 PLANE TAKE -0.1891
-0.9031 SYDNEY </s> -0.3010
-0.4260 SYDNEY I -0.0669
-0.3010 TAKE OFF -0.1249
-0.3010 TAKEN OFF -0.2218
-0.3010 THE PLANE -0.0792
-0.3010 THIS PLANE -0.2553
-0.3010 TICKET TO -0.1891
-1.0414 TO BOOK -0.1891
-1.3424 TO BUY -0.1891
-1.0414 TO CHECK -0.1891
-0.8653 TO MADRID -0.1891
-1.0414 TO SYDNEY -0.1891
-1.3424 TO TAKE -0.1891
-0.3010 WANT TO -0.2596
-0.3010 WHEN DOES 0.0000
-0.3010 WOULD LIKE 0.0000

\3-grams:
-0.3010 <s> I WOULD
-0.3010 A FLIGHT TO
-0.3010 A TICKET TO
-0.3010 ABOUT TO TAKE
-0.6021 BOOK A FLIGHT
-0.6021 BOOK A TICKET
-0.3010 BUY A TICKET
-0.3010 CHECK IF THE
-0.3010 CHECK THE PLANE
-0.3010 DESTINATION TO MADRID
-0.3010 DOES THIS PLANE
-0.3010 FLIGHT TO MADRID
-0.3010 HAS LANDED I
-0.3010 HAS TAKEN OFF
-0.3010 I WANT TO
-0.3010 I WOULD LIKE
-0.3010 IF THE PLANE
-0.3010 IN DESTINATION TO
-0.3010 IS ABOUT TO
-0.3010 LANDED I WOULD
-0.3010 LIKE A TICKET
-0.6021 LIKE TO BOOK
-0.9031 LIKE TO BUY
-0.9031 LIKE TO CHECK
-0.3010 MADRID HAS TAKEN
-0.3010 MADRID I WANT
-0.3010 MADRID THE PLANE
-0.3010 OFF I WOULD
-0.3010 OFF THE PLANE
-0.3010 OFF WHEN DOES
-0.3010 PLANE HAS LANDED
-0.3010 PLANE IN DESTINATION
-0.3010 PLANE IS ABOUT
-0.3010 PLANE TAKE OFF
-0.3010 SYDNEY I WOULD
-0.6021 TAKE OFF I
-0.6021 TAKE OFF THE
-0.3010 TAKEN OFF WHEN
-0.6021 THE PLANE HAS
-0.9031 THE PLANE IN
-0.9031 THE PLANE IS
-0.3010 THIS PLANE TAKE
-0.7782 TICKET TO MADRID
-0.4771 TICKET TO SYDNEY
-0.3010 TO BOOK A
-0.3010 TO BUY A
-0.6021 TO CHECK IF
-0.6021 TO CHECK THE
-0.7782 TO MADRID HAS
-0.7782 TO MADRID I
-0.7782 TO MADRID THE
-0.9031 TO SYDNEY </s>
-0.4260 TO SYDNEY I
-0.3010 TO TAKE OFF
-0.3010 WANT TO CHECK
-0.3010 WHEN DOES THIS
-1.0000 WOULD LIKE A
-0.3979 WOULD LIKE TO

\end\

Best regards.

sorry, didn't find any ".phone"
but I guess this is not the problem, the error you quoted appears to every phone not just only EY.
I use this dictionnary:
A AX
A(2) EY
ABOUT AX B AW T
BOOK B UH K
BUY B AY
CHECK CH EH K
DESTINATION D EH S T AX N EY SH AX N
DESTINATION(2) D EH S T IX N EY SH AX N
DOES D AH Z
DOES(2) D IX Z
FLIGHT F L AY T
HAS HH AE Z
HAS(2) HH AX Z
I AY
IF IH F
IF(2) IX F
IN IH N
IN(2) IX N
IS IH Z
IS(2) IX Z
LANDED L AE N D AX D
LANDED(2) L AE N D IX D
LIKE L AY K
MADRID M AX D R IH D
OFF AO F
PLANE P L EY N
SYDNEY S IH D N IY
TAKE T EY K
TAKEN T EY K AX N
THE DH AH
THE(2) DH AX
THE(3) DH IY
THIS DH IH S
THIS(2) DH IX S
TICKET T IH K AX T
TICKET(2) T IH K IX T
TO T AX
TO(2) T IX
TO(3) T UW
WANT W AA N T
WANT(2) W AO N T
WHEN HH W EH N
WHEN(2) HH W IH N
WHEN(3) W EH N
WHEN(4) W IH N
WOULD W UH D

but even when I use the CMU dictionnary it gives me the same kind of trouble.

I really don't know what's wrong. I've try to change many things but I didn't get any result.

create language model

Speech Recognition Toolkit

Forums

Help

create language model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

create language model