CMU Sphinx / Forums / Help: Finite State Grammar woes

Dear All

Thanks for early helps. I am edging forward with using an FSG for my word recogniser, using the python test script. I have a new error: it may be that I am missing something in the config/comand line arguments, or it may be that my acoustic model is no good.

Command Line:

Instead of giving an -lm argument, I am setting -mode to "fsg" and setting -fsg to the path to the fsg file. Apart from -hmm, -dict and -fdict, I am giving no other arguments.

Error Message:

Comparing the output with output from earlier attempts with an LM, Sphinx seems ot read in ther AM, and the dictionaries OK. I notice I get the same error output if I give a bogus/non-existent filename for the fsg file, so the error can't be to do with the fsg itself. Perhaps I am missing some command line arguments, although this looks like something more to do with the acoustic model.

Here it is reading in the dictionaries OK, then going pear-shaped:

;; This buffer is for notes you don't want to save, and for Lisp evaluation.
;; If you want to create a file, visit that file with C-x C-f,
;; then enter the text in that file's own buffer.

INFO: dict.c(475): Reading main dictionary: ../etc/cyfrifiannell.dic
INFO: dict.c(478): 17 words read
INFO: dict.c(483): Reading filler dictionary: ../etc/cyfrifiannell.filler
INFO: dict.c(486): 3 words read
INFO:   Initialization of dict_t, report:
INFO:   No of CI phone: 0
INFO:   Max word: 4116
INFO:   No of word: 20
INFO:   
INFO:   Initialization of fillpen_t, report:
INFO:   Language weight =9.500000 
INFO:   Word Insertion Penalty =0.700000 
INFO:   Silence probability =0.100000 
INFO:   Filler probability =0.100000 
INFO:   
INFO: dict2pid.c(599): Building PID tables for dictionary
INFO:   Initialization of dict2pid_t, report:
INFO:   Dict2pid is in composite triphone mode
INFO:   126 composite states; 18 composite sseq
INFO:   
INFO: kbcore.c(632): Inside kbcore: Verifying models consistency ...... 
INFO: kbcore.c(654): End of Initialization of Core Models:
INFO:   Initialization of beam_t, report:
INFO:   Parameters used in Beam Pruning of Viterbi Search:
INFO:   Beam=-422133
INFO:   PBeam=-383758
INFO:   WBeam=-268630 (Skip=0)
INFO:   WEndBeam=-614012 
INFO:   No of CI Phone assumed=18 
INFO:   
INFO:   Initialization of fast_gmm_t, report:
INFO:   Parameters used in Fast GMM computation:
INFO:      Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
INFO:        GMM-level: CI phone beam -614012. MAX CD 100000
INFO:   Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -19363
INFO:   
INFO:   Initialization of pl_t, report:
INFO:   Parameters used in phoneme lookahead:
INFO:   Phoneme look-ahead        type = 0
INFO:   Phoneme look-ahead beam   size = 65945
INFO:   No of CI Phones assumed=18 
INFO:   
INFO:   Initialization of ascr_t, report:
INFO:   No. of CI senone =126 
INFO:   No. of senone = 126
INFO:   No. of composite senone = 126
INFO:   No. of senone sequence = 18
INFO:   No. of composite senone sequence=18 
INFO:   Parameters used in phoneme lookahead:
INFO:   Phoneme lookahead window = 1
INFO:   
INFO: kb.c(306): SEARCH MODE INDEX 2
INFO: srch.c(373): Search Initialization. 
Assertion failed: (n_emit_state &lt;= MAX_HMM_NSTATE), function hmm_context_init, file hmm.c, line 111.

Any help much appreciated. Once I get this working, I'll be sure to write it up and put a howto or similar on the web.

Thanks and best wishes

Ivan

Dear All

Further on! It's not the AM (yet).

The last line in the error message is:

Assertion failed: (n_emit_state &lt;= MAX_HMM_NSTATE), function hmm_context_init, file hmm.c, line 111.

MAX_HMM_NSTATE is hardcoded in hmm.h to be 5:

117:    /** Hardcoded limit on the number of states (temporary) */
118:    #define MAX_HMM_NSTATE 5

However, the AM I've built is a word model, and the SphinxTrain docs [1] recommend using more hmm states for a word model (I'm using seven). Is there a command-line argument I can set for Sphinx to use more states more hmm, or must I edit and recompile the source?

Thanks

Ivan

[1] from http://www.speech.cs.cmu.edu/sphinxman/FAQ.html

Q: How many states-per-hmm should I specify for my training?

A: If you have "difficult" speech (noisy/spontaneous/damaged), use 3-state hmms with a noskip topology. For clean speech you may choose to use any odd number of states, depending on the amount of data you have and the type of acoustic units you are training. If you are training word models, for example, you might be better off using 5 states or higher. 3-5 states are good for shorter acoustic units like phones. You cannot currently train 1 state hmms with the Sphinx.

Finite State Grammar woes

Speech Recognition Toolkit

Forums

Help

Finite State Grammar woes document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Finite State Grammar woes