Menu

Finite State Grammar woes

Help
2009-01-19
2012-09-22
  • Ivan Uemlianin

    Ivan Uemlianin - 2009-01-19

    Dear All

    Thanks for early helps. I am edging forward with using an FSG for my word recogniser, using the python test script. I have a new error: it may be that I am missing something in the config/comand line arguments, or it may be that my acoustic model is no good.

    Command Line:

    Instead of giving an -lm argument, I am setting -mode to "fsg" and setting -fsg to the path to the fsg file. Apart from -hmm, -dict and -fdict, I am giving no other arguments.

    Error Message:

    Comparing the output with output from earlier attempts with an LM, Sphinx seems ot read in ther AM, and the dictionaries OK. I notice I get the same error output if I give a bogus/non-existent filename for the fsg file, so the error can't be to do with the fsg itself. Perhaps I am missing some command line arguments, although this looks like something more to do with the acoustic model.

    Here it is reading in the dictionaries OK, then going pear-shaped:

    ;; This buffer is for notes you don't want to save, and for Lisp evaluation.
    ;; If you want to create a file, visit that file with C-x C-f,
    ;; then enter the text in that file's own buffer.

    INFO: dict.c(475): Reading main dictionary: ../etc/cyfrifiannell.dic
    INFO: dict.c(478): 17 words read
    INFO: dict.c(483): Reading filler dictionary: ../etc/cyfrifiannell.filler
    INFO: dict.c(486): 3 words read
    INFO:   Initialization of dict_t, report:
    INFO:   No of CI phone: 0
    INFO:   Max word: 4116
    INFO:   No of word: 20
    INFO:   
    INFO:   Initialization of fillpen_t, report:
    INFO:   Language weight =9.500000 
    INFO:   Word Insertion Penalty =0.700000 
    INFO:   Silence probability =0.100000 
    INFO:   Filler probability =0.100000 
    INFO:   
    INFO: dict2pid.c(599): Building PID tables for dictionary
    INFO:   Initialization of dict2pid_t, report:
    INFO:   Dict2pid is in composite triphone mode
    INFO:   126 composite states; 18 composite sseq
    INFO:   
    INFO: kbcore.c(632): Inside kbcore: Verifying models consistency ...... 
    INFO: kbcore.c(654): End of Initialization of Core Models:
    INFO:   Initialization of beam_t, report:
    INFO:   Parameters used in Beam Pruning of Viterbi Search:
    INFO:   Beam=-422133
    INFO:   PBeam=-383758
    INFO:   WBeam=-268630 (Skip=0)
    INFO:   WEndBeam=-614012 
    INFO:   No of CI Phone assumed=18 
    INFO:   
    INFO:   Initialization of fast_gmm_t, report:
    INFO:   Parameters used in Fast GMM computation:
    INFO:      Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
    INFO:        GMM-level: CI phone beam -614012. MAX CD 100000
    INFO:   Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -19363
    INFO:   
    INFO:   Initialization of pl_t, report:
    INFO:   Parameters used in phoneme lookahead:
    INFO:   Phoneme look-ahead        type = 0
    INFO:   Phoneme look-ahead beam   size = 65945
    INFO:   No of CI Phones assumed=18 
    INFO:   
    INFO:   Initialization of ascr_t, report:
    INFO:   No. of CI senone =126 
    INFO:   No. of senone = 126
    INFO:   No. of composite senone = 126
    INFO:   No. of senone sequence = 18
    INFO:   No. of composite senone sequence=18 
    INFO:   Parameters used in phoneme lookahead:
    INFO:   Phoneme lookahead window = 1
    INFO:   
    INFO: kb.c(306): SEARCH MODE INDEX 2
    INFO: srch.c(373): Search Initialization. 
    Assertion failed: (n_emit_state <= MAX_HMM_NSTATE), function hmm_context_init, file hmm.c, line 111.
    

    Any help much appreciated. Once I get this working, I'll be sure to write it up and put a howto or similar on the web.

    Thanks and best wishes

    Ivan

     
    • Ivan Uemlianin

      Ivan Uemlianin - 2009-01-19

      Dear All

      Further on! It's not the AM (yet).

      The last line in the error message is:

      Assertion failed: (n_emit_state <= MAX_HMM_NSTATE), function hmm_context_init, file hmm.c, line 111.
      

      MAX_HMM_NSTATE is hardcoded in hmm.h to be 5:

      117:    /** Hardcoded limit on the number of states (temporary) */
      118:    #define MAX_HMM_NSTATE 5
      

      However, the AM I've built is a word model, and the SphinxTrain docs [1] recommend using more hmm states for a word model (I'm using seven). Is there a command-line argument I can set for Sphinx to use more states more hmm, or must I edit and recompile the source?

      Thanks

      Ivan

      [1] from http://www.speech.cs.cmu.edu/sphinxman/FAQ.html

      Q: How many states-per-hmm should I specify for my training?

      A: If you have "difficult" speech (noisy/spontaneous/damaged), use 3-state hmms with a noskip topology. For clean speech you may choose to use any odd number of states, depending on the amount of data you have and the type of acoustic units you are training. If you are training word models, for example, you might be better off using 5 states or higher. 3-5 states are good for shorter acoustic units like phones. You cannot currently train 1 state hmms with the Sphinx.

       
    • Ivan Uemlianin

      Ivan Uemlianin - 2009-01-19

      Dear All

      I've recompiled sphinx3 with a higher MAX_HMM_NSTATE in hmm.h and everything "just works"! Would still like to know if there's a command-line argument I can use instead.

      Sorry for thinking aloud in public!

      Best

      Ivan

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.