Menu

Differences between Sphinx3 / SphinxTrain ??

Help
Anonymous
2004-01-20
2012-09-22
  • Anonymous

    Anonymous - 2004-01-20

    Hi ,

    Actually I am very confusing on both Sphinx 3 and Sphinx Train. What is the diff between both ? I know that Sphinx3 is a voice regconition engine .
    My question is :

    1. What is the role of Sphinx Train together with Sphinx3   ?

    2. What is the meaning of training a acoustic model ?

    3. Any complete sample flag files that use by all mk_model_def, init_gau ......etc ... When i try to run mk_model_def to build the model def file, it required 5 flags as input parameter .But the problem is , what is the input for that ? There is no example to tell us what is the correct input format for that ,Any one can help ?
    4. Can Sphinx3 regconize .arpa file or .lm instead of .DMP file ?

    5. Any one have better references or web site for both Sphinx3 or SphinxTrain for better understanding ? (except the official website)

    Thanks in advanced.

    Regards,Chee Leong

     
    • Anonymous

      Anonymous - 2004-01-20

      1+2. All modern HMM recognizers such as Sphinx2 and Sphinx3 need an "acoustic model", consisting of numerical parameters for each triphone (a phone with a specific left and right context) that it needs to recognize.  These parameters can be estimated by analyzing a  large set (typically many hours) of transcribed speech utterances, which process is called "training" an acoustic model.  You must also supply a dictionary that covers all the words in your dataset; a large dictionary is available at CMU (but you must convert it into a format suitable for Sphinx and SphinxTrain).  SphinxTrain is a suite of programs to carry out the many steps needed to do this.

      Both Sphinx2 and Sphinx3 open source distributions come with an existing usable adult acoustic model, so you need to train your own model only if these models are not suitable for your application.

      3. SphinxTrain also includes a set of Perl scripts (in SphinxTrain/scripts_pl) that can be configured to carry out these steps on a speech dataset that you provide.  These scripts set default parameters for the training programs, and more importantly, they specify consistent file names (the outputs of some programs that are then used as inputs to the next programs).

      The SphinxTrain scripts are *very* useful, but contain a few errors and more inconsistencies.  They carry out the steps outlined in the SphinxTrain documentation files found at the SphinxTrain page of the CMU website (but note that some of the model-generating programs described there have been supplanted by a single more general one, mk_mdef_gen).  The scripts are only minimally documented, so using them requires some understanding of the steps in the model training process.

      The SphinxTrain scripts are set up to train semi-continuous acoustic models in Sphinx3 file format, and the final step (09) converts them to Sphinx2 format.  I believe that the semi-continuous models are usable by Sphinx3, and you can train continuous models for Sphinx3 by changing one parameter, $CFG_TYPE, in etc/sphinx_train.cfg.

      4. I believe the answer is yes.

      I hope this helps get you started.
      cheers,
        jerry

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.