Menu

Continuous HMMs in PocketSphinx

Help
creative64
2010-04-30
2012-09-22
  • creative64

    creative64 - 2010-04-30

    Hi,

    I'm using PocketSphinx0.6 on Windows. Have couple of questions.

    1. What is needed for trying a "continuous HMM" based accoustic model ? Are there such models available for US English that
      can be tried with PocketSphinx ?

    2. What changes are required for trying "Floating Point" implementation ?

    3. What is the difference between hub4.5000 and wsj0vp.5000 language models ?

    4. Any reference which briefly describes meanings of various command line options and configuration parameters for
      PocketSphinx ?

    5. Any refrence which briefly describes formats of various databses used in PocketSphinx (Accoustic Models, Language
      Models etc).

    6. For running the decoder, the sample rate settings has to be in synch with the model (Correct ?), How to find sample rates for
      various accoustic models provided with 0.6 and 0.5 versions ?

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    1. What is needed for trying a "continuous HMM" based accoustic model?

    Point continuous model with -hmm option

    Are there such models available for US English that can be tried with
    PocketSphinx ?

    At least

    http://www.speech.cs.cmu.edu/sphinx/models/

    and

    http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphin
    x/

    1. What changes are required for trying "Floating Point" implementation ?

    You mean fixed point? In Linux it's just --enable-fixed configure flag. In
    windows you need
    to make sure ENABLE_FIXED is defined in sphinx_config.h

    1. What is the difference between hub4.5000 and wsj0vp.5000 language models
      ?

    One is trained from HUB4 (Broadcast news task) texts, other from WSJ (Reading
    Wall Street Journal task) texts.

    1. Any reference which briefly describes meanings of various command line
      options and configuration parameters for PocketSphinx ?

    Run pocketsphinx_batch without arguments or look into the sources.

    1. Any refrence which briefly describes formats of various databses used in
      PocketSphinx (Accoustic Models, Language Models etc).

    Acoustic model should have the following files: feat.params, mdef, means,
    variances, transition_matrices, mixture_weights. Instead of mixture_weights
    there could be sendump file. There could be other files like
    feature_transform, kdtrees, noisedict. Their names are self-descriptive I
    think. Details of the format could be found in sources.

    Language model could be in ARPA format or in compressed DMP format.

    Dictionary format is straightforward

    1. For running the decoder, the sample rate settings has to be in synch
      with the model (Correct ?),

    Yes

    How to find sample rates for various accoustic models provided with 0.6 and
    0.5 versions ?

    Usually it's mentioned in model description on the website. Most models are
    16kHz or 8kHz.

     
  • creative64

    creative64 - 2010-05-02

    Thanks nshmyrev. Couple more basic ones:

    1. Does PocketSphinx allow changing dimension of MFCC feature vector ? And does this too have to be in synch with accoustic
      model ?

    2. Is there vendor that does or can supply pocketsphinx compatible accoustic models (or recorded training corpora) for various
      english accents and possibly other languages.

    3. This one is very basic: Does accoustic model have to be tuned for specific tasks at hand or they could be fairly generic and
      still provide fairly accurate results ?

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    1. Does PocketSphinx allow changing dimension of MFCC feature vector ?

    Yes, there are -ceplen and -ncep options as well as -feat for various feature
    types

    And does this too have to be in synch with accoustic model ?

    Yes

    1. Is there vendor that does or can supply pocketsphinx compatible
      accoustic models (or recorded training corpora) for various english accents
      and possibly other languages.

    Most corpora are aquired from LDC
    http://www.ldc.upenn.edu/ or european conterpart
    http://www.elra.info/ and they are quite expensive.
    There are some non-commercial corpora distributed by various organizations.
    Voxforge provides GPL corpora for number of languages.

    Models are specific for the particular task, it's unlikely any company has
    generic one. Maybe only Google has one. Also, commercial companies tune their
    recognition process and their models become incompatible with stock
    pocketsphinx or not easy to plug in.

    1. This one is very basic: Does accoustic model have to be tuned for
      specific tasks at hand or they could be fairly generic and still provide
      fairly accurate results ?

    Yes, tuning is usually applied both for model and for the way recognizer is
    configured. Many components are specific for the particular task. For example
    in far-distance microphone recognition it's critical to have reverberation
    removal component and that makes model hardly compatible with telephone
    models.

     
  • Nickolay V. Shmyrev

    Is there vendor that does or can supply pocketsphinx compatible accoustic
    models

    I mean you are welcome to ask on our linkedin group about that http://www.lin
    kedin.com/groups?gid=2754506
    but
    be ready to negotiate using the whole recognizer instead of just a model.

     
  • creative64

    creative64 - 2010-05-03

    Q: Does PocketSphinx allow changing dimension of MFCC feature vector ?

    Yes, there are -ceplen and -ncep options as well as -feat for various
    feature types

    Q: And does this too have to be in synch with accoustic model ?

    Yes

    1. For running PocketSphinx 0.6 with models provided with it, do I need to specify "accoustic vector, "sampling rate" and other
      Model Specific parameters throgh command line or decoder automatically
      extracts it from the model discription ? I'm currently
      using only -hmm , -lm, -dict and -samprate

    Q: This one is very basic:
    Does accoustic model have to be tuned for specific tasks at hand or they could
    be fairly generic and still provide fairly accurate results ?

    Yes, tuning is usually applied both for model and for the way recognizer is
    configured. Many components are specific for the >>particular task. For
    example in far-distance microphone recognition it's critical to have
    reverberation removal component and that >>makes model hardly compatible with
    telephone models

    1. My question was more on the vocabulary aspect of the application eg If I have a 100 word vocabulary recognition task for say a
      "command and control" type of application (with an FSG grammar), from an
      accuracy point of view, should I still go with a model
      like hub4wsj_sc_8k (or something similar trained on a bigger corpus) or I need
      to create my own model or is there a way to
      customize a bigger model for a smaller vocabulary task ?

    Thanks for your prompt responses.

    Regards.

     
  • Nickolay V. Shmyrev

    1. For running PocketSphinx 0.6 with models provided with it, do I need to
      specify "accoustic vector, "sampling rate" and other Model Specific parameters
      throgh command line or decoder automatically extracts it from the model
      discription ? I'm currently using only -hmm , -lm, -dict and -samprate

    Others are syncronized automatically since default values are used

    1. My question was more on the vocabulary aspect of the application eg If I
      have a 100 word vocabulary recognition task for say a "command and control"
      type of application (with an FSG grammar), from an accuracy point of view,
      should I still go with a model like hub4wsj_sc_8k (or something similar
      trained on a bigger corpus) or I need to create my own model or is there a way
      to customize a bigger model for a smaller vocabulary task ?

    That's rethorical question without numbers, such decision required detailed
    analysis of accuracy, performance and available resources. We usually don't
    recommend to train the model just because it's a error-prone process that
    could take several month. Default models are reasonably good.

     
  • creative64

    creative64 - 2010-05-03

    Thanx a Ton.

     

Log in to post a comment.