Menu

Word Models using pocketSphinx

Help
Anuj Kumar
2013-03-08
2013-03-08
  • Anuj Kumar

    Anuj Kumar - 2013-03-08

    Hello,

    I'm training a new acoustic model using latest sphinxbase-0.8, pocketsphinx-0.8, and sphinxtrain-1.0.8. My recognition task is for isolated English words and I have roughly 325 words in the dictionary. As suggested in the "Building an Acoustic Model" page: http://cmusphinx.sourceforge.net/wiki/tutorialam, I am building a word-dependent phone dictionary for isolated word recognition task i.e.:

    Dictionary looks like:
    WORDA WORDA_1 WORDA_2
    WORDB WORDB_1 WORDB_2
    WORDC WORDC_1 WORDC_2
    ... and so on.

    Phoneset looks like:
    WORDA_1
    WORDA_2
    WORDB_1
    WORDB_2
    WORDC_1
    WORDC_2
    .. and so on.

    While running the training using "sphinxtrain run", it successfully completes the training until Context-Independent Module stage without any error or warning, and then gets stuck on Context-Dependent module at the "Initialization stage". At this point, there are no errors, but it just does not proceed. Would you know why this is happening?

    As an alternative, I tried using the output of CI training in "model_parameters" folder to decode using pocketsphinx_batch, but that failed giving the error as follows:

    INFO: acmod.c(246): Parsed model-specific feature parameters from model_parameters/wordModel.ci_semi/feat.params
    INFO: feat.c(713): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(517): Reading model definition: model_parameters/wordModel.ci_semi/mdef
    ERROR: "bin_mdef.c", line 91: Number of phones exceeds limit: 699 > 255
    INFO: bin_mdef.c(336): Reading binary model definition: model_parameters/wordModel.ci_semi/mdef
    ERROR: "bin_mdef.c", line 359: File format version 1634887022 for model_parameters/wordModel.ci_semi/mdef is newer than library
    ERROR: "acmod.c", line 93: Failed to read acoustic model definition from model_parameters/wordModel.ci_semi/mdef
    FATAL_ERROR: "batch.c", line 819: PocketSphinx decoder init failed

    Any insights into how to resolve either of the above?

    All my data and configuration files from the etc and wav folder can be accessed at: http://db.tt/qA1yEjcM (~110 MB)

    Thanks very much.

     
    • Pranav Jawale

      Pranav Jawale - 2013-03-08

      Hi,

      Why are you using just two phonemes when a word has > 2 phones?
      e.g. ACCIDENT ACCIDENT_1 ACCIDENT_2

      The tutorial asks to build a word dependent phone dictionary. So it could
      be -
      ACCIDENT AE_ACCIDENT K_ACCIDENT S_ACCIDENT AH_ACCIDENT D_ACCIDENT AH_
      ACCIDENT N_ACCIDENT T_ACCIDENT

       
      • Anuj Kumar

        Anuj Kumar - 2013-03-08

        Yes, I should use the number of phones that are needed per-word; however, if they are word-dependent phones, the total number of phones for 325 words exceeds the maximum allowed i.e. 255.

         
  • Nickolay V. Shmyrev

    Please read the messages the software outputs for you. The message you posted says

    ERROR: "bin_mdef.c", line 91: Number of phones exceeds limit: 699 > 255

    You need to reduce the number of phones in a phoneset

     
  • Anuj Kumar

    Anuj Kumar - 2013-03-08

    Is there a theoretical limit on the number of phones that can be used? Why is that?

    In this case, how should I cut the number of phones: I have 325 words in the dictionary, and for a word-based model, in the minimum case, I will have 325 phones. This will still be more than the limit of 255 phones.

    Does this imply that a word model is not appropriate in this case? The reasons why I was building a word-model are: (A) I have a small amount of data (~2 hrs), (B) my data is for users and usage context that is completely different than a lot of other "off-the-shelf" models i.e. children's speech, non-native speakers, 8KHz data, noisy background, etc. So, adapting an existing acoustic model was ruled out, and (C) I have training recordings for all the words that are in the test set.

    Thanks once again.

     
  • Nickolay V. Shmyrev

    Is there a theoretical limit on the number of phones that can be used? Why is that?

    There is a practical limit, not a theoretical one

    I have 325 words in the dictionary, and for a word-based model, in the minimum case, I will have 325 phones.

    Word-dependent models only make sense for ten words or less. For 325 words you should use simple phone-based models

    Does this imply that a word model is not appropriate in this case?

    Yes

    have a small amount of data (~2 hrs)
    I have training recordings for all the words that are in the test set.

    You need to have about 50 samples for each word. So your database must be about 30 hours, not 2 hours

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.