Menu

Contents of model definition file (mdef)

Help
Balaji
2019-10-13
2020-11-09
  • Balaji

    Balaji - 2019-10-13

    I am going through the mdef.txt file to understand its components and I have the following questions.

    Q1] How are the combinations of phones (base, left and right) in the mdef.txt file created. Is there any liguistic knowledge applied ?
    > For example, the mdef of cmusphinx-en-in-5.2 has
    > 46 n_base
    > 176918 n_tri
    > 707856 n_state_map
    > 5138 n_tied_state
    >
    > whereas the mdef of cmusphinx-en-us-5.2 has
    > 46 n_base
    > 137053 n_tri
    > 548396 n_state_map
    > 5138 n_tied_state

    My question is, after selecting 46 base phones, how are you determining how many trigrams have to be created?

    Q2] I have the same question about n_tied_state = 5138. These are the senones right? How is this number fixed?
    > Additionally, in the forum, I found one more:
    > 42 n_base
    > 137053 n_tri
    > 548380 n_state_map
    > 5126 n_tied_state
    This is exactly 4 phones less - which means number of senones = 5126 (5138 - [43])
    So, is there an arithmetic between the number of CI phones to the number of senones?

    Q3] Is there a mapping between base/lft/rt phones to stateids 1/2/3?

    Thank you.
    Balaji.

     
    • Nickolay V. Shmyrev

      My question is, after selecting 46 base phones, how are you determining how many trigrams have to be created?

      Triphones, not trigrams. Sphinxtrain lists all possible triphones from the dictionary.

      I have the same question about n_tied_state = 5138. These are the senones right? How is this number fixed?

      5000 senones from sphinxtrain configuration + 42 * 3 for base phones = 5126

      Q3] Is there a mapping between base/lft/rt phones to stateids 1/2/3?

      I don't get this question.

       

      Last edit: Nickolay V. Shmyrev 2019-10-13
      • Balaji

        Balaji - 2019-10-14

        > > Sphinxtrain lists all possible triphones from the dictionary.

        From dictionary, each word is picked up and the triphones are created from its transcription. For example,
        abductor AE B D AH K T ER
        We will create 7 triphones like this:
        ~~~
        <sil> AE B
        AE B D
        B D AH
        D AH K
        AH K T
        K T ER
        T ER <sil>
        ~~~</sil></sil>

        Is my understanding correct?

         
      • Balaji

        Balaji - 2019-10-19

        I tried searching the sphinxtrain configuration for the 5000 senones. I got the following from sphinxtrain\etc\sphinx_train.cfg :

        # Number of tied states (senones) to create in decision-tree clustering
        $CFG_N_TIED_STATES = 200;
        

        Can you please mention where the 5000 senones are specified.

         
  • Balaji

    Balaji - 2020-11-07

    Hello,

    I am trying to understand the contents of mdef.txt and need this help:
    Sample lines in mdef.txt are like this:

    #base lft  rt p attrib tmat      ... state id's ...
       AA  AA  AA s    n/a    6    152    205    221 N
       AA  AA  AE s    n/a    6    152    192    221 N
       AA  AA  AH b    n/a    6    156    193    221 N
    
    1. Could somebody please explain textually what does each line mean - the state IDs particularly.
    2. During training (bw program), if there is a training word with Phonemes AA, AE, AH, then will the parameter files (means, variances, mixed weights) be updated against these state ids (152, 156, 192, 193, 205 and 221) by the bw program?

    Thank you.

     

    Last edit: Balaji 2020-11-07
  • Balaji

    Balaji - 2020-11-09

    I haven't got any response, so making the question a little more specific:
    1. How is this mdef file generated - is there any manual editing or a program generates this, after reading the contents of dict file?
    2. In the above 3 lines of model definition, All three lines have AA AA as left and base phones. Then, should the state id (1) be 152, 152 and 152? How come 156 is there on the third line.

    Sorry, if these are very primitive questions. I understand these state ids are the keys to access acoustic model files like means and variance. So, I need to understand how these IDs are assigned to triphones.

    Please help.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.