Menu

Build mllr regression classes using sphinx

2010-12-28
2012-09-22
  • Michael Betser

    Michael Betser - 2010-12-28

    When having a large model it is recommended to use more than one regression
    class for mllr adaptation, depending on the quantity of adaptation data
    available. For this, one needs a clustering technic to group gaussians in a
    "meaningfull" way. Since there is already a clustering algorithm implemented
    in sphinx, ie the one used for state tying, my question is: can this algorithm
    be used for mllr clustering?

    Here is my idea:
    1) Say that we have a tied model, but without the tree used to build it. One
    need first to recreate a tree, using make_quests and bldtree. It seems that
    bldtree can accept tied models with a few minor modifications (1).

    2) run bw on adaptation data. We just need now a function which will cut the
    previous tree according to the number of mllr classes needed and to the amount
    of available data (summarized in mixw_counts from the bw step).

    What do you think? Will this clustering give meaningfull results in the mllr
    adaptation sense? Personnally, I don't see why it would be worse than any
    other clustering algorithm.

    If you agree, then with only one slightly complex function, one could add the
    mllr class building in sphinx!

    Best wishes for the new year,

    Michael Betser

    (1) Here are the modifications I made in order to make bldtree work with tied
    models:
    in main.c line 251 and following

       smax = mdef->defn[p_s].state[0]; 
        for (p = p_s, i = mdef->defn[p_s].state[0]-1; p <= p_e; p++) {
        for (j = 0; j < mdef->defn[p].n_state; j++) {
    
            if (mdef->defn[p].state[j] != TYING_NON_EMITTING) {
                        /* MB: count the max senone index, since in tied models, 
                 *        states are not consecutive numbers */ 
                if (mdef->defn[p].state[j] > smax)
                    smax = mdef->defn[p].state[j];
                /* MB: removed for tied state model definition */
            /*if (mdef->defn[p].state[j] != i+1) {
                E_ERROR("States in triphones for %s are not consecutive\n", phn);
    
                return S3_ERROR;
            }*/
    
            i = mdef->defn[p].state[j];
            }
        }
        }
    
        /*ADDITION - CHECK FOR NUMBER OF OCCURANCES OF STATE */
        cntflag = (char *)ckd_calloc(p_e-p_s+1,sizeof(char));
        cntthreshold = *(float32 *)cmd_ln_access("-cntthresh");
        /* END ADDITION - CHECK FOR NUMBER OF OCCURANCES OF STATE */
    
        /* Find first and last mixing weight used for p_s through p_e */
        mixw_s = mdef->defn[p_s].state[0];
        mixw_e = smax; /* MB: in order to make it work with tied model definition */
        //mixw_e = mdef->defn[p_e].state[mdef->defn[p_e].n_state-2];
    
     
  • Nickolay V. Shmyrev

    I think its a great idea. It would be very cool to turn it to a complete
    working patch.

     

Log in to post a comment.