Menu

Using arbitrary features instead of MFC-based

Help
bat10
2010-09-14
2012-09-22
  • bat10

    bat10 - 2010-09-14

    Hi

    I almost know the answer to my question (namely no), but I nevertheless wanted
    to make sure I'm not missing something.

    Is there a way to get Sphinx 3 and SphinxTrain to train and decode using
    arbitrary custom features instead of features derived from the MFC
    coefficients? I would like to "abuse" CMU Sphinx to train and decode based on
    a feature set which is extracted from the sound files by my own routine. What
    I have in mind is something like HTK's ability to cope with any user-defined
    features, as long as they come in the HTK feature file format.

    Thanks for your advice!

     
  • Nickolay V. Shmyrev

    There is no problem to do that. You need to convert your custom features to
    CMUSphinx format.

     
  • bat10

    bat10 - 2010-09-14

    Great! Wow, I'm glad I asked... :-)

    Well, what I don't understand is why I need to tell a bunch of routines
    (init_gau, norm, bw, sphinx3_decode) things like the number of cepstral coeffs
    (-ceplen) again, after I already passed them to wave2feat to extract the base
    features (when MFCC as features)? Is this just for factorizing the full number
    of features into number of vectors times number of features per vector? What
    will I have to set for -ceplen if I use any number of custom features, and
    what influence will -agc and -cmn (for sphinx3_decode for instance) have on
    them? Or, in general, what parameters need to be set to make the sphinx
    modules understand that they should use my custom features and not
    modify/transform them further, as they perhaps would do when the default MFC
    coeffs were used? Can I just set -feat 1s_c (instead of 1s_c_d_dd for example)
    and that's it?

    As you can see, I am confused, but full of hope. :-)

     
  • Nickolay V. Shmyrev

    Well, what I don't understand is why I need to tell a bunch of routines
    (init_gau, norm, bw, sphinx3_decode) things like the number of cepstral coeffs
    (-ceplen) again, after I already passed them to wave2feat to extract the base
    features (when MFCC as features)

    You don't need that, you can just point new feature vector length in config
    file

    Is this just for factorizing the full number of features into number of
    vectors times number of features per vector?

    Sphinx format for feature files doesn't hold the number of vectors, only total
    length of the data

    What will I have to set for -ceplen if I use any number of custom features

    ceplen should be the lenght of your feature vector

    influence will -agc and -cmn (for sphinx3_decode for instance) have on them?

    agc is disabled by default. You can disable CMN if needed in config file.

    Can I just set -feat 1s_c (instead of 1s_c_d_dd for example) and that's it?[/quote]
    
    [quote]$CFG_VECTOR_LENGTH = <vector dimension>;
    $CFG_CMN = 'none'
    $CFG_FEATURE = "1s_c";
    

    Should work

     
  • bat10

    bat10 - 2010-09-15

    You don't need that, you can just point new feature vector length in config
    file

    I don't use the config files, and neither the provided perl wrapper scripts
    (for various reasons). So my case is a bit trickier here.

    Sphinx format for feature files doesn't hold the number of vectors, only
    total length of the data

    I know that the sphinx feature file format only stores the number of features
    in total. Thats why I'm asking. How do the training tools know how to
    calculate delta and delta-delta coeffs, if e.g. 1s_c_d_dd is set? They will
    have to know how the total number of features factorizes into number of
    vectors and number of features per vector, because the deltas are calculated
    for each vector separately, if I get that right. Imagine there are 60 features
    in a feature file. Are these 10 vectors with 6 cepstral coeffs each, or 5
    vectors with 12 coeffs, or ...? That's not clear. This is the only thing that
    comes to my mind as reason why sphinx3_decode (among others) needs the -ceplen
    parameter. But when using custom features, there is no such thing as cepstral
    coefficients. Do you get my point? When dropping the -ceplen option, the
    (wrong) default value will be used and e.g. sphinx3_decode will complain:

    FATAL_ERROR: "kbcore.c", line 633: Feature streamlen(39) != mgau streamlen(60)

    ceplen should be the lenght of your feature vector

    But that doesn't seem to be true: I'm using feature files with 520 features
    for instance, and ceplen (based on online documentation) specifies the number
    of one base cepstral vector, which I MUST set to 20 for sphinx3_decode to work
    in this case. That yields 26 base vectors. But what if I wanted to use 521
    features?

     
  • bat10

    bat10 - 2010-09-15

    Hmmmm, wait a second, I think I got it:

    HMMs need a time-series of feature vectors (26 in my example above). The
    minimum number of them is 9, according to bw/main.c:

    if (n_frame < 9) {
    E_WARN("utt %s too short\n", corpus_utt());
    ...
    }

    So, this means that when using custom features, they must come as a time-
    series of length >= 9. Is that correct? So, I can't train/decode with 521
    arbitrary features, but I must use at least 9 vectors with any number of
    coefficients?

     
  • Nickolay V. Shmyrev

    So, I can't train/decode with 521 arbitrary features, but I must use at
    least 9 vectors with any number of coefficients?

    exactly

     
  • bat10

    bat10 - 2010-09-15

    OK, I see. Then all of the above makes sense.

    One last thing, though: What's the motive behind inhibiting frame numbers
    (numbers of vectors) smaller than 9? Why 9? Could I just remove that
    restricting ckeck (n_frame < 9) in the code and happily work with less, or
    would that compromise the power of the underlying HMMs?

     
  • Nickolay V. Shmyrev

    It's just a heuristic, you can remove it.

     
  • bat10

    bat10 - 2010-09-15

    Alright, thank you for answering.

     

Log in to post a comment.