Menu

strang behavior coefficients

Help
2010-10-14
2012-09-22
  • daniella dias

    daniella dias - 2010-10-14

    Hi!

    I am having the following problem:

    If I increase the number of MFCC the error rate increases too. For example, if
    I use only 3 MFCC, the error rate for sentences is 2%
    If I use 14 MFCC then the error rate increases to 20%!!

    Can anybody help me to fix this problem? thanks in advance!

     
  • Nickolay V. Shmyrev

    If I increase the number of MFCC the error rate increases too. For example,
    if I use only 3 MFCC, the error rate for sentences is 2%. If I use 14 MFCC
    then the error rate increases to 20%!!

    I don't think it's a problem. It's expected behavior

     
  • daniella dias

    daniella dias - 2010-10-14

    Why is it an expected behavior? As far as I know, the more the number of coef.
    better the results will be, right?

     
  • Nickolay V. Shmyrev

    Why is it an expected behavior?

    Why it is not? Given the details you provided I expect such result

    As far as I know, the more the number of coef. better the results will be,
    right?

    No, it depends on amount of data for example and on many more factors you
    didn't take into account.

     
  • daniella dias

    daniella dias - 2010-10-15

    But I used the same amount of data and the same configurations.

    The only configuration changed was the number of coefficients.

    s3decode.pl : -ceplen => 3

    sphinx_train.cfg : $CFG_VECTOR_LENGTH = 3

    make_feats.pl : -ncep 3

    So, please, tell me which other factors could I take into account?

     
  • Nickolay V. Shmyrev

    But I used the same amount of data and the same configurations.

    It's not relevant here. If you use more coefficients you need to use more data
    to train and more data to test. The test data needs to be independent.

     
  • daniella dias

    daniella dias - 2010-10-19

    Ok, thanks for your explanation. But I have tows questions:

    1 - If less coefs is better, why the most of research uses around 13 cofs?

    2 - Could you explain why the time of recognition with 3 coefs is so high?
    While with 14 coefs it takes around 4min, whit 3 it takes around 20min.

    Once again, thank you very much!

     
  • Nickolay V. Shmyrev

    1 - If less coefs is better, why the most of research uses around 13 cofs?

    It's not better to use 2 coefficients. You've got better results only because
    you didn't run sufficient tests, your test setup is not generic enough or too
    biased.

    The choice of the number of MFCCs to include in an ASR system is largely
    empirical. Historically people tried increasing the number of coefficients
    until a law of diminishing returns kicked in. In practice, the optimal number
    of coefficients depends on the quantity of training data, the details of the
    training algorithm (in particular how well the PDFs can be modelled as the
    dimensionality of the feature space increases), the number of Gaussian
    mixtures in the HMMs, the speaker and background noise characteristics, and
    sometimes the available computing resources.

    To understand why any specific number of cepstral coefficients is used, you
    could do worse than look at very early (pre-HMM) papers. When using DTW using
    Euclidean or even Mahalanobis distances, it quickly became apparent that the
    very high cepstral coefficients were not helpful for recognition, and to a
    lesser extent, neither were the very low ones. The most common solution was to
    "lifter" the MFCCs - i.e. apply a weighting function to them to emphasise the
    mid-range coefficients. These liftering functions were "optimised" by a number
    of researchers, but they almost always ended up being close to zero by the
    time you got to the 12th coefficient.

    2 - Could you explain why the time of recognition with 3 coefs is so high?
    While with 14 coefs it takes around 4min, whit 3 it takes around 20min. Once
    again, thank you very much!

    The decoding with 13 coefficients is faster because 2-coefficient model
    doesn't discriminate sounds well enough. So the recognizer has to explore all
    possible decoding results and pruning of bad results doesn't work. With 13
    coefficients all wrong paths are quickly pruned and only valid path survive.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.