Menu

Using own base with sphinxtrain

Help
diogovn
2011-01-31
2012-09-22
  • diogovn

    diogovn - 2011-01-31

    Hello everybody

    Im new at Sphinx, but im trying to develop a little application with it. I
    would like it to recognize some commands on my language, so i am using a
    database that i recorded to train the accoustic models with sphinxtrain. But i
    do have some questions, can i bother u guys?

    Ive read that even if the vocabulary is small, its not advisable to train the
    models for entire words. Although in those cases we could train like the
    digits example that comes with sphinx. Ive got less than 100 words, should i
    do that, or train with phonemes?

    Actually, ive already trained with phonemes and the trainning was completed,
    but i got some warnings like:
    ERROR: "c:\tutorial\sphinxtrain\src\libs\libmodinv\gauden.c", line 1700: var
    (mgau= 131, feat= 0, density=3, component=38) < 0
    I saw in another topic here, that this error is due to insuficient trainning
    data. I think not all phonemes got this error, and i still was able to modify
    the HelloWorld application to use by models, many words went pretty well, but
    some words were not recognized at all.

    I recorded each word 20 times, 15 for trainning and 5 for test. I would like
    to ask if thats an acceptable ammount of data. Should i record more samples of
    all the words, or should i record only the ones i had problem with?

    Thanks in advance, and sorry for the long post

     
  • Nickolay V. Shmyrev

    Ive got less than 100 words, should i do that, or train with phonemes?

    Your "less than 100" provides zero information to answer. As tutorial says
    everything below 20 is a small vocabulary everything above 30 is not.

    would like to ask if thats an acceptable ammount of data.

    it's also caused by overestimated number of tied states (senones). See the
    tutorial for details

    I recorded each word 20 times, 15 for trainning and 5 for test.

    It's enough

     
  • diogovn

    diogovn - 2011-02-05

    Hey Nickolay, thx for the answer!
    I made an improvement in my list of phonemes and transcribed the words again,
    that seemed to improved the performance of the system. Wheni ran the test i
    got like 5% word error, and i could see it was just some 5 words that couldnt
    be recognized (like 4-5 errors, in 5 test recordings of each). I guess some
    phonemes werent completely trained or something.. Btw ive got 80 diferent
    words to train.
    I would like to bring another question this time, sphinxtrain extratcted 13
    MFCCs, it there anyway sphinxtrain can also obtain MFCCs derivatives (delta
    and delta-delta)?

     
  • diogovn

    diogovn - 2011-02-05

    Oh, i would also also to ask: even with the 5% error on test, when i run the
    application, sphinx says "it couldnt hear me", so i speak louder until it
    recognizes. Does that msg means the microphone was too low, or it wasnt
    actually able to recognize the word? I mean, cause my mic was on max volume =/

     
  • Nickolay V. Shmyrev

    I guess some phonemes werent completely trained or something.. Btw ive got
    80 diferent words to train.

    That's a lot. You just need more amount of audio to train. There is no sense
    to experiment with the phoneset

    . Does that msg means the microphone was too low, or it wasnt actually able
    to recognize the word? I

    You need to compare your training audio and the audio you are using for a
    test. Maybe your training database was had too loud audio so it's trained to
    recognize only loud. Volume doesn't matter actually until sound recording is
    clipped.

     
  • diogovn

    diogovn - 2011-02-05

    Thanks for the anser! Im working on increasing the trainning set, i was
    thinking in get 10 more recordings and use 5 for trainning and 5 for test, or
    it should be better to use all 10 for trainning?
    Sry to ask again, but is there any way to configure sphinxtrain to also obtain
    the MFCCs derivatives?

     
  • Nickolay V. Shmyrev

    Thanks for the anser! Im working on increasing the trainning set, i was
    thinking in get 10 more recordings and use 5 for trainning and 5 for test, or
    it should be better to use all 10 for trainning?

    Better use all them for test. Train set can be smaller

    Sry to ask again, but is there any way to configure sphinxtrain to also
    obtain the MFCCs derivatives?

    What do you mean "to obtain"? To print on the screen? To save in the file? To
    something else? Sphinxtrain computes derivatives on the fly for example, they
    aren't stored in the feature file.

     
  • diogovn

    diogovn - 2011-02-06

    i mean, when we use make_feats it gets a sequence of 13 dimensional array with
    the MFCCs and thats what we use on the trainning right? i would like to ask
    how to train using 13 MFCCS, 13 MFCC-derivatives and 13 MFCC-2nd order
    derivatives, for example

     
  • Nickolay V. Shmyrev

    and thats what we use on the trainning right?

    No, according to configuration in sphinx_decode.cfg "1s_c_d_dd" it trains with
    derivatives. "d" and "dd" mean that. Derivatives aren't stored in feature
    files they are computed on the fly.

     
  • diogovn

    diogovn - 2011-02-06

    Ohh, now i understand. So, for example, if i didnt wanna use the derivatives
    on trainning i should change that for :
    $CFG_FEATURE = "1s_c";
    And if i wanted to use 11 MFCCs instead, i should change:
    $CFG_VECTOR_LENGTH = 11;
    Is that right?
    By the way, i guess the "c" in "1s_c_d_dd" means cepstral, but what does that
    "1s" mean?

     
  • Nickolay V. Shmyrev

    o, for example, if i didnt wanna use the derivatives on trainning i should
    change that for :
    $CFG_FEATURE = "1s_c";
    And if i wanted to use 11 MFCCs instead, i should change:
    $CFG_VECTOR_LENGTH = 11;
    Is that right?

    Yes

    By the way, i guess the "c" in "1s_c_d_dd" means cepstral, but what does
    that "1s" mean?

    1s means one stream. The disribution can be modelled with number of streams.
    That affects quantization. Either you quantize separately or together. If
    variable ranges are different it's better to have multiple streams. In
    semicontinuous model where quantization is used, 3-4 streams are usually
    employed. In continuous models with no quantization 1 stream is enough.

     
  • diogovn

    diogovn - 2011-02-06

    Thanks for all the help Nickolay!
    Im gonna work more on the system. Sorry for all the bothering =x

     

Log in to post a comment.