Menu

Phonemes Model Question

Help
2015-07-02
2015-07-03
  • Benjamin Gorman

    Benjamin Gorman - 2015-07-02

    Hi I'm working with pocketsphinx. I have configured it correctly as per the tutorial and I am getting decent accuracy on the test file. However you can see it's not a perfect fit (as expected). I'm looking to focus on phoneme recognition, but limit my accuracy to particular phonemes or just to initial phonemes of words (in the case below it would be G, and F etc).

    Is it possible to train a model to focus on particular phonemes, or just phonemes at the beginning of words? Or is there a particular configuration that would help me?

    Also the confidence is at 1.0 for each phoneme, does pocketpshinx not deliver confidence for phonemes? I was calculating conf by using the code for word conf from pocketsphinx_continuous.c

    ps_seg_frames(iter, &sf, &ef);
    pprob = ps_seg_prob(iter, NULL, NULL, NULL);
    conf = logmath_exp(ps_get_logmath(ps), pprob);
    

    Phonemes
    ~~~~

    config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR "/en-us/en-us", "-allphone", MODELDIR "/en-us/en-us-phone.lm.dmp", "-backtrace", "yes", "-beam", "1e-20", "-pbeam", "1e-20", "-lw", "2.0", NULL)

    Recognized: SIL G OW F AO R W ER D T AE NG IY IH ZH ER Z S V SIL

    SIL 0.000 0.450 1.000000
    G 0.460 0.530 1.000000
    OW 0.540 0.630 1.000000
    F 0.640 0.770 1.000000
    AO 0.780 0.850 1.000000
    R 0.860 0.930 1.000000
    W 0.940 1.000 1.000000
    ER 1.010 1.110 1.000000
    D 1.120 1.160 1.000000
    T 1.170 1.300 1.000000
    AE 1.310 1.390 1.000000
    NG 1.400 1.560 1.000000
    IY 1.570 1.660 1.000000
    IH 1.670 1.700 1.000000
    ZH 1.710 1.750 1.000000
    ER 1.760 1.890 1.000000
    Z 1.900 1.950 1.000000
    S 1.960 2.100 1.000000
    V 2.110 2.150 1.000000
    SIL 2.160 2.600 1.000000

    **Words**
    

    config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR "/en-us/en-us", "-lm", MODELDIR "/en-us/en-us.lm.dmp", "-dict", MODELDIR "/en-us/cmudict-en-us.dict", NULL);

    s> 0.000 0.450 0.999900
    go 0.460 0.630 0.999600
    forward 0.640 1.160 0.999900
    ten 1.170 1.520 0.102605
    meters 1.530 2.110 0.297887
    /s> 2.120 2.600 1.000000
    ~~~~~~

     
    • Nickolay V. Shmyrev

      I'm looking to focus on phoneme recognition, but limit my accuracy to particular phonemes or just to initial phonemes of words (in the case below it would be G, and F etc).

      Not sure what do you mean by "focus" here. Accurate phoneme recognition is a hard problem. Usually some phonemes are easier to recognize, some phones are more confusable.

      Also the confidence is at 1.0 for each phoneme, does pocketpshinx not deliver confidence for phonemes?

      Unfortunately, phone confidence is not supported yet.

       
  • Benjamin Gorman

    Benjamin Gorman - 2015-07-02

    Could you train a model so it was better at say consonant phonemes over vowel phonemes for instance?

    Is it instead possible to trace back phonemes from recognised words instead? Using the word confidence to estimate the phoneme confidence?

     

    Last edit: Benjamin Gorman 2015-07-02
    • Nickolay V. Shmyrev

      Could you train a model so it was better at say consonant phonemes over vowel phonemes for instance?

      No, there is no such thing. Main phoneme confusion is about really acoustically confusable pairs like Z/S or AH/IH or B/P or D/T. It's not about vowel or consonant.

      Is it instead possible to trace back phonemes from recognised words instead?

      No, word recognizer does not track phonemes.

      Using the word confidence to estimate the phoneme confidence?

      I do no think it is possible, sorry.

      If you want to implement phoneme confidence, you can implement it from phoneme lattice. You will have to write another search to keep track of phoneme lattice though.

       
  • Benjamin Gorman

    Benjamin Gorman - 2015-07-03

    Implementing phoneme confidence could be interesting. Do you happen to know any additional info which would get me started on this?

     
    • Nickolay V. Shmyrev

      You need to understand the theory of confidence scoring at least from the following overview

      http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.6890

      And you need to understand the code for lattice (dag) construction from decoder search which is available in sphinx3 in srch_allphone.c file.

       

Log in to post a comment.