CMU Sphinx / Forums / Help: Phonemes Model Question

Benjamin Gorman - 2015-07-02

Hi I'm working with pocketsphinx. I have configured it correctly as per the tutorial and I am getting decent accuracy on the test file. However you can see it's not a perfect fit (as expected). I'm looking to focus on phoneme recognition, but limit my accuracy to particular phonemes or just to initial phonemes of words (in the case below it would be G, and F etc).

Is it possible to train a model to focus on particular phonemes, or just phonemes at the beginning of words? Or is there a particular configuration that would help me?

Also the confidence is at 1.0 for each phoneme, does pocketpshinx not deliver confidence for phonemes? I was calculating conf by using the code for word conf from pocketsphinx_continuous.c

ps_seg_frames(iter, &sf, &ef); pprob = ps_seg_prob(iter, NULL, NULL, NULL); conf = logmath_exp(ps_get_logmath(ps), pprob);

Phonemes
~~~~

config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR "/en-us/en-us", "-allphone", MODELDIR "/en-us/en-us-phone.lm.dmp", "-backtrace", "yes", "-beam", "1e-20", "-pbeam", "1e-20", "-lw", "2.0", NULL)

Recognized: SIL G OW F AO R W ER D T AE NG IY IH ZH ER Z S V SIL

SIL 0.000 0.450 1.000000
G 0.460 0.530 1.000000
OW 0.540 0.630 1.000000
F 0.640 0.770 1.000000
AO 0.780 0.850 1.000000
R 0.860 0.930 1.000000
W 0.940 1.000 1.000000
ER 1.010 1.110 1.000000
D 1.120 1.160 1.000000
T 1.170 1.300 1.000000
AE 1.310 1.390 1.000000
NG 1.400 1.560 1.000000
IY 1.570 1.660 1.000000
IH 1.670 1.700 1.000000
ZH 1.710 1.750 1.000000
ER 1.760 1.890 1.000000
Z 1.900 1.950 1.000000
S 1.960 2.100 1.000000
V 2.110 2.150 1.000000
SIL 2.160 2.600 1.000000

**Words**

config = cmd_ln_init(NULL, ps_args(), TRUE, "-hmm", MODELDIR "/en-us/en-us", "-lm", MODELDIR "/en-us/en-us.lm.dmp", "-dict", MODELDIR "/en-us/cmudict-en-us.dict", NULL);

s> 0.000 0.450 0.999900
go 0.460 0.630 0.999600
forward 0.640 1.160 0.999900
ten 1.170 1.520 0.102605
meters 1.530 2.110 0.297887
/s> 2.120 2.600 1.000000
~~~~~~
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-02
  
  I'm looking to focus on phoneme recognition, but limit my accuracy to particular phonemes or just to initial phonemes of words (in the case below it would be G, and F etc).
  
  Not sure what do you mean by "focus" here. Accurate phoneme recognition is a hard problem. Usually some phonemes are easier to recognize, some phones are more confusable.
  
  Also the confidence is at 1.0 for each phoneme, does pocketpshinx not deliver confidence for phonemes?
  
  Unfortunately, phone confidence is not supported yet.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Benjamin Gorman - 2015-07-02

Could you train a model so it was better at say consonant phonemes over vowel phonemes for instance?

Is it instead possible to trace back phonemes from recognised words instead? Using the word confidence to estimate the phoneme confidence?

Last edit: Benjamin Gorman 2015-07-02

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-02
  
  Could you train a model so it was better at say consonant phonemes over vowel phonemes for instance?
  
  No, there is no such thing. Main phoneme confusion is about really acoustically confusable pairs like Z/S or AH/IH or B/P or D/T. It's not about vowel or consonant.
  
  Is it instead possible to trace back phonemes from recognised words instead?
  
  No, word recognizer does not track phonemes.
  
  Using the word confidence to estimate the phoneme confidence?
  
  I do no think it is possible, sorry.
  
  If you want to implement phoneme confidence, you can implement it from phoneme lattice. You will have to write another search to keep track of phoneme lattice though.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Benjamin Gorman - 2015-07-03

Implementing phoneme confidence could be interesting. Do you happen to know any additional info which would get me started on this?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-03
  
  You need to understand the theory of confidence scoring at least from the following overview
  
  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.6890
  
  And you need to understand the code for lattice (dag) construction from decoder search which is available in sphinx3 in srch_allphone.c file.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Phonemes Model Question

Speech Recognition Toolkit

Forums

Help

Phonemes Model Question document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Phonemes Model Question