Karim - 2020-12-11

Hi,

I plan to use PocketSphinx for prototyping basic control/command language for controlling items in a room (screen,n ,lighting, speaker..).

Therefore I've created basic corpus file suchas :

do
down
reset
one
two
light
lights
screen
screens
set
...

Then, following the tutorials, I've created the dict using

g2p-seq2seq --decode /media/cms/corpus.txt --model_dir ./g2p-seq2seq-model-6.2-cmudict-nostress/ --output /media/cms/cms.dic

I finally create the lm file using

ngram-count -interpolate -text cms.dic -lm cms.lm

First at all, I hope I did not make mistakes in the previous steps.

I tred the recognition using pocketsphinx_continuous on windows:

pocketsphinx_continuous -inmic yes -hmm en-us\en-us -lm cms\cms.lm.bin -dict cms\cms.dic

It works. But there are some confusions with the word "do". Pocketsphinx always detects "two" instead.
The tokens screens and lights (plural) are always detected as "light set" and "screen set"

My question, is there a way to tune the creation of the lm so that I coul get muich accurate results ?

Thanks for your feedback.

K.