I tried to use finite state grammer on pocketsphinx v0.6.
I used the hub4wsj_sc_8k acoustic model supplied with the pocketsphinx source.
The grammar file was pizza.gram. Its contents are as follows:
**#JSGF V1.0;
grammar pizza;
public <startpizza> = i want to order a <size> pizza with <topping>; </topping></size></startpizza>
<size> = small | medium | large; </size>
<topping> = pepperoni | mushrooms | anchovies;
**
This grammar was converted to fsg file by sphinx_jsgf2fsg script available in
sphinxbase
The contents of the dictionary file are:
A AH
A(2) EY
ANCHOVIES AE N CH OW V IY Z
I AY
LARGE L AA R JH
MEDIUM M IY D IY AH M
MUSHROOMS M AH SH R UW M Z
ORDER AO R D ER
PEPPERONI P EH P ER OW N IY
PIZZA P IY T S AH
SMALL S M AO L
TO T UW
TO(2) T IH
TO(3) T AH
WANT W AA N T
WANT(2) W AO N T
WITH W IH DH
WITH(2) W IH TH </topping>
I used the command ./pocketsphinx_continuous -hmm
../../model/hmm/en_US/hub4wsj_sc_8k/ -fsg .pizza.fsg -dict pizza.dic -samprate
8000.
I received the following error:
ERROR: "fsg_search.c", line 322: The word 'i' is missing in the dictionary
Although the dictionary file pizza.dic contains the word 'i'.
Any clue - what is wrong with the above procedure.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, it just doesn't recognize what do you say. Add an option "-rawlogdir ."
to dump the audio is that is recorded (note that dot) and share the recorded
audio.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have created two files testlm.raw and testfsg.raw.
testlm was created by the command:
./pocketsphinx_continuous -hmm ../../model/hmm/en_US/hub4wsj_sc_8k -lm
pizza.lm -dict pizza.dic -dictcase yes -rawlogdir .
testfsg was created by the command:
./pocketsphinx_continuous -hmm ../../model/hmm/en_US/hub4wsj_sc_8k -jsgf
pizza.gram -dict pizza.dic -dictcase yes -rawlogdir .
I had sopken the words "I want to order a pizza".
When n-gram model is used it gets recognized properly but with fsg model
following error is displayed
ERROR: "fsg_search.c", line 1091: Final state not reached in frame 60
000000007: (null) (9481844)
Well, your accent is too different from US English. Our models are trained to
recognize native US speakers. It will not recognize your speech without
adaptation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I agree that my accent is very different from that of US English, but I don't
find it to be a problem when I use pocketsphinx 0.6 with n-gram model. I get
an accuracy of about 80% or so. Only when I use FSG model, I am observing
errors and the decoder is not able to recognize any of the words.
The following errors are displayed:
INFO: cmn_prior.c(121): cmn_prior_update: from < 62.12 1.75 0.63 -0.38 0.41
-0.22 -0.04 0.36 0.51 -0.06 0.41 0.04 0.01 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 61.43 1.62 0.71 -0.33 0.39
-0.16 0.05 0.37 0.53 -0.06 0.42 0.06 0.03 >
INFO: fsg_search.c(1015): 60 frames, 746 HMMs (12/fr), 2425 senones (40/fr),
137 history entries (2/fr)
ERROR: "fsg_search.c", line 1091: Final state not reached in frame 60
000000007: (null) (9481844)
Any other clue?
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
according to your FSG, decoder expects a complete sentence
"I want to order a small/medium/large pizza with
pepperoni/mushrooms/anchovies"
so you are allowed to say
"I want to order a small pizza with pepperoni"
or
"I want to order a medium pizza with mushrooms"
etc
However you are only speaking "I want to order a pizza" and so decoder is not
able to match it
and is coming out (nGram however will be able to handle it better)
For your utterance to be recognized, the grammar should be as follows:
JSGF V1.0;
grammar pizza;
public <startpizza> = i want to order a pizza ; </startpizza>
Hello,
I tried to use finite state grammer on pocketsphinx v0.6.
I used the hub4wsj_sc_8k acoustic model supplied with the pocketsphinx source.
The grammar file was pizza.gram. Its contents are as follows:
**#JSGF V1.0;
grammar pizza;
public <startpizza> = i want to order a <size> pizza with <topping>; </topping></size></startpizza>
<size> = small | medium | large; </size>
<topping> = pepperoni | mushrooms | anchovies;
**
This grammar was converted to fsg file by sphinx_jsgf2fsg script available in
sphinxbase
The contents of the dictionary file are:
A AH
A(2) EY
ANCHOVIES AE N CH OW V IY Z
I AY
LARGE L AA R JH
MEDIUM M IY D IY AH M
MUSHROOMS M AH SH R UW M Z
ORDER AO R D ER
PEPPERONI P EH P ER OW N IY
PIZZA P IY T S AH
SMALL S M AO L
TO T UW
TO(2) T IH
TO(3) T AH
WANT W AA N T
WANT(2) W AO N T
WITH W IH DH
WITH(2) W IH TH </topping>
I used the command ./pocketsphinx_continuous -hmm
../../model/hmm/en_US/hub4wsj_sc_8k/ -fsg .pizza.fsg -dict pizza.dic -samprate
8000.
I received the following error:
ERROR: "fsg_search.c", line 322: The word 'i' is missing in the dictionary
Although the dictionary file pizza.dic contains the word 'i'.
Any clue - what is wrong with the above procedure.
Regards
Pankaj
There was no need to do that, there is -jsgf option and you can use jsgf
directly
Dictionary is case sensitive. It doesn't contain i, it contains I. You can try
-dictcase yes to overcome this
Thanks nshmyrev .
./pocketsphinx_continuous-hmm ../../model/hmm/en_US/hub4wsj_sc_8k -jsgf
pizza.gram -dict pizza.dic -dictcase yes -samprate 8000.
OR
./pocketsphinx_continuous-hmm ../../model/hmm/en_US/hub4wsj_sc_8k -fsg
pizza.fsg -dict pizza.dic -dictcase yes -samprate 8000.
I get the following error for anything I speak
Listening...
Stopped listening, please wait...
INFO: cmn_prior.c(121): cmn_prior_update: from < 62.12 1.75 0.63 -0.38 0.41
-0.22 -0.04 0.36 0.51 -0.06 0.41 0.04 0.01 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 61.43 1.62 0.71 -0.33 0.39
-0.16 0.05 0.37 0.53 -0.06 0.42 0.06 0.03 >
INFO: fsg_search.c(1015): 60 frames, 746 HMMs (12/fr), 2425 senones (40/fr),
137 history entries (2/fr)
ERROR: "fsg_search.c", line 1091: Final state not reached in frame 60
000000007: (null) (9481844)
Now what could be going wrong? I am getting good results with n-gram model but
not with finite state grammar.
Regards
Pankaj
Well, it just doesn't recognize what do you say. Add an option "-rawlogdir ."
to dump the audio is that is recorded (note that dot) and share the recorded
audio.
Hi nshmyrev,
How do I send the recorded audio. I dont see the button for attaching a file.
With regards
Pankaj
You can upload a file on any public file hosting service. Don't forget to give
here a link on it.
Hello nshmyrev,
I have created two files testlm.raw and testfsg.raw.
testlm was created by the command:
./pocketsphinx_continuous -hmm ../../model/hmm/en_US/hub4wsj_sc_8k -lm
pizza.lm -dict pizza.dic -dictcase yes -rawlogdir .
testfsg was created by the command:
./pocketsphinx_continuous -hmm ../../model/hmm/en_US/hub4wsj_sc_8k -jsgf
pizza.gram -dict pizza.dic -dictcase yes -rawlogdir .
I had sopken the words "I want to order a pizza".
When n-gram model is used it gets recognized properly but with fsg model
following error is displayed
ERROR: "fsg_search.c", line 1091: Final state not reached in frame 60
000000007: (null) (9481844)
The links for the files are:
http://www.mediafire.com/file/mnoknmy4vog/testlm.raw
http://www.mediafire.com/file/t3yj0yzgizg/testfsg.raw
http://www.mediafire.com/file/mozjymftmzu/pizza.lm
http://www.mediafire.com/file/mjyxjnnyezz/pizza.gram
http://www.mediafire.com/file/zdnn3dyngnk/pizza.dic
WIth regards
Pankaj
Hi
Well, your accent is too different from US English. Our models are trained to
recognize native US speakers. It will not recognize your speech without
adaptation.
Hi nshmyrev,
I suspected that but when I use n-gram model the recognition is quite good. I
am facing problems with fsg model only.
With regards
Pankaj
Hi nshmyrev,
I agree that my accent is very different from that of US English, but I don't
find it to be a problem when I use pocketsphinx 0.6 with n-gram model. I get
an accuracy of about 80% or so. Only when I use FSG model, I am observing
errors and the decoder is not able to recognize any of the words.
The following errors are displayed:
INFO: cmn_prior.c(121): cmn_prior_update: from < 62.12 1.75 0.63 -0.38 0.41
-0.22 -0.04 0.36 0.51 -0.06 0.41 0.04 0.01 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 61.43 1.62 0.71 -0.33 0.39
-0.16 0.05 0.37 0.53 -0.06 0.42 0.06 0.03 >
INFO: fsg_search.c(1015): 60 frames, 746 HMMs (12/fr), 2425 senones (40/fr),
137 history entries (2/fr)
ERROR: "fsg_search.c", line 1091: Final state not reached in frame 60
000000007: (null) (9481844)
Any other clue?
Regards
Pankaj
Sorry, I need to look closer on your recordings. That will take some time
My guess is:
according to your FSG, decoder expects a complete sentence
"I want to order a small/medium/large pizza with
pepperoni/mushrooms/anchovies"
so you are allowed to say
"I want to order a small pizza with pepperoni"
or
"I want to order a medium pizza with mushrooms"
etc
However you are only speaking "I want to order a pizza" and so decoder is not
able to match it
and is coming out (nGram however will be able to handle it better)
For your utterance to be recognized, the grammar should be as follows:
JSGF V1.0;
grammar pizza;
public <startpizza> = i want to order a pizza ; </startpizza>
<size> = small | medium | large; </size>
<topping> = pepperoni | mushrooms | anchovies; </topping>
try it out.