Hi, I'm currently trying to use pocketsphinx to create a dictionary that let
users input words by spelling them.
The ideal goal is to able to put all the words in, which is about 16,000
words.
I've looked through a lot of this forum's related posts and tried the
following:
1. kind of a hack -> using FSG with
dict files looking like this:
W-O-R-D //pronunciation omitted
W-O-R-L-D
...
grammar files looking like this:
<testgrammar> = W-O-R-D | W-O-R-L-D | ...; </testgrammar>
result:
putting all the words in results in crash at load time,
as when I tried with putting less words in-> user may utter too slowly for
"reading" a word (as I understand the pronunciation is designed for reading a
word, not spelling it)
use language model created with corpus.txt like this:
W O R D
W O R L D
...
with dict files looking like this:(the full one is all 26 alphabets)
W //pronunciation omitted
O
R
L
D
result:
the recognition error rate is very high and using nbest doesn't help very
much, most of time it hears words that are some alphabets off, and most of the
time it is way off, implementing a spell checker didn't help a lot neither...
And I hope don't have to implement it like some posts mentioned, using
"alpha""beta" etc. for improving the recognition rates.
Can anyone please point me in a better direction?? I'm quite stuck....
I'm a newbie at this, and really appreciate the efforts! This is a great
project!
Thanks a lot!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Approach 2 has more sense but it also requires attention and probably
reimplementation of some algorithms. Pocketsphinx is not really suitable for
short-phone recognition. The algorithm itself requires modification.
So if you really want to do this you are on the right way but you need to
spend more time on identification of the accuracy issues and analyzis of the
ways to fix them.
The first step would be to collect a speech database and measure the accuracy
you have with the current model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I'm currently trying to use pocketsphinx to create a dictionary that let
users input words by spelling them.
The ideal goal is to able to put all the words in, which is about 16,000
words.
I've looked through a lot of this forum's related posts and tried the
following:
1. kind of a hack -> using FSG with
dict files looking like this:
W-O-R-D //pronunciation omitted
W-O-R-L-D
...
grammar files looking like this:
<testgrammar> = W-O-R-D | W-O-R-L-D | ...; </testgrammar>
result:
putting all the words in results in crash at load time,
as when I tried with putting less words in-> user may utter too slowly for
"reading" a word (as I understand the pronunciation is designed for reading a
word, not spelling it)
W O R D
W O R L D
...
with dict files looking like this:(the full one is all 26 alphabets)
W //pronunciation omitted
O
R
L
D
result:
the recognition error rate is very high and using nbest doesn't help very
much, most of time it hears words that are some alphabets off, and most of the
time it is way off, implementing a spell checker didn't help a lot neither...
And I hope don't have to implement it like some posts mentioned, using
"alpha""beta" etc. for improving the recognition rates.
Can anyone please point me in a better direction?? I'm quite stuck....
I'm a newbie at this, and really appreciate the efforts! This is a great
project!
Thanks a lot!
Approach 2 has more sense but it also requires attention and probably
reimplementation of some algorithms. Pocketsphinx is not really suitable for
short-phone recognition. The algorithm itself requires modification.
So if you really want to do this you are on the right way but you need to
spend more time on identification of the accuracy issues and analyzis of the
ways to fix them.
The first step would be to collect a speech database and measure the accuracy
you have with the current model.