Hi - So thanks in part to numerous posts here and help from CMU folks, I've
got a nice pocketsphinx solution working on a Virtex5 PowerPC system running
QNX. It is doing recognition on a small grammar (12 commands), using a custom
trained acoustic model.
The recognition is excellent - like 98.5% accurate, but it gives false
positives for basically everything. By false positive, I mean that I can speak
words that aren't in the grammar at all and it will recognize them as words in
the grammar (even if they're not even close).
I'm sure there is something that can be done here by tweaking parameters, or
manipulating the language model, I guess I'm looking for the obvious stuff. I
can post the LM if needs be... it is pretty small.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> The recognition is excellent - like 98.5% accurate
Congratulations
> is something that can be done here by tweaking parameters
In your acoustic model you need to introduce the "garbage" phone
that will represent everything else. Probably you need a few phones for common
specific types of sounds. Then you need to include those "garbage"
words into the grammar with a low probability. Also you need to get posterior
probabilty with ps_get_prob from the pocketsphinx and compare it with some
threshold. Everything here needs tuning unfortunately.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Turns out postingthe LM here is kind of a trainwreck. It doesn't respect new
lines and it does strikethrough for <s> </s> which is all over the
LM. I can email it or post it someplace else if needs be.
M
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Mike. Well, y, after you add a garbage to the phoneset you need to retrain
the model. Then you can either add gargabe words into the dictionary and model
them with lm or add them to a filler dictionary and model them as fillers
automatically inserted after other words. You still need to carefully select
the weight in order to get stable recognition results.
I think we reallly need to prepare a demo on this and train a production model
that could be used with pocketsphinx. It will take some time for me though.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So after a couple of months hiatus, I'm back into looking at this. Just wanted
to check and see if you guys have done any work on a demo for this in the mean
time, or written some documentation about how to do it?
Manipulating the phoneset and dictionary will be easy for me, but changing the
LM is going to be a pain. I used your online tool to make the LM file and
haven't touched it since and can't seem to find any documentation about what
the format of the data in the file actually is.
Another question - right now in my training data all I have is recordings of
valid phrases, so do I now have to go off and record a bunch of stuff I don't
really care about to train as garbage? Essentially everything else that can be
said is garbage as far as I'm concerned...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm back into looking at this. Just wanted to check and see if you guys have
done any work on a demo for this in the mean time, or written some
documentation about how to do it?
No, still pending in todo
Another question - right now in my training data all I have is recordings of
valid phrases, so do I now have to go off and record a bunch of stuff I don't
really care about to train as garbage? Essentially everything else that can be
said is garbage as far as I'm concerned.
Yes, it would be nice to record garbage at least for a testing part of the
database. For training it also makes sense to record typical noises.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi - So thanks in part to numerous posts here and help from CMU folks, I've
got a nice pocketsphinx solution working on a Virtex5 PowerPC system running
QNX. It is doing recognition on a small grammar (12 commands), using a custom
trained acoustic model.
The recognition is excellent - like 98.5% accurate, but it gives false
positives for basically everything. By false positive, I mean that I can speak
words that aren't in the grammar at all and it will recognize them as words in
the grammar (even if they're not even close).
I'm sure there is something that can be done here by tweaking parameters, or
manipulating the language model, I guess I'm looking for the obvious stuff. I
can post the LM if needs be... it is pretty small.
> The recognition is excellent - like 98.5% accurate
Congratulations
> is something that can be done here by tweaking parameters
In your acoustic model you need to introduce the "garbage" phone
that will represent everything else. Probably you need a few phones for common
specific types of sounds. Then you need to include those "garbage"
words into the grammar with a low probability. Also you need to get posterior
probabilty with ps_get_prob from the pocketsphinx and compare it with some
threshold. Everything here needs tuning unfortunately.
So maybe you can help me with how this works. My .phone file would have to
look like:
...
T
V
W
Z
SIL
XXX
And my .dic file would look like:
SLEEP S L IY P
STEP S T EH P
TO T AH
WAKEUP W EY K AH P
GARBAGE XXX
I'll paste my LM in the next post, cuz I have no clue how to manipulate it.
But then what, I'd make these three changes, then do I have to rebuild my
acoustic models?
M
Turns out postingthe LM here is kind of a trainwreck. It doesn't respect new
lines and it does strikethrough for <s> </s> which is all over the
LM. I can email it or post it someplace else if needs be.
M
Hi Mike. Well, y, after you add a garbage to the phoneset you need to retrain
the model. Then you can either add gargabe words into the dictionary and model
them with lm or add them to a filler dictionary and model them as fillers
automatically inserted after other words. You still need to carefully select
the weight in order to get stable recognition results.
I think we reallly need to prepare a demo on this and train a production model
that could be used with pocketsphinx. It will take some time for me though.
Hi Nickolay-
So after a couple of months hiatus, I'm back into looking at this. Just wanted
to check and see if you guys have done any work on a demo for this in the mean
time, or written some documentation about how to do it?
Manipulating the phoneset and dictionary will be easy for me, but changing the
LM is going to be a pain. I used your online tool to make the LM file and
haven't touched it since and can't seem to find any documentation about what
the format of the data in the file actually is.
Another question - right now in my training data all I have is recordings of
valid phrases, so do I now have to go off and record a bunch of stuff I don't
really care about to train as garbage? Essentially everything else that can be
said is garbage as far as I'm concerned...
No, still pending in todo
Yes, it would be nice to record garbage at least for a testing part of the
database. For training it also makes sense to record typical noises.