Would be great if anyone could throw some light on the following..
1. What configuration changes are to be made to achieve better accuracy.
2. Since I just need to recognize a few commands eg; Next, Back, Menu, Up, Down etc. Would using a JSGF grammar provide better accuracy?
3. The controller application receives, 16K pcm data on a tcp stream. This is fed to ps_process_raw and then checked for hypothesis. What would be a best way to use pocketsphinx in this scenario?
4. Any other recommendations?
The issue with your database is that it has too big silences in the middle of
each recording. Basically you are recognizing isolated words, not words in the
context as you specified in prompts.
Since you have accent, you need to adapt the dictionary, for example add
the(3) DH AE
If you want more accuracy you can use more adaptation data, with 10 times more
adaptation prompts you'll get more accuracy as well.
Since you have big silence and number of insertions is big you can try to
lower word insertion probability with
-wip 1e-3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Though the 'wip' option, but did not give a jump in accuracy, the dictionary
adaptation gave a 6% improvement in accuracy.
Since these adaptations are not part of the standard CMU dictionary, where can
I get these adaptations? or do I have to find them by observing my
pronunciation?
Since in my application needs to recognize just a few words, should I still
train will whole sentences as before or just with the words I need to
recognize?
As mentioned before, my controller application uses pocket sphinx in the same
manner as in freeswitch.
I find 2 issues during each test.
1. The first utterance gives a wrong hypothesis every time.
2. The hypothesis accuracy degrades after a few recognitions.
Any recommendations on how to solve these? I found the same issue mentioned at
in other forums and it seems an improved language mode can solve the latter
issue.
How can I create an improved language model?
Thanks in advance
Rethish
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
. Since these adaptations are not part of the standard CMU dictionary, where
can I get these adaptations? or do I have to find them by observing my
pronunciation?
Unfortunately we don't have any data to provide dictionaries for various
regional dialects of English so you have to create such dictionary yourself.
Standard dictionary only covers US English
The first utterance gives a wrong hypothesis every time. 2. The
hypothesis accuracy degrades after a few recognitions. Any recommendations on
how to solve these?
It looks like the issue with normalization estimation which is discussed
currently in the bug
I can look on this issue, but you need to provide me an example I can
reproduce locally. I don't observe such behaviour here.
This example needs to be self-contained application reading audio from a wav
file.
I found the same issue mentioned at in other forums and it seems an improved
language mode can solve the latter issue. How can I create an improved
language model?
I'm not sure what are you talking about, sorry.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Since in my application needs to recognize just a few words, should I still
train will whole sentences as before or just with the words I need to
recognize?
You should adapt with the sentences containing the words you will use. But you
can use more sentences to get better adaptation. It's preferable to have 20-30
samples of each word.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As recommended at : https://sourceforge.net/projects/cmusphinx/forums/forum/5
470/topic/3962205, Now I am using the hub4_wsj_sc_3s_8k.cd_semi_5000 acoustic
model from http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk
/pocketsphinx-
extra/?view=tar
Tests were done on the hub4_wsj_sc_3s_8k.cd_semi_5000 and adapted
hub4_wsj_sc_3s_8k.cd_semi_5000 models. The results are as below.
hub4_wsj_sc_3s_8k.cd_semi_5000
TOTAL Words: 91 Correct: 56 Errors: 46
TOTAL Percent correct = 61.54% Error = 50.55% Accuracy = 49.45%
TOTAL Insertions: 11 Deletions: 3 Substitutions: 32
Test log: https://docs.google.com/leaf?id=0B7cYCzEYybDSYWI3OWEyODgtMDY3My00OD
c5LWI0MzEtY2FhNTZlMGExZTM4&sort=name&layout=list&num=50
hub4_wsj_sc_3s_8k.cd_semi_5000adapt
TOTAL Words: 91 Correct: 71 Errors: 25
TOTAL Percent correct = 78.02% Error = 27.47% Accuracy = 72.53%
TOTAL Insertions: 5 Deletions: 9 Substitutions: 11
Test log: https://docs.google.com/leaf?id=0B7cYCzEYybDSYjk0ZTc3ZjMtYzE4Yi00ND
k1LWFmMDAtNDZmYmJiNGRiOThm&sort=name&layout=list&num=50
Would be great if anyone could throw some light on the following..
1. What configuration changes are to be made to achieve better accuracy.
2. Since I just need to recognize a few commands eg; Next, Back, Menu, Up, Down etc. Would using a JSGF grammar provide better accuracy?
3. The controller application receives, 16K pcm data on a tcp stream. This is fed to ps_process_raw and then checked for hypothesis. What would be a best way to use pocketsphinx in this scenario?
4. Any other recommendations?
Audio files @ https://docs.google.com/leaf?id=0B7cYCzEYybDSMTRjMDVkNDAtYjEwZi
00Mjc4LWExY2YtMDA3YjRlMWYyZGQ4&sort=name&layout=list&num=50
Files used for training and testing (testing options, list of files,
transcription etc) @ https://docs.google.com/leaf?id=0B7cYCzEYybDSMjJmY2RhNzA
tYjI5Ny00ZTc4LWIyOTctNTdjYjU0OWMzN2U1&sort=name&layout=list&num=50
Adapted acoustic model @ https://docs.google.com/leaf?id=0B7cYCzEYybDSNTVlNDN
kYzEtOTU3OS00NGNmLThmZDctNjJiZTRkOWZlYmI4&sort=name&layout=list&num=50
Thanks in advance
Rethish
The issue with your database is that it has too big silences in the middle of
each recording. Basically you are recognizing isolated words, not words in the
context as you specified in prompts.
Since you have accent, you need to adapt the dictionary, for example add
If you want more accuracy you can use more adaptation data, with 10 times more
adaptation prompts you'll get more accuracy as well.
Since you have big silence and number of insertions is big you can try to
lower word insertion probability with
Hi Nickolay,
Thanks for your attention to the issue.
Though the 'wip' option, but did not give a jump in accuracy, the dictionary
adaptation gave a 6% improvement in accuracy.
Since these adaptations are not part of the standard CMU dictionary, where can
I get these adaptations? or do I have to find them by observing my
pronunciation?
Since in my application needs to recognize just a few words, should I still
train will whole sentences as before or just with the words I need to
recognize?
As mentioned before, my controller application uses pocket sphinx in the same
manner as in freeswitch.
I find 2 issues during each test.
1. The first utterance gives a wrong hypothesis every time.
2. The hypothesis accuracy degrades after a few recognitions.
Any recommendations on how to solve these? I found the same issue mentioned at
in other forums and it seems an improved language mode can solve the latter
issue.
How can I create an improved language model?
Thanks in advance
Rethish
Unfortunately we don't have any data to provide dictionaries for various
regional dialects of English so you have to create such dictionary yourself.
Standard dictionary only covers US English
It looks like the issue with normalization estimation which is discussed
currently in the bug
https://sourceforge.net/tracker/?func=detail&atid=101904&aid=3117707&group_id
=1904
I can look on this issue, but you need to provide me an example I can
reproduce locally. I don't observe such behaviour here.
This example needs to be self-contained application reading audio from a wav
file.
I'm not sure what are you talking about, sorry.
You should adapt with the sentences containing the words you will use. But you
can use more sentences to get better adaptation. It's preferable to have 20-30
samples of each word.
Hi Nickolay,
Thanks a lot for the clarification.
Currently my application is not a standalone one. It receives pcm data from a
tcp stream. I shall try to come up with a standalone demo.
Rethish