Hi People,
I have been working on pocketsphinx to run on Nokia (Symbian) devices. All is
running perfect. I created my own acoustic and language (jsgf) models for
command words. During the training phase, I got 5% of WER, I also got the same
WER running the batch program on the device, it is very good, but when I use
the models to recognize using the microphone, the WER increases to 80%.
Must I configure something to adapt the decoder to the hardware? Are there
some way to know what is going on with the WER?
Thanks a lot for your help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Most likely you are using 'current' cmn in that case you need to select
cmninit value carefully so it will match the typical mean. If there is a
mismatch, accuracy issues can appear. Or you can try prior cmn which doesn't
depend on initial value.
And please avoid using the words "improving accuracy", you have nothing to
improve yet. You just used strange parameters.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay and Pankaj,
I took your recommendations, so I tested several cmninit values for cmn prior,
but the WER is still high. Since the accuracy is good when using
pocketsphinx_batch, I collected the raw files on live decoding, and then I
used these as input on pocketsphinx_batch. With raw data I got newly high WER.
I don't know what can be wrong, I'm using the same parameters from the
decode/slave.pl script.
Thanks a lot for your help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With raw data I got newly high WER. I don't know what can be wrong,
That usually means that your feature extraction parameters during training and
database test decoding doesn't match feature extraction parameters during
decoding. For example, sample rate was not properly set. I suggest you to
check feature values and feature extraction with sphinx_cepview, sphinx_fe and
using mfclogdir. Also check feature extraction parameters in the trainign and
decoding logs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
The problem was that most of training/testing files had a silence interval in
the beginning and in the end, but the utterance captured with pocketphinx
don't have such silence. I removed the silence in all the files and rebuilt
the models. Now I have 6% WER with pocketsphinx_batch and 10% with the mobile
phone. I think it is really good.
Other thing, I have a couple of modification in the build file for Symbian,
can I send to you the diff file to be added to the trunk?
Thanks a lot for your help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi People,
I have been working on pocketsphinx to run on Nokia (Symbian) devices. All is
running perfect. I created my own acoustic and language (jsgf) models for
command words. During the training phase, I got 5% of WER, I also got the same
WER running the batch program on the device, it is very good, but when I use
the models to recognize using the microphone, the WER increases to 80%.
Must I configure something to adapt the decoder to the hardware? Are there
some way to know what is going on with the WER?
Thanks a lot for your help!
Most likely you are using 'current' cmn in that case you need to select
cmninit value carefully so it will match the typical mean. If there is a
mismatch, accuracy issues can appear. Or you can try prior cmn which doesn't
depend on initial value.
And please avoid using the words "improving accuracy", you have nothing to
improve yet. You just used strange parameters.
Hi,
I think CMN PRIOR depends on the initial value, while CMN Current depends only
on the mean of the current utterance.
Pankaj
Hi Nickolay and Pankaj,
I took your recommendations, so I tested several cmninit values for cmn prior,
but the WER is still high. Since the accuracy is good when using
pocketsphinx_batch, I collected the raw files on live decoding, and then I
used these as input on pocketsphinx_batch. With raw data I got newly high WER.
I don't know what can be wrong, I'm using the same parameters from the
decode/slave.pl script.
Thanks a lot for your help!
That usually means that your feature extraction parameters during training and
database test decoding doesn't match feature extraction parameters during
decoding. For example, sample rate was not properly set. I suggest you to
check feature values and feature extraction with sphinx_cepview, sphinx_fe and
using mfclogdir. Also check feature extraction parameters in the trainign and
decoding logs.
Hi Nickolay,
The problem was that most of training/testing files had a silence interval in
the beginning and in the end, but the utterance captured with pocketphinx
don't have such silence. I removed the silence in all the files and rebuilt
the models. Now I have 6% WER with pocketsphinx_batch and 10% with the mobile
phone. I think it is really good.
Other thing, I have a couple of modification in the build file for Symbian,
can I send to you the diff file to be added to the trunk?
Thanks a lot for your help!
Ok, good. This is a problem which would be nice to solve one day.
Sure