Why do you put -samprate 8000? try without it. Actually you can just run
pocketsphinx_continuous without any arguments.
It could be an accent issue but still with my accent (full.wma) is 100%
error rate OK or I'm missing something ?
100% error rate couldn't be due of the accent. It's a bug/incorrect setup.
it's most likely a bug in windows code that doesn't properly input audio at
8kHz
Is there something I can do to improve recognition accuracy (other than
adaptation and reducing vocabulary) ?
Yes, first of all you could try to fix the bug
Why the dumps generated by pocketsphinx have repeatitions of words/parts
of words ?
Due to the bug I think
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was putting it to match input sampling rate to accoustic model. It is not required looks like !
I tried without it too. Nothing changed on the accuracy front however the dump
generated now sounds little better. It still has
echos, clicks and repeatitions though.
Just to bypass any possible issues associated with the live input mode, I
tried pocketsphinx_batch with following parameters:
Accoustic model: hub4wsj_sc_8k
Language model: wsj0vp.5000.dmp
Dictionary: cmu07a.dic
I used audio clips preset in \test\data directory of pocketsphinx0.6 package
(numbers.raw, goforward.raw, something.raw) these
clips are in american native accent.
Here are the results:
numbers.raw ("Thirty three four six ninety two") Recognized output: "Thirty
three four six nine to two "
goforward.raw ("Go forward ten meters") Recognized output: "Go forward and
users"
something.raw ("Go somewhere and do something") Recognized output " Though
some wear and you something "
Is this the expected accuracy level or there is still something amiss ?
How can I improve on this ?
Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is this the expected accuracy level or there is still something amiss ?
This is not an accuracy. Accuracy is a number measured on test database that
represents acoustic properties. Acoustic test database doesn't need to be
large but it should be bigger than 3 sentences.
The result itself is expected, I get exactly the same here on Linux.
How can I improve on this ?
Obvious thing here is that you are trying to decode commands and number
sequences with language model that's not very suiable for that. WSJ model is
trained for dictation task with newspaper texts. Careful system design, better
language and acoustic models, new features, adaptation and postprocessing.
That are the actions that make system usable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for validating my output. Atleast I now know that the pocketsphinx is
properly up and running on my system. Yes improving
accuracy is a big task with many tweaking handles. Will be trying some of
those.
Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm running PocketSphinx0.6 in WIndows (7) with following parameters (Build is
default (Floating point)):
pocketsphinx_continuous -hmm hub4wsj_sc_8k -lm wsj0vp.5000.dmp -dict
cmu07a.dic -samprate 8000 -rawlogdir .
I speak following one by one
stop, welcome, turn left, turn right, go home, bye bye.
Following dumps created by pocketsphinx are uploaded:
http://www.mediafire.com/file/2yxkzzj2l4i/go_home.raw
http://www.mediafire.com/file/zqwzr2jmatw/stop.raw
http://www.mediafire.com/file/uwqzljmmgnj/turn_left.raw
http://www.mediafire.com/file/tytom2fymmn/turn_right.raw
http://www.mediafire.com/file/4zjxmlaze1w/welcome.raw
http://www.mediafire.com/file/gmztiyvimky/bye_bye.raw
Following is the recording done (in parallel) in windows:
http://www.mediafire.com/file/yminqwmyzjz/full.wma
I'm consistently getting none of these (and many others that I tried)
words/phrases reconized correctly.
It could be an accent issue but still with my accent (full.wma) is 100% error rate OK or I'm missing something ?
Is there something I can do to improve recognition accuracy (other than adaptation and reducing vocabulary) ?
Why the dumps generated by pocketsphinx have repeatitions of words/parts of words ?
Thanks and regards.
Why do you put -samprate 8000? try without it. Actually you can just run
pocketsphinx_continuous without any arguments.
100% error rate couldn't be due of the accent. It's a bug/incorrect setup.
it's most likely a bug in windows code that doesn't properly input audio at
8kHz
Yes, first of all you could try to fix the bug
Due to the bug I think
I tried without it too. Nothing changed on the accuracy front however the dump
generated now sounds little better. It still has
echos, clicks and repeatitions though.
Original (as recorded by windows in parallel) http://www.mediafire.com/file/z
d4zm0dtkjm/turn_right_original.wma
Dumped (as dumped by pocketsphinx_continuous) http://www.mediafire.com/file/v
1m2hdntauy/turn_right_dumped.raw
Bug seems to be for 16 khz also.
Is pocketsphinx0.6 build is tested for live input mode ?
Thanks and Regards,
Hi,
Just to bypass any possible issues associated with the live input mode, I
tried pocketsphinx_batch with following parameters:
Accoustic model: hub4wsj_sc_8k
Language model: wsj0vp.5000.dmp
Dictionary: cmu07a.dic
I used audio clips preset in \test\data directory of pocketsphinx0.6 package
(numbers.raw, goforward.raw, something.raw) these
clips are in american native accent.
Here are the results:
numbers.raw ("Thirty three four six ninety two") Recognized output: "Thirty
three four six nine to two "
goforward.raw ("Go forward ten meters") Recognized output: "Go forward and
users"
something.raw ("Go somewhere and do something") Recognized output " Though
some wear and you something "
Is this the expected accuracy level or there is still something amiss ?
How can I improve on this ?
Thanks and regards,
This is not an accuracy. Accuracy is a number measured on test database that
represents acoustic properties. Acoustic test database doesn't need to be
large but it should be bigger than 3 sentences.
The result itself is expected, I get exactly the same here on Linux.
Obvious thing here is that you are trying to decode commands and number
sequences with language model that's not very suiable for that. WSJ model is
trained for dictation task with newspaper texts. Careful system design, better
language and acoustic models, new features, adaptation and postprocessing.
That are the actions that make system usable.
Hi Nshmyrev,
Thanks for validating my output. Atleast I now know that the pocketsphinx is
properly up and running on my system. Yes improving
accuracy is a big task with many tweaking handles. Will be trying some of
those.
Thanks and regards,