I have been testing pocketsphinx and have had pretty good results so far.
Around 60% accuracy :). My main problem now is that pocketsphinx seems to be
skipping entire sentences. I have tried with several recordings, and that
doesn't seem to be the problem. Here are my test parameters:
It doesn't skip anything on your recording. If you think it does you'd
better provide more informaiton about that
What I mean when I say I think it's skipping parts of my recording is that
when I run pocketsphinx with my recording, I assume it will try to guess every
word. And even I it doesn't get it right, I should have roughly the same
amount of words. So when my recording says something like this:
{Municipal bonds next financial landmines as wall street nervously watches
the sovereign debt crisis unfold in Greece another potential landmine is
looming closer to home. One that could bring us cities and towns to their
knees and
force the federal to cough up another bail out package and potentially send
the unemployment rate much higher. the danger this time municipal debt. and
local governments are frantically scrambling to meet budget shortfalls with
high
unemployment and shaky consumer confidence in less income tax and smaller
sales tax revenues for government offers. At the same time falling home prices
and rising foreclosure. }
I and I get:
{municipal bond next airlines at wall street auburn decorative some tawdry
another agree u. s. city to capture the heat forced federal to cough up
another bail out package of government for frantically scrambling to meet
budget shortfall of high unemployment in cheeky consumer confidence eat less
income tax and smaller scale tax revenues for government caucus at the same
time on home prices and rising foreclosure will start}
I seems like I am missing words where I added "X":
{municipal bond next airlines at wall street auburn decorative some tawdry
XXXXXX XXXXXXX XXXXXXX XXXXXXX XX XXXXX XXXX XXXX XXXXX XXXXXX XXXXXXX XX
XXXXXX. XXX XXX another agree u. s. city to capture the heat forced federal to
cough up another bail out package of government for XXXXXXX XXXXXX XXXXXXX
XXXXXX XXXXX XXXX. XXX XXXXX XXX XXXXXX XXXXX. XXX XXXX XXXXX XXXX frantically
scrambling to meet budget shortfall of high unemployment in cheeky consumer
confidence eat less income tax and smaller scale tax revenues for government
caucus at the same time on home prices and rising foreclosure will start}
Is there something I can do to improve this?
This doesn't sound like a reasonable value for silence probability.
I'm still experimenting, so I'm not quite sure what's reasonable yet :p. I
tried "-silprob 5e-03" but got the exact same result. I will keep
experimenting, but am I getting closer to something appropriate for my
configuration?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, I can't reproduce your results still. Here it gives less accurate but
more consistent result. The communicator model used is semi-continuous model
in downloads, the language model was converted from lm_giga_64k with
sphinx_lm_convert. Command line is the same as yours. Pocketsphinx and
sphinxbase are from today subversion trunk.
*** *** MUNICIPAL BONDS next *** *** FINANCIAL LANDMINES AS wall *** STREET NERVOUSLY WATCHES the SOVEREIGN DEBT CRISIS UNFOLD IN GREECE ANOTHER POTENTIAL LANDMINE IS LOOMING CLOSER to home one THAT could bring us CITIES AND TOWNS TO THEIR KNEES AND FORCE THE FEDERAL TO COUGH up another BAIL OUT package and POTENTIALLY SEND THE UNEMPLOYMENT RATE much HIGHER THE DANGER THIS TIME MUNICIPAL DEBT AND local governments are frantically SCRAMBLING TO MEET budget shortfalls WITH HIGH UNEMPLOYMENT AND SHAKY consumer confidence IN LESS INCOME TAX and smaller sales tax revenues for government OFFERS AT THE SAME time FALLING home prices *** *** AND RISING FORECLOSURE (B)
UNITA WILL NOT AND next BY NATURAL WHEN OUR INSIDE wall THREE NEVER STOOD
WATCHING the STARTER AND A HARD TIME HOLDING READS ANOTHER DETENTION WHEN
MCGINLEY CLOSE to home one NIGHT could bring us *** *** *** *** *** SEE
CONTRAPTION ERNIE FORCED A PENALTY CALL up another *** BAILOUT package and ***
INTENSELY TATIANA TWENTY READ much FIRE COULD BE TREATED CHINESE HOPE THAT THE
local governments are frantically *** *** TRIUMPHANTLY budget shortfalls ***
*** GUYANA POIGNANCY TO consumer confidence SEEMED LIKE THE COMPACT and
smaller sales tax revenues for government *** COFFERS ARE TICKING time BOMB
home prices IN WRITING FOR POLITICAL STIR (B)
Words: 100 Correct: 32 Errors: 75 Percent correct = 32.00% Error = 75.00%
Accuracy = 25.00%
Insertions: 7 Deletions: 12 Substitutions: 56
TOTAL Words: 100 Correct: 32 Errors: 75
TOTAL Percent correct = 32.00% Error = 75.00% Accuracy = 25.00%
TOTAL Insertions: 7 Deletions: 12 Substitutions: 56
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello :),
I have been testing pocketsphinx and have had pretty good results so far.
Around 60% accuracy :). My main problem now is that pocketsphinx seems to be
skipping entire sentences. I have tried with several recordings, and that
doesn't seem to be the problem. Here are my test parameters:
pocketsphinx_continuous -infile 668.wav -hmm communicator -lm
tools/lm_giga_64k_nvp_3gram.arpa.DMP -beam 1e-48 -wbeam 1e-40 -ci_pbeam 1e-16
-subvqbeam 1e-2 -maxhmmpf 5000 -maxcdsenpf 1000 -maxwpf 9 -ds 2 -lpbeam 4e-30
-lponlybeam 7e-60 -kdmaxdepth 7 -lw 6 -kdmaxbbi 11 -wip .2 -topn 6 -silprob 2
-samprate 8000
Here is a sample of one of my recordings: http://www.mediafire.com/?2bbltkwv3
lf1tfz
I am using PocketSphinx v0.5.99 snapshot and the communicator model.
Any suggestions are welcome :)
It doesn't skip anything on your recording. If you think it does you'd better
provide more informaiton about that
This doesn't sound like a reasonable value for silence probability.
Thanks for your reply :)
What I mean when I say I think it's skipping parts of my recording is that
when I run pocketsphinx with my recording, I assume it will try to guess every
word. And even I it doesn't get it right, I should have roughly the same
amount of words. So when my recording says something like this:
{Municipal bonds next financial landmines as wall street nervously watches
the sovereign debt crisis unfold in Greece another potential landmine is
looming closer to home. One that could bring us cities and towns to their
knees and
force the federal to cough up another bail out package and potentially send
the unemployment rate much higher. the danger this time municipal debt. and
local governments are frantically scrambling to meet budget shortfalls with
high
unemployment and shaky consumer confidence in less income tax and smaller
sales tax revenues for government offers. At the same time falling home prices
and rising foreclosure. }
I and I get:
{municipal bond next airlines at wall street auburn decorative some tawdry
another agree u. s. city to capture the heat forced federal to cough up
another bail out package of government for frantically scrambling to meet
budget shortfall of high unemployment in cheeky consumer confidence eat less
income tax and smaller scale tax revenues for government caucus at the same
time on home prices and rising foreclosure will start}
I seems like I am missing words where I added "X":
{municipal bond next airlines at wall street auburn decorative some tawdry
XXXXXX XXXXXXX XXXXXXX XXXXXXX XX XXXXX XXXX XXXX XXXXX XXXXXX XXXXXXX XX
XXXXXX. XXX XXX another agree u. s. city to capture the heat forced federal to
cough up another bail out package of government for XXXXXXX XXXXXX XXXXXXX
XXXXXX XXXXX XXXX. XXX XXXXX XXX XXXXXX XXXXX. XXX XXXX XXXXX XXXX frantically
scrambling to meet budget shortfall of high unemployment in cheeky consumer
confidence eat less income tax and smaller scale tax revenues for government
caucus at the same time on home prices and rising foreclosure will start}
Is there something I can do to improve this?
I'm still experimenting, so I'm not quite sure what's reasonable yet :p. I
tried "-silprob 5e-03" but got the exact same result. I will keep
experimenting, but am I getting closer to something appropriate for my
configuration?
Sorry, I can't reproduce your results still. Here it gives less accurate but
more consistent result. The communicator model used is semi-continuous model
in downloads, the language model was converted from lm_giga_64k with
sphinx_lm_convert. Command line is the same as yours. Pocketsphinx and
sphinxbase are from today subversion trunk.
*** *** MUNICIPAL BONDS next *** *** FINANCIAL LANDMINES AS wall *** STREET NERVOUSLY WATCHES the SOVEREIGN DEBT CRISIS UNFOLD IN GREECE ANOTHER POTENTIAL LANDMINE IS LOOMING CLOSER to home one THAT could bring us CITIES AND TOWNS TO THEIR KNEES AND FORCE THE FEDERAL TO COUGH up another BAIL OUT package and POTENTIALLY SEND THE UNEMPLOYMENT RATE much HIGHER THE DANGER THIS TIME MUNICIPAL DEBT AND local governments are frantically SCRAMBLING TO MEET budget shortfalls WITH HIGH UNEMPLOYMENT AND SHAKY consumer confidence IN LESS INCOME TAX and smaller sales tax revenues for government OFFERS AT THE SAME time FALLING home prices *** *** AND RISING FORECLOSURE (B)
UNITA WILL NOT AND next BY NATURAL WHEN OUR INSIDE wall THREE NEVER STOOD
WATCHING the STARTER AND A HARD TIME HOLDING READS ANOTHER DETENTION WHEN
MCGINLEY CLOSE to home one NIGHT could bring us *** *** *** *** *** SEE
CONTRAPTION ERNIE FORCED A PENALTY CALL up another *** BAILOUT package and ***
INTENSELY TATIANA TWENTY READ much FIRE COULD BE TREATED CHINESE HOPE THAT THE
local governments are frantically *** *** TRIUMPHANTLY budget shortfalls ***
*** GUYANA POIGNANCY TO consumer confidence SEEMED LIKE THE COMPACT and
smaller sales tax revenues for government *** COFFERS ARE TICKING time BOMB
home prices IN WRITING FOR POLITICAL STIR (B)
Words: 100 Correct: 32 Errors: 75 Percent correct = 32.00% Error = 75.00%
Accuracy = 25.00%
Insertions: 7 Deletions: 12 Substitutions: 56
TOTAL Words: 100 Correct: 32 Errors: 75
TOTAL Percent correct = 32.00% Error = 75.00% Accuracy = 25.00%
TOTAL Insertions: 7 Deletions: 12 Substitutions: 56