because i used the exact one from the tutorial website..
What's the difference for adapting to sphinx4? Is it to download the file
hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?
If i adapt the acoustic model for myself with arctic.. i can improve this
acoustic adaptation by using this adapted folder, and add more recordings of
my voice?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
regarding issue number1, i shouldn't have asked before trying it myself..
sorry for that.. i see now that bw doesn't have some parameters that are in
the feat.parameters file.. so i think i should use the command exactly as it
is on the tutorial page..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
regarding issue number1, i shouldn't have asked before trying it myself..
sorry for that.. i see now that bw doesn't have some parameters that are in
the feat.parameters file.. so i think i should use the command exactly as it
is on the tutorial page..
I updated wiki page anyway
What's the difference for adapting to sphinx4? Is it to download the file
hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?
You can use hub4 or wsj which goes with sphinx4 distribution
If i adapt the acoustic model for myself with arctic.. i can improve this
acoustic adaptation by using this adapted folder, and add more recordings of
my voice?
Yes, you can
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As for your adaptation, there are several issues here both yours and from
pocketsphinx, for example
1) Audio files you recorded shouldn't have a lot of silence, you need to trim
the silence in the beginning and in the end
2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
3) There are other bugs
If you'll do everything properly you'll get the following result:
TOTAL Words: 102 Correct: 102 Errors: 2
TOTAL Percent correct = 100.00% Error = 1.96% Accuracy = 98.04%
TOTAL Insertions: 2 Deletions: 0 Substitutions: 0
Insertions: 0 Deletions: 0 Substitutions: 0
TOTAL Words: 102 Correct: 102 Errors: 1
TOTAL Percent correct = 100.00% Error = 0.98% Accuracy = 99.02%
TOTAL Insertions: 1 Deletions: 0 Substitutions: 0
The bug was fixed in pocketsphinx today, if you'll update to trunk and run
pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
without adaptation:
TOTAL Words: 102 Correct: 102 Errors: 0
TOTAL Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
TOTAL Insertions: 0 Deletions: 0 Substitutions: 0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I got the sphinx4 distribution.. are you talking about these files?
WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar
As for your adaptation, there are several issues here both yours and from
pocketsphinx, for example
1) Audio files you recorded shouldn't have a lot of silence, you need to trim
the silence in the beginning and in the end
2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
3) There are other bugs
thanks, wil take this into consideration..
what's this "language weight" parameter?
The bug was fixed in pocketsphinx today, if you'll update to trunk and run
pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
without adaptation:
" -fwdflat " and "-bestpath"
those are the kind of parameters i am yet to understand what they "control"..
care to explain?
all i got was :
-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
Can thank you enough for your time to attend this subject... hope to repay it
in the future..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I got the sphinx4 distribution.. are you talking about these files?
WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar
Yes
what's this "language weight" parameter?
-lw
" -fwdflat " and "-bestpath" those are the kind of parameters i am yet to
understand what they "control".. care to explain?
pocketsphinx analyzes audio multiple times to get best recognition result.
After first approximation is calculated, pocketsphinx tries to recognize all
the words that where detected again, then in the result of the second pass it
tries to find best combination of words. This improves accuracy on a large
vocabulary. For small grammar like yours it's not needed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
btw.. did what you recommended and got same results for before adaptation (98)
and got the same after adaptation ( 98 instead of 99 like you ) don't
understand why..
Got to 100% with last version and those parameters you told...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
unpacked your files, and ran the adapt.sh script.. result.. the same 98% for
no adaptation and adaptation...although 98% seems pretty good , i find awkward
i can't get same results as you did..
-in pockesphinx_continuous i can use -fwdflat and -bestpath right?!
-was re-reading past posts and you mention
Many issues needs to be implemented like noise skipping.
how can I try implementing noise skipping? Or is this going deep on the code?
-what tool did you use to "cut" the silence on the beginning and end of my wav files?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
forgot to ask previously..
I am on a windows host, and using virtualbox with linux ubuntu...
on sound preferences i can see my microphone input, but when running
pocketsphinx_continuous this error appears:
"ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or
directory" ..
when doing
arecord -l
i get this :
"* List of CAPTURE Hardware Devices *
card 1: Device , device 0: USB Audio
Subdevices: 1/1
Subdevice #0: subdevice #0"
Can't I refer the microphone with -adcdev? maybe this is a limitation of using
linux on a virtual machine?
thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Regarding last issue, was able to solve it... must have alsa drivers
installed.
Must add USB sound device on virtual machine settings.. and all set it
seems...
Don't wanna be pushing harder but need to ask this...
Now tried pocketsphinx_continuous with the parameters talked before ( -lw,
-fwdflat, -bestpath) . First impressions, it does seem faster to decode...
But am I losing data on the output? Because before I had info about
fsg_search.c and ps_lattice.c
Note:
Unless the -bestpath option is enabled, this function will always return zero
(corresponding to a posterior probability of 1.0). Even if -bestpath is
enabled, it will also return zero when called on a partial result. Ongoing
research into effective confidence annotation for partial hypotheses may
result in these restrictions being lifted in future versions."
1.Any documentation for that "phone-base confidence" of sphinx2?
I'll check for those noise cancellation algorithms you are talking about...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
pocketsphinx_continuous.exe -hmm hub4wsj_sc_8k_adapt -dict cmu07a.dic -jsgf
cards.gram -lw 1 -fwdflat no -bestpath no
Even though it identifies what i am saying, it recognizes a lot of "garbage"
:S.. while i was talking to a friend, it was picking random stuff i was
saying, and without confidence level to measure, i can't eliminate these
utterances... kind of lost here now..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello.. hope you got the files..
3 questions not regarding that issue..
./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
"Make sure the arguments here match the parameters in feat.params file inside
the acoustic model folder."
so if using hub4wsj_sc_8k model, i should make sure the command is :
./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-nfilt 20
-lowerf 1
-upperf 4000
-wlen 0.025
-transform dct
-round_filters no
-remove_dc yes
-svspec 0-12/13-25/26-38
-feat 1s_c_d_dd
-agc none
-cmn current
-cmninit 56,-3,1
-varnorm no
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
because i used the exact one from the tutorial website..
What's the difference for adapting to sphinx4? Is it to download the file
hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?
If i adapt the acoustic model for myself with arctic.. i can improve this
acoustic adaptation by using this adapted folder, and add more recordings of
my voice?
regarding issue number1, i shouldn't have asked before trying it myself..
sorry for that.. i see now that bw doesn't have some parameters that are in
the feat.parameters file.. so i think i should use the command exactly as it
is on the tutorial page..
I updated wiki page anyway
See http://cmusphinx.sourceforge.net/wiki/tutorialadapt#other_acoustic_models
You can use hub4 or wsj which goes with sphinx4 distribution
Yes, you can
As for your adaptation, there are several issues here both yours and from
pocketsphinx, for example
1) Audio files you recorded shouldn't have a lot of silence, you need to trim
the silence in the beginning and in the end
2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
3) There are other bugs
If you'll do everything properly you'll get the following result:
You can check download my sample scripts here
http://rapidshare.com/files/430472798/adapt_test_cornelius.tar.gz
The bug was fixed in pocketsphinx today, if you'll update to trunk and run
pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
without adaptation:
I got the sphinx4 distribution.. are you talking about these files?
WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar
1) Audio files you recorded shouldn't have a lot of silence, you need to trim
the silence in the beginning and in the end
2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
3) There are other bugs
" -fwdflat " and "-bestpath"
those are the kind of parameters i am yet to understand what they "control"..
care to explain?
all i got was :
-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
Can thank you enough for your time to attend this subject... hope to repay it
in the future..
Yes
-lw
pocketsphinx analyzes audio multiple times to get best recognition result.
After first approximation is calculated, pocketsphinx tries to recognize all
the words that where detected again, then in the result of the second pass it
tries to find best combination of words. This improves accuracy on a large
vocabulary. For small grammar like yours it's not needed.
does that mean a faster recognition cycle?
btw.. did what you recommended and got same results for before adaptation (98)
and got the same after adaptation ( 98 instead of 99 like you ) don't
understand why..
Got to 100% with last version and those parameters you told...
Yes
I don't understand why as well, you can check adapted model in my files. But
it doesn't matter I suppose.
got the sphinxbase + pockesphinx snapshot .. compiled both.. did
pocketsphinx_batch \
-hmm hub4wsj_sc_8k_adapt \
-dict lm/cmu07a.dic \
-jsgf cards.gram \
-ctl transcription.fileids \
-adcin yes \
-cepext .wav \
-cepdir wav \
-wip 0.1 \
-lw 1 \
-hyp after_example.hyp
and am getting 98%.. weird :(
with -fwdflat no -bestpath no get 100% like you...
unpacked your files, and ran the adapt.sh script.. result.. the same 98% for
no adaptation and adaptation...although 98% seems pretty good , i find awkward
i can't get same results as you did..
-in pockesphinx_continuous i can use -fwdflat and -bestpath right?!
-was re-reading past posts and you mention
how can I try implementing noise skipping? Or is this going deep on the code?
-what tool did you use to "cut" the silence on the beginning and end of my wav files?
forgot to ask previously..
I am on a windows host, and using virtualbox with linux ubuntu...
on sound preferences i can see my microphone input, but when running
pocketsphinx_continuous this error appears:
"ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or
directory" ..
when doing
i get this :
"* List of CAPTURE Hardware Devices *
card 1: Device , device 0: USB Audio
Subdevices: 1/1
Subdevice #0: subdevice #0"
Can't I refer the microphone with -adcdev? maybe this is a limitation of using
linux on a virtual machine?
thank you
Regarding last issue, was able to solve it... must have alsa drivers
installed.
Must add USB sound device on virtual machine settings.. and all set it
seems...
Don't wanna be pushing harder but need to ask this...
Now tried pocketsphinx_continuous with the parameters talked before ( -lw,
-fwdflat, -bestpath) . First impressions, it does seem faster to decode...
But am I losing data on the output? Because before I had info about
fsg_search.c and ps_lattice.c
example :
Also, i was using before
to get confidence on recognized utterance... but now with ( -lw 1, -fwdflat
no, -bestpath no) confidence this way seems to be always 1 (??!!)
Thought i could get a confidence percentage, and eliminate utterances that
were recognized below a threshold defined ( for example 50%)...
Thanks again..
There are many algorithms for noise cancellation you can find something small
in google and implement it.
Not really
Audacity
Posteriour confidence only have sense with a large vocabulary. You have two
choices:
Hey..
before reading your post read on the API this
"Get posterior probability.
Note:
Unless the -bestpath option is enabled, this function will always return zero
(corresponding to a posterior probability of 1.0). Even if -bestpath is
enabled, it will also return zero when called on a partial result. Ongoing
research into effective confidence annotation for partial hypotheses may
result in these restrictions being lifted in future versions."
1.Any documentation for that "phone-base confidence" of sphinx2?
I'll check for those noise cancellation algorithms you are talking about...
tried pocketsphinx_continuous
Even though it identifies what i am saying, it recognizes a lot of "garbage"
:S.. while i was talking to a friend, it was picking random stuff i was
saying, and without confidence level to measure, i can't eliminate these
utterances... kind of lost here now..
any documentation to know where to look?
when searching thru the forums found this:
https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3400265
seems i am crashing on the same problem... any news concerning that issue?
on a side note.. can't i edit previous posts here?
No, there is no documentation, only sphinx2 sources
No, you can't
will "dig in" on sphinx2 sources..
any tips on the other questions?
Once again..thanks for your time..