CMU Sphinx / Forums / Speech Recognition Theory: Help begginer to choose direction to take.

cornelyus - 2010-11-12

Hello.. hope you got the files..

3 questions not regarding that issue..

went to the tutorial for adaptation again, when :
./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .

"Make sure the arguments here match the parameters in feat.params file inside
the acoustic model folder."

so if using hub4wsj_sc_8k model, i should make sure the command is :

./bw \
-hmmdir hub4wsj_sc_8k \
-moddeffn hub4wsj_sc_8k/mdef.txt \
-ts2cbfn .semi. \
-nfilt 20
-lowerf 1
-upperf 4000
-wlen 0.025
-transform dct
-round_filters no
-remove_dc yes
-svspec 0-12/13-25/26-38
-feat 1s_c_d_dd
-agc none
-cmn current
-cmninit 56,-3,1
-varnorm no
-dictfn arctic20.dic \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .

because i used the exact one from the tutorial website..

What's the difference for adapting to sphinx4? Is it to download the file
hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?

If i adapt the acoustic model for myself with arctic.. i can improve this
acoustic adaptation by using this adapted folder, and add more recordings of
my voice?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-12

regarding issue number1, i shouldn't have asked before trying it myself..
sorry for that.. i see now that bw doesn't have some parameters that are in
the feat.parameters file.. so i think i should use the command exactly as it
is on the tutorial page..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-11-12

regarding issue number1, i shouldn't have asked before trying it myself..
sorry for that.. i see now that bw doesn't have some parameters that are in
the feat.parameters file.. so i think i should use the command exactly as it
is on the tutorial page..

I updated wiki page anyway

What's the difference for adapting to sphinx4? Is it to download the file
hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?

See http://cmusphinx.sourceforge.net/wiki/tutorialadapt#other_acoustic_models

You can use hub4 or wsj which goes with sphinx4 distribution

If i adapt the acoustic model for myself with arctic.. i can improve this
acoustic adaptation by using this adapted folder, and add more recordings of
my voice?

Yes, you can

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-11-12

As for your adaptation, there are several issues here both yours and from
pocketsphinx, for example

1) Audio files you recorded shouldn't have a lot of silence, you need to trim
the silence in the beginning and in the end
2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
3) There are other bugs

If you'll do everything properly you'll get the following result:

TOTAL Words: 102 Correct: 102 Errors: 2 TOTAL Percent correct = 100.00% Error = 1.96% Accuracy = 98.04% TOTAL Insertions: 2 Deletions: 0 Substitutions: 0 Insertions: 0 Deletions: 0 Substitutions: 0 TOTAL Words: 102 Correct: 102 Errors: 1 TOTAL Percent correct = 100.00% Error = 0.98% Accuracy = 99.02% TOTAL Insertions: 1 Deletions: 0 Substitutions: 0

You can check download my sample scripts here

http://rapidshare.com/files/430472798/adapt_test_cornelius.tar.gz
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-11-14

The bug was fixed in pocketsphinx today, if you'll update to trunk and run
pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
without adaptation:

TOTAL Words: 102 Correct: 102 Errors: 0 TOTAL Percent correct = 100.00% Error = 0.00% Accuracy = 100.00% TOTAL Insertions: 0 Deletions: 0 Substitutions: 0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-15

See http://cmusphinx.sourceforge.net/wiki/tutorialadapt#other_acoustic_mode
ls
You can use hub4 or wsj which goes with sphinx4 distribution

I got the sphinx4 distribution.. are you talking about these files?
WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

As for your adaptation, there are several issues here both yours and from
pocketsphinx, for example

1) Audio files you recorded shouldn't have a lot of silence, you need to trim
the silence in the beginning and in the end
2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
3) There are other bugs

thanks, wil take this into consideration..

what's this "language weight" parameter?

The bug was fixed in pocketsphinx today, if you'll update to trunk and run
pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
without adaptation:

" -fwdflat " and "-bestpath"
those are the kind of parameters i am yet to understand what they "control"..
care to explain?
all i got was :
-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)

Can thank you enough for your time to attend this subject... hope to repay it
in the future..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-11-15

I got the sphinx4 distribution.. are you talking about these files?
WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

Yes

what's this "language weight" parameter?

-lw

" -fwdflat " and "-bestpath" those are the kind of parameters i am yet to
understand what they "control".. care to explain?

pocketsphinx analyzes audio multiple times to get best recognition result.
After first approximation is calculated, pocketsphinx tries to recognize all
the words that where detected again, then in the result of the second pass it
tries to find best combination of words. This improves accuracy on a large
vocabulary. For small grammar like yours it's not needed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-15

does that mean a faster recognition cycle?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-15

btw.. did what you recommended and got same results for before adaptation (98)
and got the same after adaptation ( 98 instead of 99 like you ) don't
understand why..

Got to 100% with last version and those parameters you told...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-11-15

does that mean a faster recognition cycle?

Yes

don't understand why.

I don't understand why as well, you can check adapted model in my files. But
it doesn't matter I suppose.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-15

I don't understand why as well, you can check adapted model in my files. But
it doesn't matter I suppose.

got the sphinxbase + pockesphinx snapshot .. compiled both.. did
pocketsphinx_batch \
-hmm hub4wsj_sc_8k_adapt \
-dict lm/cmu07a.dic \
-jsgf cards.gram \
-ctl transcription.fileids \
-adcin yes \
-cepext .wav \
-cepdir wav \
-wip 0.1 \
-lw 1 \
-hyp after_example.hyp

and am getting 98%.. weird :(

with -fwdflat no -bestpath no get 100% like you...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-15

unpacked your files, and ran the adapt.sh script.. result.. the same 98% for
no adaptation and adaptation...although 98% seems pretty good , i find awkward
i can't get same results as you did..

-in pockesphinx_continuous i can use -fwdflat and -bestpath right?!

-was re-reading past posts and you mention

Many issues needs to be implemented like noise skipping.

how can I try implementing noise skipping? Or is this going deep on the code?

-what tool did you use to "cut" the silence on the beginning and end of my wav files?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-15

forgot to ask previously..
I am on a windows host, and using virtualbox with linux ubuntu...
on sound preferences i can see my microphone input, but when running
pocketsphinx_continuous this error appears:
"ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or
directory" ..

when doing

arecord -l

i get this :
"* List of CAPTURE Hardware Devices *
card 1: Device , device 0: USB Audio
Subdevices: 1/1
Subdevice #0: subdevice #0"

Can't I refer the microphone with -adcdev? maybe this is a limitation of using
linux on a virtual machine?

thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Regarding last issue, was able to solve it... must have alsa drivers
installed.
Must add USB sound device on virtual machine settings.. and all set it
seems...

Don't wanna be pushing harder but need to ask this...
Now tried pocketsphinx_continuous with the parameters talked before ( -lw,
-fwdflat, -bestpath) . First impressions, it does seem faster to decode...
But am I losing data on the output? Because before I had info about
fsg_search.c and ps_lattice.c

example :

INFO: cmn_prior.c(121): cmn_prior_update: from < 48.20 -4.79  0.51  0.66 -0.83
1.18 -1.27  0.87 -0.93  1.14 -1.04  0.53 -0.62 >
INFO: cmn_prior.c(139): cmn_prior_update: to   < 45.29 -5.28  0.35  0.71 -0.85
1.39 -1.31  1.00 -0.78  0.98 -0.83  0.47 -0.53 >
INFO: fsg_search.c(1026): 426 frames, 44157 HMMs (103/fr), 48666 senones (114/fr
), 20759 history entries (48/fr)

INFO: fsg_search.c(1403): Start node ACE.0:5:7
INFO: fsg_search.c(1403): Start node TWO.0:5:12
INFO: fsg_search.c(1403): Start node EIGHT.0:5:14
INFO: fsg_search.c(1403): Start node <sil>.0:2:30
INFO: fsg_search.c(1442): End node <sil>.354:358:425 (-1014)
INFO: fsg_search.c(1658): lattice start node <s>.0 end node <sil>.354
INFO: ps_lattice.c(1351): Normalizer P(O) = alpha(<sil>:354:425) = -2119991
INFO: ps_lattice.c(1389): Joint P(O,S) = -2153008 P(S|O) = -33017
000000004: ACE OF SPADES ACE OF CLUBS ACE OF HEARTS

Also, i was using before

 prob = ps_get_prob(ps, &uttid); 
        conf = logmath_exp(ps_get_logmath(ps), prob);

to get confidence on recognized utterance... but now with ( -lw 1, -fwdflat
no, -bestpath no) confidence this way seems to be always 1 (??!!)

Thought i could get a confidence percentage, and eliminate utterances that
were recognized below a threshold defined ( for example 50%)...

Thanks again..

Nickolay V. Shmyrev - 2010-11-16

how can I try implementing noise skipping?

There are many algorithms for noise cancellation you can find something small
in google and implement it.

Or is this going deep on the code?

Not really

-what tool did you use to "cut" the silence on the beginning and end of my
wav files?

Audacity

to get confidence on recognized utterance... b

Posteriour confidence only have sense with a large vocabulary. You have two
choices:

Implement phone-based confidence like in sphinx2

Don't use confidence at all.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-16

Hey..

before reading your post read on the API this

"Get posterior probability.

Note:
Unless the -bestpath option is enabled, this function will always return zero
(corresponding to a posterior probability of 1.0). Even if -bestpath is
enabled, it will also return zero when called on a partial result. Ongoing
research into effective confidence annotation for partial hypotheses may
result in these restrictions being lifted in future versions."

1.Any documentation for that "phone-base confidence" of sphinx2?

I'll check for those noise cancellation algorithms you are talking about...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-16

tried pocketsphinx_continuous

pocketsphinx_continuous.exe -hmm hub4wsj_sc_8k_adapt -dict cmu07a.dic -jsgf
cards.gram -lw 1 -fwdflat no -bestpath no

Even though it identifies what i am saying, it recognizes a lot of "garbage"
:S.. while i was talking to a friend, it was picking random stuff i was
saying, and without confidence level to measure, i can't eliminate these
utterances... kind of lost here now..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-16

Implement phone-based confidence like in sphinx2

any documentation to know where to look?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-16

when searching thru the forums found this:
https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3400265

seems i am crashing on the same problem... any news concerning that issue?

on a side note.. can't i edit previous posts here?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-11-16

any documentation to know where to look?

No, there is no documentation, only sphinx2 sources

on a side note.. can't i edit previous posts here?

No, you can't

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cornelyus - 2010-11-16

will "dig in" on sphinx2 sources..

any tips on the other questions?

Once again..thanks for your time..

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Help begginer to choose direction to take.

Speech Recognition Toolkit

Forums

Help

Help begginer to choose direction to take. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help begginer to choose direction to take.