I use kaldi tool to generate an HMM to classify some acoustic events. I have only isolated acoustic events. So I model each event as one word with only one monophone. For example, I have the word "knock| and the monophone is KN. But when i train my hmm model and then move to decoding i get a WER above 40%. In the exp/mono0a/decode/scoring directory i see that that some acoustic files have been scored with more than one event.
For example: "file1 knockdoor phoreringing", but I would like it to be scored with one event or none.
As a result a get insertion and deletion penalties which lead to have bad WER.
I define my grammar at task.arpabo file like this:
\data\
ngram l=8
-1 knock
-1 doorslam
-1 steps
-1 chairmoving
-1 spoon # these are the words #
-1 paperwork
-1 keyjingle
-1 speech
-99 <S>
-1 </S>
\end\
I also tried to add word insertion penalty at the decode stage with the option "--word_ins_penalty". I tried values 5,10,100,200 but nothing seems to eliminate insertion and deletion error.
My question: How can I force my grammar to produce sentences of only one word?
Your help is of great importance as I am stuck on this for days!
Thanks in advance!
(P.S i followed the yesno example)
Last edit: Konstantinos Themelis 2015-07-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When you say "scoring", I think what you are really talking about is
decoding. Scoring is the process of computing WERs from the outputs.
If you want a grammar that outputs exactly one word per sentence, you
should construct it as an FST (it will actually be a finite state
acceptor as the input and output symbols will be the same). Someting
like:
0 1 knock knock 0.0
0 1 ring ring 0.0 [etc.]
1 0.0
the 0.0 means zero cost; these values are interpreted as negative
log-probs and would normally be positive.
I use kaldi tool to generate an HMM to classify some acoustic events. I have
only isolated acoustic events. So I model each event as one word with only
one monophone. For example, I have the word "knock| and the monophone is KN.
But when i train my hmm model and then move to decoding i get a WER above
40%. In the exp/mono0a/decode/scoring directory i see that that some
acoustic files have been scored with more than one event.
For example: "file1 knockdoor phoreringing", but I would like it to be
scored with one event or none.
As a result a get insertion and deletion penalties which lead to have bad
WER.
I define my grammar at task.arpabo file like this:
\data\ ngram l=8
-1 kn
-1 ds
-1 st
-1 cm
-1 cl # these are the words #
-1 pw
-1 kj
-1 kt
-99
-1
\end\
I also tried to add word insertion penalty at the decode stage with the
oprion "--word_ins_penalty". I tried values 5,10,100,200 but nothing seems
to eliminate insertion and deletion error.
My question: How can I force my grammar to produce sentences of only one
word?
Your help is of great importance as I am stuck on this for days!
Thanks in advance!
BTW, I am showing the text-form FST with words instead of integer
symbols. The RM setup might be a good example to show you how to turn
this into a binary FST using fstcompile. Also see the tutorial at
openfst.org.
Dan
When you say "scoring", I think what you are really talking about is
decoding. Scoring is the process of computing WERs from the outputs.
If you want a grammar that outputs exactly one word per sentence, you
should construct it as an FST (it will actually be a finite state
acceptor as the input and output symbols will be the same). Someting
like:
0 1 knock knock 0.0
0 1 ring ring 0.0 [etc.]
1 0.0
the 0.0 means zero cost; these values are interpreted as negative
log-probs and would normally be positive.
Fan
On Wed, Jul 1, 2015 at 2:48 PM, Konstantinos Themelis
kothemel@users.sf.net wrote:
Hello everyone,
I use kaldi tool to generate an HMM to classify some acoustic events. I have
only isolated acoustic events. So I model each event as one word with only
one monophone. For example, I have the word "knock| and the monophone is KN.
But when i train my hmm model and then move to decoding i get a WER above
40%. In the exp/mono0a/decode/scoring directory i see that that some
acoustic files have been scored with more than one event.
For example: "file1 knockdoor phoreringing", but I would like it to be
scored with one event or none.
As a result a get insertion and deletion penalties which lead to have bad
WER.
I define my grammar at task.arpabo file like this:
\data\ ngram l=8
-1 kn
-1 ds
-1 st
-1 cm
-1 cl # these are the words #
-1 pw
-1 kj
-1 kt
-99
-1
\end\
I also tried to add word insertion penalty at the decode stage with the
oprion "--word_ins_penalty". I tried values 5,10,100,200 but nothing seems
to eliminate insertion and deletion error.
My question: How can I force my grammar to produce sentences of only one
word?
Your help is of great importance as I am stuck on this for days!
Thanks in advance!
Hello everyone,
I use kaldi tool to generate an HMM to classify some acoustic events. I have only isolated acoustic events. So I model each event as one word with only one monophone. For example, I have the word "knock| and the monophone is KN. But when i train my hmm model and then move to decoding i get a WER above 40%. In the exp/mono0a/decode/scoring directory i see that that some acoustic files have been scored with more than one event.
For example: "file1 knockdoor phoreringing", but I would like it to be scored with one event or none.
As a result a get insertion and deletion penalties which lead to have bad WER.
I define my grammar at task.arpabo file like this:
\data\ ngram l=8
-1 knock
-1 doorslam
-1 steps
-1 chairmoving
-1 spoon # these are the words #
-1 paperwork
-1 keyjingle
-1 speech
-99 <S>
-1 </S>
\end\
I also tried to add word insertion penalty at the decode stage with the option "--word_ins_penalty". I tried values 5,10,100,200 but nothing seems to eliminate insertion and deletion error.
My question: How can I force my grammar to produce sentences of only one word?
Your help is of great importance as I am stuck on this for days!
Thanks in advance!
(P.S i followed the yesno example)
Last edit: Konstantinos Themelis 2015-07-01
When you say "scoring", I think what you are really talking about is
decoding. Scoring is the process of computing WERs from the outputs.
If you want a grammar that outputs exactly one word per sentence, you
should construct it as an FST (it will actually be a finite state
acceptor as the input and output symbols will be the same). Someting
like:
0 1 knock knock 0.0
0 1 ring ring 0.0
[etc.]
1 0.0
the 0.0 means zero cost; these values are interpreted as negative
log-probs and would normally be positive.
Fan
On Wed, Jul 1, 2015 at 2:48 PM, Konstantinos Themelis
kothemel@users.sf.net wrote:
BTW, I am showing the text-form FST with words instead of integer
symbols. The RM setup might be a good example to show you how to turn
this into a binary FST using fstcompile. Also see the tutorial at
openfst.org.
Dan
On Wed, Jul 1, 2015 at 6:31 PM, Daniel Povey danielpovey@users.sf.net wrote: