I read here about the issue where the decoder gives incorrect hyp when decoding the file at the first time, and when decoding it a second time and on, the hyps are correct.
However, I’ve observed a more complex pattern, in which the decoder fluctuates between two hyps over and over.
To reproduce the issue – this is what I did:
1) I’ve made a change to the main function in the file continuous.c, so the same audio file will be decoded 20 times:
if (cmd_ln_str_r(config, "-infile") != NULL) {
int i;
for (i = 1; i <= 20; i++) {
recognize_from_file();
}
} …
2) I’ve created this grammar:
JSGF V1.0;
grammar grammar1;
public <rule1> = (
/1/ i like to play football |
/50000000000/ they like to play football
);
3) I’ve run pocketsphinx_continuous to decode an audio file with that grammar (file attached):
they like to play football
they like to play football
they like to play football
they like to play football
they like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
Log is attached. I would like to ask:
1. Why does it happen?
2. Can I alter this behavior?
Judging from your log file, the cmn vector is being computed from utterance
to utterance.
Eventually it settles into a 3-utt loop, which is why you're seeing what
you're seeing.
This doesn't make sense since you have '-cmn current' in your config; it's
acting more like 'prior'.
Look at the code and check that the cmn logic is correct...
The 'first utterance is garbage' problem happens in prior mode, when the
default cmn values are just wrong for the utt at hand.
I read here https://github.com/watsonbox/pocketsphinx-ruby/issues/10
about the issue where the decoder gives incorrect hyp when decoding the
file at the first time, and when decoding it a second time and on, the hyps
are correct.
However, I’ve observed a more complex pattern, in which the decoder
fluctuates between two hyps over and over.
To reproduce the issue – this is what I did:
1) I’ve made a change to the main function in the file continuous.c, so
the same audio file will be decoded 20 times:
if (cmd_ln_str_r(config, "-infile") != NULL) {
int i;
for (i = 1; i <= 20; i++) {
recognize_from_file();
}
} …
2) I’ve created this grammar:
JSGF V1.0;
grammar grammar1;
public <rule1> = (
/1/ i like to play football |
/50000000000/ they like to play football
);
3) I’ve run pocketsphinx_continuous to decode an audio file with that
grammar (file attached):
they like to play football
they like to play football
they like to play football
they like to play football
they like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
Log is attached. I would like to ask:
1. Why does it happen?
2. Can I alter this behavior?
I read here about the issue where the decoder gives incorrect hyp when decoding the file at the first time, and when decoding it a second time and on, the hyps are correct.
However, I’ve observed a more complex pattern, in which the decoder fluctuates between two hyps over and over.
To reproduce the issue – this is what I did:
1) I’ve made a change to the main function in the file continuous.c, so the same audio file will be decoded 20 times:
2) I’ve created this grammar:
JSGF V1.0;
grammar grammar1;
public <rule1> = (
/1/ i like to play football |
/50000000000/ they like to play football
);
3) I’ve run pocketsphinx_continuous to decode an audio file with that grammar (file attached):
pocketsphinx_continuous -hmm hmm -dict cmu.dict -jsgf 1.jsgf -infile 1.wav -logfn log.txt
The output is:
they like to play football
they like to play football
they like to play football
they like to play football
they like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
they like to play football
i like to play football
i like to play football
Log is attached. I would like to ask:
1. Why does it happen?
2. Can I alter this behavior?
Last edit: Oren G. 2018-01-31
Judging from your log file, the cmn vector is being computed from utterance
to utterance.
Eventually it settles into a 3-utt loop, which is why you're seeing what
you're seeing.
This doesn't make sense since you have '-cmn current' in your config; it's
acting more like 'prior'.
Look at the code and check that the cmn logic is correct...
The 'first utterance is garbage' problem happens in prior mode, when the
default cmn values are just wrong for the utt at hand.
On Wed, Jan 31, 2018 at 10:08 AM, Oren G. orenstuf@users.sourceforge.net
wrote:
Notice that there is no code of my mine. It's only the pocketsphinx_continuous porgram, with the change that I describe in my post.
Last edit: Oren G. 2018-02-01