I'm training and decoding with audio files containing single words with
trailing silences. Therefore (please correct me if this is wrong) my
transcription files for training and decoding are composed of lines such as
word4 (audio361)
During decoding, I get a few WARNINGs saying:
WARNING: "vithist.c", line 1696: When is used as final word, audio123:
Search didn't end in
Does that increase my classification error rate? I noticed that some of the
final matches of files that had this warning coming up are ambiguous, like:
word7 word8 (audio123)
Does that have a connection to the warning message? What are the reasons why
decoding an audio file may lead to multiple classification candidates, such as
"word7 word8"? How is that output to be interpreted? And what if no match is
found, as in
(audio123)
? What could be the reasons for that? Shouldn't Sphinx just name the most
probable word?
Thanks for helping me with this!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Application doesn't do classification. It does speec decoding, that is tries
to find the most likely sequence of words according to the language model.
If it fails to find the sequence it gives you a warning.
It's indication of the problem of course, but not sure if "affects error rate"
can be applied here. Obviously if you'll fix the problem the error rate will
be smaller.
How is that output to be interpreted?
As a hypothesis sequence
Shouldn't Sphinx just name the most probable word?
No. In order to reduce decoding problem to classification problem you need to
use way more strict language model during your decoding. It needs to be a
finite state grammar with one entry one exit and without loops.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In order to reduce decoding problem to classification problem you need to
use way more strict language model during your decoding. It needs to be a
finite state grammar with one entry one exit and without loops.
And how would that be done? Currently, my language model file essentially
looks like this:
I'm decoding with the default mode (fwdtree), because that gives the best
results. How would I change my model to be a FSG with one entry, one exit, and
without loops, like you said?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello
I'm training and decoding with audio files containing single words with
trailing silences. Therefore (please correct me if this is wrong) my
transcription files for training and decoding are composed of lines such as
During decoding, I get a few WARNINGs saying:
Does that increase my classification error rate? I noticed that some of the
final matches of files that had this warning coming up are ambiguous, like:
Does that have a connection to the warning message? What are the reasons why
decoding an audio file may lead to multiple classification candidates, such as
"word7 word8"? How is that output to be interpreted? And what if no match is
found, as in
? What could be the reasons for that? Shouldn't Sphinx just name the most
probable word?
Thanks for helping me with this!
Application doesn't do classification. It does speec decoding, that is tries
to find the most likely sequence of words according to the language model.
If it fails to find the sequence it gives you a warning.
It's indication of the problem of course, but not sure if "affects error rate"
can be applied here. Obviously if you'll fix the problem the error rate will
be smaller.
As a hypothesis sequence
No. In order to reduce decoding problem to classification problem you need to
use way more strict language model during your decoding. It needs to be a
finite state grammar with one entry one exit and without loops.
And how would that be done? Currently, my language model file essentially
looks like this:
\1-grams:
-1.1461 -99.0000
-99.0000
0.0000-1.1461 word1 0.0000
-1.1461 word2 0.0000
-1.1461 word3 0.0000
-1.1461 word4 0.0000
-1.1461 word5 0.0000
-1.1461 word6 0.0000
-1.1461 word7 0.0000
-1.1461 word8 0.0000
-1.1461 word9 0.0000
-1.1461 word10 0.0000
-1.1461 word11 0.0000
-1.1461 word12 0.0000
-1.1461 word13 0.0000
\2-grams:
0.0000
\end\
and the dictionary is just
I'm decoding with the default mode (fwdtree), because that gives the bestresults. How would I change my model to be a FSG with one entry, one exit, and
without loops, like you said?
Never mind, figured it out. (-fsg option and -mode fsg)
Great!