Hi, what you are interested in is the Hypothesis: ... log entry. The recognizer is called in a loop because there is a stage which splits the speech by silence into segments and you get one Result with a hypothesis per each.
You need to concatenate those outputs and dump them into a file yourself.
👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm having trouble understand the output from the sample program at https://github.com/cmusphinx/sphinx4/blob/master/sphinx4-samples/src/main/java/edu/cmu/sphinx/demo/transcriber/TranscriberDemo.java (using my own input file)
I expected to get a transcript of the speech of my file, but instead I get something like the following. Not sure how to interpret it. Am I doing something wrong?
17:10:18.734 INFO unitManager CI Unit: Z
17:10:18.735 INFO unitManager CI Unit: ZH
...
17:10:20.536 INFO lexTreeLinguist Max CI Units 43
17:10:20.536 INFO lexTreeLinguist Unit table size 79507
17:10:20.537 INFO speedTracker # ----------------------------- Timers----------------------------------------
17:10:20.538 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
17:10:20.538 INFO speedTracker Load LM 1 0.3900s 0.3900s 0.3900s 0.3900s 0.3900s
17:10:20.538 INFO speedTracker Compile 1 0.7170s 0.7170s 0.7170s 0.7170s 0.7170s
17:10:20.538 INFO speedTracker Load Dictionary 1 0.1110s 0.1110s 0.1110s 0.1110s 0.1110s
17:10:20.539 INFO speedTracker Load AM 1 1.3660s 1.3660s 1.3660s 1.3660s 1.3660s
17:10:28.138 INFO liveCMN 90.96 -22.96 -5.64 -8.06 -3.98 -2.80 -3.11 -1.31 -2.86 -0.46 -1.11 -2.23 -1.54
17:10:31.842 INFO liveCMN 90.95 -22.94 -5.64 -7.98 -3.91 -2.90 -3.11 -1.26 -2.84 -0.44 -1.14 -2.16 -1.49
17:10:36.605 INFO liveCMN 91.09 -22.91 -5.78 -7.85 -3.60 -3.24 -2.89 -1.68 -2.64 -0.80 -1.22 -2.29 -1.92
...
Hypothesis: up
List of recognized words and their times:
{<sil>, 1.000, [87150:87240]}
{up, 1.000, [87250:89580]}</sil>
Hi, what you are interested in is the
Hypothesis: ...
log entry. The recognizer is called in a loop because there is a stage which splits the speech by silence into segments and you get oneResult
with a hypothesis per each.You need to concatenate those outputs and dump them into a file yourself.