CMU Sphinx / Forums / Help: Get Pocketsphinx performance info for grammar mode? Understanding LM mode output?

I am running pocketsphinx on Raspberry Pi 3 / Raspbian Jessie-lite and want to compare my recognition performance on a Raspberry Pi B+.

I have set -verbose yes, and -logfn to psphinx.log, and I am getting lots of output but in "grammar mode" nothing looks like what I get in "lm mode":

INFO: ngram_search_fwdtree.c(1567): fwdtree 0.88 CPU 0.355 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 2.63 wall 1.062 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 36 words
INFO: ngram_search_fwdflat.c(945):     1162 words recognized (5/fr)
INFO: ngram_search_fwdflat.c(947):   105917 senones evaluated (427/fr)
INFO: ngram_search_fwdflat.c(949):    83719 channels searched (337/fr)
INFO: ngram_search_fwdflat.c(951):     4624 words searched (18/fr)
INFO: ngram_search_fwdflat.c(954):     3438 word transitions (13/fr)
INFO: ngram_search_fwdflat.c(957): fwdflat 0.47 CPU 0.190 xRT
INFO: ngram_search_fwdflat.c(960): fwdflat 0.47 wall 0.191 xRT
INFO: ngram_search.c(1252): lattice start node <s>.0 end node </s>.213
INFO: ngram_search.c(1278): Eliminated 1 nodes before end node
INFO: ngram_search.c(1383): Lattice has 331 nodes, 499 links
INFO: ps_lattice.c(1380): Bestpath score: -6142
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:213:246) = -414419
INFO: ps_lattice.c(1441): Joint P(O,S) = -439496 P(S|O) = -25077
INFO: ngram_search.c(874): bestpath 0.01 CPU 0.004 xRT
INFO: ngram_search.c(877): bestpath 0.00 wall 0.001 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.88 CPU 0.356 xRT
INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 2.63 wall 1.067 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.47 CPU 0.190 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.47 wall 0.191 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.01 CPU 0.004 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.001 xRT

Q1) Is it possible to get timing for grammar mode?

Q2) for "lm mode" how do I interpret how long was the utterance, and how long did the reco take?

BTW: This is my reco listener loop: (from Neil Davenport https://github.com/bynds/makevoicedemo )

                   # We tell PocketSphinx that the user is finished saying what they wanted
                    # to say, and that it should makes it's best guess as to what thay was.
                    self.decoder.end_utt()
                    # The following will get a hypothesis object with, amongst other things,
                    # the string of words that PocketSphinx thinks the user said.
                    self.hypothesis = self.decoder.hyp()
                    if self.hypothesis is not None:
                        bestGuess = self.hypothesis.hypstr
                        print 'I just heard you say:"{}"'.format(bestGuess)
                        # We are done with the microphone for now so we'll close the stream.
                        self.stream.stop_stream()
                        self.stream.close()
                        # We have what we came for! A string representing what the user said.
                        # We'll now return it to the runMain function so that it can be
                        # processed and some meaning can be gleamed from it.
                        return bestGuess

Nickolay V. Shmyrev - 2016-05-22

Q1) Is it possible to get timing for grammar mode?

Grammar mode must print similar numbers if you use latest pocketsphinx:

INFO: fsg_search.c(869): fsg 0.13 CPU 0.023 xRT INFO: fsg_search.c(871): fsg 0.16 wall 0.028 xRT INFO: fsg_search.c(265): TOTAL fsg 0.13 CPU 0.023 xRT INFO: fsg_search.c(268): TOTAL fsg 0.16 wall 0.028 xRT

Q2) for "lm mode" how do I interpret how long was the utterance, and how long did the reco take?

From this lines

INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.88 CPU 0.356 xRT INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 2.63 wall 1.067 xRT INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.47 CPU 0.190 xRT INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.47 wall 0.191 xRT INFO: ngram_search.c(303): TOTAL bestpath 0.01 CPU 0.004 xRT INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.001 xRT

3 stages (fwdtree, fwdflat and bestpath) took 0.88 + 0.47 + 0.01 or 1.36 seconds CPU time. The speed of decoding was 0.356 + 0.190 + 0.004 or 0.55xRT which means 1 second of speech was decode in 0.55 seconds of CPU time. The total length of the audio is 1.36 / 0.55 or 2.47 seconds, but it is probably irrelevant. xRT number matters.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

amcdonley - 2016-05-23

Superb - Thank You! Exactly what I needed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

amcdonley - 2016-05-27

xRT is a measure only "in-speech", correct?

total-wall-time is the sum of times from start of speech to end of speech, including inter-word silences?

Is there a term for total-cpu-time / total-wall-time?

result_14.txt
::::::::::::::
Utterances=14
CpuTime=56.42 seconds
CPU xRealTime=0.926 or 92.6% of one core
Actual Speech=60.9287 seconds
Utterances=92.89 seconds total
66% of utterances were speech
::::::::::::::
result_63.txt
::::::::::::::
Utterances=63
CpuTime=75.7 seconds
CPU xRealTime=0.52 or 52% of one core
Actual Speech=145.577 seconds
Utterances=189.77 seconds total
77% of utterances were speech

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-27
  
  xRT is a measure only "in-speech", correct?
  
  Yes
  
  total-wall-time is the sum of times from start of speech to end of speech, including inter-word silences?
  
  Well, it is not exactly with silence included. Silence is always filtered out in processing and is not accounted in performance computation.
  
  It's more about what system time did it took to process speech. When you process from microphone, yes, it waits for input and the time it simply waits is included. When you process from file it is just the time taken to process speech. This time also accounts for machine doing something else, for example, if you are doing some other computation it will be included into wall time.
  
  Is there a term for total-cpu-time / total-wall-time?
  
  Not really
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

amcdonley - 2016-06-05

Nickolay - you are credited in Alan's Pi 3 Road Test Using CMU PocketSphinx: https://goo.gl/RrGgCm

and the video: https://vimeo.com/169445418

Thanks for your help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-09
  
  Hey, Alan, many thanks, this is a very important publication. I shared on our blog too
  http://cmusphinx.sourceforge.net/2016/06/should-you-select-raspberry-pi-3-or-raspberry-pi-b-for-cmusphinx/
  
  Actually it would be very interesting to evaluate keyword spotting mode too which supposed to be a primary operation mode for IOT. It might be also interesting to play with decoding parameters for LVCSR, it might be reasonably fast and accurate after tuning.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Get Pocketsphinx performance info for grammar mode? Understanding LM mode output?

Speech Recognition Toolkit

Forums

Help

Get Pocketsphinx performance info for grammar mode? Understanding LM mode output? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Get Pocketsphinx performance info for grammar mode? Understanding LM mode output?