I am trying to develop a Speech Recognition System which will be able to recognize pilot's commands in a aviation environment.
I am able to tun the offline mode but when i try to run the live mode of the Speech Recognizer, it acts weird.
I have 33 utterances with WSJ, a simple word list grammar, flat linguist, my own dictionary.
Can you please explain me what this means?
[root@mali cibac]# ant tidigits_wordlist_live
Buildfile: build.xml
tidigits_wordlist_live: [java] # ----------------------------- Timers---------------------------------------- [java] # Name Count CurTime MinTime MaxTime AvgTime TotTime [java] AM_Load 1 11.9910s 11.9910s 11.9910s 11.9910s 11.9910s [java] DictionaryLoad 1 0.0100s 0.0100s 0.0100s 0.0100s 0.0100s [java] grammarLoad 1 0.0130s 0.0130s 0.0130s 0.0130s 0.0130s [java] compile 1 2.6130s 2.6130s 2.6130s 2.6130s 2.6130s [java] createGStates 1 0.0370s 0.0370s 0.0370s 0.0370s 0.0370s [java] collectContex 1 0.0610s 0.0610s 0.0610s 0.0610s 0.0610s [java] expandStates 1 2.2310s 2.2310s 2.2310s 2.2310s 2.2310s [java] connectNodes 1 0.2800s 0.2800s 0.2800s 0.2800s 0.2800s [java] # ----------- linguist stats ------------ [java] # Total states: 16261 [java] # class edu.cmu.sphinx.linguist.flat.NonEmittingHMMState: 2925 [java] # class edu.cmu.sphinx.linguist.flat.PronunciationState: 1198 [java] # class edu.cmu.sphinx.linguist.flat.BranchState : 437 [java] # class edu.cmu.sphinx.linguist.flat.GrammarState: 1 [java] # class edu.cmu.sphinx.linguist.flat.HMMStateState: 8775 [java] # class edu.cmu.sphinx.linguist.flat.ExtendedUnitState: 2925 [java] NonSpeechDataFilter: ALERT: getting a SpeechStartSignal while in speech, removing it. [java] This Time Audio: 73.04s Proc: 71.36s Speed: 0.98 X real time [java] Total Time Audio: 73.04s Proc: 71.36s Speed: 0.98 X real time [java] Response Time: Avg: 0.207s Max: 0.207s Min: 0.207s [java] Mem Total: 126.62 Mb Free: 41.79 Mb [java] Used: This: 84.83 Mb Avg: 84.83 Mb Max: 84.83 Mb
The hypothesis text [HYP] is generated by itself.Its not something i spoke on the mike.
And it also generated Align.txt whic has
and type id tag id tag field type field type type exit tunnel tunnel type type thirty id type
NAV RANGE DECREASE NAV RANGE DECREASE NAV RANGE TWENTY NAV RANGE TEN NAV RANGE DECREASE NAV RANGE DECREASE NAV RANGE DOWN NA\
V RANGE DOWN NAV RANGE DOWN PFD PFD FIELD OF VIEW UNITY PFD FLIR DECLUTTER CANCEL PFD FLIR ON PFD FLIR DECLUTTER PFD FLIR ON\
NAV RANGE UP NAV RANGE DOWN NAV RANGE UP HUD ALL DECLUTTER HUD ALL ON NAV RANGE ZERO POINT FIVE NAV RANGE TEN PFD FIELD OF \
VIEW UNITY PFD FIELD OF VIEW UNITY PFD FIELD OF VIEW UNITY NAV RANGE ONE NAV RANGE TEN NAV RANGE TEN NAV RANGE TWENTY PFD FI\
ELD OF VIEW SIXTY PFD FIELD OF VIEW SIXTY NAV RANGE TWENTY
The words in capitals are the 33 utterances of my raw files. And the first sentence is the HYP.
Can you please respond to this email?
I am struck here and I need some information to carry on.
Thanks a lot in Advance,
Sincerely,
Raghu.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi list,
I am trying to develop a Speech Recognition System which will be able to recognize pilot's commands in a aviation environment.
I am able to tun the offline mode but when i try to run the live mode of the Speech Recognizer, it acts weird.
I have 33 utterances with WSJ, a simple word list grammar, flat linguist, my own dictionary.
Can you please explain me what this means?
[root@mali cibac]# ant tidigits_wordlist_live
Buildfile: build.xml
tidigits_wordlist_live:
[java] # ----------------------------- Timers----------------------------------------
[java] # Name Count CurTime MinTime MaxTime AvgTime TotTime
[java] AM_Load 1 11.9910s 11.9910s 11.9910s 11.9910s 11.9910s
[java] DictionaryLoad 1 0.0100s 0.0100s 0.0100s 0.0100s 0.0100s
[java] grammarLoad 1 0.0130s 0.0130s 0.0130s 0.0130s 0.0130s
[java] compile 1 2.6130s 2.6130s 2.6130s 2.6130s 2.6130s
[java] createGStates 1 0.0370s 0.0370s 0.0370s 0.0370s 0.0370s
[java] collectContex 1 0.0610s 0.0610s 0.0610s 0.0610s 0.0610s
[java] expandStates 1 2.2310s 2.2310s 2.2310s 2.2310s 2.2310s
[java] connectNodes 1 0.2800s 0.2800s 0.2800s 0.2800s 0.2800s
[java] # ----------- linguist stats ------------
[java] # Total states: 16261
[java] # class edu.cmu.sphinx.linguist.flat.NonEmittingHMMState: 2925
[java] # class edu.cmu.sphinx.linguist.flat.PronunciationState: 1198
[java] # class edu.cmu.sphinx.linguist.flat.BranchState : 437
[java] # class edu.cmu.sphinx.linguist.flat.GrammarState: 1
[java] # class edu.cmu.sphinx.linguist.flat.HMMStateState: 8775
[java] # class edu.cmu.sphinx.linguist.flat.ExtendedUnitState: 2925
[java] NonSpeechDataFilter: ALERT: getting a SpeechStartSignal while in speech, removing it.
[java] This Time Audio: 73.04s Proc: 71.36s Speed: 0.98 X real time
[java] Total Time Audio: 73.04s Proc: 71.36s Speed: 0.98 X real time
[java] Response Time: Avg: 0.207s Max: 0.207s Min: 0.207s
[java] Mem Total: 126.62 Mb Free: 41.79 Mb
[java] Used: This: 84.83 Mb Avg: 84.83 Mb Max: 84.83 Mb
BUILD SUCCESSFUL
Total time: 1 minute 30 seconds
The hypothesis text [HYP] is generated by itself.Its not something i spoke on the mike.
And it also generated Align.txt whic has
and type id tag id tag field type field type type exit tunnel tunnel type type thirty id type
NAV RANGE DECREASE NAV RANGE DECREASE NAV RANGE TWENTY NAV RANGE TEN NAV RANGE DECREASE NAV RANGE DECREASE NAV RANGE DOWN NA\
V RANGE DOWN NAV RANGE DOWN PFD PFD FIELD OF VIEW UNITY PFD FLIR DECLUTTER CANCEL PFD FLIR ON PFD FLIR DECLUTTER PFD FLIR ON\ NAV RANGE UP NAV RANGE DOWN NAV RANGE UP HUD ALL DECLUTTER HUD ALL ON NAV RANGE ZERO POINT FIVE NAV RANGE TEN PFD FIELD OF \
VIEW UNITY PFD FIELD OF VIEW UNITY PFD FIELD OF VIEW UNITY NAV RANGE ONE NAV RANGE TEN NAV RANGE TEN NAV RANGE TWENTY PFD FI\ ELD OF VIEW SIXTY PFD FIELD OF VIEW SIXTY NAV RANGE TWENTY
The words in capitals are the 33 utterances of my raw files. And the first sentence is the HYP.
Can you please respond to this email?
I am struck here and I need some information to carry on.
Thanks a lot in Advance,
Sincerely,
Raghu.