Raghu - 2005-10-07

Hi list,

I am trying to develop a Speech Recognition System which will be able to recognize pilot's commands in a aviation environment.
I am able to tun the offline mode but when i try to run the live mode of the Speech Recognizer, it acts weird.

I have 33 utterances with WSJ, a simple word list grammar, flat linguist, my own dictionary.
Can you please explain me what this means?

[root@mali cibac]# ant tidigits_wordlist_live
Buildfile: build.xml
tidigits_wordlist_live:
[java] # ----------------------------- Timers----------------------------------------
[java] # Name Count CurTime MinTime MaxTime AvgTime TotTime
[java] AM_Load 1 11.9910s 11.9910s 11.9910s 11.9910s 11.9910s
[java] DictionaryLoad 1 0.0100s 0.0100s 0.0100s 0.0100s 0.0100s
[java] grammarLoad 1 0.0130s 0.0130s 0.0130s 0.0130s 0.0130s
[java] compile 1 2.6130s 2.6130s 2.6130s 2.6130s 2.6130s
[java] createGStates 1 0.0370s 0.0370s 0.0370s 0.0370s 0.0370s
[java] collectContex 1 0.0610s 0.0610s 0.0610s 0.0610s 0.0610s
[java] expandStates 1 2.2310s 2.2310s 2.2310s 2.2310s 2.2310s
[java] connectNodes 1 0.2800s 0.2800s 0.2800s 0.2800s 0.2800s
[java] # ----------- linguist stats ------------
[java] # Total states: 16261
[java] # class edu.cmu.sphinx.linguist.flat.NonEmittingHMMState: 2925
[java] # class edu.cmu.sphinx.linguist.flat.PronunciationState: 1198
[java] # class edu.cmu.sphinx.linguist.flat.BranchState : 437
[java] # class edu.cmu.sphinx.linguist.flat.GrammarState: 1
[java] # class edu.cmu.sphinx.linguist.flat.HMMStateState: 8775
[java] # class edu.cmu.sphinx.linguist.flat.ExtendedUnitState: 2925
[java] NonSpeechDataFilter: ALERT: getting a SpeechStartSignal while in speech, removing it.
[java] This Time Audio: 73.04s Proc: 71.36s Speed: 0.98 X real time
[java] Total Time Audio: 73.04s Proc: 71.36s Speed: 0.98 X real time
[java] Response Time: Avg: 0.207s Max: 0.207s Min: 0.207s
[java] Mem Total: 126.62 Mb Free: 41.79 Mb
[java] Used: This: 84.83 Mb Avg: 84.83 Mb Max: 84.83 Mb

 [java] HYP: and type id tag id tag field type field type type exit tunnel tunnel type type thirty id type
 [java]    Sentences: 1

 [java] Aligning results...
 [java]    Utterances: Found: 1   Actual: 33
 [java]  ...done aligning

 [java] # ------------- Summary Statistics -------------
 [java]    Accuracy: 1.835%    Errors: 107  (Sub: 17  Ins: 0  Del: 90)
 [java]    Words: 109   Matches: 2    WER: 98.165%
 [java]    Sentences: 1   Matches: 0   SentenceAcc: 0.000%
 [java] # ----------------------------- Timers----------------------------------------
 [java] # Name          Count   CurTime   MinTime   MaxTime   AvgTime   TotTime
 [java] concatDataSourc 14612   0.0000s   0.0000s   0.0410s   0.0000s   0.5330s
 [java] nonSpeechDataFi 14612   0.0000s   0.0000s   0.0060s   0.0000s   0.0890s
 [java] premphasizer    14612   0.0000s   0.0000s   0.0140s   0.0000s   0.0560s 
 [java] windower        7305    0.0000s   0.0000s   0.0900s   0.0001s   0.8800s
 [java] fft             7307    0.0000s   0.0000s   0.1570s   0.0002s   1.5140s
 [java] melFilterBank   7307    0.0000s    0.0000s   0.0320s   0.0000s   0.1160s
 [java] dct             7307    0.0000s   0.0000s   0.0130s   0.0000s   0.2120s
 [java] liveCMN         7307    0.0000s   0.0000s   0.0260s   0.0000s   0.0620s
 [java] featureExtracti 7302    0.0000s   0.0000s   0.0220s   0.0000s   0.1420s
 [java] AM_Load         1       11.9910s  11.9910s  11.9910s  11.9910s  11.9910s
 [java] DictionaryLoad  1       0.0100s   0.0100s   0.0100s   0.0100s   0.0100s 
 [java] grammarLoad     1       0.0130s   0.0130s   0.0130s   0.0130s   0.0130s
 [java] compile         1       2.6130s   2.6130s   2.6130s   2.6130s   2.6130s
 [java]   createGStates 1       0.0370s    0.0370s   0.0370s   0.0370s   0.0370s
 [java]   collectContex 1       0.0610s   0.0610s   0.0610s   0.0610s   0.0610s
 [java]   expandStates  1       2.2310s   2.2310s   2.2310s   2.2310s   2.2310s
 [java]   connectNodes  1       0.2800s   0.2800s   0.2800s   0.2800s   0.2800s
 [java] scoring         7306    0.0000s   0.0000s   0.2100s   0.0045s   32.6130s
 [java] pruning         7304    0.0000s   0.0000s   0.1350s   0.0007s   4.8020s 
 [java] growing         7306    0.0010s   0.0000s   0.6410s   0.0046s   33.8480s
 [java] Align           1       0.0080s   0.0080s   0.0080s   0.0080s   0.0080s

 [java] # --------------- Summary statistics ---------
 [java]    Total Time Audio: 73.04s  Proc: 71.36s  Speed: 0.98 X real time
 [java]    Response Time:  Avg: 0.207s  Max: 0.207s  Min: 0.207s
 [java]    Mem  Total: 126.62 Mb  Free: 41.61 Mb
 [java]    Used: This: 85.02 Mb  Avg: 84.92 Mb  Max: 85.02 Mb
 [java]    Utterances:  Actual: 33  Found: 1
 [java]    Gap Insertions: 18

BUILD SUCCESSFUL
Total time: 1 minute 30 seconds

The hypothesis text [HYP] is generated by itself.Its not something i spoke on the mike.

And it also generated Align.txt whic has

and type id tag id tag field type field type type exit tunnel tunnel type type thirty id type
NAV RANGE DECREASE NAV RANGE DECREASE NAV RANGE TWENTY NAV RANGE TEN NAV RANGE DECREASE NAV RANGE DECREASE NAV RANGE DOWN NA\
V RANGE DOWN NAV RANGE DOWN PFD PFD FIELD OF VIEW UNITY PFD FLIR DECLUTTER CANCEL PFD FLIR ON PFD FLIR DECLUTTER PFD FLIR ON\ NAV RANGE UP NAV RANGE DOWN NAV RANGE UP HUD ALL DECLUTTER HUD ALL ON NAV RANGE ZERO POINT FIVE NAV RANGE TEN PFD FIELD OF \
VIEW UNITY PFD FIELD OF VIEW UNITY PFD FIELD OF VIEW UNITY NAV RANGE ONE NAV RANGE TEN NAV RANGE TEN NAV RANGE TWENTY PFD FI\ ELD OF VIEW SIXTY PFD FIELD OF VIEW SIXTY NAV RANGE TWENTY

The words in capitals are the 33 utterances of my raw files. And the first sentence is the HYP.

Can you please respond to this email?

I am struck here and I need some information to carry on.

Thanks a lot in Advance,

Sincerely,

Raghu.