CMU Sphinx / Forums / Help: KaldiLoader + LexTreeLinguist

Matt Robinson - 2016-02-24

Attempting to load Kaldi GMMs into sphinx4 using current sphinx4 git repos on Ubuntu 14.04.3 LTS.

git log |head
commit ffa62f865b258623926ed2bd23d5015570161c9d
Author: nshmyrev nshmyrev@94700074-3cef-4d97-a70e-9c8c206c02f5
Date: Wed Feb 3 19:03:31 2016 +0000

There is a branch out there which successfully demonstrates an example Kaldi model (with tree) for the yesno task via a jsgf grammar and the FlatLinguist:
https://sourceforge.net/p/cmusphinx/code/12177/tree/branches/hl-interface/

Kaldi Training Configuration
1) Must use a topo file that treats "silence phones" with same 3-state topology as "non-silence" phones. There is a TODO in the code to dynamically handle any HMM topology, but currently hardcoded to 3 states.
<topology>
<topologyentry>
<forphones>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
</forphones>
<state> 0 <pdfclass> 0 <transition> 0 0.75 <transition> 1 0.25 </transition></transition></pdfclass></state>
<state> 1 <pdfclass> 1 <transition> 1 0.75 <transition> 2 0.25 </transition></transition></pdfclass></state>
<state> 2 <pdfclass> 2 <transition> 2 0.75 <transition> 3 0.25 </transition></transition></pdfclass></state>
<state> 3 </state>
</topologyentry></topology>

2) Sphinx4 need transition model to contain <daiggmm> which I believe is a result of a descriminately rained model.</daiggmm>

Then to prepare for use with Sphinx4, the transition model ("final.mdl") and "tree" must be converted to ascii instead of binary (using Kaldi tools copy-tree and copy-transition-model).

When I point my task (using and arpa model) the LexTreeLinguist fails when initializing the HMMPool as ultimately we are using the LazyHmmManager.get() and there is a 'null' right context triphone. Attempting to get the HMM for the triphone “SIL[SIL,null]” causes
[java] java.lang.NullPointerException
[java] at linguist.acoustic.tiedstate.LazyHmmManager.get(LazyHmmManager.java:115)
[java] at linguist.acoustic.tiedstate.TiedStateAcousticModel.lookupNearestHMM(TiedStateAcousticModel.java:179)
[java] at linguist.acoustic.HMMPool.<init>(HMMPool.java:98)
[java] at linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:427)
[java] at linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:416)
[java] at linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:335)
[java] at decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningrstSearchManager.java:243)
[java] at decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
[java] at recognizer.Recognizer.allocate(Recognizer.java:164)
[java] at tools.batch.BatchModeRecognizer.decode(BatchModeRecognizer.java:208)
[java] at tools.batch.BatchModeRecognizer.main(BatchModeRecognizer.java:621)</init>

I modified the LazyHmmManager to check for either context bing null to back off to "SIL" (as it does when not unit.isContextDependent() as such:

if (null == left) {
ids[0] = symbolTable.get("SIL");
System.out.println("linguist.acoustic.tiedstate.LazyHmmManager: why null LEFT");
} else {
ids[0] = symbolTable.get(left.getName());
}

if (null == right) { ids[2] = symbolTable.get("SIL"); System.out.println("linguist.acoustic.tiedstate.LazyHmmManager: why null RIGHT"); } else { ids[2] = symbolTable.get(right.getName()); }

But this just causes the HMMPool to choke when it tries to synthesizeUnit SIL SIL SIL
[java] at edu.cmu.sphinx.linguist.acoustic.HMMPool.synthesizeUnit(HMMPool.java:140)

Due to the topology constraint there are no online Kaldi models that I can point to - I am considering reaching out to Dan Povey and committing a build to Kaldi for this purpose. Will it make a difference to the Sphinx4 KaldiLoader if the Kaldi model has been trained on speaker adapted features.

Matt
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-24
  
  Will it make a difference to the Sphinx4 KaldiLoader if the Kaldi model has been trained on speaker adapted features.
  
  You need plain features, not speaker adapted ones since speaker adaptation is not supported in s4.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Robinson - 2016-02-25

The model is trained on fMMR features, but should be used in speaker independent mode. In any event, I made a model last night without adapting features. In the vernacular of Kaldi it is labeled tri2b_mpe. Fails in exatly the same way. I dont think it is a features issue but for completeness it is requiered by LexTreeLinguist to have a feat.params file in the model dir in which I put:

-dither yes
-transform kaldi
-remove_dc yes
-feat 1s_c_d_dd
-agc none
-cmn current
-varnorm no

The LazyHmmManager gets SIL due to offending null about 160 times for the right context and then about that many for the left beefore the HMMPool.synthesis method chokes on the SIL SIL SIL triphone. My model has about 160 phonemes with 7946 Triples.

[java] Error during decoding: [java] [java] java.lang.NullPointerException [java] at edu.cmu.sphinx.linguist.acoustic.HMMPool.synthesizeUnit(HMMPool.java:140) [java] at edu.cmu.sphinx.linguist.acoustic.HMMPool.<init>(HMMPool.java:95) [java] at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:427) [java] at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:416) [java] at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:335) [java] at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243) [java] at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103) [java] at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164) [java] at edu.cmu.sphinx.tools.batch.BatchModeRecognizer.decode(BatchModeRecognizer.java:208) [java] at edu.cmu.sphinx.tools.batch.BatchModeRecognizer.main(BatchModeRecognizer.java:621)

BUILD SUCCESSFUL
Total time: 8 seconds
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-25
  
  If this exception you want to resolve, probably you can share your setup so I can take a look.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Matt Robinson - 2016-02-29
    
    As mentioned at the beginning of this thread, I am using a recent version of Sphinx4 as maintained on gitHub.
    
    This issue can be replicated using the Voxforge language model and with a system that works using the Voxforge acoustic model found here:
    
    http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/voxforge-en-r0_1_3.tar.gz
    
    Using acoustic.tiedstate.Sphinx3Loader the batch finishes as expected with this result:
    
    [java] REF: AND THERE'S NO CHIVALRY NO QUARTER SHOWN IN THIS FIGHT [java] HYP: IN HIS NO CHIVALRY NO QUARTER SHOWN IN THIS FIGHT [java] Accuracy: 80.000% Errors: 2 (Sub: 2 Ins: 0 Del: 0) [java] Words: 10 Matches: 8 WER: 20.000% [java] Sentences: 1 Matches: 0 SentenceAcc: 0.000% [java] Accuracy: 80.000% Errors: 2 (Sub: 2 Ins: 0 Del: 0) [java] Words: 10 Matches: 8 WER: 20.000% [java] Sentences: 1 Matches: 0 SentenceAcc: 0.000% [java] Tokens created: 1.3189692E7
    
    Using acoustic.tiedstate.KaldiLoader it gives the error message as above described.
    
    b0027.wav
    
    build.xml
    
    kaldi.config.xml
    
    tri2b_3state_mpe.zip
    
    voxforge.config.xml
    
    voxforge_en_sphinx_16kHz.1uttTest.batch
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

KaldiLoader + LexTreeLinguist

Speech Recognition Toolkit

Forums

Help

KaldiLoader + LexTreeLinguist document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

KaldiLoader + LexTreeLinguist