I have collected 30 hrs of Indian English speech data. I have performed experiments on Sphinx and Kaldi keeping all the experimental conditions same.
Like Feature extraction, numer of Gaussians, tied states, basic EM training, no other techniques like SAT,fmllr,mmi etc.
The test set is of 8000 utterances.
I am getting wer of 6.5% on Sphinx, 4.3% on Kaldi.
I cant figure out the reason behind difference in the accuracy.
Am I missing something important in the experiments?
Thanks in advance
Bhargav
Last edit: bhargav 2016-02-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To compare decoders accurately you need to tune decoding and training parameters - number of gaussians, beams, language weights. Those must be different for kaldi and cmusphinx, not the same. For cmusphinx you generally need more gaussians than for Kaldi since cmusphinx assigns them uniformly. Small difference in accuracy is ok. Also for CMUSphinx you need a different langauge weights (in fwdflat, fwdtree and bestpath) due to normalizations of scores inside decoder.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone.
I am testing the Lium/CMU french Language Model on Kaldi. But the format doesn't seem the same.
Here is what I get when trying to evaluate its ppl on my corpus:
$ ngram -lm ModelsRef/3gLIUM/trigram_LM.DMP.gz -ppl data/ACSYNT/Mix1/ACSYNTMix1_${p}_ts/text
ModelsRef/3gLIUM/trigram_LM.DMP.gz: line 182630: reached EOF before \end\
format error in lm file ModelsRef/3gLIUM/trigram_LM.DMP.gz
Is there a way to convert the model in a format that Kaldi can evaluate?
When trying with SphinxBase here is the error:
$ sphinx_lm_convert -i ModelsRef/3gLIUM/trigram_LM.DMP -o model.lm -ofmt bin
Current configuration:
[NAME][DEFLT][VALUE]
-case
-debug 0
-help no no
-i ModelsRef/3gLIUM/trigram_LM.DMP
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o model.lm
-ofmt bin
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't matchINFO: ngram_model_trie.c(177): Trying to read LM in arpa formatINFO: ngram_model_trie.c(70): No \data\ mark in LM fileINFO: ngram_model_trie.c(445): Trying to read LM in dmp formatINFO: ngram_model_trie.c(527): ngrams 1=65533, 2=18408667, 3=22235344INFO: lm_trie.c(474): Training quantizerINFO: lm_trie.c(482): Building LM trieERROR: "ngram_model_trie.c", line 323: Error reading word strings (904402888 doesn't match n_unigrams 65533)
$ sphinx_lm_convert -i ModelsRef/3gLIUM/trigram_LM.DMP -o model.lm -ofmt arpa
Current configuration:
[NAME][DEFLT][VALUE]
-case
-debug 0
-help no no
-i ModelsRef/3gLIUM/trigram_LM.DMP
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o model.lm
-ofmt arpa
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't matchINFO: ngram_model_trie.c(177): Trying to read LM in arpa formatINFO: ngram_model_trie.c(70): No \data\ mark in LM fileINFO: ngram_model_trie.c(445): Trying to read LM in dmp formatINFO: ngram_model_trie.c(527): ngrams 1=65533, 2=18408667, 3=22235344INFO: lm_trie.c(474): Training quantizerINFO: lm_trie.c(482): Building LM trieERROR: "ngram_model_trie.c", line 323: Error reading word strings (904402888 doesn't match n_unigrams 65533)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have collected 30 hrs of Indian English speech data. I have performed experiments on Sphinx and Kaldi keeping all the experimental conditions same.
Like Feature extraction, numer of Gaussians, tied states, basic EM training, no other techniques like SAT,fmllr,mmi etc.
The test set is of 8000 utterances.
I am getting wer of 6.5% on Sphinx, 4.3% on Kaldi.
I cant figure out the reason behind difference in the accuracy.
Am I missing something important in the experiments?
Thanks in advance
Bhargav
Last edit: bhargav 2016-02-29
To compare decoders accurately you need to tune decoding and training parameters - number of gaussians, beams, language weights. Those must be different for kaldi and cmusphinx, not the same. For cmusphinx you generally need more gaussians than for Kaldi since cmusphinx assigns them uniformly. Small difference in accuracy is ok. Also for CMUSphinx you need a different langauge weights (in fwdflat, fwdtree and bestpath) due to normalizations of scores inside decoder.
Hi everyone.
I am testing the Lium/CMU french Language Model on Kaldi. But the format doesn't seem the same.
Here is what I get when trying to evaluate its ppl on my corpus:
Is there a way to convert the model in a format that Kaldi can evaluate?
When trying with SphinxBase here is the error: