I am trying to run the Hub4 system (tests/performance/hub4) using the trigram LM and find that there are some words in the LM missing from the dictionary. I have checked "abscond" and "<unk>" against the dictionary in HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.jar
and those two at least are indeed missing.
The system then attempts to run recognition but, possibly as a consequence of the missing words, I get empty HYP output in a very short time!
[java] 04:45.300 WARNING dictionary Missing word: <unk>
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.340 WARNING dictionary Missing word: abidjan
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.343 WARNING dictionary Missing word: abimael
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.344 WARNING dictionary Missing word: abiquiu
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.364 WARNING dictionary Missing word: abridging
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.366 WARNING dictionary Missing word: abscond
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.367 WARNING dictionary Missing word: absconded
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.367 WARNING dictionary Missing word: absconding
...
[java] 04:45.995 WARNING trigramModel Dictionary is missing 711 words that are contained in the language model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for using Sphinx-4! What you are seeing, the report of 711 missing words is normal behavior for the hub-4 test. Expected behavior for the hub4 test is to report the missing words and proceed with recognition. We see about an 18% WER with hub4. You can view the latest test results here:
The fact that you get an empty HYP in a very short time is indeed indicative of a problem Are you running the test against the hub4 data? Are you running live mode, or against some other data set?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks very much -- I was using an internal data set, which I had massaged to get into big-endian 16kHz form. I have just tried with the AN4 data and am getting sensible recognition results, so its back to the data prep drawing board!
P.S. I am impressed by the flexible Java/XML setup to put together your choice of recognizer "on the fly".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Curiouser and curiouser. I think I have finally pinned down my empty output. If I run using as input the file an4/an4_clstk/fash/an251-fash-b.raw, then I get the correct recognition of "yes". If I append one tenth of a second of zero waveform to that file, it behaves exactly as my data is behaving -- comes back in very quick time with empty recognition:
[java] REF: yes
[java] HYP: yes
[java] Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
[java] Words: 1 Matches: 1 WER: 0.000%
[java] Sentences: 1 Matches: 1 SentenceAcc: 100.000%
[java] This Time Audio: 1.00s Proc: 9.68s Speed: 9.68 X real time
[java] Total Time Audio: 1.00s Proc: 9.68s Speed: 9.68 X real time
[java] Mem Total: 379.75 Mb Free: 164.26 Mb
[java] Used: This: 215.49 Mb Avg: 215.49 Mb Max: 215.49 Mb
[java] REF: yes
[java] HYP:
[java] Accuracy: 50.000% Errors: 1 (Sub: 0 Ins: 0 Del: 1)
[java] Words: 2 Matches: 1 WER: 50.000%
[java] Sentences: 2 Matches: 1 SentenceAcc: 50.000%
[java] This Time Audio: 1.10s Proc: 0.04s Speed: 0.04 X real time
[java] Total Time Audio: 2.10s Proc: 9.72s Speed: 4.63 X real time
[java] Mem Total: 379.75 Mb Free: 152.62 Mb
[java] Used: This: 227.13 Mb Avg: 221.31 Mb Max: 227.13 Mb
Keith
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-06-09
I suggest that the "one tenth of a second of zero waveform" may be a problem. In Sphinx2 and Sphinx3, the features are cepstra, which involves taking the log of the power spectrum. If you feed it frames that are all zero, it causes overflow errors in the log computation. This not only messes up those frames, but it writes very large numbers into the cepstral mean used for normalization, which will mess up subsequent speech frames as well.
*I do not know* whether the Sphinx4 front end has a similar vulnerability to all-zero signals, but it may. Try splicing some "actual silence" in the front instead of artificial silence.
cheers,
jerry wolf
soliloquy learning, inc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am trying to run the Hub4 system (tests/performance/hub4) using the trigram LM and find that there are some words in the LM missing from the dictionary. I have checked "abscond" and "<unk>" against the dictionary in HUB4_8gau_13dCep_16k_40mel_133Hz_6855Hz.jar
and those two at least are indeed missing.
The system then attempts to run recognition but, possibly as a consequence of the missing words, I get empty HYP output in a very short time!
[java] 04:45.300 WARNING dictionary Missing word: <unk>
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.340 WARNING dictionary Missing word: abidjan
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.343 WARNING dictionary Missing word: abimael
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.344 WARNING dictionary Missing word: abiquiu
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.364 WARNING dictionary Missing word: abridging
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.366 WARNING dictionary Missing word: abscond
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.367 WARNING dictionary Missing word: absconded
[java] in edu.cmu.sphinx.linguist.dictionary.FastDictionary:getWord-dictionary
[java] 04:45.367 WARNING dictionary Missing word: absconding
...
[java] 04:45.995 WARNING trigramModel Dictionary is missing 711 words that are contained in the language model.
Keith:
Thanks for using Sphinx-4! What you are seeing, the report of 711 missing words is normal behavior for the hub-4 test. Expected behavior for the hub4 test is to report the missing words and proceed with recognition. We see about an 18% WER with hub4. You can view the latest test results here:
http://cmusphinx.sourceforge.net/LargeVocabResults.html
The fact that you get an empty HYP in a very short time is indeed indicative of a problem Are you running the test against the hub4 data? Are you running live mode, or against some other data set?
Paul:
Thanks very much -- I was using an internal data set, which I had massaged to get into big-endian 16kHz form. I have just tried with the AN4 data and am getting sensible recognition results, so its back to the data prep drawing board!
P.S. I am impressed by the flexible Java/XML setup to put together your choice of recognizer "on the fly".
Thanks for the kudos ... have fun, and let us know how it works out for you.
Paul
Curiouser and curiouser. I think I have finally pinned down my empty output. If I run using as input the file an4/an4_clstk/fash/an251-fash-b.raw, then I get the correct recognition of "yes". If I append one tenth of a second of zero waveform to that file, it behaves exactly as my data is behaving -- comes back in very quick time with empty recognition:
[java] REF: yes
[java] HYP: yes
[java] Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
[java] Words: 1 Matches: 1 WER: 0.000%
[java] Sentences: 1 Matches: 1 SentenceAcc: 100.000%
[java] This Time Audio: 1.00s Proc: 9.68s Speed: 9.68 X real time
[java] Total Time Audio: 1.00s Proc: 9.68s Speed: 9.68 X real time
[java] Mem Total: 379.75 Mb Free: 164.26 Mb
[java] Used: This: 215.49 Mb Avg: 215.49 Mb Max: 215.49 Mb
[java] REF: yes
[java] HYP:
[java] Accuracy: 50.000% Errors: 1 (Sub: 0 Ins: 0 Del: 1)
[java] Words: 2 Matches: 1 WER: 50.000%
[java] Sentences: 2 Matches: 1 SentenceAcc: 50.000%
[java] This Time Audio: 1.10s Proc: 0.04s Speed: 0.04 X real time
[java] Total Time Audio: 2.10s Proc: 9.72s Speed: 4.63 X real time
[java] Mem Total: 379.75 Mb Free: 152.62 Mb
[java] Used: This: 227.13 Mb Avg: 221.31 Mb Max: 227.13 Mb
Keith
I suggest that the "one tenth of a second of zero waveform" may be a problem. In Sphinx2 and Sphinx3, the features are cepstra, which involves taking the log of the power spectrum. If you feed it frames that are all zero, it causes overflow errors in the log computation. This not only messes up those frames, but it writes very large numbers into the cepstral mean used for normalization, which will mess up subsequent speech frames as well.
*I do not know* whether the Sphinx4 front end has a similar vulnerability to all-zero signals, but it may. Try splicing some "actual silence" in the front instead of artificial silence.
cheers,
jerry wolf
soliloquy learning, inc.