CMU Sphinx / Forums / Help: How to start

Hello,

I have been a for days reading documentation. I am completely newbie on speech recognition and now I have some knowledge about how Sphinx 4 and speech recognition works.

I want to create a dictionary recognition program - that means, you say one word, system just recognize it.

The problem is that I am short of time. I have read a lot I am not sure where start. I would really appreciate if someone can guide me a little bit to start creating that program - What example may I use as base, what documentation should I read to customize it for that purpose.

Thanks in advance,

Ruben - 2009-08-04

I am making some progress, but I am not sure if this is the best way to do this.

I use as config base hellongram.config.xml, modified SimpleNGrammar by large.LargeTrigramModel and using models/language/wsj/wsj5kc.Z.DMP

All of other values are default ones from hellongram. Could anyone please say me how can I make some better speech recognition ? Normally the sound is going to be one word, can I use this to make recognition better ?

Thanks in advance,
Ruben Rubio Rey

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nate - 2009-07-31

How comprehensive is this dictionary intended to be?

Fundamentally, you need an acoustic model, a language model (or grammar for simple applications, hence my complexity question), and a dictionary file that maps each word represented in the language model/grammar to its set of acoustic phonemes.

Probably the best thing to do is to check out the <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/README.html">Hello World!</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/README.html">Hello NGram</a> demos in the /apps folder of your sphinx download. These demos recognize phrases contained in their grammars and echo them back. For any of the demo programs you play with, pay special attention to their config.xml files, which specify which models are used, the dictionary, etc.

Also make sure you read the <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html">Programmer's Guide</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html">FAQ</a>.

Hopefully this helps. I can't and shouldn't really provide anything more concrete since I also am a sphinx noob. :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nate - 2009-07-31

Apparently those HTML tags were not required. >.<

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Thaks for you response !!! Finally I am making some progress.

I am working with voicedict http://personales.ya.com/javiercl/voicedict/index.html

I have two problems, both related.

In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".

Next lines are the config.xml. Thanks in advance!

<?xml version="1.0" encoding="UTF-8"?>

&lt;!-- ******************************************************** --&gt;
&lt;!-- frequently tuned properties                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;

&lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;-1&quot;/&gt;
&lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-80&quot;/&gt;
&lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
&lt;property name=&quot;languageWeight&quot;     value=&quot;8&quot;/&gt;

&lt;property name=&quot;frontend&quot; value=&quot;epFrontEnd&quot;/&gt;
&lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
&lt;property name=&quot;showCreations&quot; value=&quot;false&quot;/&gt;

&lt;component name=&quot;activeList&quot;
         type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;trivialPruner&quot;
            type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;

&lt;component name=&quot;threadedScorer&quot;
            type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
    &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
    &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- word recognizer configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;lettersRecognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
    &lt;property name=&quot;decoder&quot; value=&quot;lettersDecoder&quot;/&gt;
    &lt;propertylist name=&quot;monitors&quot;&gt;
        &lt;!--&lt;item&gt;accuracyTracker &lt;/item&gt;--&gt;
        &lt;item&gt;speedTracker &lt;/item&gt;
        &lt;item&gt;memoryTracker &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;component name=&quot;wordsRecognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
    &lt;property name=&quot;decoder&quot; value=&quot;wordsDecoder&quot;/&gt;
    &lt;propertylist name=&quot;monitors&quot;&gt;
        &lt;!--&lt;item&gt;accuracyTracker &lt;/item&gt;--&gt;
        &lt;item&gt;speedTracker &lt;/item&gt;
        &lt;item&gt;memoryTracker &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;component name=&quot;wordListRecognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
    &lt;property name=&quot;decoder&quot; value=&quot;wordListDecoder&quot;/&gt;
    &lt;propertylist name=&quot;monitors&quot;&gt;
        &lt;item&gt;accuracyTracker &lt;/item&gt;
        &lt;item&gt;speedTracker &lt;/item&gt;
        &lt;item&gt;memoryTracker &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The Decoder   configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;lettersDecoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;lettersSearchManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;lettersSearchManager&quot;
    type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;linguist&quot; value=&quot;lettersLinguist&quot;/&gt;
    &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
    &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
    &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordsDecoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;wordsSearchManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordsSearchManager&quot;
    type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;linguist&quot; value=&quot;wordsLinguist&quot;/&gt;
    &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
    &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
    &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordListDecoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;wordListSearchManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordListSearchManager&quot;
    type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;linguist&quot; value=&quot;wordListLinguist&quot;/&gt;
    &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
    &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
    &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The linguist  configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;lettersLinguist&quot;
            type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;grammar&quot; value=&quot;lettersGrammar&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; 
            value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordsLinguist&quot;
            type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;grammar&quot; value=&quot;wordsGrammar&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot;
            value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordListLinguist&quot;
            type=&quot;edu.cmu.sphinx.linguist.lextree.LexTreeLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;languageModel&quot; value=&quot;ngramLanguageModel&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;wsj&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot;
            value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Language Model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;ngramLanguageModel&quot;
            type=&quot;edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel&quot;&gt;
    &lt;property name=&quot;location&quot;
            value=&quot;resource:/edu.cmu.sphinx.voicedict.VoiceDict!/edu/cmu/sphinx/voicedict/voicedict_unigram.lm&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
    &lt;property name=&quot;maxDepth&quot; value=&quot;1&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Grammar  configuration                               --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;lettersGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryLetters&quot;/&gt;
    &lt;property name=&quot;grammarLocation&quot; 
         value=&quot;resource:/edu.cmu.sphinx.voicedict.VoiceDict!/edu/cmu/sphinx/voicedict/&quot;/&gt;
    &lt;property name=&quot;grammarName&quot; value=&quot;voicedictLetters&quot;/&gt;
&lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordsGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
    &lt;property name=&quot;grammarLocation&quot;
         value=&quot;resource:/edu.cmu.sphinx.voicedict.VoiceDict!/edu/cmu/sphinx/voicedict/&quot;/&gt;
    &lt;property name=&quot;grammarName&quot; value=&quot;voicedictSentences&quot;/&gt;
&lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wordListGrammar&quot;
        type=&quot;edu.cmu.sphinx.linguist.language.grammar.SimpleWordListGrammar&quot;&gt;
    &lt;property name=&quot;path&quot;
        value=&quot;voicedict.wordlist&quot;/&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionaryWords&quot;/&gt;
    &lt;property name=&quot;isLooping&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The Dictionary configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;dictionaryLetters&quot;
    type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
    &lt;property name=&quot;dictionaryPath&quot;
 value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/alpha.dict&quot;/&gt;
    &lt;property name=&quot;fillerPath&quot; 
 value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
    &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;dictionaryWords&quot;
    type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
    &lt;property name=&quot;dictionaryPath&quot;
 value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d&quot;/&gt;
    &lt;property name=&quot;fillerPath&quot;
 value=&quot;resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict&quot;/&gt;
    &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;allowMissingWords&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;wsj&quot; 
  type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;wsjLoader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;wsjLoader&quot;
           type=&quot;edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The unit manager configuration                           --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;unitManager&quot; 
    type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend configuration                               --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;frontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;microphone &lt;/item&gt;
        &lt;item&gt;premphasizer &lt;/item&gt;
        &lt;item&gt;windower &lt;/item&gt;
        &lt;item&gt;fft &lt;/item&gt;
        &lt;item&gt;melFilterBank &lt;/item&gt;
        &lt;item&gt;dct &lt;/item&gt;
        &lt;item&gt;liveCMN &lt;/item&gt;
        &lt;item&gt;featureExtraction &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The live frontend configuration                          --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;microphone &lt;/item&gt;
        &lt;item&gt;speechClassifier &lt;/item&gt;
        &lt;item&gt;speechMarker &lt;/item&gt;
        &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
        &lt;item&gt;premphasizer &lt;/item&gt;
        &lt;item&gt;windower &lt;/item&gt;
        &lt;item&gt;fft &lt;/item&gt;
        &lt;item&gt;melFilterBank &lt;/item&gt;
        &lt;item&gt;dct &lt;/item&gt;
        &lt;item&gt;liveCMN &lt;/item&gt;
        &lt;item&gt;featureExtraction &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend pipelines                                   --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;speechClassifier&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
    &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;nonSpeechDataFilter&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;

&lt;component name=&quot;speechMarker&quot; 
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
    &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
&lt;/component&gt;


&lt;component name=&quot;premphasizer&quot; 
           type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;

&lt;component name=&quot;windower&quot; 
           type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;fft&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;melFilterBank&quot; 
    type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;dct&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;

&lt;component name=&quot;liveCMN&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;

&lt;component name=&quot;featureExtraction&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;

&lt;component name=&quot;microphone&quot; 
           type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
    &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************* --&gt;
&lt;!--  monitors                                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;!--- Does not work compile
&lt;component name=&quot;accuracyTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.AccuracyTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;lettersRecognizer&quot;/&gt;
    &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;
--&gt;

&lt;component name=&quot;accuracyTracker&quot;
            type=&quot;edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
    &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;memoryTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;lettersRecognizer&quot;/&gt;
&lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
&lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speedTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;lettersRecognizer&quot;/&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
&lt;property name=&quot;showSummary&quot; value=&quot;true&quot;/&gt;
&lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************* --&gt;
&lt;!--  Miscellaneous components                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
    &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
    &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

</config>

Nickolay V. Shmyrev - 2009-08-01

> In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?
> VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".

When you report about problems always try to provide:

Versions of the software you are using.

Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.

Describe the expected results

Describe the results you want to get.

In particular, sphinx4-beta2 doesn't work with a microphone. If you are using nightly build it should be better.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ruben - 2009-08-02

> When you report about problems always try to provide:

Sorry!. You have the reason. I ll try to explain myself better.

> In general, the test does not work with my voice. Microphone seems to be working, but somehow seems that does not work. It the beginning I though that was because I am spanish, but after try with an native English I realize that the problem is another, Could you help me to find it?

> 1. Versions of the software you are using:
sphinx4-1.0beta2
> 2. Ways to reproduce your problem, test case for example.

If you run any test, for example "HelloNGram.jar". Speak using the microphone. I am using two, the laptop built-in microphone and an external microphone that you can attach.

> 3. Describe the expected results

If speaking sentences that are in http://cmusphinx.sourceforge.net/sphinx4/src/apps/edu/cmu/sphinx/demo/hellongram/hellongram.test file, system should recognize the sentences in a % of the cases.

> 4. Describe the results you want to get.

System recognize sentences in almost 0% of the cases. I tried with three people in two microphones, one of them native English speaker. The most of cases it match any of the words of the sentence (it use to be some output) but never the complete phrase.

> VoiceDict does not work a bit. There are three recognizer, two of them work "more or less" and the third one, the one which I am more interested does not work. It is called "wordListRecognizer".

I am using voicedict as an code base because it is very similar at we needto do.

> 1. Versions of the software you are using.
sphinx4-1.0beta2 and voicedict that you can download from here: http://personales.ya.com/javiercl/voicedict/index.html

I had to modify a bit the java source code, due it was not compiling, but with very small problems. And I also modified it to just use the wordListRecognizer recognizer. (As system does not recognize me very well I cannot use the original "whats the meaning of" phrase.

> 2. Ways to reproduce your problem, test case for example. The description "it doesn't work for me" never allows you help you.

Program has three recognizers. The interesting one is wordListRecognizer that is which can recognize any word in the dictionary.

> 3. Describe the expected results
When you speak, you would like the system to recognize your voice. If say any word, that word must appear.

> 4. Describe the results you want to get.
Always empty string.

I really don't know how to fix any of the problems. Any king of help will be appreciated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-08-02
  
  > sphinx4-1.0beta2
  
  So start with the upgrade to the nightly build/svn trunk
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-08-02
  
  > sphinx4-1.0beta2
  
  So start with the upgrade to the nightly build/svn trunk
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-02

> So start with the upgrade to the nightly build/svn trunk

So sweet! The demos works now!

Only the second problem left.

The main goal is to have a system that is able to understand any word in a big dictionary. To do so, I found a similar program called VoiceDict.

> 1. Versions of the software you are using:
svn code from a few hours ago

> 2. Ways to reproduce your problem
Download VoiceDict from http://personales.ya.com/javiercl/voicedict/index.html
If you execute jar, you will have this error:

java -jar VoiceDict.jar
main(); Couldn't start the recognizers.Property Exception component:'accuracyTracker' property:'null' - Can't instantiate class class edu.cmu.sphinx.instrumentation.AccuracyTracker
edu.cmu.sphinx.util.props.InternalConfigurationException: java.lang.InstantiationException
Exception in thread "main" java.lang.NullPointerException
at demo.sphinx.voicedict.VoiceDict.main(VoiceDict.java:155)

To go over it, comment  from monitors in the recognizers at voicedict.config.xml

Execute the program, and try to get working the recognizer "wordListRecognizer" (you have to say "what's the meaning of" and then after a beep the word)

> 3. Describe the expected results
The program using recognizer "wordListRecognizer" should be able to understand the word in a big dictionary.

> 4. Describe the results you want to get.
What I get is always an empty string.

Thanks in advance!!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-08-02
  
  To be honest the way voice-dictionary works makes me thing it will never produce reliable results. Is your goal to make it work or to implement some other thing?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-03

My goal is make dictionary work. I need to input some sound files, and output the text. Each sound file will be one word.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-08-03
  
  The accuracy will be 60% max on a 60k words, is it ok?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-03

But anything else (with the demo examples and JavaDocs) I think I ll have no problem, but the dictionary recognition I don't know yet how to do it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-03

Should be enough (as higher, as better, but enough). Could you say me some details about how to do that ?

This program will be for a telephone application. is maybe possible to "transform" the original sound file before process it to get better results?

Thanks in advance!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- eliasmajic - 2009-08-03
  
  So you want to write a telephone app that just uses simple grammars. You do not need sphinx. Google for voicexml or twilio. Both do exactly what you want from what I can tell.
  
  If you still choose to use sphinx, you will need a pbx. A simple asterisk setup w/ sphinx can be found by searching for scribblej sphinx. If you plan on larger scale stuff, you will want some sort of MRCP such as Zanzibar that uses sphinx4.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-03

It is likely we may also try with the "Medium (1000 words - RM1)", we ll get better results, but we have to study if is enough. Anyway, if you could provide us any kind of way to try, we ll change between both dictionaries and test.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-03

It is not so complex, not pbx, not asteriisk. It is a custom application. It will only process wav files, and get the content. It is gonna be applied on telephone applications but the implementation it is clear. We only need the dictionary recognition. I ll have a look to voicexml.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ruben - 2009-08-03

eliasmajic I really do not think we need voiceXML. The dictionary recognition is perfect for our purposes, feeding the application with a sound file, and returning the text.

I just need some guidance to achieve this goal.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to start - dictionary recognition

Speech Recognition Toolkit

Forums

Help

How to start - dictionary recognition

How to start - dictionary recognition

Speech Recognition Toolkit

Forums

Help

How to start - dictionary recognition document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to start - dictionary recognition