Menu

My application dont recognize anything

Help
Omarmex
2005-06-11
2012-09-22
  • Omarmex

    Omarmex - 2005-06-11

    HI ,

    I finally can to train the acustic model, but I have a problem with the aplication in Sphinx 4, this are my settings and my files for the Sphinx Train, and for the Sphinx 4:

    I record about 3000 utterances of:

    HOLA AMIGO COMO ESTAS ADIOS

    and I run since 02 until 07 script, I have a small phone list:

    A
    E
    I
    O
    L
    M
    C
    S
    T
    D
    SIL

    And my dicctioany is too small:
    HOLA O L A
    AMIGO A M I G O
    COMO C O M O
    ESTAS E S T A S
    ADIOS A D I O S

    My filler dictionary is:

    <s> SIL
    </s> SIL
    SIL SIL

    When I finished to train I dont make a language model, only make a grammar, very same with grammar for TDIGITS example:

    grammar cincopalabras;

    public <palabras> (HOLA | AMIGO | COMO | ESTAS | ADIOS)*;

    In addition I make a .java for my aplicattion and modify the demo.xml and then build my aplication .jar file, but when I run this not recognize anithing, only show:

    Por favor comience a hablar:
    Usted dijo:

    Usted dijo:

    Can anybody tell me whats can be wrong (acustic model or grammar for language model or aplication java file or config file)?

    I check all of this, but I dont find a mistake, when I compile dont show me any error.

    Omar

     
    • Anonymous

      Anonymous - 2005-06-11

      Omar -- you have given us important information, but not enough! It's like saying "My automobile won't start. Can anyone tell me why?" We need more information in order to be able to help you.

      1. Is your microphone connected correctly? Can you run the standard live Sphinx-4 demos?

      2. You said you have made your own .java file for your application. Perhaps you made a mistake in it. Show it to us.

      3. Show us the Sphinx-4 configuration file.

      4. You should add some instrumentation to your configuration file to display more detail about what your application is doing. See "Understanding Sphinx-4 Instrumentation" in the Sphinx-4 home page. You should be able to find out the problem with the resulting information, but if not, then show us what it prints when you run the application.

      5. Are all of your 3000 training utterances exactly the same text? If so, then some of the beginning- and end-of-word triphones that you need for your grammar and application are not trained. Sphinx-4 will substitute other triphones or uniphones in those cases, but the result will not be optimal. However, I don't think this is the reason for your no-recognitions.

      cheers,
      jerry

       
    • Omarmex

      Omarmex - 2005-06-17

      Hi,

      So, I check and the microphone is connected correctly, I can run the Demos and this recognize my speech, but my aplicattion not.

      I make my own .java only modifying the .java for the TDIGITS application java file this is the java file:

      /
      prueba para cinco palabras
      /

      package demo.sphinx.cincopalabras;

      import edu.cmu.sphinx.frontend.util.Microphone;
      import edu.cmu.sphinx.recognizer.Recognizer;
      import edu.cmu.sphinx.result.Result;
      import edu.cmu.sphinx.util.props.ConfigurationManager;
      import edu.cmu.sphinx.util.props.PropertyException;

      import java.io.File;
      import java.io.IOException;
      import java.net.URL;

      public class cincopalabras {

      public static void main(String[] args) {
          try {
              URL url;
              if (args.length &gt; 0) {
                  url = new File(args[0]).toURI().toURL();
              } else {
                  url = cincopalabras.class.getResource(&quot;cincopalabras.config.xml&quot;);
              }
      
              ConfigurationManager cm = new ConfigurationManager(url);
      
          Recognizer recognizer = (Recognizer) cm.lookup(&quot;recognizer&quot;);
          Microphone microphone = (Microphone) cm.lookup(&quot;microphone&quot;);
      
              /* allocate the resource necessary for the recognizer */
              recognizer.allocate();
      
              /* the microphone will keep recording until the program exits */
          if (microphone.startRecording()) {
      
          System.out.println
              (&quot;Diga alguna de las palabras hola amigo como estas adios: &quot;);
      
          while (true) {
              System.out.println
              (&quot;Comience a hablar. Presione Ctrl-C para salir.\n&quot;);
      
                      /*
                       * This method will return when the end of speech
                       * is reached. Note that the endpointer will determine
                       * the end of speech.
                       */ 
              Result result = recognizer.recognize();
      
              if (result != null) {
              String resultText = result.getBestResultNoFiller();
              System.out.println(&quot;Usted dijo: &quot; + resultText + &quot;\n&quot;);
              } else {
              System.out.println(&quot;No puedo escuchar lo que usted dijo.\n&quot;);
              }
          }
          } else {
          System.out.println(&quot;No se pudo iniciar el microfono.&quot;);
          recognizer.deallocate();
          System.exit(1);
          }
          } catch (IOException e) {
              System.err.println(&quot;Problemas cuando se cargo cincopalabras: &quot; + e);
              e.printStackTrace();
          } catch (PropertyException e) {
              System.err.println(&quot;Problemas configurando cincopalabras: &quot; + e);
              e.printStackTrace();
          } catch (InstantiationException e) {
              System.err.println(&quot;Problemas creando cincopalabras: &quot; + e);
              e.printStackTrace();
          }
      }
      

      }

      This is the configuration file:

      <?xml version="1.0" encoding="UTF-8"?>

      <!--
      Sphinx-4 Configuration file
      -->

      <!-- ******** -->
      <!-- an4 configuration file -->
      <!-- ******** -->

      <config>

      &lt;!-- ******************************************************** --&gt;
      &lt;!-- frequently tuned properties                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;property name=&quot;logLevel&quot; value=&quot;WARNING&quot;/&gt;
      
      &lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;-1&quot;/&gt;
      &lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-80&quot;/&gt;
      &lt;property name=&quot;wordInsertionProbability&quot; value=&quot;1E-36&quot;/&gt;
      &lt;property name=&quot;languageWeight&quot;     value=&quot;8&quot;/&gt;
      
      &lt;property name=&quot;frontend&quot; value=&quot;epFrontEnd&quot;/&gt;
      &lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
      &lt;property name=&quot;showCreations&quot; value=&quot;false&quot;/&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- word recognizer configuration                            --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;recognizer&quot; type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
          &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
          &lt;propertylist name=&quot;monitors&quot;&gt;
              &lt;item&gt;accuracyTracker &lt;/item&gt;
              &lt;item&gt;speedTracker &lt;/item&gt;
              &lt;item&gt;memoryTracker &lt;/item&gt;
          &lt;/propertylist&gt;
      

      </component>

      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Decoder   configuration                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
          &lt;property name=&quot;searchManager&quot; value=&quot;searchManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;searchManager&quot; 
          type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;linguist&quot; value=&quot;flatLinguist&quot;/&gt;
          &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
          &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
          &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;activeList&quot; 
               type=&quot;edu.cmu.sphinx.decoder.search.PartitionActiveListFactory&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
          &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;trivialPruner&quot; 
                  type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;
      
      &lt;component name=&quot;threadedScorer&quot; 
                  type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
          &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
          &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
          &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
          &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
          &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The linguist  configuration                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;flatLinguist&quot; 
                  type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;grammar&quot; value=&quot;jsgfGrammar&quot;/&gt;
          &lt;property name=&quot;acousticModel&quot; value=&quot;cincopalabras&quot;/&gt;
          &lt;property name=&quot;wordInsertionProbability&quot; 
                  value=&quot;${wordInsertionProbability}&quot;/&gt;
          &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Grammar  configuration                               --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;jsgfGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
          &lt;property name=&quot;grammarLocation&quot; 
               value=&quot;resource:/demo.sphinx.cincopalabras.cincopalabras!/demo/sphinx/cincopalabras/&quot;/&gt;
          &lt;property name=&quot;grammarName&quot; value=&quot;cincopalabras&quot;/&gt;
      &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Dictionary configuration                            --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;dictionary&quot; 
          type=&quot;edu.cmu.sphinx.linguist.dictionary.FastDictionary&quot;&gt;
          &lt;property name=&quot;dictionaryPath&quot; 
       value=&quot;resource:/edu.cmu.sphinx.model.acoustic.cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz/dic/espmexacmo.dic&quot;/&gt;
          &lt;property name=&quot;fillerPath&quot; 
       value=&quot;resource:/edu.cmu.sphinx.model.acoustic.cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz/dic/espmexacmo.filler&quot;/&gt;
          &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
          &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The acoustic model configuration                         --&gt;
      &lt;!-- ******************************************************** --&gt;
      &lt;component name=&quot;cincopalabras&quot; 
        type=&quot;edu.cmu.sphinx.model.acoustic.cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
          &lt;property name=&quot;loader&quot; value=&quot;sphinx3Loader&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;sphinx3Loader&quot;
                 type=&quot;edu.cmu.sphinx.model.acoustic.cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The unit manager configuration                           --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;unitManager&quot; 
          type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The live frontend configuration                          --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;frontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
          &lt;propertylist name=&quot;pipeline&quot;&gt;
              &lt;item&gt;microphone &lt;/item&gt;
              &lt;item&gt;premphasizer &lt;/item&gt;
              &lt;item&gt;windower &lt;/item&gt;
              &lt;item&gt;fft &lt;/item&gt;
              &lt;item&gt;melFilterBank &lt;/item&gt;
              &lt;item&gt;dct &lt;/item&gt;
              &lt;item&gt;liveCMN &lt;/item&gt;
              &lt;item&gt;featureExtraction &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The live frontend configuration                          --&gt;
      &lt;!-- ******************************************************** --&gt;
      &lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
          &lt;propertylist name=&quot;pipeline&quot;&gt;
              &lt;item&gt;microphone &lt;/item&gt;
              &lt;item&gt;speechClassifier &lt;/item&gt;
              &lt;item&gt;speechMarker &lt;/item&gt;
              &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
              &lt;item&gt;premphasizer &lt;/item&gt;
              &lt;item&gt;windower &lt;/item&gt;
              &lt;item&gt;fft &lt;/item&gt;
              &lt;item&gt;melFilterBank &lt;/item&gt;
              &lt;item&gt;dct &lt;/item&gt;
              &lt;item&gt;liveCMN &lt;/item&gt;
              &lt;item&gt;featureExtraction &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The frontend pipelines                                   --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;speechClassifier&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
          &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;nonSpeechDataFilter&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;
      
      &lt;component name=&quot;speechMarker&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
          &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;premphasizer&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
      
      &lt;component name=&quot;windower&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;fft&quot; 
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;melFilterBank&quot; 
          type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;dct&quot; 
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
      
      &lt;component name=&quot;liveCMN&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;
      
      &lt;component name=&quot;featureExtraction&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
      
      &lt;component name=&quot;microphone&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
          &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************* --&gt;
      &lt;!--  monitors                                               --&gt;
      &lt;!-- ******************************************************* --&gt;
      
      &lt;component name=&quot;accuracyTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.AccuracyTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
          &lt;property name=&quot;showAlignedResults&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;showRawResults&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;memoryTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
      &lt;property name=&quot;showSummary&quot; value=&quot;false&quot;/&gt;
      &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;speedTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
          &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
      &lt;property name=&quot;showSummary&quot; value=&quot;true&quot;/&gt;
      &lt;property name=&quot;showDetails&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************* --&gt;
      &lt;!--  Miscellaneous components                               --&gt;
      &lt;!-- ******************************************************* --&gt;
      
      &lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
          &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
          &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      

      </config>

      And all the 3000 training utterances have the same text : HOLA AMIGO COMO ESTAS ADIOS, I need other type of text?

      I try to change the phone list for a unic sets of phones for word, this means that:

      HOLA O_hola L_hola A_hola
      AMIGO A_amigo M_amigo I_amigo Gamigo Oamigo
      COMO K_como O_como M_como O_como_2
      ESTAS E_estas S_estas T_estas A_estas S_estas_2
      ADIOS A_adios D_adios I_adios S_adios

      And train again the acustic model wtih this change (and in the dictionary) but I my application dont recognize nothing.

      So, I have another question, in the model.props file for the acoustic model exist the option featureType and vectorLength and I read in the manual that: cepstra_delta_doubledelta and 39 are the common values for this parameters, but I extract features with cepstra this a problem?

      My model.props are:

      description = cincopalabras acoustic models

      modelClass = edu.cmu.sphinx.model.acoustic.cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
      modelLoader = edu.cmu.sphinx.model.acoustic.cincopalabras_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader
      dataLocation = espmexacmo_cd_cont_500_8
      modelDefinition = etc/espmexacmo.500.mdef

      isBinary = true
      featureType = cepstra_delta_doubledelta
      vectorLength = 39
      sparseForm = false

      numberFftPoints = 512
      filters = 40
      gaussians = 8
      maxFreq = 6800
      minFreq. = 130
      sampleRate = 16000

      omar

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.