Menu

Only recognizes the original trained wav file

Help
gustavobap
2007-06-20
2012-09-22
  • gustavobap

    gustavobap - 2007-06-20

    Hy, I'm trying to convert audio from a wav file to text using Sphinx4. I've built a very small database for testing, and adapted the WavFile demo to do the task. When I decode the same files used in training every thing goes right, but if I record the same word in another file, using the same method I did for the training files, the result is always <SIL>.
    How can I solve this ? Thank you.
    Gustavo

     
    • Nickolay V. Shmyrev

      "very small" database for testing will not work. Usually test set is 10% of training data. Is it your case?

      If you need more details, paste complete log.

       
    • gustavobap

      gustavobap - 2007-06-21

      Hello again Nickolay ^^, thanks for answering.

      The database consists of only 4 words, my program don't need to write big speechs, it will execute some tasks, associated with the spoken words.
      You mean the program output log ? The output is the audio file atributes and the text decoded, as in the WavFile demo.

      WITH THE ORIGINAL FILE USED FOR TRAINING--------------------

      Loading Recognizer...

      Decoding /C:/Documents/TesteSphinx2/src/teste.wav
      WAVE (.wav) file, byte length: 40048, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 20002

      RESULT: teste


      WITH THE NEW FILE, WITH THE SAME WORD RECORDED--------------

      Loading Recognizer...

      Decoding /C:/Documents/TesteSphinx2/src/teste2.wav
      WAVE (.wav) file, byte length: 48058, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 24000

      RESULT:


      THIS IS MY CONFIGURATION FILE-------------------------------

      <?xml version="1.0" encoding="UTF-8"?>

      <!--
      Sphinx-4 Configuration file
      -->

      <!-- ******** -->
      <!-- g_teste configuration file -->
      <!-- ******** -->

      <config>

      &lt;!-- ******************************************************** --&gt;
      &lt;!-- frequently tuned properties                              --&gt;
      &lt;!-- ******************************************************** --&gt; 
      &lt;property name=&quot;logLevel&quot;                    value=&quot;WARNING&quot;/&gt;
      
      &lt;property name=&quot;absoluteBeamWidth&quot;           value=&quot;-1&quot;/&gt;
      &lt;property name=&quot;relativeBeamWidth&quot;           value=&quot;1E-80&quot;/&gt;
      &lt;property name=&quot;wordInsertionProbability&quot;    value=&quot;1E-36&quot;/&gt;
      &lt;property name=&quot;languageWeight&quot;              value=&quot;8&quot;/&gt;
      
      &lt;property name=&quot;frontend&quot;   value=&quot;mfcFrontEnd&quot;/&gt;
      &lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The connectedDigitsRecognizer configuration               --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;recognizer&quot; 
                 type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
          &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
          &lt;propertylist name=&quot;monitors&quot;&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Decoder   configuration                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
          &lt;property name=&quot;searchManager&quot; value=&quot;searchManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;searchManager&quot; 
          type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;linguist&quot; value=&quot;flatLinguist&quot;/&gt;
          &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
          &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
          &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;activeList&quot; 
               type=&quot;edu.cmu.sphinx.decoder.search.SortingActiveListFactory&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
          &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;trivialPruner&quot; 
                  type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;
      
      &lt;component name=&quot;threadedScorer&quot; 
                  type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
          &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
          &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
          &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
          &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
          &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The linguist  configuration                              --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;flatLinguist&quot; 
                  type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
          &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;grammar&quot; value=&quot;jsgfGrammar&quot;/&gt;
          &lt;property name=&quot;acousticModel&quot; value=&quot;g_teste&quot;/&gt;
          &lt;property name=&quot;wordInsertionProbability&quot; 
                  value=&quot;${wordInsertionProbability}&quot;/&gt;
          &lt;property name=&quot;silenceInsertionProbability&quot; 
                  value=&quot;${silenceInsertionProbability}&quot;/&gt;
          &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Grammar  configuration                               --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;jsgfGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
          &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
          &lt;property name=&quot;grammarLocation&quot; 
               value=&quot;src/&quot;/&gt;
          &lt;property name=&quot;grammarName&quot; value=&quot;g_teste_grammar&quot;/&gt;
      &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The Dictionary configuration                            --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;dictionary&quot; 
          type=&quot;edu.cmu.sphinx.linguist.dictionary.FullDictionary&quot;&gt;
          &lt;property name=&quot;dictionaryPath&quot;
          value=&quot;resource:/edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/g_teste_adb.dic&quot;/&gt;
      &lt;property name=&quot;fillerPath&quot;
          value=&quot;resource:/edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/g_teste_adb.filler&quot;/&gt;
          &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
      &lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
          &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
      &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The acoustic model configuration                         --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;g_teste&quot; 
        type=&quot;edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
          &lt;property name=&quot;loader&quot; value=&quot;sphinx3Loader&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;sphinx3Loader&quot;
             type=&quot;edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
      &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
          &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The unit manager configuration                           --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;unitManager&quot; 
          type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The frontend configuration                               --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;mfcFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
          &lt;propertylist name=&quot;pipeline&quot;&gt;
              &lt;item&gt;streamDataSource&lt;/item&gt;
              &lt;item&gt;premphasizer&lt;/item&gt;
              &lt;item&gt;windower&lt;/item&gt;
              &lt;item&gt;fft&lt;/item&gt;
              &lt;item&gt;melFilterBank&lt;/item&gt;
              &lt;item&gt;dct&lt;/item&gt;
              &lt;item&gt;batchCMN&lt;/item&gt;
              &lt;item&gt;featureExtraction&lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;streamDataSource&quot; 
                  type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
          &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
      &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
      &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
      &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;premphasizer&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
      
      &lt;component name=&quot;windower&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;/&gt;
      
      &lt;component name=&quot;fft&quot; 
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;/&gt;
      
      &lt;component name=&quot;melFilterBank&quot; 
            type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;/&gt;
      
      &lt;component name=&quot;dct&quot; 
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
      
      &lt;component name=&quot;batchCMN&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.feature.BatchCMN&quot;/&gt;
      
      &lt;component name=&quot;featureExtraction&quot; 
                 type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
      
      &lt;!-- ******************************************************* --&gt;
      &lt;!--  monitors                                               --&gt;
      &lt;!-- ******************************************************* --&gt;
      
      &lt;component name=&quot;memoryTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;speedTracker&quot; 
                  type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
          &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
          &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
          &lt;property name=&quot;showTimers&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************* --&gt;
      &lt;!--  Miscellaneous components                               --&gt;
      &lt;!-- ******************************************************* --&gt;
      
      &lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
          &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
          &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
      &lt;/component&gt;
      

      </config>


      THIS IS MY GRAMMAR-------------------------------------------

      JSGF V1.0;

      grammar g_teste_grammar;

      public <teste> = (alo | lontras | teste | focas) * ;

       
      • Nickolay V. Shmyrev

        Can you please paste prompts you used for recording and a dictionary. It's recommended to have at least 10 examples of each word in a dictionary so your prompts should have at least 40 words. Is it true?

        Another way to proceed - iterative process. Once your test word is not recognized, add it to training set and record another test word. Such process should converge to working system.

        I also wonder why absolute beam is -1:

        <property name="absoluteBeamWidth" value="-1"/>

         
        • Robbie

          Robbie - 2007-06-22

          AbsoluteBeamWidth of -1 means that no absolute beam width is used. I frequently use this setting and rely solely on the RelativeBeamWidth--so do many of the example configurations in the code base.

           
    • gustavobap

      gustavobap - 2007-06-22

      Thanks a lot Nickolay.
      I had only 1 wav file for each word, now I use 10 and the results are much better, though not perfect. How many different voices and wave files per voice do I need to use, in order to make the recognition "speaker independent" ? Do I need to use male and female voices ?

      I don't know what value AbsoluteBeamWidth should have, I use the same value as the demos, as Robbie said. What AbsoluteBeamWidth and RelativeBeamWidth are used for?

      I have one more problem now (looks like they will never end ^^), the recognition still returning <SIL> or erroneous results if I use the microphone. I guess my microphone output SampleRate is not 16KHz as it should be, do you know how to change the microphone properties in Windows XP?
      And the result shouldn't be always a word in the dicionary ?

      Nickolay, if you need more information to discover what is happening would be possible for us to talk through MSN ? This would speed things up. If you can add me, my MSN e-mail is [gustavobap#yahoo.com.br].

      The dictionary uses english phonems to simulate the portuguese ones,
      since there is no portuguese language model available.
      THIS IS THE DICTIONARY=============================

      ALO AH L OW
      FOCAS F AA K AH S
      LONTRAS L OW N T R AH S
      TESTE T EH S T EH

      ===================================================

      THIS IS THE FILLER DICTIONARY======================

      <s> SIL
      </s> SIL
      <sil> SIL

      ==================================================

      THIS IS THE TRANSCRIPTION=========================

      <s> ALO </s> (alo0)
      <s> ALO </s> (alo1)
      <s> ALO </s> (alo2)
      <s> ALO </s> (alo3)
      <s> ALO </s> (alo4)
      <s> ALO </s> (alo5)
      <s> ALO </s> (alo6)
      <s> ALO </s> (alo7)
      <s> ALO </s> (alo8)
      <s> ALO </s> (alo9)
      <s> FOCAS </s> (focas0)
      <s> FOCAS </s> (focas1)
      <s> FOCAS </s> (focas2)
      <s> FOCAS </s> (focas3)
      <s> FOCAS </s> (focas4)
      <s> FOCAS </s> (focas5)
      <s> FOCAS </s> (focas6)
      <s> FOCAS </s> (focas7)
      <s> FOCAS </s> (focas8)
      <s> FOCAS </s> (focas9)
      <s> LONTRAS </s> (lontras0)
      <s> LONTRAS </s> (lontras1)
      <s> LONTRAS </s> (lontras2)
      <s> LONTRAS </s> (lontras3)
      <s> LONTRAS </s> (lontras4)
      <s> LONTRAS </s> (lontras5)
      <s> LONTRAS </s> (lontras6)
      <s> LONTRAS </s> (lontras7)
      <s> LONTRAS </s> (lontras8)
      <s> LONTRAS </s> (lontras9)
      <s> TESTE </s> (teste0)
      <s> TESTE </s> (teste1)
      <s> TESTE </s> (teste2)
      <s> TESTE </s> (teste3)
      <s> TESTE </s> (teste4)
      <s> TESTE </s> (teste5)
      <s> TESTE </s> (teste6)
      <s> TESTE </s> (teste7)
      <s> TESTE </s> (teste8)
      <s> TESTE </s> (teste9)

      ==================================================

      THIS IS THE MAIN JAVA CLASS=======================

      package src;

      import java.io.File;
      import java.io.IOException;
      import java.net.URL;

      import javax.sound.sampled.AudioInputStream;
      import javax.sound.sampled.AudioSystem;
      import javax.sound.sampled.UnsupportedAudioFileException;

      import edu.cmu.sphinx.frontend.util.StreamDataSource;
      import edu.cmu.sphinx.recognizer.Recognizer;
      import edu.cmu.sphinx.result.Result;
      import edu.cmu.sphinx.util.props.ConfigurationManager;
      import edu.cmu.sphinx.util.props.PropertyException;

      public class GTeste {

      /**
       * Main method for running the WavFile demo.
       */
      public static void main(String[] args) {
          try {
      
              URL audioFileURL;
      
              if (args.length &gt; 0) {
                  audioFileURL = new File(args[0]).toURI().toURL();
              } else {
                  audioFileURL = GTeste.class.getResource(&quot;testeR.wav&quot;);
              }
      
              URL configURL = GTeste.class.getResource(&quot;/g_teste_config.xml&quot;);
      
              System.out.println(&quot;Loading Recognizer...\n&quot;);
      
              ConfigurationManager cm = new ConfigurationManager(configURL);
      
          Recognizer recognizer = (Recognizer) cm.lookup(&quot;recognizer&quot;);
      
              /* allocate the resource necessary for the recognizer */
              recognizer.allocate();
      
              System.out.println(&quot;Decoding &quot; + audioFileURL.getFile());
              System.out.println(AudioSystem.getAudioFileFormat(audioFileURL));
      
          StreamDataSource reader
                  = (StreamDataSource) cm.lookup(&quot;streamDataSource&quot;);
      
              AudioInputStream ais 
                  = AudioSystem.getAudioInputStream(audioFileURL);
      
              /* set the stream data source to read from the audio file */
              reader.setInputStream(ais, audioFileURL.getFile());
      
              /* decode the audio file */
              Result result = recognizer.recognize();
      
              /* print out the results */
              if (result != null) {
                  System.out.println(&quot;\nRESULT: &quot; + 
                                     result.getBestFinalResultNoFiller() + &quot;\n&quot;);
              } else {
                  System.out.println(&quot;Result: null\n&quot;);
              }
          } catch (IOException e) {
              System.err.println(&quot;Problem when loading WavFile: &quot; + e);
              e.printStackTrace();
          } catch (PropertyException e) {
              System.err.println(&quot;Problem configuring WavFile: &quot; + e);
              e.printStackTrace();
          } catch (InstantiationException e) {
              System.err.println(&quot;Problem creating WavFile: &quot; + e);
              e.printStackTrace();
          } catch (UnsupportedAudioFileException e) {
              System.err.println(&quot;Audio file format not supported: &quot; + e);
              e.printStackTrace();
          }
      }
      

      }

      ===============================================================

       
      • Nickolay V. Shmyrev

        > I had only 1 wav file for each word, now I use 10 and the results are much better, though not perfect. How many different voices and wave files per voice do I need to use, in order to make the recognition "speaker independent" ? Do I need to use male and female voices ?

        Well, I suggest you to look at tidigits database design:

        http://www.ldc.upenn.edu/Catalog/docs/LDC93S10/tidigits.readme.html

        of course you don't need 300 speakers but at least 100 is required for true independence. Another details you can take from tidigits (see sphinx4 example) is the word-oriented dictionary:

        ALO A_1 L_1 O_1
        FOCAS F_2 O_2 K_2 A_2 S_2
        LONTRAS L_3 O_3 N_3 T_3 R_3 A_3 S_3
        TESTE T_4 E_4 S_4 T_4 E_4

        or something like that. Check if such dictionary/phoneset will improve WER.

        > if you need more information to discover what is happening would be possible for us to talk through MSN ?

        Well, it's possible but I suppose we should create irc channel like #cmusphinx on freenode. Although I don't know if people here will support this. Probably it should be discussed in separate thread.

        > I guess my microphone output SampleRate is not 16KHz as it should be, do you know how to change the microphone properties in Windows XP?

        I suppose you should just set microphone properties:

        &lt;component name=&quot;microphone&quot; type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
            &lt;property name=&quot;sampleRate&quot; value=&quot;${sampleRate}/&gt;
        &lt;/component&gt;
        

        But of course it's better to set 16000 rate somehow, not sure if all hardware supports this.

         
    • gustavobap

      gustavobap - 2007-06-22

      Hi again,

      Correcting the last post: when I use the microphone the recognition returns ALWAYS <SIL>, and not erroneous results.

      The configuration file when I'm using the microphone is like the one before, but the property "frontend" is changed
      <property name="frontend" value="epFrontEnd"/>
      and the part
      <!-- ******** -->
      <!-- The frontend configuration -->
      <!-- ******** -->
      is replaced by this:
      ================================================

      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The live frontend configuration                          --&gt;
      &lt;!-- ******************************************************** --&gt;
      &lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
          &lt;propertylist name=&quot;pipeline&quot;&gt;
              &lt;item&gt;microphone &lt;/item&gt;
              &lt;item&gt;dataBlocker &lt;/item&gt;
              &lt;item&gt;speechClassifier &lt;/item&gt;
              &lt;item&gt;speechMarker &lt;/item&gt;
              &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
              &lt;item&gt;premphasizer &lt;/item&gt;
              &lt;item&gt;windower &lt;/item&gt;
              &lt;item&gt;fft &lt;/item&gt;
              &lt;item&gt;melFilterBank &lt;/item&gt;
              &lt;item&gt;dct &lt;/item&gt;
              &lt;item&gt;liveCMN &lt;/item&gt;
              &lt;item&gt;featureExtraction &lt;/item&gt;
          &lt;/propertylist&gt;
      &lt;/component&gt;
      
      &lt;!-- ******************************************************** --&gt;
      &lt;!-- The frontend pipelines                                   --&gt;
      &lt;!-- ******************************************************** --&gt;
      
      &lt;component name=&quot;dataBlocker&quot; type=&quot;edu.cmu.sphinx.frontend.DataBlocker&quot;&gt;
          &lt;!--&lt;property name=&quot;blockSizeMs&quot; value=&quot;10&quot;/&gt;--&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;speechClassifier&quot;
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
          &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;nonSpeechDataFilter&quot;
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;
      
      &lt;component name=&quot;speechMarker&quot;
                 type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
          &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;premphasizer&quot;
                 type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;
      
      &lt;component name=&quot;windower&quot;
                 type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;fft&quot;
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;melFilterBank&quot;
          type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
      &lt;/component&gt;
      
      &lt;component name=&quot;dct&quot;
              type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;
      
      &lt;component name=&quot;liveCMN&quot;
                 type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;
      
      &lt;component name=&quot;featureExtraction&quot;
                 type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;
      
      &lt;component name=&quot;microphone&quot;
                 type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
          &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
      &lt;/component&gt;
      

      =======================================================
      THIS IS THE JAVA MAIN CLASS, USING MICROPHONE==========

      /
      * Copyright 1999-2004 Carnegie Mellon University.
      * Portions Copyright 2004 Sun Microsystems, Inc.
      * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
      * All Rights Reserved. Use is subject to license terms.

      * See the file "license.terms" for information on usage and
      * redistribution of this file, and for a DISCLAIMER OF ALL
      * WARRANTIES.

      /

      package src;

      import edu.cmu.sphinx.frontend.util.Microphone;
      import edu.cmu.sphinx.recognizer.Recognizer;
      import edu.cmu.sphinx.result.Result;
      import edu.cmu.sphinx.util.props.ConfigurationManager;
      import edu.cmu.sphinx.util.props.PropertyException;

      import java.io.File;
      import java.io.IOException;
      import java.net.URL;

      /*
      * A simple HelloWorld demo showing a simple speech application
      * built using Sphinx-4. This application uses the Sphinx-4 endpointer,
      * which automatically segments incoming audio into utterances and silences.
      /
      public class GTeste2 {

      /**
       * Main method for running the HelloWorld demo.
       */
      public static void main(String[] args) {
          try {
              URL url;
              if (args.length &gt; 0) {
                  url = new File(args[0]).toURI().toURL();
              } else {
                  url = GTeste2.class.getResource(&quot;/g_teste_config2.xml&quot;);
              }
      
              System.out.println(&quot;Loading...&quot;);
      
              ConfigurationManager cm = new ConfigurationManager(url);
      
          Recognizer recognizer = (Recognizer) cm.lookup(&quot;recognizer&quot;);
          Microphone microphone = (Microphone) cm.lookup(&quot;microphone&quot;);
      
              /* allocate the resource necessary for the recognizer */
              recognizer.allocate();
      
              /* the microphone will keep recording until the program exits */
          if (microphone.startRecording()) {
      
              System.out.println
              (&quot;Say: (alo | lontras | teste | focas)&quot;);
      
          while (true) {
              System.out.println
              (&quot;Start speaking. Press Ctrl-C to quit.\n&quot;);
      
                      /*
                       * This method will return when the end of speech
                       * is reached. Note that the endpointer will determine
                       * the end of speech.
                       */ 
              Result result = recognizer.recognize();
      
              if (result != null) {
              String resultText = result.getBestFinalResultNoFiller();
              System.out.println(&quot;You said: &quot; + resultText + &quot;\n&quot;);
              } else {
              System.out.println(&quot;I can't hear what you said.\n&quot;);
              }
          }
          } else {
          System.out.println(&quot;Cannot start microphone.&quot;);
          recognizer.deallocate();
          System.exit(1);
              }
          } catch (IOException e) {
              System.err.println(&quot;Problem when loading HelloWorld: &quot; + e);
              e.printStackTrace();
          } catch (PropertyException e) {
              System.err.println(&quot;Problem configuring HelloWorld: &quot; + e);
              e.printStackTrace();
          } catch (InstantiationException e) {
              System.err.println(&quot;Problem creating HelloWorld: &quot; + e);
              e.printStackTrace();
          }
      }
      

      }

      ==========================================================================

       
    • gustavobap

      gustavobap - 2007-06-26

      I discovered what is wrong with the microphone, I've set the properties this way:

      &lt;component name=&quot;microphone&quot; type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
          &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
          &lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
          &lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
          &lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
          &lt;/component&gt;
      

      ====================================================================================

      like it should be, to be used whit a database where files are:

      WAVE (.wav) file, byte length: 32048, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 16002

      the problem is, even with this property set
      <property name="bigEndianData" value="false"/>
      the microphone is initialized like this:
      ================================================================
      PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, big-endian
      ================================================================
      it is BIG-ENDIAN, and my trained files are LITTLE-ENDIAN.

      Do you know how to solve this?

       
    • gustavobap

      gustavobap - 2007-06-26

      Just discovered the property should be set like this:

      <property name="bigEndian" value="false"/>

      The audio format is correct now, but the results still wrong =[.
      The recognizer is taking too long to return, and the recognition
      is <SIL> or erroneous.

      Something weird is happening.. I guess I'm doing something really wrong.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.