Hy, I'm trying to convert audio from a wav file to text using Sphinx4. I've built a very small database for testing, and adapted the WavFile demo to do the task. When I decode the same files used in training every thing goes right, but if I record the same word in another file, using the same method I did for the training files, the result is always <SIL>.
How can I solve this ? Thank you.
Gustavo

Nickolay V. Shmyrev - 2007-06-21

"very small" database for testing will not work. Usually test set is 10% of training data. Is it your case?

If you need more details, paste complete log.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hello again Nickolay ^^, thanks for answering.

The database consists of only 4 words, my program don't need to write big speechs, it will execute some tasks, associated with the spoken words.
You mean the program output log ? The output is the audio file atributes and the text decoded, as in the WavFile demo.

WITH THE ORIGINAL FILE USED FOR TRAINING--------------------

Loading Recognizer...

Decoding /C:/Documents/TesteSphinx2/src/teste.wav
WAVE (.wav) file, byte length: 40048, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 20002

RESULT: teste

WITH THE NEW FILE, WITH THE SAME WORD RECORDED--------------

Loading Recognizer...

Decoding /C:/Documents/TesteSphinx2/src/teste2.wav
WAVE (.wav) file, byte length: 48058, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 24000

RESULT:

THIS IS MY CONFIGURATION FILE-------------------------------

<?xml version="1.0" encoding="UTF-8"?>

&lt;!-- ******************************************************** --&gt;
&lt;!-- frequently tuned properties                              --&gt;
&lt;!-- ******************************************************** --&gt; 
&lt;property name=&quot;logLevel&quot;                    value=&quot;WARNING&quot;/&gt;

&lt;property name=&quot;absoluteBeamWidth&quot;           value=&quot;-1&quot;/&gt;
&lt;property name=&quot;relativeBeamWidth&quot;           value=&quot;1E-80&quot;/&gt;
&lt;property name=&quot;wordInsertionProbability&quot;    value=&quot;1E-36&quot;/&gt;
&lt;property name=&quot;languageWeight&quot;              value=&quot;8&quot;/&gt;

&lt;property name=&quot;frontend&quot;   value=&quot;mfcFrontEnd&quot;/&gt;
&lt;property name=&quot;recognizer&quot; value=&quot;recognizer&quot;/&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The connectedDigitsRecognizer configuration               --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;recognizer&quot; 
           type=&quot;edu.cmu.sphinx.recognizer.Recognizer&quot;&gt;
    &lt;property name=&quot;decoder&quot; value=&quot;decoder&quot;/&gt;
    &lt;propertylist name=&quot;monitors&quot;&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The Decoder   configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;decoder&quot; type=&quot;edu.cmu.sphinx.decoder.Decoder&quot;&gt;
    &lt;property name=&quot;searchManager&quot; value=&quot;searchManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;searchManager&quot; 
    type=&quot;edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;linguist&quot; value=&quot;flatLinguist&quot;/&gt;
    &lt;property name=&quot;pruner&quot; value=&quot;trivialPruner&quot;/&gt;
    &lt;property name=&quot;scorer&quot; value=&quot;threadedScorer&quot;/&gt;
    &lt;property name=&quot;activeListFactory&quot; value=&quot;activeList&quot;/&gt;
&lt;/component&gt;


&lt;component name=&quot;activeList&quot; 
         type=&quot;edu.cmu.sphinx.decoder.search.SortingActiveListFactory&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;absoluteBeamWidth&quot; value=&quot;${absoluteBeamWidth}&quot;/&gt;
    &lt;property name=&quot;relativeBeamWidth&quot; value=&quot;${relativeBeamWidth}&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;trivialPruner&quot; 
            type=&quot;edu.cmu.sphinx.decoder.pruner.SimplePruner&quot;/&gt;

&lt;component name=&quot;threadedScorer&quot; 
            type=&quot;edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer&quot;&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;isCpuRelative&quot; value=&quot;true&quot;/&gt;
    &lt;property name=&quot;numThreads&quot; value=&quot;0&quot;/&gt;
    &lt;property name=&quot;minScoreablesPerThread&quot; value=&quot;10&quot;/&gt;
    &lt;property name=&quot;scoreablesKeepFeature&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The linguist  configuration                              --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;flatLinguist&quot; 
            type=&quot;edu.cmu.sphinx.linguist.flat.FlatLinguist&quot;&gt;
    &lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;grammar&quot; value=&quot;jsgfGrammar&quot;/&gt;
    &lt;property name=&quot;acousticModel&quot; value=&quot;g_teste&quot;/&gt;
    &lt;property name=&quot;wordInsertionProbability&quot; 
            value=&quot;${wordInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;silenceInsertionProbability&quot; 
            value=&quot;${silenceInsertionProbability}&quot;/&gt;
    &lt;property name=&quot;languageWeight&quot; value=&quot;${languageWeight}&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The Grammar  configuration                               --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;jsgfGrammar&quot; type=&quot;edu.cmu.sphinx.jsapi.JSGFGrammar&quot;&gt;
    &lt;property name=&quot;dictionary&quot; value=&quot;dictionary&quot;/&gt;
    &lt;property name=&quot;grammarLocation&quot; 
         value=&quot;src/&quot;/&gt;
    &lt;property name=&quot;grammarName&quot; value=&quot;g_teste_grammar&quot;/&gt;
&lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The Dictionary configuration                            --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;dictionary&quot; 
    type=&quot;edu.cmu.sphinx.linguist.dictionary.FullDictionary&quot;&gt;
    &lt;property name=&quot;dictionaryPath&quot;
    value=&quot;resource:/edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/g_teste_adb.dic&quot;/&gt;
&lt;property name=&quot;fillerPath&quot;
    value=&quot;resource:/edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/g_teste_adb.filler&quot;/&gt;
    &lt;property name=&quot;addSilEndingPronunciation&quot; value=&quot;false&quot;/&gt;
&lt;property name=&quot;wordReplacement&quot; value=&quot;&amp;lt;sil&amp;gt;&quot;/&gt;
    &lt;property name=&quot;allowMissingWords&quot; value=&quot;false&quot;/&gt;
&lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The acoustic model configuration                         --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;g_teste&quot; 
  type=&quot;edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model&quot;&gt;
    &lt;property name=&quot;loader&quot; value=&quot;sphinx3Loader&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;sphinx3Loader&quot;
       type=&quot;edu.cmu.sphinx.model.acoustic.g_teste_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader&quot;&gt;
&lt;property name=&quot;logMath&quot; value=&quot;logMath&quot;/&gt;
    &lt;property name=&quot;unitManager&quot; value=&quot;unitManager&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The unit manager configuration                           --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;unitManager&quot; 
    type=&quot;edu.cmu.sphinx.linguist.acoustic.UnitManager&quot;/&gt;


&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend configuration                               --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;mfcFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;streamDataSource&lt;/item&gt;
        &lt;item&gt;premphasizer&lt;/item&gt;
        &lt;item&gt;windower&lt;/item&gt;
        &lt;item&gt;fft&lt;/item&gt;
        &lt;item&gt;melFilterBank&lt;/item&gt;
        &lt;item&gt;dct&lt;/item&gt;
        &lt;item&gt;batchCMN&lt;/item&gt;
        &lt;item&gt;featureExtraction&lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;component name=&quot;streamDataSource&quot; 
            type=&quot;edu.cmu.sphinx.frontend.util.StreamDataSource&quot;&gt;
    &lt;property name=&quot;sampleRate&quot; value=&quot;16000&quot;/&gt;
&lt;property name=&quot;bitsPerSample&quot; value=&quot;16&quot;/&gt;
&lt;property name=&quot;bigEndianData&quot; value=&quot;false&quot;/&gt;
&lt;property name=&quot;signedData&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;premphasizer&quot; 
           type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;

&lt;component name=&quot;windower&quot; 
           type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;/&gt;

&lt;component name=&quot;fft&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;/&gt;

&lt;component name=&quot;melFilterBank&quot; 
      type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;/&gt;

&lt;component name=&quot;dct&quot; 
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;

&lt;component name=&quot;batchCMN&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.BatchCMN&quot;/&gt;

&lt;component name=&quot;featureExtraction&quot; 
           type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;


&lt;!-- ******************************************************* --&gt;
&lt;!--  monitors                                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;memoryTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.MemoryTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;speedTracker&quot; 
            type=&quot;edu.cmu.sphinx.instrumentation.SpeedTracker&quot;&gt;
    &lt;property name=&quot;recognizer&quot; value=&quot;${recognizer}&quot;/&gt;
    &lt;property name=&quot;frontend&quot; value=&quot;${frontend}&quot;/&gt;
    &lt;property name=&quot;showTimers&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;


&lt;!-- ******************************************************* --&gt;
&lt;!--  Miscellaneous components                               --&gt;
&lt;!-- ******************************************************* --&gt;

&lt;component name=&quot;logMath&quot; type=&quot;edu.cmu.sphinx.util.LogMath&quot;&gt;
    &lt;property name=&quot;logBase&quot; value=&quot;1.0001&quot;/&gt;
    &lt;property name=&quot;useAddTable&quot; value=&quot;true&quot;/&gt;
&lt;/component&gt;

</config>

THIS IS MY GRAMMAR-------------------------------------------

JSGF V1.0;

grammar g_teste_grammar;

public <teste> = (alo | lontras | teste | focas) * ;

Nickolay V. Shmyrev - 2007-06-22

Can you please paste prompts you used for recording and a dictionary. It's recommended to have at least 10 examples of each word in a dictionary so your prompts should have at least 40 words. Is it true?

Another way to proceed - iterative process. Once your test word is not recognized, add it to training set and record another test word. Such process should converge to working system.

I also wonder why absolute beam is -1:

<property name="absoluteBeamWidth" value="-1"/>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Robbie - 2007-06-22
  
  AbsoluteBeamWidth of -1 means that no absolute beam width is used. I frequently use this setting and rely solely on the RelativeBeamWidth--so do many of the example configurations in the code base.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gustavobap - 2007-06-22

Thanks a lot Nickolay.
I had only 1 wav file for each word, now I use 10 and the results are much better, though not perfect. How many different voices and wave files per voice do I need to use, in order to make the recognition "speaker independent" ? Do I need to use male and female voices ?

I don't know what value AbsoluteBeamWidth should have, I use the same value as the demos, as Robbie said. What AbsoluteBeamWidth and RelativeBeamWidth are used for?

I have one more problem now (looks like they will never end ^^), the recognition still returning <SIL> or erroneous results if I use the microphone. I guess my microphone output SampleRate is not 16KHz as it should be, do you know how to change the microphone properties in Windows XP?
And the result shouldn't be always a word in the dicionary ?

Nickolay, if you need more information to discover what is happening would be possible for us to talk through MSN ? This would speed things up. If you can add me, my MSN e-mail is [gustavobap#yahoo.com.br].

The dictionary uses english phonems to simulate the portuguese ones,
since there is no portuguese language model available.
THIS IS THE DICTIONARY=============================

ALO AH L OW
FOCAS F AA K AH S
LONTRAS L OW N T R AH S
TESTE T EH S T EH

===================================================

THIS IS THE FILLER DICTIONARY======================

<s> SIL
</s> SIL
<sil> SIL

==================================================

THIS IS THE TRANSCRIPTION=========================

<s> ALO </s> (alo0)
<s> ALO </s> (alo1)
<s> ALO </s> (alo2)
<s> ALO </s> (alo3)
<s> ALO </s> (alo4)
<s> ALO </s> (alo5)
<s> ALO </s> (alo6)
<s> ALO </s> (alo7)
<s> ALO </s> (alo8)
<s> ALO </s> (alo9)
<s> FOCAS </s> (focas0)
<s> FOCAS </s> (focas1)
<s> FOCAS </s> (focas2)
<s> FOCAS </s> (focas3)
<s> FOCAS </s> (focas4)
<s> FOCAS </s> (focas5)
<s> FOCAS </s> (focas6)
<s> FOCAS </s> (focas7)
<s> FOCAS </s> (focas8)
<s> FOCAS </s> (focas9)
<s> LONTRAS </s> (lontras0)
<s> LONTRAS </s> (lontras1)
<s> LONTRAS </s> (lontras2)
<s> LONTRAS </s> (lontras3)
<s> LONTRAS </s> (lontras4)
<s> LONTRAS </s> (lontras5)
<s> LONTRAS </s> (lontras6)
<s> LONTRAS </s> (lontras7)
<s> LONTRAS </s> (lontras8)
<s> LONTRAS </s> (lontras9)
<s> TESTE </s> (teste0)
<s> TESTE </s> (teste1)
<s> TESTE </s> (teste2)
<s> TESTE </s> (teste3)
<s> TESTE </s> (teste4)
<s> TESTE </s> (teste5)
<s> TESTE </s> (teste6)
<s> TESTE </s> (teste7)
<s> TESTE </s> (teste8)
<s> TESTE </s> (teste9)

==================================================

THIS IS THE MAIN JAVA CLASS=======================

package src;

import java.io.File;
import java.io.IOException;
import java.net.URL;

import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

import edu.cmu.sphinx.frontend.util.StreamDataSource;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
import edu.cmu.sphinx.util.props.PropertyException;

public class GTeste {

/** * Main method for running the WavFile demo. */ public static void main(String[] args) { try { URL audioFileURL; if (args.length > 0) { audioFileURL = new File(args[0]).toURI().toURL(); } else { audioFileURL = GTeste.class.getResource("testeR.wav"); } URL configURL = GTeste.class.getResource("/g_teste_config.xml"); System.out.println("Loading Recognizer...\n"); ConfigurationManager cm = new ConfigurationManager(configURL); Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); /* allocate the resource necessary for the recognizer */ recognizer.allocate(); System.out.println("Decoding " + audioFileURL.getFile()); System.out.println(AudioSystem.getAudioFileFormat(audioFileURL)); StreamDataSource reader = (StreamDataSource) cm.lookup("streamDataSource"); AudioInputStream ais = AudioSystem.getAudioInputStream(audioFileURL); /* set the stream data source to read from the audio file */ reader.setInputStream(ais, audioFileURL.getFile()); /* decode the audio file */ Result result = recognizer.recognize(); /* print out the results */ if (result != null) { System.out.println("\nRESULT: " + result.getBestFinalResultNoFiller() + "\n"); } else { System.out.println("Result: null\n"); } } catch (IOException e) { System.err.println("Problem when loading WavFile: " + e); e.printStackTrace(); } catch (PropertyException e) { System.err.println("Problem configuring WavFile: " + e); e.printStackTrace(); } catch (InstantiationException e) { System.err.println("Problem creating WavFile: " + e); e.printStackTrace(); } catch (UnsupportedAudioFileException e) { System.err.println("Audio file format not supported: " + e); e.printStackTrace(); } }

}

===============================================================
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2007-06-22
  
  > I had only 1 wav file for each word, now I use 10 and the results are much better, though not perfect. How many different voices and wave files per voice do I need to use, in order to make the recognition "speaker independent" ? Do I need to use male and female voices ?
  
  Well, I suggest you to look at tidigits database design:
  
  http://www.ldc.upenn.edu/Catalog/docs/LDC93S10/tidigits.readme.html
  
  of course you don't need 300 speakers but at least 100 is required for true independence. Another details you can take from tidigits (see sphinx4 example) is the word-oriented dictionary:
  
  ALO A_1 L_1 O_1
  FOCAS F_2 O_2 K_2 A_2 S_2
  LONTRAS L_3 O_3 N_3 T_3 R_3 A_3 S_3
  TESTE T_4 E_4 S_4 T_4 E_4
  
  or something like that. Check if such dictionary/phoneset will improve WER.
  
  > if you need more information to discover what is happening would be possible for us to talk through MSN ?
  
  Well, it's possible but I suppose we should create irc channel like #cmusphinx on freenode. Although I don't know if people here will support this. Probably it should be discussed in separate thread.
  
  > I guess my microphone output SampleRate is not 16KHz as it should be, do you know how to change the microphone properties in Windows XP?
  
  I suppose you should just set microphone properties:
  
  <component name="microphone" type="edu.cmu.sphinx.frontend.util.Microphone"> <property name="sampleRate" value="${sampleRate}/> </component>
  
  But of course it's better to set 16000 rate somehow, not sure if all hardware supports this.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi again,

Correcting the last post: when I use the microphone the recognition returns ALWAYS <SIL>, and not erroneous results.

The configuration file when I'm using the microphone is like the one before, but the property "frontend" is changed
<property name="frontend" value="epFrontEnd"/>
and the part



is replaced by this:
================================================

&lt;!-- ******************************************************** --&gt;
&lt;!-- The live frontend configuration                          --&gt;
&lt;!-- ******************************************************** --&gt;
&lt;component name=&quot;epFrontEnd&quot; type=&quot;edu.cmu.sphinx.frontend.FrontEnd&quot;&gt;
    &lt;propertylist name=&quot;pipeline&quot;&gt;
        &lt;item&gt;microphone &lt;/item&gt;
        &lt;item&gt;dataBlocker &lt;/item&gt;
        &lt;item&gt;speechClassifier &lt;/item&gt;
        &lt;item&gt;speechMarker &lt;/item&gt;
        &lt;item&gt;nonSpeechDataFilter &lt;/item&gt;
        &lt;item&gt;premphasizer &lt;/item&gt;
        &lt;item&gt;windower &lt;/item&gt;
        &lt;item&gt;fft &lt;/item&gt;
        &lt;item&gt;melFilterBank &lt;/item&gt;
        &lt;item&gt;dct &lt;/item&gt;
        &lt;item&gt;liveCMN &lt;/item&gt;
        &lt;item&gt;featureExtraction &lt;/item&gt;
    &lt;/propertylist&gt;
&lt;/component&gt;

&lt;!-- ******************************************************** --&gt;
&lt;!-- The frontend pipelines                                   --&gt;
&lt;!-- ******************************************************** --&gt;

&lt;component name=&quot;dataBlocker&quot; type=&quot;edu.cmu.sphinx.frontend.DataBlocker&quot;&gt;
    &lt;!--&lt;property name=&quot;blockSizeMs&quot; value=&quot;10&quot;/&gt;--&gt;
&lt;/component&gt;

&lt;component name=&quot;speechClassifier&quot;
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechClassifier&quot;&gt;
    &lt;property name=&quot;threshold&quot; value=&quot;13&quot;/&gt;
&lt;/component&gt;

&lt;component name=&quot;nonSpeechDataFilter&quot;
           type=&quot;edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter&quot;/&gt;

&lt;component name=&quot;speechMarker&quot;
           type=&quot;edu.cmu.sphinx.frontend.endpoint.SpeechMarker&quot; &gt;
    &lt;property name=&quot;speechTrailer&quot; value=&quot;50&quot;/&gt;
&lt;/component&gt;


&lt;component name=&quot;premphasizer&quot;
           type=&quot;edu.cmu.sphinx.frontend.filter.Preemphasizer&quot;/&gt;

&lt;component name=&quot;windower&quot;
           type=&quot;edu.cmu.sphinx.frontend.window.RaisedCosineWindower&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;fft&quot;
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;melFilterBank&quot;
    type=&quot;edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank&quot;&gt;
&lt;/component&gt;

&lt;component name=&quot;dct&quot;
        type=&quot;edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform&quot;/&gt;

&lt;component name=&quot;liveCMN&quot;
           type=&quot;edu.cmu.sphinx.frontend.feature.LiveCMN&quot;/&gt;

&lt;component name=&quot;featureExtraction&quot;
           type=&quot;edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor&quot;/&gt;

&lt;component name=&quot;microphone&quot;
           type=&quot;edu.cmu.sphinx.frontend.util.Microphone&quot;&gt;
    &lt;property name=&quot;closeBetweenUtterances&quot; value=&quot;false&quot;/&gt;
&lt;/component&gt;

=======================================================
THIS IS THE JAVA MAIN CLASS, USING MICROPHONE==========

/
* Copyright 1999-2004 Carnegie Mellon University.
* Portions Copyright 2004 Sun Microsystems, Inc.
* Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
* All Rights Reserved. Use is subject to license terms.
*
* See the file "license.terms" for information on usage and
* redistribution of this file, and for a DISCLAIMER OF ALL
* WARRANTIES.
*
/

package src;

import edu.cmu.sphinx.frontend.util.Microphone;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
import edu.cmu.sphinx.util.props.PropertyException;

import java.io.File;
import java.io.IOException;
import java.net.URL;

/*
* A simple HelloWorld demo showing a simple speech application
* built using Sphinx-4. This application uses the Sphinx-4 endpointer,
* which automatically segments incoming audio into utterances and silences.
/
public class GTeste2 {

/**
 * Main method for running the HelloWorld demo.
 */
public static void main(String[] args) {
    try {
        URL url;
        if (args.length &gt; 0) {
            url = new File(args[0]).toURI().toURL();
        } else {
            url = GTeste2.class.getResource(&quot;/g_teste_config2.xml&quot;);
        }

        System.out.println(&quot;Loading...&quot;);

        ConfigurationManager cm = new ConfigurationManager(url);

    Recognizer recognizer = (Recognizer) cm.lookup(&quot;recognizer&quot;);
    Microphone microphone = (Microphone) cm.lookup(&quot;microphone&quot;);


        /* allocate the resource necessary for the recognizer */
        recognizer.allocate();

        /* the microphone will keep recording until the program exits */
    if (microphone.startRecording()) {

        System.out.println
        (&quot;Say: (alo | lontras | teste | focas)&quot;);

    while (true) {
        System.out.println
        (&quot;Start speaking. Press Ctrl-C to quit.\n&quot;);

                /*
                 * This method will return when the end of speech
                 * is reached. Note that the endpointer will determine
                 * the end of speech.
                 */ 
        Result result = recognizer.recognize();

        if (result != null) {
        String resultText = result.getBestFinalResultNoFiller();
        System.out.println(&quot;You said: &quot; + resultText + &quot;\n&quot;);
        } else {
        System.out.println(&quot;I can't hear what you said.\n&quot;);
        }
    }
    } else {
    System.out.println(&quot;Cannot start microphone.&quot;);
    recognizer.deallocate();
    System.exit(1);
        }
    } catch (IOException e) {
        System.err.println(&quot;Problem when loading HelloWorld: &quot; + e);
        e.printStackTrace();
    } catch (PropertyException e) {
        System.err.println(&quot;Problem configuring HelloWorld: &quot; + e);
        e.printStackTrace();
    } catch (InstantiationException e) {
        System.err.println(&quot;Problem creating HelloWorld: &quot; + e);
        e.printStackTrace();
    }
}

}

==========================================================================

gustavobap - 2007-06-26

I discovered what is wrong with the microphone, I've set the properties this way:

<component name="microphone" type="edu.cmu.sphinx.frontend.util.Microphone"> <property name="closeBetweenUtterances" value="false"/> <property name="sampleRate" value="16000"/> <property name="bitsPerSample" value="16"/> <property name="bigEndianData" value="false"/> <property name="signedData" value="true"/> </component>

====================================================================================

like it should be, to be used whit a database where files are:

WAVE (.wav) file, byte length: 32048, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 16002

the problem is, even with this property set
<property name="bigEndianData" value="false"/>
the microphone is initialized like this:
================================================================
PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, big-endian
================================================================
it is BIG-ENDIAN, and my trained files are LITTLE-ENDIAN.

Do you know how to solve this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
gustavobap - 2007-06-26

Just discovered the property should be set like this:

<property name="bigEndian" value="false"/>

The audio format is correct now, but the results still wrong =[.
The recognizer is taking too long to return, and the recognition
is <SIL> or erroneous.

Something weird is happening.. I guess I'm doing something really wrong.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Only recognizes the original trained wav file

Speech Recognition Toolkit

Forums

Help

Only recognizes the original trained wav file

JSGF V1.0;

public <teste> = (alo | lontras | teste | focas) * ;

I discovered what is wrong with the microphone, I've set the properties this way:

like it should be, to be used whit a database where files are:

WAVE (.wav) file, byte length: 32048, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 16002

Only recognizes the original trained wav file

Speech Recognition Toolkit

Forums

Help

Only recognizes the original trained wav file document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

JSGF V1.0;

public <teste> = (alo | lontras | teste | focas) * ;

I discovered what is wrong with the microphone, I've set the properties this way:

like it should be, to be used whit a database where files are:

WAVE (.wav) file, byte length: 32048, data format: PCM_SIGNED 16000.0 Hz, 16 bit, mono, 2 bytes/frame, little-endian, frame length: 16002

Only recognizes the original trained wav file