Menu

Aligner demo - french_f0 - poor results

Help
Anonymous
2011-03-03
2012-09-22
  • Anonymous

    Anonymous - 2011-03-03

    sphinx4-1.0beta5-src on Windows 7
    compiled with Ant
    jdk1.6.0_24

    I modified Aligner.java and its config.xml to use the french_f0 dictionary, a
    sample wav, and a sample sentence. The result is poor. Is this the best I can
    expect or should I be doing this differently?

    CONFIG:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <config>
    
        <property name="logLevel" value="WARNING"/>
    
        <property name="absoluteBeamWidth"  value="-1"/>
        <property name="relativeBeamWidth"  value="1E-80"/>
        <property name="wordInsertionProbability" value="1E-36"/>
        <property name="languageWeight"     value="8"/>
    
        <property name="frontend" value="epFrontEnd"/>
        <property name="recognizer" value="recognizer"/>
        <property name="showCreations" value="false"/>
    
        <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
            <property name="decoder" value="decoder"/>
       </component>
    
        <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
            <property name="searchManager" value="searchManager"/>
        </component>
    
        <component name="searchManager" 
            type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
            <property name="logMath" value="logMath"/>
            <property name="linguist" value="flatLinguist"/>
            <property name="pruner" value="trivialPruner"/>
            <property name="scorer" value="threadedScorer"/>
            <property name="activeListFactory" value="activeList"/>
        </component>
    
    
        <component name="activeList" 
                 type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
            <property name="logMath" value="logMath"/>
            <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
            <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
        </component>
    
        <component name="trivialPruner" 
                    type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>
    
        <component name="threadedScorer" 
                    type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
            <property name="frontend" value="${frontend}"/>
        </component>
    
        <component name="flatLinguist"
                    type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
            <property name="logMath" value="logMath"/>
            <property name="grammar" value="textAlignGrammar"/>
            <property name="acousticModel" value="frenchAM"/>
            <property name="wordInsertionProbability"
                    value="${wordInsertionProbability}"/>
            <property name="languageWeight" value="${languageWeight}"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
        <component name="textAlignGrammar" type="edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar">
            <property name="dictionary" value="dictionary"/>
        <property name="logMath" value="logMath"/>
        </component>
    
    
        <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> 
        <property name="unigramWeight" value="0.7"/> 
        <property name="maxDepth" value="3"/> 
        <property name="logMath" value="logMath"/> 
        <property name="dictionary" value="dictionary"/> 
        <property name="location" value="models/acoustic/french_f0/etc/french3g62K.DMP"/> 
        </component>
    
    
    
        <component name="dictionary" 
            type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
            <property name="dictionaryPath"
                      value="models/acoustic/french_f0/etc/frenchWords62K.dic"/>
            <property name="fillerPath" 
                  value="models/acoustic/french_f0/etc/frenchFillers.dic"/>
            <property name="addSilEndingPronunciation" value="true"/>
            <property name="wordReplacement" value="&lt;sil&gt;"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
      <component name="frenchAMLoader"
             type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
            <property name="location" value="models/acoustic/french_f0/model_parameters/french_f0.cd_cont_5725_22/"/>
        <property name="modelDefinition" value="french_f0.5725.mdef"/>
      </component>
    
      <component name="frenchAM" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
        <property name="loader" value="frenchAMLoader"/>
        <property name="unitManager" value="unitManager"/>
      </component>
    
    
        <component name="wsj"
                   type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
            <property name="loader" value="wsjLoader"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
        <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
            <property name="logMath" value="logMath"/>
            <property name="unitManager" value="unitManager"/>
            <property name="location" value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz"/>
            <property name="modelDefinition" value="etc/WSJ_clean_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef"/>
            <property name="dataLocation" value="cd_continuous_8gau/"/>
        </component>
    
        <component name="unitManager" 
            type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>
    
        <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
            <propertylist name="pipeline">
                <item>audioFileDataSource </item>
                <item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
                <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>
                <item>liveCMN </item>
                <item>featureExtraction </item>
            </propertylist>
        </component>
    
        <component name="audioFileDataSource" type="edu.cmu.sphinx.frontend.util.AudioFileDataSource"/>
    
        <component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>
    
        <component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"/>
    
        <component name="nonSpeechDataFilter" 
                   type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>
    
        <component name="speechMarker" type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" />
    
        <component name="preemphasizer"
                   type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>
    
        <component name="windower" 
                   type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
        </component>
    
        <component name="fft" 
                type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
        </component>
    
        <component name="melFilterBank" 
            type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
        </component>
    
        <component name="dct" 
                type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>
    
        <component name="liveCMN" 
                   type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>
    
        <component name="featureExtraction" 
                   type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>
    
        <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
            <property name="logBase" value="1.0001"/>
            <property name="useAddTable" value="true"/>
        </component>
    
    </config>
    

    ALIGNER.XML

    /*
     * Copyright 1999-2004 Carnegie Mellon University.
     * Portions Copyright 2004 Sun Microsystems, Inc.
     * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
     * All Rights Reserved.  Use is subject to license terms.
     *
     * See the file "license.terms" for information on usage and
     * redistribution of this file, and for a DISCLAIMER OF ALL
     * WARRANTIES.
     *
     */
    
    package edu.cmu.sphinx.demo.aligner;
    
    import edu.cmu.sphinx.frontend.util.AudioFileDataSource;
    import edu.cmu.sphinx.recognizer.Recognizer;
    import edu.cmu.sphinx.result.Result;
    import edu.cmu.sphinx.util.props.ConfigurationManager;
    import edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar;
    
    import javax.sound.sampled.UnsupportedAudioFileException;
    import java.io.IOException;
    import java.net.URL;
    
    /**
     * A simple example that shows how to align speech to existing transcription to
     * get times.
     */
    public class Aligner {
    
        public static void main(String[] args) throws IOException, UnsupportedAudioFileException {
    
            ConfigurationManager cm = new ConfigurationManager("src/sphinx4/edu/cmu/sphinx/config/aligner.xml");
            Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
    
            TextAlignerGrammar grammar = (TextAlignerGrammar) cm.lookup("textAlignGrammar");
            grammar.setText("Dans le faubourg une rue assourdissante populeuse où du matin au soir les vitres tremblaient au fracas des camions et des omnibus tout le monde connaissait estimait et respectait la petite papetière");
            recognizer.addResultListener(grammar);
    
            /* allocate the resource necessary for the recognizer */
            recognizer.allocate();
    
            // configure the audio input for the recognizer
            AudioFileDataSource dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource");
            dataSource.setAudioFile(new URL("file:src/apps/edu/cmu/sphinx/demo/transcriber/10001-90210-01803.wav"), null);
    
            Result result;
            while ((result = recognizer.recognize()) != null) {
    
                String resultText = result.getTimedBestResult(false, true);
                System.out.println(resultText);
            }
        }
    }
    

    RESULT:

    10:42:04.807 WARNING dictionary        Missing word: assourdissante
    10:42:04.807 WARNING dictionary        Missing word: populeuse
    10:42:04.807 WARNING dictionary        Missing word: tremblaient
    connaissait(0.85,2.69)
    la(4.4,4.91)
    petite(7.6,8.15)
    

    The three aligned words are completely wrong.

     
  • Nickolay V. Shmyrev

    French model uses AGC, you need to include BatchAGC component into frontend
    pipeline. You can search this forum for details.

    It's also recommended to use sphinx4-1.0 beta6, not beta5.

     
  • Anonymous

    Anonymous - 2011-03-03

    It's not any better but it might help if I showed you the results using the
    correct audio file (I thought those results were weird).

    Still, none of the words are correctly located. I also tried with an audio
    file containing someone counting from 0-9 in French. It was bad except the
    zero was perfect.

    11:32:07.100 WARNING dictionary        Missing word: assourdissante
    11:32:07.100 WARNING dictionary        Missing word: populeuse
    11:32:07.100 WARNING dictionary        Missing word: tremblaient
    au(3.73,4.48) soir(4.48,4.97) les(4.97,5.16) vitres(5.16,5.82) au(5.98,6.12) fracas(6.12,6.47) des(6.47,6.85)
    tout(8.8,8.98) le(8.98,9.09) monde(9.09,9.44) connaissait(9.44,10.41) estimait(10.41,11.35) et(11.35,13.71)
    
     
  • Anonymous

    Anonymous - 2011-03-03

    Thank you for the reply. sphinx4-1.0beta6 with the following addition to the
    config.xml:

        <component name="BatchAGC" type="edu.cmu.sphinx.frontend.feature.BatchAGC"/>
    

    The results on the counting test are poor:

    zero(0.92163265,1.7796825)
    un(2.5353289,4.5507483)
    un(5.87161,6.2906575)
    un(7.606213,8.174921)
    un(9.828481,10.038005)
    un(11.493016,11.782358)
    un(13.227075,13.5164175)
    un(14.571701,15.190295)
    un(16.78399,17.562222)
    
     
  • Nickolay V. Shmyrev

    You need to add it into frontend pipeline not just in a list of the
    components.

     
  • Anonymous

    Anonymous - 2011-03-04

    Do I understand correctly that to add BatchAGC into the frontend pipeline, I
    make the following changes to the configuration?

        <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
            <propertylist name="pipeline">
                <item>audioFileDataSource </item>
                <item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
                <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>
                <item>liveCMN </item>
                <item>featureExtraction </item>
            <item>BatchAGC </item>
            </propertylist>
        </component>
    

    and

    <component name="BatchAGC" 
            type="edu.cmu.sphinx.frontend.feature.BatchAGC"/>
    

    I also added the following line to Aligner.java:

    import edu.cmu.sphinx.frontend.feature.BatchAGC;
    

    I also scoured these forums for "BatchAGC" but did not see anything more
    detailed than what I have written above.

     
  • marekl

    marekl - 2011-03-04

    I have searched forums an found
    "you just need to add BatchAGC into the frontend pipeline after the BatchCMN"
    (https://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3360385
    ?message=7556457)

    So according to this search result, in your pipeline you there is BatchCMN
    component missing and BatchAGC is misplaced.

     
  • Anonymous

    Anonymous - 2011-03-04

    I saw that as well. I didn't assume that I needed BatchCMN, but I've added it
    as you have suggested and the result remains exactly the same. To make sure I
    understand correctly, I will post my files again. I very much appreciate your
    attention.

    ALIGNER.JAVA

    /*
     * Copyright 1999-2004 Carnegie Mellon University.
     * Portions Copyright 2004 Sun Microsystems, Inc.
     * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
     * All Rights Reserved.  Use is subject to license terms.
     *
     * See the file "license.terms" for information on usage and
     * redistribution of this file, and for a DISCLAIMER OF ALL
     * WARRANTIES.
     *
     */
    
    package edu.cmu.sphinx.demo.aligner;
    
    import edu.cmu.sphinx.frontend.util.AudioFileDataSource;
    import edu.cmu.sphinx.recognizer.Recognizer;
    import edu.cmu.sphinx.result.Result;
    import edu.cmu.sphinx.util.props.ConfigurationManager;
    import edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar;
    import edu.cmu.sphinx.frontend.feature.BatchCMN;
    import edu.cmu.sphinx.frontend.feature.BatchAGC;
    
    import javax.sound.sampled.UnsupportedAudioFileException;
    import java.io.IOException;
    import java.net.URL;
    
    /**
     * A simple example that shows how to align speech to existing transcription to
     * get times.
     */
    public class Aligner {
    
        public static void main(String[] args) throws IOException, UnsupportedAudioFileException {
    
            ConfigurationManager cm = new ConfigurationManager("src/sphinx4/edu/cmu/sphinx/config/aligner.xml");
            Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
    
            TextAlignerGrammar grammar = (TextAlignerGrammar) cm.lookup("textAlignGrammar");
            grammar.setText("zero un deux trois quatre cinq six sept huit neuf");
            recognizer.addResultListener(grammar);
    
            /* allocate the resource necessary for the recognizer */
            recognizer.allocate();
    
            // configure the audio input for the recognizer
            AudioFileDataSource dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource");
            dataSource.setAudioFile(new URL("file:src/apps/edu/cmu/sphinx/demo/0-9.wav"), null);
    
            Result result;
            while ((result = recognizer.recognize()) != null) {
    
                String resultText = result.getTimedBestResult(false, true);
                System.out.println(resultText);
            }
        }
    }
    

    ALIGNER.XML

    <?xml version="1.0" encoding="UTF-8"?>
    
    <config>
    
        <property name="logLevel" value="WARNING"/>
    
        <property name="absoluteBeamWidth"  value="-1"/>
        <property name="relativeBeamWidth"  value="1E-80"/>
        <property name="wordInsertionProbability" value="1E-36"/>
        <property name="languageWeight"     value="8"/>
    
        <property name="frontend" value="epFrontEnd"/>
        <property name="recognizer" value="recognizer"/>
        <property name="showCreations" value="false"/>
    
        <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
            <property name="decoder" value="decoder"/>
       </component>
    
        <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
            <property name="searchManager" value="searchManager"/>
        </component>
    
        <component name="searchManager" 
            type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
            <property name="logMath" value="logMath"/>
            <property name="linguist" value="flatLinguist"/>
            <property name="pruner" value="trivialPruner"/>
            <property name="scorer" value="threadedScorer"/>
            <property name="activeListFactory" value="activeList"/>
        </component>
    
    
        <component name="activeList" 
                 type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
            <property name="logMath" value="logMath"/>
            <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
            <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
        </component>
    
        <component name="trivialPruner" 
                    type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>
    
        <component name="threadedScorer" 
                    type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
            <property name="frontend" value="${frontend}"/>
        </component>
    
        <component name="flatLinguist"
                    type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
            <property name="logMath" value="logMath"/>
            <property name="grammar" value="textAlignGrammar"/>
            <property name="acousticModel" value="frenchAM"/>
            <property name="wordInsertionProbability"
                    value="${wordInsertionProbability}"/>
            <property name="languageWeight" value="${languageWeight}"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
        <component name="textAlignGrammar" type="edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar">
            <property name="dictionary" value="dictionary"/>
        <property name="logMath" value="logMath"/>
        </component>
    
    
        <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> 
        <property name="unigramWeight" value="0.7"/> 
        <property name="maxDepth" value="3"/> 
        <property name="logMath" value="logMath"/> 
        <property name="dictionary" value="dictionary"/> 
        <property name="location" value="models/acoustic/french_f0/etc/french3g62K.DMP"/> 
        </component>
    
    
    
        <component name="dictionary" 
            type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
            <property name="dictionaryPath"
                      value="models/acoustic/french_f0/etc/frenchWords62K.dic"/>
            <property name="fillerPath" 
                  value="models/acoustic/french_f0/etc/frenchFillers.dic"/>
            <property name="addSilEndingPronunciation" value="true"/>
            <property name="wordReplacement" value="&lt;sil&gt;"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
      <component name="frenchAMLoader"
             type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
            <property name="location" value="models/acoustic/french_f0/model_parameters/french_f0.cd_cont_5725_22/"/>
        <property name="modelDefinition" value="french_f0.5725.mdef"/>
        <property name="properties_file" value="am.props"/>
      </component>
    
      <component name="frenchAM" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
        <property name="loader" value="frenchAMLoader"/>
        <property name="unitManager" value="unitManager"/>
      </component>
    
    
        <component name="wsj"
                   type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
            <property name="loader" value="wsjLoader"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
        <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
            <property name="logMath" value="logMath"/>
            <property name="unitManager" value="unitManager"/>
            <property name="location" value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz"/>
            <property name="modelDefinition" value="etc/WSJ_clean_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef"/>
            <property name="dataLocation" value="cd_continuous_8gau/"/>
        </component>
    
        <component name="unitManager" 
            type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>
    
        <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
            <propertylist name="pipeline">
                <item>audioFileDataSource </item>
                <item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
                <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>
                <item>liveCMN </item>
                <item>featureExtraction </item>
            <item>BatchCMN </item>
            <item>BatchAGC </item>
            </propertylist>
        </component>
    
        <component name="audioFileDataSource" type="edu.cmu.sphinx.frontend.util.AudioFileDataSource"/>
    
        <component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>
    
        <component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"/>
    
        <component name="nonSpeechDataFilter" 
                   type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>
    
        <component name="speechMarker" type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" />
    
        <component name="preemphasizer"
                   type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>
    
        <component name="windower" 
                   type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
        </component>
    
        <component name="fft" 
                type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
        </component>
    
        <component name="melFilterBank" 
            type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
        </component>
    
        <component name="dct" 
                type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>
    
        <component name="liveCMN" 
                   type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>
    
        <component name="featureExtraction" 
                   type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>
    
        <component name="BatchCMN" 
            type="edu.cmu.sphinx.frontend.feature.BatchCMN"/>
    
        <component name="BatchAGC" 
            type="edu.cmu.sphinx.frontend.feature.BatchAGC"/>
    
        <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
            <property name="logBase" value="1.0001"/>
            <property name="useAddTable" value="true"/>
        </component>
    
    
    </config>
    

    RESULT (counting from 0-9 in French):

    zero(0.92163265,1.7796825)
    un(2.5353289,4.5507483)
    un(5.87161,6.2906575)
    un(7.606213,8.174921)
    un(9.828481,10.038005)
    un(11.493016,11.782358)
    un(13.227075,13.5164175)
    un(14.571701,15.190295)
    un(16.78399,17.562222)
    
     
  • Anonymous

    Anonymous - 2011-03-04

    I just learned that the placement of BatchAGC is important. With the frontend
    configured in the following way, I get mostly correct results, a lot of junk,
    and one missing word (quatre). Using BatchCMN makes things worse, so I removed
    it.

        <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
            <propertylist name="pipeline">
                <item>audioFileDataSource </item>
                <item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>
            <item>BatchAGC </item>
                <item>featureExtraction </item>  
            </propertylist>
        </component>
    

    A sample from the end of the results. Now I need to figure out how if I can
    filter the results like LiveCMN did, and why quatre is not recognized. If you
    could throw me a bone in that respect, I would appreciate it.

    -15.664374929302463
    -15.022731697289538
    -14.516154451947292
    -14.720797109791558
    -13.907056959837554
    -12.94244373489699
    -13.429857169098572
    -13.889048605326327
    -13.708396484638216
    -13.037687166300769
    -12.533037083927763
    -8.747878894606437
    -10.340122049435841
    -18.641665217680323
    -20.486957055362794
    -18.255194840401604
    neuf(15.995782,17.562222)
    
     
  • Anonymous

    Anonymous - 2011-03-04

    Unfortunately for my longer example, which is a story rather than a simple
    count from 0-9, the recognition is horribly poor, despite being so accurate
    for the count example. It seems that there may be some deeply embedded tuning
    here for the English language which doesn't work for the French language
    model.

    I believe there is possibly some tuning that could be done for French, and I
    would be interested in learning from those that have already blazed the trail.
    Please let me know if you can provide some insight.

     
  • Nickolay V. Shmyrev

    Hello

    Please be more accurate and try to understand how things work. Your frontend
    pipeline is wrong. Proper pipeline is cited in the forum thread we referenced,
    you just need to read it carefully. Proper pipeline is:

            <propertylist name="pipeline">
                <item>audioFileDataSource </item>
                <item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
                <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>
            <item>BatchCMN </item>
            <item>BatchAGC </item>
                <item>featureExtraction </item>
            </propertylist>
    
     
  • Anonymous

    Anonymous - 2011-03-04

    It's clear that you also did not read post number 10 where indicated that I
    did have BatchCMN in the pipeline before BatchAGC and it made things worse. Go
    back up in the thread and see for yourself. Just in case I was wrong I double-
    checked using your recommended pipeline. It's worse. The BatchAGC alone
    pipeline is so far the best result I have seen (as I already wrote). Reread
    10.

    Simply put: what you have suggested is not the solution to this problem,
    although the BatchAGC did improve matters. Don't blame me for a lack of
    information or for lack of attention, I read everything carefully, which you
    would know if you had read carefully yourself.

    I have given you all of the information you need to reproduce the situation on
    your end with a simple copy and paste. It would be one thing if you had this
    set up and it was working for you but I understand that this is some educated
    guesswork. I've given you the complete contents of all of my files and the
    result. The one thing that is missing is the audio file.

    The audio is here:
    about.com

    It works reasonably well with this pipeline (BatchAGC only):

            <propertylist name="pipeline">
                <item>audioFileDataSource </item>
                <item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>
            <item>BatchAGC </item>
                <item>featureExtraction </item>   
            </propertylist>
    

    If you have the inclination, you could use the file and code I provided to
    reproduce the problem. I just modified the included Aligner.java, aligner.xml
    files and added french_f0 into the mix.

    My guess is that this is not enough to provide for accurate French recognition
    and that some further tuning is needed. It seems that you believe otherwise,
    but maybe you or an experienced user are willing to reproduce the problem
    using the information I have provided and show me that it is just a simple
    tweak as you have indicated.

     
  • Nickolay V. Shmyrev

    Hello

    I tried your audio and indeed it returns not so good results. There is some
    issue with long silences between digits, if you'll remove them everything will
    be way better. Result with cut silence is:

    zero(0.19,1.08) un(1.08,1.23) deux(1.23,2.22) trois(2.22,2.86)
    quatre(2.86,3.51) cinq(3.51,4.24) six(4.24,4.83) sept(4.83,5.28)
    huit(5.28,6.18) neuf(6.18,6.85)

    Aligner algorithm need some work it seems to deal with this particular case.

    But that doesn't change the proper frontend configuration listed above since
    the configuration is based on prior knowledge, not on the experiments. If
    experiments were based on bigger amount of data, they can show which one
    performs better.

     
  • marekl

    marekl - 2011-03-07

    According to previous discussions on this forum, problem with proper silence
    classification might be caused by wrong positioning of dataBlocker component
    in pipeline. Placing this component after VAD (after nonSpeechDataFilter to be
    more exact) or removing it (if applicable) may solve this problem (see https:
    //sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3894779/index/p
    age/2)

     

Log in to post a comment.