CMU Sphinx / Forums / Help: Aligner demo - french_f0

sphinx4-1.0beta5-src on Windows 7
compiled with Ant
jdk1.6.0_24

I modified Aligner.java and its config.xml to use the french_f0 dictionary, a
sample wav, and a sample sentence. The result is poor. Is this the best I can
expect or should I be doing this differently?

CONFIG:

<?xml version="1.0" encoding="UTF-8"?>

<config>

    <property name="logLevel" value="WARNING"/>

    <property name="absoluteBeamWidth"  value="-1"/>
    <property name="relativeBeamWidth"  value="1E-80"/>
    <property name="wordInsertionProbability" value="1E-36"/>
    <property name="languageWeight"     value="8"/>

    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>

    <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
        <property name="decoder" value="decoder"/>
   </component>

    <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
        <property name="searchManager" value="searchManager"/>
    </component>

    <component name="searchManager" 
        type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
        <property name="logMath" value="logMath"/>
        <property name="linguist" value="flatLinguist"/>
        <property name="pruner" value="trivialPruner"/>
        <property name="scorer" value="threadedScorer"/>
        <property name="activeListFactory" value="activeList"/>
    </component>


    <component name="activeList" 
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>

    <component name="trivialPruner" 
                type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>

    <component name="threadedScorer" 
                type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
        <property name="frontend" value="${frontend}"/>
    </component>

    <component name="flatLinguist"
                type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
        <property name="logMath" value="logMath"/>
        <property name="grammar" value="textAlignGrammar"/>
        <property name="acousticModel" value="frenchAM"/>
        <property name="wordInsertionProbability"
                value="${wordInsertionProbability}"/>
        <property name="languageWeight" value="${languageWeight}"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="textAlignGrammar" type="edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar">
        <property name="dictionary" value="dictionary"/>
    <property name="logMath" value="logMath"/>
    </component>


    <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> 
    <property name="unigramWeight" value="0.7"/> 
    <property name="maxDepth" value="3"/> 
    <property name="logMath" value="logMath"/> 
    <property name="dictionary" value="dictionary"/> 
    <property name="location" value="models/acoustic/french_f0/etc/french3g62K.DMP"/> 
    </component>



    <component name="dictionary" 
        type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
        <property name="dictionaryPath"
                  value="models/acoustic/french_f0/etc/frenchWords62K.dic"/>
        <property name="fillerPath" 
              value="models/acoustic/french_f0/etc/frenchFillers.dic"/>
        <property name="addSilEndingPronunciation" value="true"/>
        <property name="wordReplacement" value="&lt;sil&gt;"/>
        <property name="unitManager" value="unitManager"/>
    </component>

  <component name="frenchAMLoader"
         type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
    <property name="logMath" value="logMath"/>
    <property name="unitManager" value="unitManager"/>
        <property name="location" value="models/acoustic/french_f0/model_parameters/french_f0.cd_cont_5725_22/"/>
    <property name="modelDefinition" value="french_f0.5725.mdef"/>
  </component>

  <component name="frenchAM" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
    <property name="loader" value="frenchAMLoader"/>
    <property name="unitManager" value="unitManager"/>
  </component>


    <component name="wsj"
               type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
        <property name="loader" value="wsjLoader"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
        <property name="location" value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz"/>
        <property name="modelDefinition" value="etc/WSJ_clean_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef"/>
        <property name="dataLocation" value="cd_continuous_8gau/"/>
    </component>

    <component name="unitManager" 
        type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>

    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>audioFileDataSource </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>
        </propertylist>
    </component>

    <component name="audioFileDataSource" type="edu.cmu.sphinx.frontend.util.AudioFileDataSource"/>

    <component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>

    <component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"/>

    <component name="nonSpeechDataFilter" 
               type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>

    <component name="speechMarker" type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" />

    <component name="preemphasizer"
               type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>

    <component name="windower" 
               type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
    </component>

    <component name="fft" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
    </component>

    <component name="melFilterBank" 
        type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
    </component>

    <component name="dct" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>

    <component name="liveCMN" 
               type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>

    <component name="featureExtraction" 
               type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>

    <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
        <property name="logBase" value="1.0001"/>
        <property name="useAddTable" value="true"/>
    </component>

</config>

ALIGNER.XML

/*
 * Copyright 1999-2004 Carnegie Mellon University.
 * Portions Copyright 2004 Sun Microsystems, Inc.
 * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
 * All Rights Reserved.  Use is subject to license terms.
 *
 * See the file "license.terms" for information on usage and
 * redistribution of this file, and for a DISCLAIMER OF ALL
 * WARRANTIES.
 *
 */

package edu.cmu.sphinx.demo.aligner;

import edu.cmu.sphinx.frontend.util.AudioFileDataSource;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
import edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar;

import javax.sound.sampled.UnsupportedAudioFileException;
import java.io.IOException;
import java.net.URL;

/**
 * A simple example that shows how to align speech to existing transcription to
 * get times.
 */
public class Aligner {

    public static void main(String[] args) throws IOException, UnsupportedAudioFileException {

        ConfigurationManager cm = new ConfigurationManager("src/sphinx4/edu/cmu/sphinx/config/aligner.xml");
        Recognizer recognizer = (Recognizer) cm.lookup("recognizer");

        TextAlignerGrammar grammar = (TextAlignerGrammar) cm.lookup("textAlignGrammar");
        grammar.setText("Dans le faubourg une rue assourdissante populeuse où du matin au soir les vitres tremblaient au fracas des camions et des omnibus tout le monde connaissait estimait et respectait la petite papetière");
        recognizer.addResultListener(grammar);

        /* allocate the resource necessary for the recognizer */
        recognizer.allocate();

        // configure the audio input for the recognizer
        AudioFileDataSource dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource");
        dataSource.setAudioFile(new URL("file:src/apps/edu/cmu/sphinx/demo/transcriber/10001-90210-01803.wav"), null);

        Result result;
        while ((result = recognizer.recognize()) != null) {

            String resultText = result.getTimedBestResult(false, true);
            System.out.println(resultText);
        }
    }
}

RESULT:

10:42:04.807 WARNING dictionary        Missing word: assourdissante
10:42:04.807 WARNING dictionary        Missing word: populeuse
10:42:04.807 WARNING dictionary        Missing word: tremblaient
connaissait(0.85,2.69)
la(4.4,4.91)
petite(7.6,8.15)

The three aligned words are completely wrong.

Nickolay V. Shmyrev - 2011-03-03

French model uses AGC, you need to include BatchAGC component into frontend
pipeline. You can search this forum for details.

It's also recommended to use sphinx4-1.0 beta6, not beta5.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2011-03-03

It's not any better but it might help if I showed you the results using the
correct audio file (I thought those results were weird).

Still, none of the words are correctly located. I also tried with an audio
file containing someone counting from 0-9 in French. It was bad except the
zero was perfect.

11:32:07.100 WARNING dictionary Missing word: assourdissante 11:32:07.100 WARNING dictionary Missing word: populeuse 11:32:07.100 WARNING dictionary Missing word: tremblaient au(3.73,4.48) soir(4.48,4.97) les(4.97,5.16) vitres(5.16,5.82) au(5.98,6.12) fracas(6.12,6.47) des(6.47,6.85) tout(8.8,8.98) le(8.98,9.09) monde(9.09,9.44) connaissait(9.44,10.41) estimait(10.41,11.35) et(11.35,13.71)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2011-03-03

Thank you for the reply. sphinx4-1.0beta6 with the following addition to the
config.xml:

<component name="BatchAGC" type="edu.cmu.sphinx.frontend.feature.BatchAGC"/>

The results on the counting test are poor:

zero(0.92163265,1.7796825) un(2.5353289,4.5507483) un(5.87161,6.2906575) un(7.606213,8.174921) un(9.828481,10.038005) un(11.493016,11.782358) un(13.227075,13.5164175) un(14.571701,15.190295) un(16.78399,17.562222)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-03

You need to add it into frontend pipeline not just in a list of the
components.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Do I understand correctly that to add BatchAGC into the frontend pipeline, I
make the following changes to the configuration?

    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>audioFileDataSource </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>
        <item>BatchAGC </item>
        </propertylist>
    </component>

and

<component name="BatchAGC" 
        type="edu.cmu.sphinx.frontend.feature.BatchAGC"/>

I also added the following line to Aligner.java:

import edu.cmu.sphinx.frontend.feature.BatchAGC;

I also scoured these forums for "BatchAGC" but did not see anything more
detailed than what I have written above.

marekl - 2011-03-04

I have searched forums an found
"you just need to add BatchAGC into the frontend pipeline after the BatchCMN"
(https://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3360385
?message=7556457)
So according to this search result, in your pipeline you there is BatchCMN
component missing and BatchAGC is misplaced.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I saw that as well. I didn't assume that I needed BatchCMN, but I've added it
as you have suggested and the result remains exactly the same. To make sure I
understand correctly, I will post my files again. I very much appreciate your
attention.

ALIGNER.JAVA

/*
 * Copyright 1999-2004 Carnegie Mellon University.
 * Portions Copyright 2004 Sun Microsystems, Inc.
 * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
 * All Rights Reserved.  Use is subject to license terms.
 *
 * See the file "license.terms" for information on usage and
 * redistribution of this file, and for a DISCLAIMER OF ALL
 * WARRANTIES.
 *
 */

package edu.cmu.sphinx.demo.aligner;

import edu.cmu.sphinx.frontend.util.AudioFileDataSource;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
import edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar;
import edu.cmu.sphinx.frontend.feature.BatchCMN;
import edu.cmu.sphinx.frontend.feature.BatchAGC;

import javax.sound.sampled.UnsupportedAudioFileException;
import java.io.IOException;
import java.net.URL;

/**
 * A simple example that shows how to align speech to existing transcription to
 * get times.
 */
public class Aligner {

    public static void main(String[] args) throws IOException, UnsupportedAudioFileException {

        ConfigurationManager cm = new ConfigurationManager("src/sphinx4/edu/cmu/sphinx/config/aligner.xml");
        Recognizer recognizer = (Recognizer) cm.lookup("recognizer");

        TextAlignerGrammar grammar = (TextAlignerGrammar) cm.lookup("textAlignGrammar");
        grammar.setText("zero un deux trois quatre cinq six sept huit neuf");
        recognizer.addResultListener(grammar);

        /* allocate the resource necessary for the recognizer */
        recognizer.allocate();

        // configure the audio input for the recognizer
        AudioFileDataSource dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource");
        dataSource.setAudioFile(new URL("file:src/apps/edu/cmu/sphinx/demo/0-9.wav"), null);

        Result result;
        while ((result = recognizer.recognize()) != null) {

            String resultText = result.getTimedBestResult(false, true);
            System.out.println(resultText);
        }
    }
}

ALIGNER.XML

<?xml version="1.0" encoding="UTF-8"?>

<config>

    <property name="logLevel" value="WARNING"/>

    <property name="absoluteBeamWidth"  value="-1"/>
    <property name="relativeBeamWidth"  value="1E-80"/>
    <property name="wordInsertionProbability" value="1E-36"/>
    <property name="languageWeight"     value="8"/>

    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>

    <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
        <property name="decoder" value="decoder"/>
   </component>

    <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
        <property name="searchManager" value="searchManager"/>
    </component>

    <component name="searchManager" 
        type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
        <property name="logMath" value="logMath"/>
        <property name="linguist" value="flatLinguist"/>
        <property name="pruner" value="trivialPruner"/>
        <property name="scorer" value="threadedScorer"/>
        <property name="activeListFactory" value="activeList"/>
    </component>


    <component name="activeList" 
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>

    <component name="trivialPruner" 
                type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>

    <component name="threadedScorer" 
                type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
        <property name="frontend" value="${frontend}"/>
    </component>

    <component name="flatLinguist"
                type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
        <property name="logMath" value="logMath"/>
        <property name="grammar" value="textAlignGrammar"/>
        <property name="acousticModel" value="frenchAM"/>
        <property name="wordInsertionProbability"
                value="${wordInsertionProbability}"/>
        <property name="languageWeight" value="${languageWeight}"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="textAlignGrammar" type="edu.cmu.sphinx.linguist.language.grammar.TextAlignerGrammar">
        <property name="dictionary" value="dictionary"/>
    <property name="logMath" value="logMath"/>
    </component>


    <component name="trigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> 
    <property name="unigramWeight" value="0.7"/> 
    <property name="maxDepth" value="3"/> 
    <property name="logMath" value="logMath"/> 
    <property name="dictionary" value="dictionary"/> 
    <property name="location" value="models/acoustic/french_f0/etc/french3g62K.DMP"/> 
    </component>



    <component name="dictionary" 
        type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
        <property name="dictionaryPath"
                  value="models/acoustic/french_f0/etc/frenchWords62K.dic"/>
        <property name="fillerPath" 
              value="models/acoustic/french_f0/etc/frenchFillers.dic"/>
        <property name="addSilEndingPronunciation" value="true"/>
        <property name="wordReplacement" value="&lt;sil&gt;"/>
        <property name="unitManager" value="unitManager"/>
    </component>

  <component name="frenchAMLoader"
         type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
    <property name="logMath" value="logMath"/>
    <property name="unitManager" value="unitManager"/>
        <property name="location" value="models/acoustic/french_f0/model_parameters/french_f0.cd_cont_5725_22/"/>
    <property name="modelDefinition" value="french_f0.5725.mdef"/>
    <property name="properties_file" value="am.props"/>
  </component>

  <component name="frenchAM" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
    <property name="loader" value="frenchAMLoader"/>
    <property name="unitManager" value="unitManager"/>
  </component>


    <component name="wsj"
               type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
        <property name="loader" value="wsjLoader"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
        <property name="location" value="resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz"/>
        <property name="modelDefinition" value="etc/WSJ_clean_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef"/>
        <property name="dataLocation" value="cd_continuous_8gau/"/>
    </component>

    <component name="unitManager" 
        type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>

    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>audioFileDataSource </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>
        <item>BatchCMN </item>
        <item>BatchAGC </item>
        </propertylist>
    </component>

    <component name="audioFileDataSource" type="edu.cmu.sphinx.frontend.util.AudioFileDataSource"/>

    <component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>

    <component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"/>

    <component name="nonSpeechDataFilter" 
               type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>

    <component name="speechMarker" type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" />

    <component name="preemphasizer"
               type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>

    <component name="windower" 
               type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
    </component>

    <component name="fft" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
    </component>

    <component name="melFilterBank" 
        type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
    </component>

    <component name="dct" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>

    <component name="liveCMN" 
               type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>

    <component name="featureExtraction" 
               type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>

    <component name="BatchCMN" 
        type="edu.cmu.sphinx.frontend.feature.BatchCMN"/>

    <component name="BatchAGC" 
        type="edu.cmu.sphinx.frontend.feature.BatchAGC"/>

    <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
        <property name="logBase" value="1.0001"/>
        <property name="useAddTable" value="true"/>
    </component>


</config>

RESULT (counting from 0-9 in French):

zero(0.92163265,1.7796825)
un(2.5353289,4.5507483)
un(5.87161,6.2906575)
un(7.606213,8.174921)
un(9.828481,10.038005)
un(11.493016,11.782358)
un(13.227075,13.5164175)
un(14.571701,15.190295)
un(16.78399,17.562222)

I just learned that the placement of BatchAGC is important. With the frontend
configured in the following way, I get mostly correct results, a lot of junk,
and one missing word (quatre). Using BatchCMN makes things worse, so I removed
it.

    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>audioFileDataSource </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
        <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
        <item>BatchAGC </item>
            <item>featureExtraction </item>  
        </propertylist>
    </component>

A sample from the end of the results. Now I need to figure out how if I can
filter the results like LiveCMN did, and why quatre is not recognized. If you
could throw me a bone in that respect, I would appreciate it.

-15.664374929302463
-15.022731697289538
-14.516154451947292
-14.720797109791558
-13.907056959837554
-12.94244373489699
-13.429857169098572
-13.889048605326327
-13.708396484638216
-13.037687166300769
-12.533037083927763
-8.747878894606437
-10.340122049435841
-18.641665217680323
-20.486957055362794
-18.255194840401604
neuf(15.995782,17.562222)

Anonymous - 2011-03-04

Unfortunately for my longer example, which is a story rather than a simple
count from 0-9, the recognition is horribly poor, despite being so accurate
for the count example. It seems that there may be some deeply embedded tuning
here for the English language which doesn't work for the French language
model.

I believe there is possibly some tuning that could be done for French, and I
would be interested in learning from those that have already blazed the trail.
Please let me know if you can provide some insight.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hello

Please be more accurate and try to understand how things work. Your frontend
pipeline is wrong. Proper pipeline is cited in the forum thread we referenced,
you just need to read it carefully. Proper pipeline is:

        <propertylist name="pipeline">
            <item>audioFileDataSource </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
        <item>BatchCMN </item>
        <item>BatchAGC </item>
            <item>featureExtraction </item>
        </propertylist>

Anonymous - 2011-03-04

It's clear that you also did not read post number 10 where indicated that I
did have BatchCMN in the pipeline before BatchAGC and it made things worse. Go
back up in the thread and see for yourself. Just in case I was wrong I double-
checked using your recommended pipeline. It's worse. The BatchAGC alone
pipeline is so far the best result I have seen (as I already wrote). Reread
10.

Simply put: what you have suggested is not the solution to this problem,
although the BatchAGC did improve matters. Don't blame me for a lack of
information or for lack of attention, I read everything carefully, which you
would know if you had read carefully yourself.

I have given you all of the information you need to reproduce the situation on
your end with a simple copy and paste. It would be one thing if you had this
set up and it was working for you but I understand that this is some educated
guesswork. I've given you the complete contents of all of my files and the
result. The one thing that is missing is the audio file.

The audio is here:
about.com

It works reasonably well with this pipeline (BatchAGC only):

<propertylist name="pipeline"> <item>audioFileDataSource </item> <item>dataBlocker </item> <item>speechClassifier </item> <item>speechMarker </item> <item>nonSpeechDataFilter </item> <item>preemphasizer </item> <item>windower </item> <item>fft </item> <item>melFilterBank </item> <item>dct </item> <item>BatchAGC </item> <item>featureExtraction </item> </propertylist>

If you have the inclination, you could use the file and code I provided to
reproduce the problem. I just modified the included Aligner.java, aligner.xml
files and added french_f0 into the mix.

My guess is that this is not enough to provide for accurate French recognition
and that some further tuning is needed. It seems that you believe otherwise,
but maybe you or an experienced user are willing to reproduce the problem
using the information I have provided and show me that it is just a simple
tweak as you have indicated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-06

Hello

I tried your audio and indeed it returns not so good results. There is some
issue with long silences between digits, if you'll remove them everything will
be way better. Result with cut silence is:

zero(0.19,1.08) un(1.08,1.23) deux(1.23,2.22) trois(2.22,2.86)
quatre(2.86,3.51) cinq(3.51,4.24) six(4.24,4.83) sept(4.83,5.28)
huit(5.28,6.18) neuf(6.18,6.85)

Aligner algorithm need some work it seems to deal with this particular case.

But that doesn't change the proper frontend configuration listed above since
the configuration is based on prior knowledge, not on the experiments. If
experiments were based on bigger amount of data, they can show which one
performs better.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

marekl - 2011-03-07

According to previous discussions on this forum, problem with proper silence
classification might be caused by wrong positioning of dataBlocker component
in pipeline. Placing this component after VAD (after nonSpeechDataFilter to be
more exact) or removing it (if applicable) may solve this problem (see https:
//sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/3894779/index/p
age/2)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aligner demo - french_f0 - poor results

Speech Recognition Toolkit

Forums

Help

Aligner demo - french_f0 - poor results

Aligner demo - french_f0 - poor results

Speech Recognition Toolkit

Forums

Help

Aligner demo - french_f0 - poor results document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Aligner demo - french_f0 - poor results