CMU Sphinx / Forums / Help: Dynamic updating of Sphinx4 language model

Shredder Woods - 2011-02-22

I am quite new to Sphinx4 system, so please bear with me.
I am trying to understand how can one change the language model dynamically
during the course of a dialogue. By language model I mean the model generated
by Sphinx system using the grammar+dictionary+etc...
Please help me understand how exactly sphinx takes into account the grammar+
dictionary+wordlist+ etc.. to create a model/structure/tree used for
processing dialogue.

Please help me with relevant references or description. Help me to move
further.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-02-22

If you keep the vocabulary the same you can change language model
probabilities even during decoding, there is no dependency on language model
in the rest of the system except what returned by getProb call on the word
sequence.

You can learn about sphinx4 architecture from the whitepaper

http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4Whitepaper.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-20

Thanks for your response nshmyrev.
I read the reference provided by you.
I understand that the components provided for decoding will be:

The decoder source code

The language dictionary

The filler dictionary

The language model

The test data
Now, for decoding sphinx4 forms a trellis, which is noting but a prduct of
language HMM and time. This trellis is noting but a acyclic graph(or a search
graph as one might call it). What I am interested is in reducing the size of
this structure(search graph), so as to increase the recognition capability of
the system. What I am doing now is, I am passing the decoder updated language
model(read grammar file). I would like to verify whether passing a smaller
grammar file(pruned grammar file which suffices the need of the concerned
sample to be decoded) will help reduce the size of the search graph.
Please make suggestion or respond if I am not making myself not clear.
I am looking forward towards this discussion and I will be quicker to respond
now!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-20

Was reading Sphinx 4 decoder model. Under the topic "GRAPH CONSTRUCTION
MODULE" I read :

The word graph can be converted to a language HMM either dynamically or
statically. In dynamic construction, word HMMs are constructed on demand -
when the search reaches the terminal state for a word, the HMMs for words that
can follow it are constructed if they have not already been instantiated.
During construction, appropriate context dependent sub-word units are used at
the word boundaries. In static construction, the entire language HMM is
constructed statically. HMMs are constructed for all words in the vocabulary.
Each word HMM is composed with several word-beginning and word-ending context
dependent phones, each corresponding to a possible crossword context. Each
word is connected to every other word by linking appropriate context dependent
crossword units.

What changes do I need to make(What should I do) in my config.xml file to do
dynamic construction of search graph?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-20

If we are specifically talking about finite grammars (grammar files), there
are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
choose between them configuring the component class of the component
"linguist"

FlatLinguist is static, it constructs graph statically and DynamicFlatLinguist
constructs graph dynamically. If graphs aren't too large, you can dump them
using the edu.cmu.sphinx.linguist.util.GDLDumper class. After that you can see
them in aisee3. Or you can dump to dot file (easy to write this class
yourself) and visualize it in graphviz.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-20

If we are specifically talking about finite grammars (grammar files), there
are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
choose between them configuring the component class of the component
"linguist"

Finite grammar.... meaning the size of the grammar is finite? Yes, that is the
case with me. I have grammar files which are either digits or words.

FlatLinguist is static, it constructs graph statically and
DynamicFlatLinguist constructs graph dynamically. If graphs aren't too large,
you can dump them using the edu.cmu.sphinx.linguist.util.GDLDumper class.
After that you can see them in aisee3. Or you can dump to dot file (easy to
write this class yourself) and visualize it in graphviz.

Can you suggest me some source/documentation covering this. That would be very
helpful.
Thanks for your quick response.
Regards,
Shredder

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-20

Can you suggest me some source/documentation covering this. That would be
very helpful.

You can find source of the linguists and dumper and accompanying javadoc in
sphinx4 sources.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-21

Thanks for your response nshmyrev.
I tried using Linguist Stats Dumper for checking the total number of states in
the search space. But I am getting an error(problem with config.xml). Can you
help me resolve the error please.

Property Exception component:'recognizer' property:'monitors' - Not all
elements have required type interface edu.cmu.sphinx.instrumentation.Monitor
Found one of type class edu.cmu.sphinx.linguist.util.LinguistStats
edu.cmu.sphinx.util.props.InternalConfigurationException
Property Exception component:'recognizer' property:'monitors' - Not all
elements have required type interface edu.cmu.sphinx.instrumentation.Monitor
Found one of type class edu.cmu.sphinx.linguist.util.LinguistStats
edu.cmu.sphinx.util.props.InternalConfigurationException

Shall I attach(or post contents of) my config,xml file here?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

source.config.xml:

<?xml version="1.0" encoding="UTF-8"?>

<!--
   Sphinx-4 Configuration file
-->

<!-- ******************************************************** -->
<!--  an4 configuration file                             -->
<!-- ******************************************************** -->

<config>

    <!-- ******************************************************** -->
    <!-- frequently tuned properties                              -->
    <!-- ******************************************************** -->

    <property name="logLevel" value="WARNING"/>

    <property name="absoluteBeamWidth"  value="-1"/>
    <property name="relativeBeamWidth"  value="1E-80"/>
    <property name="wordInsertionProbability" value="1E-36"/>
    <property name="languageWeight"     value="8"/>

    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>


    <!-- ******************************************************** -->
    <!-- word recognizer configuration                            -->
    <!-- ******************************************************** -->

    <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
        <property name="decoder" value="decoder"/>
        <propertylist name="monitors">
            <item>accuracyTracker </item>
            <item>speedTracker </item>
            <item>memoryTracker </item>
            <item>recognizerMonitor </item> 
            <item>linguistStats </item>
        </propertylist>
   </component>

    <!-- ******************************************************** -->
    <!-- The Decoder   configuration                              -->
    <!-- ******************************************************** -->

    <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
        <property name="searchManager" value="searchManager"/>
    </component>

    <component name="searchManager" 
        type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
        <property name="logMath" value="logMath"/>
        <property name="linguist" value="flatLinguist"/>
        <property name="pruner" value="trivialPruner"/>
        <property name="scorer" value="threadedScorer"/>
        <property name="activeListFactory" value="activeList"/>
    </component>

    <!--component name="activeList" 
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">-->


     <component name="activeList" 
             type="edu.cmu.sphinx.decoder.search.SortingActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>

    <!--<component name="activeList" 
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>-->

    <component name="trivialPruner" 
                type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>

    <component name="threadedScorer" 
                type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
        <property name="frontend" value="${frontend}"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The linguist  configuration                              -->
    <!-- ******************************************************** -->

    <component name="flatLinguist"
                type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
        <property name="logMath" value="logMath"/>
        <property name="grammar" value="jsgfGrammar"/>
        <property name="acousticModel" value="wsj"/>
        <property name="wordInsertionProbability"
                value="${wordInsertionProbability}"/>
        <property name="languageWeight" value="${languageWeight}"/>
        <property name="unitManager" value="unitManager"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The Grammar  configuration                               -->
    <!-- ******************************************************** -->

    <component name="jsgfGrammar" type="edu.cmu.sphinx.jsgf.JSGFGrammar">
        <property name="dictionary" value="dictionary"/>
        <property name="grammarLocation" 
             value="resource:/edu/cmu/sphinx/demo/application/"/>
        <property name="grammarName" value="digits"/>
    <property name="logMath" value="logMath"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The Dictionary configuration                            -->
    <!-- ******************************************************** -->
    <component name="dictionary" 
        type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
        <property name="dictionaryPath"
                  value="resource:/an4_80_0_8gau_13dCep_8k_40mel_130Hz_3500Hz/dict/an4_80_0.dic"/>
        <property name="fillerPath" 
              value="resource:/an4_80_0_8gau_13dCep_8k_40mel_130Hz_3500Hz/dict/an4_80_0.filler"/>
        <property name="addSilEndingPronunciation" value="false"/>
        <property name="wordReplacement" value="&lt;sil&gt;"/>
        <property name="unitManager" value="unitManager"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The acoustic model configuration                         -->
    <!-- ******************************************************** -->
    <component name="wsj"
               type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
        <property name="loader" value="wsjLoader"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
        <property name="location" value="resource:/an4_80_0_8gau_13dCep_8k_40mel_130Hz_3500Hz"/>
        <property name="modelDefinition" value="etc/an4_80_0.1000.mdef"/>
        <property name="dataLocation" value="cd_continuous_8gau/"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The unit manager configuration                           -->
    <!-- ******************************************************** -->

    <component name="unitManager" 
        type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>

    <!-- ******************************************************** -->
    <!-- The live frontend configuration                          -->
    <!-- ******************************************************** -->
    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>streamCepstrumSource </item>
            <!--<item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>-->
            <item>liveCMN </item>
            <item>featureExtraction </item>
        </propertylist>
    </component>

    <!-- ******************************************************** -->
    <!-- The frontend pipelines                                   -->
    <!-- ******************************************************** -->

    <component name="streamCepstrumSource" type="edu.cmu.sphinx.frontend.util.StreamCepstrumSource"/>

    <!--<component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>

    <component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"/>

    <component name="nonSpeechDataFilter" 
               type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>

    <component name="speechMarker" type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" />

    <component name="preemphasizer"
               type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>

    <component name="windower" 
               type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
    </component>

    <component name="fft" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
    </component>

    <component name="melFilterBank" 
        type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
    </component>

    <component name="dct" 
            type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>-->

    <component name="liveCMN" 
               type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>

    <component name="featureExtraction" 
               type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>


    <!-- ******************************************************* -->
    <!--  monitors                                               -->
    <!-- ******************************************************* -->

    <component name="accuracyTracker" 
                type="edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker">
        <property name="recognizer" value="${recognizer}"/>
        <property name="showAlignedResults" value="false"/>
        <property name="showRawResults" value="false"/>
    </component>

    <component name="memoryTracker" 
                type="edu.cmu.sphinx.instrumentation.MemoryTracker">
        <property name="recognizer" value="${recognizer}"/>
    <property name="showSummary" value="false"/>
    <property name="showDetails" value="false"/>
    </component>

    <component name="speedTracker" 
                type="edu.cmu.sphinx.instrumentation.SpeedTracker">
        <property name="recognizer" value="${recognizer}"/>
        <property name="frontend" value="${frontend}"/>
    <property name="showSummary" value="true"/>
    <property name="showDetails" value="false"/>
    </component>

    <component name="recognizerMonitor" 
                type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
        <property name="recognizer" value="${recognizer}"/>
        <propertylist name="allocatedMonitors">
            <item>linguistStats </item>
        </propertylist>
    </component>

 <component name="linguistStats" 
                type="edu.cmu.sphinx.linguist.util.LinguistStats">
        <property name="linguist" value="flatLinguist"/>
    </component>

    <!-- ******************************************************* -->
    <!--  Miscellaneous components                               -->
    <!-- ******************************************************* -->

    <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
        <property name="logBase" value="1.0001"/>
        <property name="useAddTable" value="true"/>
    </component>

</config>

Shredder Woods - 2011-03-21

Figured out the problem!
Please ignore the above post.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-22

Hello nshmyrev,
Does sphinx allows us to use multiple configuration files?
By multiple config. files, I mean that in my application I have different
fields to be recognized. Can I have different configurations for them by
having different config. files for them.
As far as I understand, decoder takes into account the details(like .gram
file, dictionary, filler etc.) of the config. files during run time, so this
should be really not a problem !

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-22

You can use multiple configuration files managed by multiple config managers
or you can use single file with multiple recognizers configured and switch
between them. There is no issue with both ways.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-23

I am looking forward to implement some techniques that can help improve the
recognition accuracy for recognizing the Hindi language using sphinx4
platform.
Is there something that I can do in sphinx itself, like changing/updating some
parts of sphinx code to improve the recognition capability of the concerned
language.
I am looking forward to suggestions. I hope that the work can be useful for
the sphinx community in general.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-24

Hello

Specifically for Hindi the main issue is to collect enough voie data.
Recognizer source code has no language specifics.

If you are looking for some thing to implement in Sphinx4, there is a feature
which is critically required by any practical system - to return proper
confidence score in grammar recognizer and to reject OOV words reliably. If
you could implement this part, the recognizer will move to the next level.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-25

Hello nshmyrev,
I am looking forward to implement something that can help improve the
recognition accuracy for the Hindi language.
I have enough voice data collected for recognition of names and numbers. I am
still looking forward to ideas.

Can you suggest(direct) me some implementations on Sphinx4, like a voice
reservation system(for trains, bus...), , something of that sort.. Have you
some samples like that ?
Regards,
Shredder

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shredder Woods - 2011-03-25

If you are looking for some thing to implement in Sphinx4, there is a
feature which is critically required by any practical system - to return
proper confidence score in grammar recognizer and to reject OOV words
reliably. If you could implement this part, the recognizer will move to the
next level.

Sorry if I sound ignorant, but what does sphinx do currently for OOV words?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-03-26

Can you suggest(direct) me some implementations on Sphinx4, like a voice
reservation system(for trains, bus...), , something of that sort.. Have you
some samples like that ?

http://wiki.speech.cs.cmu.edu/olympus/index.php/Olympus

Sorry if I sound ignorant, but what does sphinx do currently for OOV words?

http://cmusphinx.sourceforge.net/wiki/sphinx4:rejectionhandling

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dynamic updating of Sphinx4 language model

Speech Recognition Toolkit

Forums

Help

Dynamic updating of Sphinx4 language model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Dynamic updating of Sphinx4 language model