Menu

Dynamic updating of Sphinx4 language model

Help
2011-02-22
2012-09-22
  • Shredder Woods

    Shredder Woods - 2011-02-22

    I am quite new to Sphinx4 system, so please bear with me.
    I am trying to understand how can one change the language model dynamically
    during the course of a dialogue. By language model I mean the model generated
    by Sphinx system using the grammar+dictionary+etc...
    Please help me understand how exactly sphinx takes into account the grammar+
    dictionary+wordlist+ etc.. to create a model/structure/tree used for
    processing dialogue.

    Please help me with relevant references or description. Help me to move
    further.

     
  • Nickolay V. Shmyrev

    If you keep the vocabulary the same you can change language model
    probabilities even during decoding, there is no dependency on language model
    in the rest of the system except what returned by getProb call on the word
    sequence.

    You can learn about sphinx4 architecture from the whitepaper

    http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4Whitepaper.pdf

     
  • Shredder Woods

    Shredder Woods - 2011-03-20

    Thanks for your response nshmyrev.
    I read the reference provided by you.
    I understand that the components provided for decoding will be:

    1. The decoder source code
    2. The language dictionary
    3. The filler dictionary
    4. The language model
    5. The test data
      Now, for decoding sphinx4 forms a trellis, which is noting but a prduct of
      language HMM and time. This trellis is noting but a acyclic graph(or a search
      graph as one might call it). What I am interested is in reducing the size of
      this structure(search graph), so as to increase the recognition capability of
      the system. What I am doing now is, I am passing the decoder updated language
      model(read grammar file). I would like to verify whether passing a smaller
      grammar file(pruned grammar file which suffices the need of the concerned
      sample to be decoded) will help reduce the size of the search graph.
      Please make suggestion or respond if I am not making myself not clear.
      I am looking forward towards this discussion and I will be quicker to respond
      now!
     
  • Shredder Woods

    Shredder Woods - 2011-03-20

    Was reading Sphinx 4 decoder model. Under the topic "GRAPH CONSTRUCTION
    MODULE" I read :

    The word graph can be converted to a language HMM either dynamically or
    statically. In dynamic construction, word HMMs are constructed on demand -
    when the search reaches the terminal state for a word, the HMMs for words that
    can follow it are constructed if they have not already been instantiated.
    During construction, appropriate context dependent sub-word units are used at
    the word boundaries. In static construction, the entire language HMM is
    constructed statically. HMMs are constructed for all words in the vocabulary.
    Each word HMM is composed with several word-beginning and word-ending context
    dependent phones, each corresponding to a possible crossword context. Each
    word is connected to every other word by linking appropriate context dependent
    crossword units.

    What changes do I need to make(What should I do) in my config.xml file to do
    dynamic construction of search graph?

     
  • Nickolay V. Shmyrev

    If we are specifically talking about finite grammars (grammar files), there
    are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
    choose between them configuring the component class of the component
    "linguist"

    FlatLinguist is static, it constructs graph statically and DynamicFlatLinguist
    constructs graph dynamically. If graphs aren't too large, you can dump them
    using the edu.cmu.sphinx.linguist.util.GDLDumper class. After that you can see
    them in aisee3. Or you can dump to dot file (easy to write this class
    yourself) and visualize it in graphviz.

     
  • Shredder Woods

    Shredder Woods - 2011-03-20

    If we are specifically talking about finite grammars (grammar files), there
    are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
    choose between them configuring the component class of the component
    "linguist"

    Finite grammar.... meaning the size of the grammar is finite? Yes, that is the
    case with me. I have grammar files which are either digits or words.

    FlatLinguist is static, it constructs graph statically and
    DynamicFlatLinguist constructs graph dynamically. If graphs aren't too large,
    you can dump them using the edu.cmu.sphinx.linguist.util.GDLDumper class.
    After that you can see them in aisee3. Or you can dump to dot file (easy to
    write this class yourself) and visualize it in graphviz.

    Can you suggest me some source/documentation covering this. That would be very
    helpful.
    Thanks for your quick response.
    Regards,
    Shredder

     
  • Nickolay V. Shmyrev

    Can you suggest me some source/documentation covering this. That would be
    very helpful.

    You can find source of the linguists and dumper and accompanying javadoc in
    sphinx4 sources.

     
  • Shredder Woods

    Shredder Woods - 2011-03-21

    Thanks for your response nshmyrev.
    I tried using Linguist Stats Dumper for checking the total number of states in
    the search space. But I am getting an error(problem with config.xml). Can you
    help me resolve the error please.

    Property Exception component:'recognizer' property:'monitors' - Not all
    elements have required type interface edu.cmu.sphinx.instrumentation.Monitor
    Found one of type class edu.cmu.sphinx.linguist.util.LinguistStats
    edu.cmu.sphinx.util.props.InternalConfigurationException
    Property Exception component:'recognizer' property:'monitors' - Not all
    elements have required type interface edu.cmu.sphinx.instrumentation.Monitor
    Found one of type class edu.cmu.sphinx.linguist.util.LinguistStats
    edu.cmu.sphinx.util.props.InternalConfigurationException

    Shall I attach(or post contents of) my config,xml file here?

     
  • Shredder Woods

    Shredder Woods - 2011-03-21

    source.config.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <!--
       Sphinx-4 Configuration file
    -->
    
    <!-- ******************************************************** -->
    <!--  an4 configuration file                             -->
    <!-- ******************************************************** -->
    
    <config>
    
        <!-- ******************************************************** -->
        <!-- frequently tuned properties                              -->
        <!-- ******************************************************** -->
    
        <property name="logLevel" value="WARNING"/>
    
        <property name="absoluteBeamWidth"  value="-1"/>
        <property name="relativeBeamWidth"  value="1E-80"/>
        <property name="wordInsertionProbability" value="1E-36"/>
        <property name="languageWeight"     value="8"/>
    
        <property name="frontend" value="epFrontEnd"/>
        <property name="recognizer" value="recognizer"/>
        <property name="showCreations" value="false"/>
    
    
        <!-- ******************************************************** -->
        <!-- word recognizer configuration                            -->
        <!-- ******************************************************** -->
    
        <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
            <property name="decoder" value="decoder"/>
            <propertylist name="monitors">
                <item>accuracyTracker </item>
                <item>speedTracker </item>
                <item>memoryTracker </item>
                <item>recognizerMonitor </item> 
                <item>linguistStats </item>
            </propertylist>
       </component>
    
        <!-- ******************************************************** -->
        <!-- The Decoder   configuration                              -->
        <!-- ******************************************************** -->
    
        <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
            <property name="searchManager" value="searchManager"/>
        </component>
    
        <component name="searchManager" 
            type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
            <property name="logMath" value="logMath"/>
            <property name="linguist" value="flatLinguist"/>
            <property name="pruner" value="trivialPruner"/>
            <property name="scorer" value="threadedScorer"/>
            <property name="activeListFactory" value="activeList"/>
        </component>
    
        <!--component name="activeList" 
                 type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">-->
    
    
         <component name="activeList" 
                 type="edu.cmu.sphinx.decoder.search.SortingActiveListFactory">
            <property name="logMath" value="logMath"/>
            <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
            <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
        </component>
    
        <!--<component name="activeList" 
                 type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
            <property name="logMath" value="logMath"/>
            <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
            <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
        </component>-->
    
        <component name="trivialPruner" 
                    type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>
    
        <component name="threadedScorer" 
                    type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
            <property name="frontend" value="${frontend}"/>
        </component>
    
        <!-- ******************************************************** -->
        <!-- The linguist  configuration                              -->
        <!-- ******************************************************** -->
    
        <component name="flatLinguist"
                    type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
            <property name="logMath" value="logMath"/>
            <property name="grammar" value="jsgfGrammar"/>
            <property name="acousticModel" value="wsj"/>
            <property name="wordInsertionProbability"
                    value="${wordInsertionProbability}"/>
            <property name="languageWeight" value="${languageWeight}"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
    
        <!-- ******************************************************** -->
        <!-- The Grammar  configuration                               -->
        <!-- ******************************************************** -->
    
        <component name="jsgfGrammar" type="edu.cmu.sphinx.jsgf.JSGFGrammar">
            <property name="dictionary" value="dictionary"/>
            <property name="grammarLocation" 
                 value="resource:/edu/cmu/sphinx/demo/application/"/>
            <property name="grammarName" value="digits"/>
        <property name="logMath" value="logMath"/>
        </component>
    
        <!-- ******************************************************** -->
        <!-- The Dictionary configuration                            -->
        <!-- ******************************************************** -->
        <component name="dictionary" 
            type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
            <property name="dictionaryPath"
                      value="resource:/an4_80_0_8gau_13dCep_8k_40mel_130Hz_3500Hz/dict/an4_80_0.dic"/>
            <property name="fillerPath" 
                  value="resource:/an4_80_0_8gau_13dCep_8k_40mel_130Hz_3500Hz/dict/an4_80_0.filler"/>
            <property name="addSilEndingPronunciation" value="false"/>
            <property name="wordReplacement" value="&lt;sil&gt;"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
    
        <!-- ******************************************************** -->
        <!-- The acoustic model configuration                         -->
        <!-- ******************************************************** -->
        <component name="wsj"
                   type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
            <property name="loader" value="wsjLoader"/>
            <property name="unitManager" value="unitManager"/>
        </component>
    
        <component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader">
            <property name="logMath" value="logMath"/>
            <property name="unitManager" value="unitManager"/>
            <property name="location" value="resource:/an4_80_0_8gau_13dCep_8k_40mel_130Hz_3500Hz"/>
            <property name="modelDefinition" value="etc/an4_80_0.1000.mdef"/>
            <property name="dataLocation" value="cd_continuous_8gau/"/>
        </component>
    
    
        <!-- ******************************************************** -->
        <!-- The unit manager configuration                           -->
        <!-- ******************************************************** -->
    
        <component name="unitManager" 
            type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>
    
        <!-- ******************************************************** -->
        <!-- The live frontend configuration                          -->
        <!-- ******************************************************** -->
        <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
            <propertylist name="pipeline">
                <item>streamCepstrumSource </item>
                <!--<item>dataBlocker </item>
                <item>speechClassifier </item>
                <item>speechMarker </item>
                <item>nonSpeechDataFilter </item>
                <item>preemphasizer </item>
                <item>windower </item>
                <item>fft </item>
                <item>melFilterBank </item>
                <item>dct </item>-->
                <item>liveCMN </item>
                <item>featureExtraction </item>
            </propertylist>
        </component>
    
        <!-- ******************************************************** -->
        <!-- The frontend pipelines                                   -->
        <!-- ******************************************************** -->
    
        <component name="streamCepstrumSource" type="edu.cmu.sphinx.frontend.util.StreamCepstrumSource"/>
    
        <!--<component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker"/>
    
        <component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"/>
    
        <component name="nonSpeechDataFilter" 
                   type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>
    
        <component name="speechMarker" type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" />
    
        <component name="preemphasizer"
                   type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>
    
        <component name="windower" 
                   type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
        </component>
    
        <component name="fft" 
                type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
        </component>
    
        <component name="melFilterBank" 
            type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
        </component>
    
        <component name="dct" 
                type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>-->
    
        <component name="liveCMN" 
                   type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>
    
        <component name="featureExtraction" 
                   type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>
    
    
        <!-- ******************************************************* -->
        <!--  monitors                                               -->
        <!-- ******************************************************* -->
    
        <component name="accuracyTracker" 
                    type="edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker">
            <property name="recognizer" value="${recognizer}"/>
            <property name="showAlignedResults" value="false"/>
            <property name="showRawResults" value="false"/>
        </component>
    
        <component name="memoryTracker" 
                    type="edu.cmu.sphinx.instrumentation.MemoryTracker">
            <property name="recognizer" value="${recognizer}"/>
        <property name="showSummary" value="false"/>
        <property name="showDetails" value="false"/>
        </component>
    
        <component name="speedTracker" 
                    type="edu.cmu.sphinx.instrumentation.SpeedTracker">
            <property name="recognizer" value="${recognizer}"/>
            <property name="frontend" value="${frontend}"/>
        <property name="showSummary" value="true"/>
        <property name="showDetails" value="false"/>
        </component>
    
        <component name="recognizerMonitor" 
                    type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
            <property name="recognizer" value="${recognizer}"/>
            <propertylist name="allocatedMonitors">
                <item>linguistStats </item>
            </propertylist>
        </component>
    
     <component name="linguistStats" 
                    type="edu.cmu.sphinx.linguist.util.LinguistStats">
            <property name="linguist" value="flatLinguist"/>
        </component>
    
        <!-- ******************************************************* -->
        <!--  Miscellaneous components                               -->
        <!-- ******************************************************* -->
    
        <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
            <property name="logBase" value="1.0001"/>
            <property name="useAddTable" value="true"/>
        </component>
    
    </config>
    
     
  • Shredder Woods

    Shredder Woods - 2011-03-21

    Figured out the problem!
    Please ignore the above post.

     
  • Shredder Woods

    Shredder Woods - 2011-03-22

    Hello nshmyrev,
    Does sphinx allows us to use multiple configuration files?
    By multiple config. files, I mean that in my application I have different
    fields to be recognized. Can I have different configurations for them by
    having different config. files for them.
    As far as I understand, decoder takes into account the details(like .gram
    file, dictionary, filler etc.) of the config. files during run time, so this
    should be really not a problem !

     
  • Nickolay V. Shmyrev

    You can use multiple configuration files managed by multiple config managers
    or you can use single file with multiple recognizers configured and switch
    between them. There is no issue with both ways.

     
  • Shredder Woods

    Shredder Woods - 2011-03-23

    I am looking forward to implement some techniques that can help improve the
    recognition accuracy for recognizing the Hindi language using sphinx4
    platform.
    Is there something that I can do in sphinx itself, like changing/updating some
    parts of sphinx code to improve the recognition capability of the concerned
    language.
    I am looking forward to suggestions. I hope that the work can be useful for
    the sphinx community in general.

     
  • Nickolay V. Shmyrev

    Hello

    Specifically for Hindi the main issue is to collect enough voie data.
    Recognizer source code has no language specifics.

    If you are looking for some thing to implement in Sphinx4, there is a feature
    which is critically required by any practical system - to return proper
    confidence score in grammar recognizer and to reject OOV words reliably. If
    you could implement this part, the recognizer will move to the next level.

     
  • Shredder Woods

    Shredder Woods - 2011-03-25

    Hello nshmyrev,
    I am looking forward to implement something that can help improve the
    recognition accuracy for the Hindi language.
    I have enough voice data collected for recognition of names and numbers. I am
    still looking forward to ideas.

    Can you suggest(direct) me some implementations on Sphinx4, like a voice
    reservation system(for trains, bus...), , something of that sort.. Have you
    some samples like that ?
    Regards,
    Shredder

     
  • Shredder Woods

    Shredder Woods - 2011-03-25

    If you are looking for some thing to implement in Sphinx4, there is a
    feature which is critically required by any practical system - to return
    proper confidence score in grammar recognizer and to reject OOV words
    reliably. If you could implement this part, the recognizer will move to the
    next level.

    Sorry if I sound ignorant, but what does sphinx do currently for OOV words?

     

Log in to post a comment.