Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Hybrid approach for Sphinx 4 HMM using semantic statistical language model

George
2013-02-08
2013-09-17
  • George
    George
    2013-02-08

    Hi all,

    Could someone supply me some extra info, extra materials, tips for using a semantic statistical language model in Sphinx 4 ? I want to start read in depth this field, how can a hybrid approach with semantic knowledge improve the Sphinx 4 scores.

    Any tips are welcome.

    Thanks in advance,
    George

     
    • Bhiksha Raj
      Bhiksha Raj
      2013-02-08

      Hi George

      What do you mean by a semantic language model? What you must do will
      depend on that.

      -Bhiksha

      On Fri, Feb 8, 2013 at 12:14 PM, George spykee89@users.sf.net wrote:

      Hi all,

      Could someone supply me some extra info, extra materials, tips for using a
      semantic statistical language model in Sphinx 4 ? I want to start read in
      depth this field, how can a hybrid approach with semantic knowledge improve
      the Sphinx 4 scores.

      Any tips are welcome.

      Thanks in advance,
      George


      Hybrid approach for Sphinx 4 HMM using semantic statistical language model


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cmusphinx/discussion/sphinx4/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/prefs/

      --
      Bhiksha Raj
      Associate Professor
      Carnegie Mellon University
      Pittsburgh, PA, USA
      Tel: 412 268 9826

       
  • George
    George
    2013-02-10

    Hi Bhiksha,

    What I try to accomplish is the next idea:
    - I want at the run-time ( before to get the recognized word from Sphinx-4) , when the Sphinx-4 recognize a word from my pronunciation, to be able to generate all possible words that could map the pronounced word, and I would like to use a semantic knowledge, I have the representation, semantic representation and I know that the word One( this is an example now) has a probability of .60 % given some prior words, some global history, prior semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM state, if what I pronounce map the history of ONE word from the semantic knowledge( of course this will have to respect the grammar and other things). Hope you can get my idea, if no, please let me know and I will try to explain it better.
    For example:
    w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by me, and these words represent a semantic event, prior semantic meaning, and the next most probable word that map this event could be) Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all start with "A" letter, and I want that at the phoneme state "A", to be able to generate all these animals knowing that all these animals are in the semantic knowledge, and also I want to use the semantic knowledge score to influence the HMM state of Sphinx-4, suppose that the user will most probable pronounce the Aligator word given the semantic knowledge).
    I'm trying to accomplish such behavior from Sphinx-4, and I would like tips, materials to read, and any idea is welcome.
    Kind regards,
    George

     
    • Bhiksha Raj
      Bhiksha Raj
      2013-02-10

      I'll give you the bad news first:

      You're taking on a difficult problem.

      a) Semantics are very hard to characterize and represent symbolically.
      I'm ccing Ben Lambert, who is completing a PhD thesis on this, and
      will have a list of prior papers to read.

      b) Historically, and in his research too, incorporating semantics
      doesnt really provide much over Ngram LMs. Ngrams are very good at
      shortlisting words, and at that point acoustics tend to be pretty good
      at making the right choice from this subset. More importantly, most
      semantic models are whole-sentence models. For instance, if
      someone says "after Zubin Mehta completed his study of music in
      Vienna, he joined the graduated from the Julliard college of music, he
      joined the Royal Liverpool Philharmonic as a conductor".

      The relationships are deep and distant.
      i) You need to konw that Zubin Mehta is male. This, presumably is
      obtained from some knowledge base
      ii) Since Zubin's male, the sentence must use the word "he" to
      refer to him. Conversely, if the word "he" is used to refer to this
      person, then the sentence is probably about a male person. This means
      "Zubin Mehta" and "he", which are 8 word apart in the sentence provide
      evidence for one another.
      iii) Going deeper is the relationship between Zubin Mehta and the
      Royal Liverpool Philharmonic, and Zubin Metha and "conductor", all of
      which are semantically related and very very distant from one another
      in the sentence. The fact that these are semantically related terms
      is itself unknown and can only be inferred by referring to a knowledge
      base of some kind. The inference is, itself, non-trivial and a
      research problem.

      c) The long-distance nature of semantic relationships means you cannot
      use them in a dynamic programming decoder which goes progresses left
      to right. The best we can do is to come up with some initial
      hypotheses for sentences and reweight their scores according to the
      semantic and syntactic consistency of the hypothesized sentences.
      This means, your performance is limited by accuracy of the initial
      hypotheses that are reweighted.

      d) There are neural-network based approaches to modelling language
      which supposedly encode some level of semantics. But they have not
      proved to be effective at improving speech recognition results by very
      much.

      Now for the good news:

      Ben Lambert's been working on this. He needs to encode some of what
      he's doing into a recognizer. He built one of his own in LISP, but
      its too slow to be useful, so doing it in Sphinx4 may be of use to
      him. He's cced. He may have ideas on how to proceed. I'll let him
      speak.

      -Bhiksha

      On Sun, Feb 10, 2013 at 5:04 PM, George spykee89@users.sf.net wrote:

      Hi Bhiksha,

      What I try to accomplish is the next idea:
      - I want at the run-time ( before to get the recognized word from Sphinx-4)
      , when the Sphinx-4 recognize a word from my pronunciation, to be able to
      generate all possible words that could map the pronounced word, and I would
      like to use a semantic knowledge, I have the representation, semantic
      representation and I know that the word One( this is an example now) has a
      probability of .60 % given some prior words, some global history, prior
      semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM
      state, if what I pronounce map the history of ONE word from the semantic
      knowledge( of course this will have to respect the grammar and other
      things). Hope you can get my idea, if no, please let me know and I will try
      to explain it better.
      For example:
      w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by
      me, and these words represent a semantic event, prior semantic meaning, and
      the next most probable word that map this event could be)
      Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all
      start with "A" letter, and I want that at the phoneme state "A", to be able
      to generate all these animals knowing that all these animals are in the
      semantic knowledge, and also I want to use the semantic knowledge score to
      influence the HMM state of Sphinx-4, suppose that the user will most
      probable pronounce the Aligator word given the semantic knowledge).
      I'm trying to accomplish such behavior from Sphinx-4, and I would like tips,
      materials to read, and any idea is welcome.
      Kind regards,
      George


      Hybrid approach for Sphinx 4 HMM using semantic statistical language model


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cmusphinx/discussion/sphinx4/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/prefs/

      --
      Bhiksha Raj
      Associate Professor
      Carnegie Mellon University
      Pittsburgh, PA, USA
      Tel: 412 268 9826

       
      • Bhiksha Raj
        Bhiksha Raj
        2013-02-10

        someone says "after Zubin Mehta completed his study of music in
        Vienna, he joined the graduated from the Julliard college of music, he
        joined the Royal Liverpool Philharmonic as a conductor".

        THta should be

        after Zubin Mehta completed his study of music in Vienna, he joined
        Royal Liverpool Philharmonic as a conductor"

        My gmail editor messed up (I had another example originally which I
        replaced and the two got mixed..)

        -B

        --
        Bhiksha Raj
        Associate Professor
        Carnegie Mellon University
        Pittsburgh, PA, USA
        Tel: 412 268 9826

         
  • George
    George
    2013-02-12

    Hi Bhiksha,

    Please, can you help me to get in contact with Mr.Ben Lambert, and the second thing is, can you tell me the improvement rate of the approach "semantic language model" proposed last year ,this one http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.

    I'm at master program now, and this is my topic, I do research about semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any help would be appreciate.

    Kind regards,
    George

     
    • Bhiksha Raj
      Bhiksha Raj
      2013-02-12

      Hi George

      Ben is cced.

      -Bhiksha

      On Tue, Feb 12, 2013 at 2:24 PM, George spykee89@users.sf.net wrote:

      Hi Bhiksha,

      Please, can you help me to get in contact with Mr.Ben Lambert, and the
      second thing is, can you tell me the improvement rate of the approach
      "semantic language model" proposed last year ,this one
      http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.

      I'm at master program now, and this is my topic, I do research about
      semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any
      help would be appreciate.

      Kind regards,
      George


      Hybrid approach for Sphinx 4 HMM using semantic statistical language model


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cmusphinx/discussion/sphinx4/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/prefs/

      --
      Bhiksha Raj
      Associate Professor
      Carnegie Mellon University
      Pittsburgh, PA, USA
      Tel: 412 268 9826

       
      • Ben Lambert
        Ben Lambert
        2013-02-12

        Hi George and Bhiksha,
        I attempted to send the message below yesterday, but it bounced back
        since I wasn't on this mailing list. (I think I am now).

        Anyway, feel free to contact me directly. I'm not sure who wrote/worked
        on the work described in the link George sent below, so I don't know any
        of the details of that.
        -Ben

        Hello!

        Bhiksha is right, this problem in general is very challenging. I think
        what Bhiksha mentioned under 'c' below is probably the most challenging
        aspect to incorporating this sort of information into a decoder.

        However, you may be able to handle some more simplified cases using just
        a class-based language model and/or a grammar. I'm not sure I
        understand exactly what you're trying to do here. Perhaps you could
        explain again?

        Best,
        Ben

        On 2/12/2013 3:31 PM, Bhiksha Raj wrote:

        Hi George

        Ben is cced.

        -Bhiksha

        On Tue, Feb 12, 2013 at 2:24 PM, George spykee89@users.sf.net
        spykee89@users.sf.net wrote:

        Hi Bhiksha,
        
        Please, can you help me to get in contact with Mr.Ben Lambert, and the
        second thing is, can you tell me the improvement rate of the approach
        "semantic language model" proposed last year ,this one
        http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.
        
        I'm at master program now, and this is my topic, I do research about
        semantics and pragmatics in speech recognition systems( Sphinx-4 )
        and any
        help would be appreciate.
        
        Kind regards,
        George
        
        ------------------------------------------------------------------------
        
        Hybrid approach for Sphinx 4 HMM using semantic statistical
        language model
        
        ------------------------------------------------------------------------
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/prefs/
        

        --
        Bhiksha Raj
        Associate Professor
        Carnegie Mellon University
        Pittsburgh, PA, USA
        Tel: 412 268 9826


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/cmusphinx/discussion/sphinx4/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/prefs/

         
  • George
    George
    2013-02-17

    Hi Ben,

    I want to do something similar like in the link I sent earlier (http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel ). I'm novice in this area and this is why I asked your help; in order for me to understand and to start doing something practical, I need tips and materials from you. Can you please tell me why at point 6 (6- What codes are needed to change? ) is written * Trainer a) ?? ( these are on the link I gave you ), the Linguist HMM DAG can't be updated at run time mode with semantic probabilities ? For the moment let assume that I have 10.000 possible semantic probabilities for a vocabulary V, and the grammar I have into Sphinx- 4 exists in my semantic knowledge; I want to take the probability from semantic part and to update, adjust the syntactic probabilities from the Linguist HMM DAG. When I start the recognition process, I want to print all possible words based the semantic knowledge ( i.e. : Animals | Alligator | Abalone | Aidi | Airedale, | -OR, and when the the system recognize the phoneme A, I want to print all words from the list and I know from the semantic knowledge that Alligator has 0.60% chances to be the next words and the other have less than 0.50 %, so the Linguist HMM DAG language probabilities should be updated). Why I need to change the trainer when I should be able to update (I believe this can be done), adjust the HMM graph ?

    Kind regards,
    George

     
  • Can you please tell me why at point 6 (6- What codes are needed to change? ) is written * Trainer a) ?? ( these are on the link I gave you )

    By "trainer" that page meant a trainer for the semantic language model, not for the acoustic model.

     
  • George
    George
    2013-02-18

    Hi Nickolay,

    Thank you for your answer. Ben, I need help with the Scorer, SearchGraph, HMM language probabilities. This is what I want to a acquire. I read some reports about this framework Sphinx-4, and I want tips, any starting point that would drive me to the fully understanding of this probabilistic language( my coordinating teacher will give me a semantic representation, semantic language model and he expect from me to combine the syntactic model with the semantic one, to keep it simple this is what I have to do, further I will see, I need to dig...). Can you help me giving tips, any starting point?

    Kind regards,
    George

     
    • Ben Lambert
      Ben Lambert
      2013-02-18

      Hi George,

      I think I understand what you're saying now:
      Your teacher is giving you a semantic representation and semantic language model, and you want to incorporate that model's score's into Sphinx 4's decoder search process.

      First, I think you should consider approaching this a little differently to start, before trying to integrate directly into Sphinx4. What I would suggest trying to start is using Sphinx to generate n-best lists, and then rescoring and reranking those with your semantic language model. (You could also use use Sphinx to generate word hypothesis lattices, and then re-score those).

      If you do want to jump right into the Sphinx4 decoder... I'm not sure how much I can help, most of my experience is with Sphinx3. But if you already have a language model, then from this page:
      http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel
      you can skip parts 1-4. You can also skip "6- What codes are needed to change? * Trainer a) " since you already have a language model (that I assume can produce it's own scores).

      The key parts for you I think would be these two:

      5) How to combine different models together?

      Only if combined with semantic language model with traditional model like N-gram, we can have a better performance. In [9] or LSA [3], the semantic model itself is part of the language model, so it doesn't need to worry about how to combine them. However, in [2] , we need to combined them together. Here, EM algorithm can be used.

      6) What codes are needed to change?
      ....
      * Recognizer (Take Sphinx4 for example) In particular, we can implement a LanguageModel class, just like LargeTrigramModel, SimpleNGramModel. The most important one is the function getProbability(WordSequence wordSequence) based on Semantic information introduced above. Here, the wordSequence will not need to be adjacent words, but can be any history words, concept or topic words.

      In addition, since the lattice structure might be different, we also need to write a Linguist to manage the lattice, such as constructing, scoring.

      I discussed some of these issues in my thesis proposal, it might be helpful to take a look at that (especially chapter 6) and some of the citations:
      http://www.cs.cmu.edu/~belamber/Papers/Lambert_ThesisProposal.pdf
      That's from a few years ago. Unfortunately, I don't have anything more recent than it. I'm sure someone else has written this up more coherently, possibly Roni Rosenfeld, or in one of the several speech recognition textbooks.

      Hopefully this helps a bit.

      Best,
      Ben

      --
      Benjamin Lambert
      Ph.D. Student of Computer Science
      Carnegie Mellon University
      www.cs.cmu.edu/~belamber
      Mobile: 617-869-1844

       
  • George
    George
    2013-02-19

    Hi Ben,

    Any word from this thread help me. Thank you. When I will have results, I will post them here.

    Kind regards,
    George

     
  • George
    George
    2013-08-06

    Can someone give me a help with AlternateHypothesisManager ? What I want to achieve is to get a list of predecessors tokens for a token. I looked over the API and I found this class, but when I try to get the list of lower scored tokens from a Result object, I get NULL value, always. What do I miss here? How this class should be used? Isn't populated dynamically inside the Result class? This is the searchManager used on the config file "WordPruningBreadthFirstSearchManager". Any tips of how to achieve the list of predecessors tokens (with a lower score than the getPredecessor method) for a token? Should the token be in a specific search state in order to get the full/partial list of predecessors tokens?

    Regards,
    George

     
  • What do I miss here?

    Alternatives are only available for Word tokens and only if buildWordLattice is enabled in configuratoin of the search manager.

    How this class should be used?

    See lattice demo and the way you Lattice is constructed from a result

    Isn't populated dynamically inside the Result class?

    No

    This is the searchManager used on the config file "WordPruningBreadthFirstSearchManager".

    Yes

    Any tips of how to achieve the list of predecessors tokens (with a lower score
    than the getPredecessor method) for a token?

    see Lattice class

    Should the token be in a specific search state in order to get the full/partial list of predecessors tokens?

    Yes

     
  • George
    George
    2013-08-08

    Hi Nickolay,

    I looked over the Lattice class, and what I needed can be found inside allPaths() method, and other methods from Lattice class(nodes, edges). I see the Lattice class as an external data structure, a graph that is used just for representation and debug( it isn't used for searching, in exchange the token tree is used for this). I read that Sphinx 4 uses "a token tree is used to manage the active paths during the search ", which can't be found or applied on lattice structure. My scope is : extract the searching graph (the best would be the lattice, or a structure that has what Lattice class offer, maybe I should make work for this?), the graph shouldn't be pruned( or at least configurable, and a number of configurable "neighborhood" for each node - I think this is maxLatticeEdges property from WordPruningBreadthFirstSearchManager ), I need to manipulate each language model score (weighting) of each edge, and to observe the results.

    Re-scoring the lattice is what I want, but how ? - the token tree is used as a search space.
    Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...

    How can I achieve this ?
    All I did was to read all the papers about this framework, and to run the demo examples, to look inside the framework code....
    Can you please give me few tips? Where should I look in order to achieve what I want?

    Kind regards,
    George

     
  • Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...

    I'm not sure you understand all the concepts and also I'm not sure you carefully read what was written before to you in this thread. In simple case you want rescoring of the n-best list, not the lattice, so the sequence of steps would be:

    1. Create n-best list result with scores
    2. Rescore using semantic language model
    3. Select best rescored result according to new score

    Next step would be lattice:

    1. Create lattice with scores
    2. Change language model scores in the lattice
    3. Retrieve a new best path in a lattice with getViterbiPath() method.

    To become familar with n-best and lattices read a textbook on speech recognition.

     
  • George
    George
    2013-09-16

    Hi Nickolay,

    I managed to build the n-best list, but I encountered a tiny problem.
    My problem is that after building the n-best list from lattice, I found that all of the scores have the following form ( I used LogMath from Sphinx 4 lib to generate the linear values) :
    0.00000000000000000000011044066960,
    0.00000000000000000000012975
    * 0.0000000000000000000000942
    Why the values are not real number between 0 and 1?


    The Java code I use to generate the linear values
    LogMath log = lattice.getLogMath(); // the lattice from which I build n-best list

    double score = log.logToLinear((float) edge.getLMScore());

    //I used BigDecimal just to display it nice
    BigDecimal bd = new BigDecimal(score);

    System.out.println(bd.toPlainString());

    // my idea was to iterate through the n-best list, and to weight the nodes from each n-best list. I wanted to truncate the score value up to 3-5 decimals using BigDecimal, but I found that all scores have the first 14-20 decimals zero.

    //Some output
    [edge: Edge(Node(,0|0)-->Node(one,0|21)[-2199378.75,-503905.375]), score: -503905.375]
    linear value: 0.000000000000000000000129756660450880268359917002168448002541250374405878042931325224651*484262494705035351216793060302734375

    // LogMath base
    logBase=1.0001
    useAddTable =true


    I used LatticeDemo where the input is a stream ( transcriber demo, 10001-90210-01803.wav file), and minor changes inside the configuration file( e.g tuning parameters based on a comment you made on a forum) + a new n-gram .

    Could you please show me the right way? I don't know where to search the explanation for this.

    Regards,
    George

     
    Last edit: George 2013-09-16
  • Why the values are not real number between 0 and 1?

    They are real numbers between zero and 1, actually. Probably you was wondering why there are so many zeros. You need to understand what probability space are you working in. The probability of observation given word sequence can be pretty small actually since observation space is large and there are many sequences.

    If you are looking for confidence scores, you probably want to look on confidence demo instead.

     
  • George
    George
    2013-09-17

    Hi Nickolay,

    Thank you for the info. I will have a look at the confidence scores.

    How the edges languageModel scores are computed( n-gram probs?)?

    Regards,
    George

     
  • How the edges languageModel scores are computed( n-gram probs?)?

    By asking the language model for the probability of the sequence.