CMU Sphinx / Forums / Sphinx4 Help: Hybrid approach for Sphinx 4 HMM using semantic statistical language model

George - 2013-02-08

Hi all,

Could someone supply me some extra info, extra materials, tips for using a semantic statistical language model in Sphinx 4 ? I want to start read in depth this field, how can a hybrid approach with semantic knowledge improve the Sphinx 4 scores.

Any tips are welcome.

Thanks in advance,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bhiksha Raj - 2013-02-08
  
  Hi George
  
  What do you mean by a semantic language model? What you must do will
  depend on that.
  
  -Bhiksha
  
  On Fri, Feb 8, 2013 at 12:14 PM, George spykee89@users.sf.net wrote:
  
  Hi all,
  
  Could someone supply me some extra info, extra materials, tips for using a
  semantic statistical language model in Sphinx 4 ? I want to start read in
  depth this field, how can a hybrid approach with semantic knowledge improve
  the Sphinx 4 scores.
  
  Any tips are welcome.
  
  Thanks in advance,
  George
  
  Hybrid approach for Sphinx 4 HMM using semantic statistical language model
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/prefs/
  
  --
  Bhiksha Raj
  Associate Professor
  Carnegie Mellon University
  Pittsburgh, PA, USA
  Tel: 412 268 9826
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-02-10

Hi Bhiksha,

What I try to accomplish is the next idea:
- I want at the run-time ( before to get the recognized word from Sphinx-4) , when the Sphinx-4 recognize a word from my pronunciation, to be able to generate all possible words that could map the pronounced word, and I would like to use a semantic knowledge, I have the representation, semantic representation and I know that the word One( this is an example now) has a probability of .60 % given some prior words, some global history, prior semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM state, if what I pronounce map the history of ONE word from the semantic knowledge( of course this will have to respect the grammar and other things). Hope you can get my idea, if no, please let me know and I will try to explain it better.
For example:
w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by me, and these words represent a semantic event, prior semantic meaning, and the next most probable word that map this event could be) Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all start with "A" letter, and I want that at the phoneme state "A", to be able to generate all these animals knowing that all these animals are in the semantic knowledge, and also I want to use the semantic knowledge score to influence the HMM state of Sphinx-4, suppose that the user will most probable pronounce the Aligator word given the semantic knowledge).
I'm trying to accomplish such behavior from Sphinx-4, and I would like tips, materials to read, and any idea is welcome.
Kind regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bhiksha Raj - 2013-02-10
  
  I'll give you the bad news first:
  
  You're taking on a difficult problem.
  
  a) Semantics are very hard to characterize and represent symbolically.
  I'm ccing Ben Lambert, who is completing a PhD thesis on this, and
  will have a list of prior papers to read.
  
  b) Historically, and in his research too, incorporating semantics
  doesnt really provide much over Ngram LMs. Ngrams are very good at
  shortlisting words, and at that point acoustics tend to be pretty good
  at making the right choice from this subset. More importantly, most
  semantic models are whole-sentence models. For instance, if
  someone says "after Zubin Mehta completed his study of music in
  Vienna, he joined the graduated from the Julliard college of music, he
  joined the Royal Liverpool Philharmonic as a conductor".
  
  The relationships are deep and distant.
  i) You need to konw that Zubin Mehta is male. This, presumably is
  obtained from some knowledge base
  ii) Since Zubin's male, the sentence must use the word "he" to
  refer to him. Conversely, if the word "he" is used to refer to this
  person, then the sentence is probably about a male person. This means
  "Zubin Mehta" and "he", which are 8 word apart in the sentence provide
  evidence for one another.
  iii) Going deeper is the relationship between Zubin Mehta and the
  Royal Liverpool Philharmonic, and Zubin Metha and "conductor", all of
  which are semantically related and very very distant from one another
  in the sentence. The fact that these are semantically related terms
  is itself unknown and can only be inferred by referring to a knowledge
  base of some kind. The inference is, itself, non-trivial and a
  research problem.
  
  c) The long-distance nature of semantic relationships means you cannot
  use them in a dynamic programming decoder which goes progresses left
  to right. The best we can do is to come up with some initial
  hypotheses for sentences and reweight their scores according to the
  semantic and syntactic consistency of the hypothesized sentences.
  This means, your performance is limited by accuracy of the initial
  hypotheses that are reweighted.
  
  d) There are neural-network based approaches to modelling language
  which supposedly encode some level of semantics. But they have not
  proved to be effective at improving speech recognition results by very
  much.
  
  Now for the good news:
  
  Ben Lambert's been working on this. He needs to encode some of what
  he's doing into a recognizer. He built one of his own in LISP, but
  its too slow to be useful, so doing it in Sphinx4 may be of use to
  him. He's cced. He may have ideas on how to proceed. I'll let him
  speak.
  
  -Bhiksha
  
  On Sun, Feb 10, 2013 at 5:04 PM, George spykee89@users.sf.net wrote:
  
  Hi Bhiksha,
  
  What I try to accomplish is the next idea:
  - I want at the run-time ( before to get the recognized word from Sphinx-4)
  , when the Sphinx-4 recognize a word from my pronunciation, to be able to
  generate all possible words that could map the pronounced word, and I would
  like to use a semantic knowledge, I have the representation, semantic
  representation and I know that the word One( this is an example now) has a
  probability of .60 % given some prior words, some global history, prior
  semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM
  state, if what I pronounce map the history of ONE word from the semantic
  knowledge( of course this will have to respect the grammar and other
  things). Hope you can get my idea, if no, please let me know and I will try
  to explain it better.
  For example:
  w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by
  me, and these words represent a semantic event, prior semantic meaning, and
  the next most probable word that map this event could be)
  Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all
  start with "A" letter, and I want that at the phoneme state "A", to be able
  to generate all these animals knowing that all these animals are in the
  semantic knowledge, and also I want to use the semantic knowledge score to
  influence the HMM state of Sphinx-4, suppose that the user will most
  probable pronounce the Aligator word given the semantic knowledge).
  I'm trying to accomplish such behavior from Sphinx-4, and I would like tips,
  materials to read, and any idea is welcome.
  Kind regards,
  George
  
  Hybrid approach for Sphinx 4 HMM using semantic statistical language model
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/prefs/
  
  --
  Bhiksha Raj
  Associate Professor
  Carnegie Mellon University
  Pittsburgh, PA, USA
  Tel: 412 268 9826
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Bhiksha Raj - 2013-02-10
    
    someone says "after Zubin Mehta completed his study of music in
    Vienna, he joined the graduated from the Julliard college of music, he
    joined the Royal Liverpool Philharmonic as a conductor".
    
    THta should be
    
    after Zubin Mehta completed his study of music in Vienna, he joined
    Royal Liverpool Philharmonic as a conductor"
    
    My gmail editor messed up (I had another example originally which I
    replaced and the two got mixed..)
    
    -B
    
    --
    Bhiksha Raj
    Associate Professor
    Carnegie Mellon University
    Pittsburgh, PA, USA
    Tel: 412 268 9826
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-02-12

Hi Bhiksha,

Please, can you help me to get in contact with Mr.Ben Lambert, and the second thing is, can you tell me the improvement rate of the approach "semantic language model" proposed last year ,this one http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.

I'm at master program now, and this is my topic, I do research about semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any help would be appreciate.

Kind regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bhiksha Raj - 2013-02-12
  
  Hi George
  
  Ben is cced.
  
  -Bhiksha
  
  On Tue, Feb 12, 2013 at 2:24 PM, George spykee89@users.sf.net wrote:
  
  Hi Bhiksha,
  
  Please, can you help me to get in contact with Mr.Ben Lambert, and the
  second thing is, can you tell me the improvement rate of the approach
  "semantic language model" proposed last year ,this one
  http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.
  
  I'm at master program now, and this is my topic, I do research about
  semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any
  help would be appreciate.
  
  Kind regards,
  George
  
  Hybrid approach for Sphinx 4 HMM using semantic statistical language model
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/prefs/
  
  --
  Bhiksha Raj
  Associate Professor
  Carnegie Mellon University
  Pittsburgh, PA, USA
  Tel: 412 268 9826
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ben Lambert - 2013-02-12
    
    Hi George and Bhiksha,
    I attempted to send the message below yesterday, but it bounced back
    since I wasn't on this mailing list. (I think I am now).
    
    Anyway, feel free to contact me directly. I'm not sure who wrote/worked
    on the work described in the link George sent below, so I don't know any
    of the details of that.
    -Ben
    
    Hello!
    
    Bhiksha is right, this problem in general is very challenging. I think
    what Bhiksha mentioned under 'c' below is probably the most challenging
    aspect to incorporating this sort of information into a decoder.
    
    However, you may be able to handle some more simplified cases using just
    a class-based language model and/or a grammar. I'm not sure I
    understand exactly what you're trying to do here. Perhaps you could
    explain again?
    
    Best,
    Ben
    
    On 2/12/2013 3:31 PM, Bhiksha Raj wrote:
    
    Hi George
    
    Ben is cced.
    
    -Bhiksha
    
    On Tue, Feb 12, 2013 at 2:24 PM, George spykee89@users.sf.net
    spykee89@users.sf.net wrote:
    
    Hi Bhiksha, Please, can you help me to get in contact with Mr.Ben Lambert, and the second thing is, can you tell me the improvement rate of the approach "semantic language model" proposed last year ,this one http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel. I'm at master program now, and this is my topic, I do research about semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any help would be appreciate. Kind regards, George ------------------------------------------------------------------------ Hybrid approach for Sphinx 4 HMM using semantic statistical language model ------------------------------------------------------------------------ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cmusphinx/discussion/sphinx4/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/
    
    --
    Bhiksha Raj
    Associate Professor
    Carnegie Mellon University
    Pittsburgh, PA, USA
    Tel: 412 268 9826
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/cmusphinx/discussion/sphinx4/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/prefs/
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-02-17

Hi Ben,

I want to do something similar like in the link I sent earlier (http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel ). I'm novice in this area and this is why I asked your help; in order for me to understand and to start doing something practical, I need tips and materials from you. Can you please tell me why at point 6 (6- What codes are needed to change? ) is written * Trainer a) ?? ( these are on the link I gave you ), the Linguist HMM DAG can't be updated at run time mode with semantic probabilities ? For the moment let assume that I have 10.000 possible semantic probabilities for a vocabulary V, and the grammar I have into Sphinx- 4 exists in my semantic knowledge; I want to take the probability from semantic part and to update, adjust the syntactic probabilities from the Linguist HMM DAG. When I start the recognition process, I want to print all possible words based the semantic knowledge ( i.e. : Animals | Alligator | Abalone | Aidi | Airedale, | -OR, and when the the system recognize the phoneme A, I want to print all words from the list and I know from the semantic knowledge that Alligator has 0.60% chances to be the next words and the other have less than 0.50 %, so the Linguist HMM DAG language probabilities should be updated). Why I need to change the trainer when I should be able to update (I believe this can be done), adjust the HMM graph ?

Kind regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-02-18

Can you please tell me why at point 6 (6- What codes are needed to change? ) is written * Trainer a) ?? ( these are on the link I gave you )

By "trainer" that page meant a trainer for the semantic language model, not for the acoustic model.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-02-18

Hi Nickolay,

Thank you for your answer. Ben, I need help with the Scorer, SearchGraph, HMM language probabilities. This is what I want to a acquire. I read some reports about this framework Sphinx-4, and I want tips, any starting point that would drive me to the fully understanding of this probabilistic language( my coordinating teacher will give me a semantic representation, semantic language model and he expect from me to combine the syntactic model with the semantic one, to keep it simple this is what I have to do, further I will see, I need to dig...). Can you help me giving tips, any starting point?

Kind regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ben Lambert - 2013-02-18
  
  Hi George,
  
  I think I understand what you're saying now:
  Your teacher is giving you a semantic representation and semantic language model, and you want to incorporate that model's score's into Sphinx 4's decoder search process.
  
  First, I think you should consider approaching this a little differently to start, before trying to integrate directly into Sphinx4. What I would suggest trying to start is using Sphinx to generate n-best lists, and then rescoring and reranking those with your semantic language model. (You could also use use Sphinx to generate word hypothesis lattices, and then re-score those).
  
  If you do want to jump right into the Sphinx4 decoder... I'm not sure how much I can help, most of my experience is with Sphinx3. But if you already have a language model, then from this page:
  http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel
  you can skip parts 1-4. You can also skip "6- What codes are needed to change? * Trainer a) " since you already have a language model (that I assume can produce it's own scores).
  
  The key parts for you I think would be these two:
  
  5) How to combine different models together?
  
  Only if combined with semantic language model with traditional model like N-gram, we can have a better performance. In [9] or LSA [3], the semantic model itself is part of the language model, so it doesn't need to worry about how to combine them. However, in [2] , we need to combined them together. Here, EM algorithm can be used.
  
  6) What codes are needed to change?
  ....
  * Recognizer (Take Sphinx4 for example) In particular, we can implement a LanguageModel class, just like LargeTrigramModel, SimpleNGramModel. The most important one is the function getProbability(WordSequence wordSequence) based on Semantic information introduced above. Here, the wordSequence will not need to be adjacent words, but can be any history words, concept or topic words.
  
  In addition, since the lattice structure might be different, we also need to write a Linguist to manage the lattice, such as constructing, scoring.
  
  I discussed some of these issues in my thesis proposal, it might be helpful to take a look at that (especially chapter 6) and some of the citations:
  http://www.cs.cmu.edu/~belamber/Papers/Lambert_ThesisProposal.pdf
  That's from a few years ago. Unfortunately, I don't have anything more recent than it. I'm sure someone else has written this up more coherently, possibly Roni Rosenfeld, or in one of the several speech recognition textbooks.
  
  Hopefully this helps a bit.
  
  Best,
  Ben
  
  --
  Benjamin Lambert
  Ph.D. Student of Computer Science
  Carnegie Mellon University
  www.cs.cmu.edu/~belamber
  Mobile: 617-869-1844
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-02-19

Hi Ben,

Any word from this thread help me. Thank you. When I will have results, I will post them here.

Kind regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-08-06

Can someone give me a help with AlternateHypothesisManager ? What I want to achieve is to get a list of predecessors tokens for a token. I looked over the API and I found this class, but when I try to get the list of lower scored tokens from a Result object, I get NULL value, always. What do I miss here? How this class should be used? Isn't populated dynamically inside the Result class? This is the searchManager used on the config file "WordPruningBreadthFirstSearchManager". Any tips of how to achieve the list of predecessors tokens (with a lower score than the getPredecessor method) for a token? Should the token be in a specific search state in order to get the full/partial list of predecessors tokens?

Regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-08-07

What do I miss here?

Alternatives are only available for Word tokens and only if buildWordLattice is enabled in configuratoin of the search manager.

How this class should be used?

See lattice demo and the way you Lattice is constructed from a result

Isn't populated dynamically inside the Result class?

No

This is the searchManager used on the config file "WordPruningBreadthFirstSearchManager".

Yes

Any tips of how to achieve the list of predecessors tokens (with a lower score
than the getPredecessor method) for a token?

see Lattice class

Should the token be in a specific search state in order to get the full/partial list of predecessors tokens?

Yes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-08-08

Hi Nickolay,

I looked over the Lattice class, and what I needed can be found inside allPaths() method, and other methods from Lattice class(nodes, edges). I see the Lattice class as an external data structure, a graph that is used just for representation and debug( it isn't used for searching, in exchange the token tree is used for this). I read that Sphinx 4 uses "a token tree is used to manage the active paths during the search ", which can't be found or applied on lattice structure. My scope is : extract the searching graph (the best would be the lattice, or a structure that has what Lattice class offer, maybe I should make work for this?), the graph shouldn't be pruned( or at least configurable, and a number of configurable "neighborhood" for each node - I think this is maxLatticeEdges property from WordPruningBreadthFirstSearchManager ), I need to manipulate each language model score (weighting) of each edge, and to observe the results.

Re-scoring the lattice is what I want, but how ? - the token tree is used as a search space.
Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...

How can I achieve this ?
All I did was to read all the papers about this framework, and to run the demo examples, to look inside the framework code....
Can you please give me few tips? Where should I look in order to achieve what I want?

Kind regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-08-08

Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...

I'm not sure you understand all the concepts and also I'm not sure you carefully read what was written before to you in this thread. In simple case you want rescoring of the n-best list, not the lattice, so the sequence of steps would be:

Create n-best list result with scores

Rescore using semantic language model

Select best rescored result according to new score

Next step would be lattice:

Create lattice with scores

Change language model scores in the lattice

Retrieve a new best path in a lattice with getViterbiPath() method.

To become familar with n-best and lattices read a textbook on speech recognition.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-09-16

Hi Nickolay,

I managed to build the n-best list, but I encountered a tiny problem.
My problem is that after building the n-best list from lattice, I found that all of the scores have the following form ( I used LogMath from Sphinx 4 lib to generate the linear values) :
* 0.00000000000000000000011044066960,
* 0.00000000000000000000012975
* 0.0000000000000000000000942
Why the values are not real number between 0 and 1?

The Java code I use to generate the linear values
LogMath log = lattice.getLogMath(); // the lattice from which I build n-best list

double score = log.logToLinear((float) edge.getLMScore());

//I used BigDecimal just to display it nice
BigDecimal bd = new BigDecimal(score);

System.out.println(bd.toPlainString());

// my idea was to iterate through the n-best list, and to weight the nodes from each n-best list. I wanted to truncate the score value up to 3-5 decimals using BigDecimal, but I found that all scores have the first 14-20 decimals zero.

//Some output
[edge: Edge(Node(~~,0|0)-->Node(one,0|21)[-2199378.75,-503905.375]~~), score: -503905.375]
linear value: 0.000000000000000000000129756660450880268359917002168448002541250374405878042931325224651*484262494705035351216793060302734375

// LogMath base
logBase=1.0001
useAddTable =true

I used LatticeDemo where the input is a stream ( transcriber demo, 10001-90210-01803.wav file), and minor changes inside the configuration file( e.g tuning parameters based on a comment you made on a forum) + a new n-gram .

Could you please show me the right way? I don't know where to search the explanation for this.

Regards,
George

Last edit: George 2013-09-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-09-16

Why the values are not real number between 0 and 1?

They are real numbers between zero and 1, actually. Probably you was wondering why there are so many zeros. You need to understand what probability space are you working in. The probability of observation given word sequence can be pretty small actually since observation space is large and there are many sequences.

If you are looking for confidence scores, you probably want to look on confidence demo instead.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

George - 2013-09-17

Hi Nickolay,

Thank you for the info. I will have a look at the confidence scores.

How the edges languageModel scores are computed( n-gram probs?)?

Regards,
George

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-09-17

How the edges languageModel scores are computed( n-gram probs?)?

By asking the language model for the probability of the sequence.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hybrid approach for Sphinx 4 HMM using semantic statistical language model

Speech Recognition Toolkit

Forums

Help

Hybrid approach for Sphinx 4 HMM using semantic statistical language model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Hybrid approach for Sphinx 4 HMM using semantic statistical language model