Could someone supply me some extra info, extra materials, tips for using a semantic statistical language model in Sphinx 4 ? I want to start read in depth this field, how can a hybrid approach with semantic knowledge improve the Sphinx 4 scores.
Any tips are welcome.
Thanks in advance,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could someone supply me some extra info, extra materials, tips for using a
semantic statistical language model in Sphinx 4 ? I want to start read in
depth this field, how can a hybrid approach with semantic knowledge improve
the Sphinx 4 scores.
Any tips are welcome.
Thanks in advance,
George
Hybrid approach for Sphinx 4 HMM using semantic statistical language model
What I try to accomplish is the next idea:
- I want at the run-time ( before to get the recognized word from Sphinx-4) , when the Sphinx-4 recognize a word from my pronunciation, to be able to generate all possible words that could map the pronounced word, and I would like to use a semantic knowledge, I have the representation, semantic representation and I know that the word One( this is an example now) has a probability of .60 % given some prior words, some global history, prior semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM state, if what I pronounce map the history of ONE word from the semantic knowledge( of course this will have to respect the grammar and other things). Hope you can get my idea, if no, please let me know and I will try to explain it better.
For example:
w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by me, and these words represent a semantic event, prior semantic meaning, and the next most probable word that map this event could be) Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all start with "A" letter, and I want that at the phoneme state "A", to be able to generate all these animals knowing that all these animals are in the semantic knowledge, and also I want to use the semantic knowledge score to influence the HMM state of Sphinx-4, suppose that the user will most probable pronounce the Aligator word given the semantic knowledge).
I'm trying to accomplish such behavior from Sphinx-4, and I would like tips, materials to read, and any idea is welcome.
Kind regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
a) Semantics are very hard to characterize and represent symbolically.
I'm ccing Ben Lambert, who is completing a PhD thesis on this, and
will have a list of prior papers to read.
b) Historically, and in his research too, incorporating semantics
doesnt really provide much over Ngram LMs. Ngrams are very good at
shortlisting words, and at that point acoustics tend to be pretty good
at making the right choice from this subset. More importantly, most
semantic models are whole-sentence models. For instance, if
someone says "after Zubin Mehta completed his study of music in
Vienna, he joined the graduated from the Julliard college of music, he
joined the Royal Liverpool Philharmonic as a conductor".
The relationships are deep and distant.
i) You need to konw that Zubin Mehta is male. This, presumably is
obtained from some knowledge base
ii) Since Zubin's male, the sentence must use the word "he" to
refer to him. Conversely, if the word "he" is used to refer to this
person, then the sentence is probably about a male person. This means
"Zubin Mehta" and "he", which are 8 word apart in the sentence provide
evidence for one another.
iii) Going deeper is the relationship between Zubin Mehta and the
Royal Liverpool Philharmonic, and Zubin Metha and "conductor", all of
which are semantically related and very very distant from one another
in the sentence. The fact that these are semantically related terms
is itself unknown and can only be inferred by referring to a knowledge
base of some kind. The inference is, itself, non-trivial and a
research problem.
c) The long-distance nature of semantic relationships means you cannot
use them in a dynamic programming decoder which goes progresses left
to right. The best we can do is to come up with some initial
hypotheses for sentences and reweight their scores according to the
semantic and syntactic consistency of the hypothesized sentences.
This means, your performance is limited by accuracy of the initial
hypotheses that are reweighted.
d) There are neural-network based approaches to modelling language
which supposedly encode some level of semantics. But they have not
proved to be effective at improving speech recognition results by very
much.
Now for the good news:
Ben Lambert's been working on this. He needs to encode some of what
he's doing into a recognizer. He built one of his own in LISP, but
its too slow to be useful, so doing it in Sphinx4 may be of use to
him. He's cced. He may have ideas on how to proceed. I'll let him
speak.
What I try to accomplish is the next idea:
- I want at the run-time ( before to get the recognized word from Sphinx-4)
, when the Sphinx-4 recognize a word from my pronunciation, to be able to
generate all possible words that could map the pronounced word, and I would
like to use a semantic knowledge, I have the representation, semantic
representation and I know that the word One( this is an example now) has a
probability of .60 % given some prior words, some global history, prior
semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM
state, if what I pronounce map the history of ONE word from the semantic
knowledge( of course this will have to respect the grammar and other
things). Hope you can get my idea, if no, please let me know and I will try
to explain it better.
For example:
w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by
me, and these words represent a semantic event, prior semantic meaning, and
the next most probable word that map this event could be)
Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all
start with "A" letter, and I want that at the phoneme state "A", to be able
to generate all these animals knowing that all these animals are in the
semantic knowledge, and also I want to use the semantic knowledge score to
influence the HMM state of Sphinx-4, suppose that the user will most
probable pronounce the Aligator word given the semantic knowledge).
I'm trying to accomplish such behavior from Sphinx-4, and I would like tips,
materials to read, and any idea is welcome.
Kind regards,
George
Hybrid approach for Sphinx 4 HMM using semantic statistical language model
someone says "after Zubin Mehta completed his study of music in
Vienna, he joined the graduated from the Julliard college of music, he
joined the Royal Liverpool Philharmonic as a conductor".
THta should be
after Zubin Mehta completed his study of music in Vienna, he joined
Royal Liverpool Philharmonic as a conductor"
My gmail editor messed up (I had another example originally which I
replaced and the two got mixed..)
-B
--
Bhiksha Raj
Associate Professor
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please, can you help me to get in contact with Mr.Ben Lambert, and the second thing is, can you tell me the improvement rate of the approach "semantic language model" proposed last year ,this one http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.
I'm at master program now, and this is my topic, I do research about semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any help would be appreciate.
Kind regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please, can you help me to get in contact with Mr.Ben Lambert, and the
second thing is, can you tell me the improvement rate of the approach
"semantic language model" proposed last year ,this one http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.
I'm at master program now, and this is my topic, I do research about
semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any
help would be appreciate.
Kind regards,
George
Hybrid approach for Sphinx 4 HMM using semantic statistical language model
Hi George and Bhiksha,
I attempted to send the message below yesterday, but it bounced back
since I wasn't on this mailing list. (I think I am now).
Anyway, feel free to contact me directly. I'm not sure who wrote/worked
on the work described in the link George sent below, so I don't know any
of the details of that.
-Ben
Hello!
Bhiksha is right, this problem in general is very challenging. I think
what Bhiksha mentioned under 'c' below is probably the most challenging
aspect to incorporating this sort of information into a decoder.
However, you may be able to handle some more simplified cases using just
a class-based language model and/or a grammar. I'm not sure I
understand exactly what you're trying to do here. Perhaps you could
explain again?
Best,
Ben
On 2/12/2013 3:31 PM, Bhiksha Raj wrote:
Hi George
Ben is cced.
-Bhiksha
On Tue, Feb 12, 2013 at 2:24 PM, George spykee89@users.sf.net spykee89@users.sf.net wrote:
HiBhiksha,Please,canyouhelpmetogetincontactwithMr.BenLambert,andthesecondthingis,canyoutellmetheimprovementrateoftheapproach"semantic language model"proposedlastyear,thisonehttp://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.I'matmasterprogramnow,andthisismytopic,Idoresearchaboutsemanticsandpragmaticsinspeechrecognitionsystems(Sphinx-4)andanyhelpwouldbeappreciate.Kindregards,George------------------------------------------------------------------------HybridapproachforSphinx4HMMusingsemanticstatisticallanguagemodel------------------------------------------------------------------------Sentfromsourceforge.netbecauseyouindicatedinterestinhttps://sourceforge.net/p/cmusphinx/discussion/sphinx4/Tounsubscribefromfurthermessages,pleasevisithttps://sourceforge.net/auth/prefs/
--
Bhiksha Raj
Associate Professor
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
I want to do something similar like in the link I sent earlier (http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel ). I'm novice in this area and this is why I asked your help; in order for me to understand and to start doing something practical, I need tips and materials from you. Can you please tell me why at point 6 (6- What codes are needed to change? ) is written * Trainer a) ?? ( these are on the link I gave you ), the Linguist HMM DAG can't be updated at run time mode with semantic probabilities ? For the moment let assume that I have 10.000 possible semantic probabilities for a vocabulary V, and the grammar I have into Sphinx- 4 exists in my semantic knowledge; I want to take the probability from semantic part and to update, adjust the syntactic probabilities from the Linguist HMM DAG. When I start the recognition process, I want to print all possible words based the semantic knowledge ( i.e. : Animals | Alligator | Abalone | Aidi | Airedale, | -OR, and when the the system recognize the phoneme A, I want to print all words from the list and I know from the semantic knowledge that Alligator has 0.60% chances to be the next words and the other have less than 0.50 %, so the Linguist HMM DAG language probabilities should be updated). Why I need to change the trainer when I should be able to update (I believe this can be done), adjust the HMM graph ?
Kind regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for your answer. Ben, I need help with the Scorer, SearchGraph, HMM language probabilities. This is what I want to a acquire. I read some reports about this framework Sphinx-4, and I want tips, any starting point that would drive me to the fully understanding of this probabilistic language( my coordinating teacher will give me a semantic representation, semantic language model and he expect from me to combine the syntactic model with the semantic one, to keep it simple this is what I have to do, further I will see, I need to dig...). Can you help me giving tips, any starting point?
Kind regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think I understand what you're saying now:
Your teacher is giving you a semantic representation and semantic language model, and you want to incorporate that model's score's into Sphinx 4's decoder search process.
First, I think you should consider approaching this a little differently to start, before trying to integrate directly into Sphinx4. What I would suggest trying to start is using Sphinx to generate n-best lists, and then rescoring and reranking those with your semantic language model. (You could also use use Sphinx to generate word hypothesis lattices, and then re-score those).
If you do want to jump right into the Sphinx4 decoder... I'm not sure how much I can help, most of my experience is with Sphinx3. But if you already have a language model, then from this page: http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel
you can skip parts 1-4. You can also skip "6- What codes are needed to change? * Trainer a) " since you already have a language model (that I assume can produce it's own scores).
The key parts for you I think would be these two:
5) How to combine different models together?
Only if combined with semantic language model with traditional model like N-gram, we can have a better performance. In [9] or LSA [3], the semantic model itself is part of the language model, so it doesn't need to worry about how to combine them. However, in [2] , we need to combined them together. Here, EM algorithm can be used.
6) What codes are needed to change?
....
* Recognizer (Take Sphinx4 for example) In particular, we can implement a LanguageModel class, just like LargeTrigramModel, SimpleNGramModel. The most important one is the function getProbability(WordSequence wordSequence) based on Semantic information introduced above. Here, the wordSequence will not need to be adjacent words, but can be any history words, concept or topic words.
In addition, since the lattice structure might be different, we also need to write a Linguist to manage the lattice, such as constructing, scoring.
I discussed some of these issues in my thesis proposal, it might be helpful to take a look at that (especially chapter 6) and some of the citations: http://www.cs.cmu.edu/~belamber/Papers/Lambert_ThesisProposal.pdf
That's from a few years ago. Unfortunately, I don't have anything more recent than it. I'm sure someone else has written this up more coherently, possibly Roni Rosenfeld, or in one of the several speech recognition textbooks.
Hopefully this helps a bit.
Best,
Ben
--
Benjamin Lambert
Ph.D. Student of Computer Science
Carnegie Mellon University
www.cs.cmu.edu/~belamber
Mobile: 617-869-1844
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can someone give me a help with AlternateHypothesisManager ? What I want to achieve is to get a list of predecessors tokens for a token. I looked over the API and I found this class, but when I try to get the list of lower scored tokens from a Result object, I get NULL value, always. What do I miss here? How this class should be used? Isn't populated dynamically inside the Result class? This is the searchManager used on the config file "WordPruningBreadthFirstSearchManager". Any tips of how to achieve the list of predecessors tokens (with a lower score than the getPredecessor method) for a token? Should the token be in a specific search state in order to get the full/partial list of predecessors tokens?
Regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I looked over the Lattice class, and what I needed can be found inside allPaths() method, and other methods from Lattice class(nodes, edges). I see the Lattice class as an external data structure, a graph that is used just for representation and debug( it isn't used for searching, in exchange the token tree is used for this). I read that Sphinx 4 uses "a token tree is used to manage the active paths during the search ", which can't be found or applied on lattice structure. My scope is : extract the searching graph (the best would be the lattice, or a structure that has what Lattice class offer, maybe I should make work for this?), the graph shouldn't be pruned( or at least configurable, and a number of configurable "neighborhood" for each node - I think this is maxLatticeEdges property from WordPruningBreadthFirstSearchManager ), I need to manipulate each language model score (weighting) of each edge, and to observe the results.
Re-scoring the lattice is what I want, but how ? - the token tree is used as a search space.
Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...
How can I achieve this ?
All I did was to read all the papers about this framework, and to run the demo examples, to look inside the framework code....
Can you please give me few tips? Where should I look in order to achieve what I want?
Kind regards,
George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...
I'm not sure you understand all the concepts and also I'm not sure you carefully read what was written before to you in this thread. In simple case you want rescoring of the n-best list, not the lattice, so the sequence of steps would be:
Create n-best list result with scores
Rescore using semantic language model
Select best rescored result according to new score
Next step would be lattice:
Create lattice with scores
Change language model scores in the lattice
Retrieve a new best path in a lattice with getViterbiPath() method.
To become familar with n-best and lattices read a textbook on speech recognition.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I managed to build the n-best list, but I encountered a tiny problem.
My problem is that after building the n-best list from lattice, I found that all of the scores have the following form ( I used LogMath from Sphinx 4 lib to generate the linear values) :
* 0.00000000000000000000011044066960,
* 0.00000000000000000000012975
* 0.0000000000000000000000942
Why the values are not real number between 0 and 1?
The Java code I use to generate the linear values
LogMath log = lattice.getLogMath(); // the lattice from which I build n-best list
//I used BigDecimal just to display it nice
BigDecimal bd = new BigDecimal(score);
System.out.println(bd.toPlainString());
// my idea was to iterate through the n-best list, and to weight the nodes from each n-best list. I wanted to truncate the score value up to 3-5 decimals using BigDecimal, but I found that all scores have the first 14-20 decimals zero.
//Some output [edge: Edge(Node(,0|0)-->Node(one,0|21)[-2199378.75,-503905.375]), score: -503905.375]
linear value: 0.000000000000000000000129756660450880268359917002168448002541250374405878042931325224651*484262494705035351216793060302734375
// LogMath base
logBase=1.0001
useAddTable =true
I used LatticeDemo where the input is a stream ( transcriber demo, 10001-90210-01803.wav file), and minor changes inside the configuration file( e.g tuning parameters based on a comment you made on a forum) + a new n-gram .
Could you please show me the right way? I don't know where to search the explanation for this.
Regards,
George
Last edit: George 2013-09-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why the values are not real number between 0 and 1?
They are real numbers between zero and 1, actually. Probably you was wondering why there are so many zeros. You need to understand what probability space are you working in. The probability of observation given word sequence can be pretty small actually since observation space is large and there are many sequences.
If you are looking for confidence scores, you probably want to look on confidence demo instead.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
Could someone supply me some extra info, extra materials, tips for using a semantic statistical language model in Sphinx 4 ? I want to start read in depth this field, how can a hybrid approach with semantic knowledge improve the Sphinx 4 scores.
Any tips are welcome.
Thanks in advance,
George
Hi George
What do you mean by a semantic language model? What you must do will
depend on that.
-Bhiksha
On Fri, Feb 8, 2013 at 12:14 PM, George spykee89@users.sf.net wrote:
--
Bhiksha Raj
Associate Professor
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
Hi Bhiksha,
What I try to accomplish is the next idea:
- I want at the run-time ( before to get the recognized word from Sphinx-4) , when the Sphinx-4 recognize a word from my pronunciation, to be able to generate all possible words that could map the pronounced word, and I would like to use a semantic knowledge, I have the representation, semantic representation and I know that the word One( this is an example now) has a probability of .60 % given some prior words, some global history, prior semantic meaning, and I want to use this .60% to influence the Sphinx-4 HMM state, if what I pronounce map the history of ONE word from the semantic knowledge( of course this will have to respect the grammar and other things). Hope you can get my idea, if no, please let me know and I will try to explain it better.
For example:
w1 w2 w3 w4 ( these are some words recognized by Sphinx -4, pronounced by me, and these words represent a semantic event, prior semantic meaning, and the next most probable word that map this event could be) Animals/Alligator/Abalone/Aidi/Airedale.... (these are animals names all start with "A" letter, and I want that at the phoneme state "A", to be able to generate all these animals knowing that all these animals are in the semantic knowledge, and also I want to use the semantic knowledge score to influence the HMM state of Sphinx-4, suppose that the user will most probable pronounce the Aligator word given the semantic knowledge).
I'm trying to accomplish such behavior from Sphinx-4, and I would like tips, materials to read, and any idea is welcome.
Kind regards,
George
I'll give you the bad news first:
You're taking on a difficult problem.
a) Semantics are very hard to characterize and represent symbolically.
I'm ccing Ben Lambert, who is completing a PhD thesis on this, and
will have a list of prior papers to read.
b) Historically, and in his research too, incorporating semantics
doesnt really provide much over Ngram LMs. Ngrams are very good at
shortlisting words, and at that point acoustics tend to be pretty good
at making the right choice from this subset. More importantly, most
semantic models are whole-sentence models. For instance, if
someone says "after Zubin Mehta completed his study of music in
Vienna, he joined the graduated from the Julliard college of music, he
joined the Royal Liverpool Philharmonic as a conductor".
The relationships are deep and distant.
i) You need to konw that Zubin Mehta is male. This, presumably is
obtained from some knowledge base
ii) Since Zubin's male, the sentence must use the word "he" to
refer to him. Conversely, if the word "he" is used to refer to this
person, then the sentence is probably about a male person. This means
"Zubin Mehta" and "he", which are 8 word apart in the sentence provide
evidence for one another.
iii) Going deeper is the relationship between Zubin Mehta and the
Royal Liverpool Philharmonic, and Zubin Metha and "conductor", all of
which are semantically related and very very distant from one another
in the sentence. The fact that these are semantically related terms
is itself unknown and can only be inferred by referring to a knowledge
base of some kind. The inference is, itself, non-trivial and a
research problem.
c) The long-distance nature of semantic relationships means you cannot
use them in a dynamic programming decoder which goes progresses left
to right. The best we can do is to come up with some initial
hypotheses for sentences and reweight their scores according to the
semantic and syntactic consistency of the hypothesized sentences.
This means, your performance is limited by accuracy of the initial
hypotheses that are reweighted.
d) There are neural-network based approaches to modelling language
which supposedly encode some level of semantics. But they have not
proved to be effective at improving speech recognition results by very
much.
Now for the good news:
Ben Lambert's been working on this. He needs to encode some of what
he's doing into a recognizer. He built one of his own in LISP, but
its too slow to be useful, so doing it in Sphinx4 may be of use to
him. He's cced. He may have ideas on how to proceed. I'll let him
speak.
-Bhiksha
On Sun, Feb 10, 2013 at 5:04 PM, George spykee89@users.sf.net wrote:
--
Bhiksha Raj
Associate Professor
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
THta should be
after Zubin Mehta completed his study of music in Vienna, he joined
Royal Liverpool Philharmonic as a conductor"
My gmail editor messed up (I had another example originally which I
replaced and the two got mixed..)
-B
--
Bhiksha Raj
Associate Professor
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
Hi Bhiksha,
Please, can you help me to get in contact with Mr.Ben Lambert, and the second thing is, can you tell me the improvement rate of the approach "semantic language model" proposed last year ,this one http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel.
I'm at master program now, and this is my topic, I do research about semantics and pragmatics in speech recognition systems( Sphinx-4 ) and any help would be appreciate.
Kind regards,
George
Hi George
Ben is cced.
-Bhiksha
On Tue, Feb 12, 2013 at 2:24 PM, George spykee89@users.sf.net wrote:
--
Bhiksha Raj
Associate Professor
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
Hi George and Bhiksha,
I attempted to send the message below yesterday, but it bounced back
since I wasn't on this mailing list. (I think I am now).
Anyway, feel free to contact me directly. I'm not sure who wrote/worked
on the work described in the link George sent below, so I don't know any
of the details of that.
-Ben
Hello!
Bhiksha is right, this problem in general is very challenging. I think
what Bhiksha mentioned under 'c' below is probably the most challenging
aspect to incorporating this sort of information into a decoder.
However, you may be able to handle some more simplified cases using just
a class-based language model and/or a grammar. I'm not sure I
understand exactly what you're trying to do here. Perhaps you could
explain again?
Best,
Ben
On 2/12/2013 3:31 PM, Bhiksha Raj wrote:
Hi Ben,
I want to do something similar like in the link I sent earlier (http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel ). I'm novice in this area and this is why I asked your help; in order for me to understand and to start doing something practical, I need tips and materials from you. Can you please tell me why at point 6 (6- What codes are needed to change? ) is written * Trainer a) ?? ( these are on the link I gave you ), the Linguist HMM DAG can't be updated at run time mode with semantic probabilities ? For the moment let assume that I have 10.000 possible semantic probabilities for a vocabulary V, and the grammar I have into Sphinx- 4 exists in my semantic knowledge; I want to take the probability from semantic part and to update, adjust the syntactic probabilities from the Linguist HMM DAG. When I start the recognition process, I want to print all possible words based the semantic knowledge ( i.e. : Animals | Alligator | Abalone | Aidi | Airedale, | -OR, and when the the system recognize the phoneme A, I want to print all words from the list and I know from the semantic knowledge that Alligator has 0.60% chances to be the next words and the other have less than 0.50 %, so the Linguist HMM DAG language probabilities should be updated). Why I need to change the trainer when I should be able to update (I believe this can be done), adjust the HMM graph ?
Kind regards,
George
By "trainer" that page meant a trainer for the semantic language model, not for the acoustic model.
Hi Nickolay,
Thank you for your answer. Ben, I need help with the Scorer, SearchGraph, HMM language probabilities. This is what I want to a acquire. I read some reports about this framework Sphinx-4, and I want tips, any starting point that would drive me to the fully understanding of this probabilistic language( my coordinating teacher will give me a semantic representation, semantic language model and he expect from me to combine the syntactic model with the semantic one, to keep it simple this is what I have to do, further I will see, I need to dig...). Can you help me giving tips, any starting point?
Kind regards,
George
Hi George,
I think I understand what you're saying now:
Your teacher is giving you a semantic representation and semantic language model, and you want to incorporate that model's score's into Sphinx 4's decoder search process.
First, I think you should consider approaching this a little differently to start, before trying to integrate directly into Sphinx4. What I would suggest trying to start is using Sphinx to generate n-best lists, and then rescoring and reranking those with your semantic language model. (You could also use use Sphinx to generate word hypothesis lattices, and then re-score those).
If you do want to jump right into the Sphinx4 decoder... I'm not sure how much I can help, most of my experience is with Sphinx3. But if you already have a language model, then from this page:
http://cmusphinx.sourceforge.net/wiki/semanticlanguagemodel
you can skip parts 1-4. You can also skip "6- What codes are needed to change? * Trainer a) " since you already have a language model (that I assume can produce it's own scores).
The key parts for you I think would be these two:
I discussed some of these issues in my thesis proposal, it might be helpful to take a look at that (especially chapter 6) and some of the citations:
http://www.cs.cmu.edu/~belamber/Papers/Lambert_ThesisProposal.pdf
That's from a few years ago. Unfortunately, I don't have anything more recent than it. I'm sure someone else has written this up more coherently, possibly Roni Rosenfeld, or in one of the several speech recognition textbooks.
Hopefully this helps a bit.
Best,
Ben
--
Benjamin Lambert
Ph.D. Student of Computer Science
Carnegie Mellon University
www.cs.cmu.edu/~belamber
Mobile: 617-869-1844
Hi Ben,
Any word from this thread help me. Thank you. When I will have results, I will post them here.
Kind regards,
George
Can someone give me a help with AlternateHypothesisManager ? What I want to achieve is to get a list of predecessors tokens for a token. I looked over the API and I found this class, but when I try to get the list of lower scored tokens from a Result object, I get NULL value, always. What do I miss here? How this class should be used? Isn't populated dynamically inside the Result class? This is the searchManager used on the config file "WordPruningBreadthFirstSearchManager". Any tips of how to achieve the list of predecessors tokens (with a lower score than the getPredecessor method) for a token? Should the token be in a specific search state in order to get the full/partial list of predecessors tokens?
Regards,
George
Alternatives are only available for Word tokens and only if buildWordLattice is enabled in configuratoin of the search manager.
See lattice demo and the way you Lattice is constructed from a result
No
Yes
see Lattice class
Yes
Hi Nickolay,
I looked over the Lattice class, and what I needed can be found inside allPaths() method, and other methods from Lattice class(nodes, edges). I see the Lattice class as an external data structure, a graph that is used just for representation and debug( it isn't used for searching, in exchange the token tree is used for this). I read that Sphinx 4 uses "a token tree is used to manage the active paths during the search ", which can't be found or applied on lattice structure. My scope is : extract the searching graph (the best would be the lattice, or a structure that has what Lattice class offer, maybe I should make work for this?), the graph shouldn't be pruned( or at least configurable, and a number of configurable "neighborhood" for each node - I think this is maxLatticeEdges property from WordPruningBreadthFirstSearchManager ), I need to manipulate each language model score (weighting) of each edge, and to observe the results.
Re-scoring the lattice is what I want, but how ? - the token tree is used as a search space.
Until now I couldn't saw a way of using the lattice on searching space, and I don't know how can I use the lattice if I re-score all the edges, since the token tree is used for the search space...
How can I achieve this ?
All I did was to read all the papers about this framework, and to run the demo examples, to look inside the framework code....
Can you please give me few tips? Where should I look in order to achieve what I want?
Kind regards,
George
I'm not sure you understand all the concepts and also I'm not sure you carefully read what was written before to you in this thread. In simple case you want rescoring of the n-best list, not the lattice, so the sequence of steps would be:
Next step would be lattice:
To become familar with n-best and lattices read a textbook on speech recognition.
Hi Nickolay,
I managed to build the n-best list, but I encountered a tiny problem.
My problem is that after building the n-best list from lattice, I found that all of the scores have the following form ( I used LogMath from Sphinx 4 lib to generate the linear values) :
* 0.00000000000000000000011044066960,
* 0.00000000000000000000012975
* 0.0000000000000000000000942
Why the values are not real number between 0 and 1?
The Java code I use to generate the linear values
LogMath log = lattice.getLogMath(); // the lattice from which I build n-best list
double score = log.logToLinear((float) edge.getLMScore());
//I used BigDecimal just to display it nice
BigDecimal bd = new BigDecimal(score);
System.out.println(bd.toPlainString());
// my idea was to iterate through the n-best list, and to weight the nodes from each n-best list. I wanted to truncate the score value up to 3-5 decimals using BigDecimal, but I found that all scores have the first 14-20 decimals zero.
//Some output
[edge: Edge(Node(
,0|0)-->Node(one,0|21)[-2199378.75,-503905.375]), score: -503905.375]linear value: 0.000000000000000000000129756660450880268359917002168448002541250374405878042931325224651*484262494705035351216793060302734375
// LogMath base
logBase=1.0001
useAddTable =true
I used LatticeDemo where the input is a stream ( transcriber demo, 10001-90210-01803.wav file), and minor changes inside the configuration file( e.g tuning parameters based on a comment you made on a forum) + a new n-gram .
Could you please show me the right way? I don't know where to search the explanation for this.
Regards,George
Last edit: George 2013-09-16
They are real numbers between zero and 1, actually. Probably you was wondering why there are so many zeros. You need to understand what probability space are you working in. The probability of observation given word sequence can be pretty small actually since observation space is large and there are many sequences.
If you are looking for confidence scores, you probably want to look on confidence demo instead.
Hi Nickolay,
Thank you for the info. I will have a look at the confidence scores.
How the edges languageModel scores are computed( n-gram probs?)?
Regards,
George
By asking the language model for the probability of the sequence.