I am quite new to Sphinx4 system, so please bear with me.
I am trying to understand how can one change the language model dynamically
during the course of a dialogue. By language model I mean the model generated
by Sphinx system using the grammar+dictionary+etc...
Please help me understand how exactly sphinx takes into account the grammar+
dictionary+wordlist+ etc.. to create a model/structure/tree used for
processing dialogue.
Please help me with relevant references or description. Help me to move
further.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you keep the vocabulary the same you can change language model
probabilities even during decoding, there is no dependency on language model
in the rest of the system except what returned by getProb call on the word
sequence.
You can learn about sphinx4 architecture from the whitepaper
Thanks for your response nshmyrev.
I read the reference provided by you.
I understand that the components provided for decoding will be:
The decoder source code
The language dictionary
The filler dictionary
The language model
The test data
Now, for decoding sphinx4 forms a trellis, which is noting but a prduct of
language HMM and time. This trellis is noting but a acyclic graph(or a search
graph as one might call it). What I am interested is in reducing the size of
this structure(search graph), so as to increase the recognition capability of
the system. What I am doing now is, I am passing the decoder updated language
model(read grammar file). I would like to verify whether passing a smaller
grammar file(pruned grammar file which suffices the need of the concerned
sample to be decoded) will help reduce the size of the search graph.
Please make suggestion or respond if I am not making myself not clear.
I am looking forward towards this discussion and I will be quicker to respond
now!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Was reading Sphinx 4 decoder model. Under the topic "GRAPH CONSTRUCTION
MODULE" I read :
The word graph can be converted to a language HMM either dynamically or
statically. In dynamic construction, word HMMs are constructed on demand -
when the search reaches the terminal state for a word, the HMMs for words that
can follow it are constructed if they have not already been instantiated.
During construction, appropriate context dependent sub-word units are used at
the word boundaries. In static construction, the entire language HMM is
constructed statically. HMMs are constructed for all words in the vocabulary.
Each word HMM is composed with several word-beginning and word-ending context
dependent phones, each corresponding to a possible crossword context. Each
word is connected to every other word by linking appropriate context dependent
crossword units.
What changes do I need to make(What should I do) in my config.xml file to do
dynamic construction of search graph?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If we are specifically talking about finite grammars (grammar files), there
are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
choose between them configuring the component class of the component
"linguist"
FlatLinguist is static, it constructs graph statically and DynamicFlatLinguist
constructs graph dynamically. If graphs aren't too large, you can dump them
using the edu.cmu.sphinx.linguist.util.GDLDumper class. After that you can see
them in aisee3. Or you can dump to dot file (easy to write this class
yourself) and visualize it in graphviz.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If we are specifically talking about finite grammars (grammar files), there
are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
choose between them configuring the component class of the component
"linguist"
Finite grammar.... meaning the size of the grammar is finite? Yes, that is the
case with me. I have grammar files which are either digits or words.
FlatLinguist is static, it constructs graph statically and
DynamicFlatLinguist constructs graph dynamically. If graphs aren't too large,
you can dump them using the edu.cmu.sphinx.linguist.util.GDLDumper class.
After that you can see them in aisee3. Or you can dump to dot file (easy to
write this class yourself) and visualize it in graphviz.
Can you suggest me some source/documentation covering this. That would be very
helpful.
Thanks for your quick response.
Regards,
Shredder
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your response nshmyrev.
I tried using Linguist Stats Dumper for checking the total number of states in
the search space. But I am getting an error(problem with config.xml). Can you
help me resolve the error please.
Property Exception component:'recognizer' property:'monitors' - Not all
elements have required type interface edu.cmu.sphinx.instrumentation.Monitor
Found one of type class edu.cmu.sphinx.linguist.util.LinguistStats
edu.cmu.sphinx.util.props.InternalConfigurationException
Property Exception component:'recognizer' property:'monitors' - Not all
elements have required type interface edu.cmu.sphinx.instrumentation.Monitor
Found one of type class edu.cmu.sphinx.linguist.util.LinguistStats
edu.cmu.sphinx.util.props.InternalConfigurationException
Shall I attach(or post contents of) my config,xml file here?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello nshmyrev,
Does sphinx allows us to use multiple configuration files?
By multiple config. files, I mean that in my application I have different
fields to be recognized. Can I have different configurations for them by
having different config. files for them.
As far as I understand, decoder takes into account the details(like .gram
file, dictionary, filler etc.) of the config. files during run time, so this
should be really not a problem !
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can use multiple configuration files managed by multiple config managers
or you can use single file with multiple recognizers configured and switch
between them. There is no issue with both ways.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am looking forward to implement some techniques that can help improve the
recognition accuracy for recognizing the Hindi language using sphinx4
platform.
Is there something that I can do in sphinx itself, like changing/updating some
parts of sphinx code to improve the recognition capability of the concerned
language.
I am looking forward to suggestions. I hope that the work can be useful for
the sphinx community in general.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Specifically for Hindi the main issue is to collect enough voie data.
Recognizer source code has no language specifics.
If you are looking for some thing to implement in Sphinx4, there is a feature
which is critically required by any practical system - to return proper
confidence score in grammar recognizer and to reject OOV words reliably. If
you could implement this part, the recognizer will move to the next level.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello nshmyrev,
I am looking forward to implement something that can help improve the
recognition accuracy for the Hindi language.
I have enough voice data collected for recognition of names and numbers. I am
still looking forward to ideas.
Can you suggest(direct) me some implementations on Sphinx4, like a voice
reservation system(for trains, bus...), , something of that sort.. Have you
some samples like that ?
Regards,
Shredder
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you are looking for some thing to implement in Sphinx4, there is a
feature which is critically required by any practical system - to return
proper confidence score in grammar recognizer and to reject OOV words
reliably. If you could implement this part, the recognizer will move to the
next level.
Sorry if I sound ignorant, but what does sphinx do currently for OOV words?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you suggest(direct) me some implementations on Sphinx4, like a voice
reservation system(for trains, bus...), , something of that sort.. Have you
some samples like that ?
I am quite new to Sphinx4 system, so please bear with me.
I am trying to understand how can one change the language model dynamically
during the course of a dialogue. By language model I mean the model generated
by Sphinx system using the grammar+dictionary+etc...
Please help me understand how exactly sphinx takes into account the grammar+
dictionary+wordlist+ etc.. to create a model/structure/tree used for
processing dialogue.
Please help me with relevant references or description. Help me to move
further.
If you keep the vocabulary the same you can change language model
probabilities even during decoding, there is no dependency on language model
in the rest of the system except what returned by getProb call on the word
sequence.
You can learn about sphinx4 architecture from the whitepaper
http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4Whitepaper.pdf
Thanks for your response nshmyrev.
I read the reference provided by you.
I understand that the components provided for decoding will be:
Now, for decoding sphinx4 forms a trellis, which is noting but a prduct of
language HMM and time. This trellis is noting but a acyclic graph(or a search
graph as one might call it). What I am interested is in reducing the size of
this structure(search graph), so as to increase the recognition capability of
the system. What I am doing now is, I am passing the decoder updated language
model(read grammar file). I would like to verify whether passing a smaller
grammar file(pruned grammar file which suffices the need of the concerned
sample to be decoded) will help reduce the size of the search graph.
Please make suggestion or respond if I am not making myself not clear.
I am looking forward towards this discussion and I will be quicker to respond
now!
Was reading Sphinx 4 decoder model. Under the topic "GRAPH CONSTRUCTION
MODULE" I read :
What changes do I need to make(What should I do) in my config.xml file to do
dynamic construction of search graph?
If we are specifically talking about finite grammars (grammar files), there
are two lingusts: FlatLingust and DynamicFlatLinguist. In config.xml you can
choose between them configuring the component class of the component
"linguist"
FlatLinguist is static, it constructs graph statically and DynamicFlatLinguist
constructs graph dynamically. If graphs aren't too large, you can dump them
using the edu.cmu.sphinx.linguist.util.GDLDumper class. After that you can see
them in aisee3. Or you can dump to dot file (easy to write this class
yourself) and visualize it in graphviz.
Finite grammar.... meaning the size of the grammar is finite? Yes, that is the
case with me. I have grammar files which are either digits or words.
Can you suggest me some source/documentation covering this. That would be very
helpful.
Thanks for your quick response.
Regards,
Shredder
You can find source of the linguists and dumper and accompanying javadoc in
sphinx4 sources.
Thanks for your response nshmyrev.
I tried using Linguist Stats Dumper for checking the total number of states in
the search space. But I am getting an error(problem with config.xml). Can you
help me resolve the error please.
Shall I attach(or post contents of) my config,xml file here?
source.config.xml:
Figured out the problem!
Please ignore the above post.
Hello nshmyrev,
Does sphinx allows us to use multiple configuration files?
By multiple config. files, I mean that in my application I have different
fields to be recognized. Can I have different configurations for them by
having different config. files for them.
As far as I understand, decoder takes into account the details(like .gram
file, dictionary, filler etc.) of the config. files during run time, so this
should be really not a problem !
You can use multiple configuration files managed by multiple config managers
or you can use single file with multiple recognizers configured and switch
between them. There is no issue with both ways.
I am looking forward to implement some techniques that can help improve the
recognition accuracy for recognizing the Hindi language using sphinx4
platform.
Is there something that I can do in sphinx itself, like changing/updating some
parts of sphinx code to improve the recognition capability of the concerned
language.
I am looking forward to suggestions. I hope that the work can be useful for
the sphinx community in general.
Hello
Specifically for Hindi the main issue is to collect enough voie data.
Recognizer source code has no language specifics.
If you are looking for some thing to implement in Sphinx4, there is a feature
which is critically required by any practical system - to return proper
confidence score in grammar recognizer and to reject OOV words reliably. If
you could implement this part, the recognizer will move to the next level.
Hello nshmyrev,
I am looking forward to implement something that can help improve the
recognition accuracy for the Hindi language.
I have enough voice data collected for recognition of names and numbers. I am
still looking forward to ideas.
Can you suggest(direct) me some implementations on Sphinx4, like a voice
reservation system(for trains, bus...), , something of that sort.. Have you
some samples like that ?
Regards,
Shredder
Sorry if I sound ignorant, but what does sphinx do currently for OOV words?
http://wiki.speech.cs.cmu.edu/olympus/index.php/Olympus
http://cmusphinx.sourceforge.net/wiki/sphinx4:rejectionhandling