Hello,
I have a fairly simple question that I cannot find the answer to. Does your
Acoustic model HAVE to contain the exact spoken utterances that your language
model dictionary defines? In other words, if I need the word:
commodity
in my language model (so that it can be recognized), does the associated
acoustic model need to be recorded with that word somewhere in it, OR, will
the speech recognition system recognize 'commodity' based on its similarity to
other word(s) that are in the acoustic model that were similar to 'commodity'?
Other questions I have are:
- what matters most in terms of speech recognition speed -- acoustic model size, language model size, or both? In other words, if I want the fastest recognition time possible, should my language model just contain the words I need, or should I use a much larger pre-defined (open source) language model? Same goes for the acoustic model; would I get faster recognition with an acoustic model that only had the words I need?
one can use lmtool to generate a smaller language model dict/grammar, however, what tools would I use to develop my own acoustic models (based on the limited vocabulary I need)? does it make sense to create my own acoustic model and train it myself in order to get higher accuracy for the limited vocabulary I need (i.e. commodity trading terms)?
Thank You,
Eric
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
. Does your Acoustic model HAVE to contain the exact spoken utterances that
your language model dictionary defines?
Acoustic model describes phone pronunciations, it's unrelated to words. It's
incorrect to write that "acoustic model contains spoken utterances"
does the associated acoustic model need to be recorded with that word
somewhere in it
no
what matters most in terms of speech recognition speed -- acoustic model
size, language model size, or both? I
both
I want the fastest recognition time possible, should my language model just
contain the words I need, or should I use a much larger pre-defined (open
source) language model?
If language model is small recognition is faster
Same goes for the acoustic model; would I get faster recognition with an
acoustic model that only had the words I need?
Acoustic model doesn't contain any words, see above.
what tools would I use to develop my own acoustic models (based on the
limited vocabulary I need)?
You don't need to develop any acoustic models, but if you still want, you can
use sphinxtrain distributed here
does it make sense to create my own acoustic model and train it myself in
order to get higher accuracy for the limited vocabulary I need (i.e. commodity
trading terms)?
It makes sense sometimes, but you should better avoid that
P.S. Avoid crossposting on various forums - voxforge, here, somewhere else.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
At the time of training Acoustic Model it is suggested that we should train it
using large utterances in different environments, different peoples etc.....
It makes sense that it induces variability (adapts).
Suppose, if I took 40 words spoken in 80 utterance files, by man (40
utterance) and by child (40 utterances) a single word spoken twice by man and
child. Their phonetic dictionary uses 40 phone ARPAbet US English same as used
by CMU Sphinx, to make their pronunciations.
My question is,
-does acoustic model is training words by breaking them phonetically.
- What are properties of an good Acoustic Model
-Do i need to train acoustic model, for phonemes separately like, AA, AE, EH, IH.....
-While doing above experiment, Does MFCC is calculated by taking one word utterance(file) at a time separately and what about means , variances, mixture_weights, transition_matrices
-What does transition matrices have??
-Is it an kind of signature kept for identification of word or phoneme
-The HUB4 Acoustic Model and Language Model available for download at CMU.SourceForge can it be used for general purpose speech recognition, like, can it recognize my utterance, batch-mode, recorded in an standard home environment on my LAPTOP.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
-does acoustic model is training words by breaking them phonetically.
No
What are properties of an good Acoustic Model
It recognizes speech with good accuracy
-Do i need to train acoustic model, for phonemes separately like, AA, AE,
EH, IH.....
No
-While doing above experiment, Does MFCC is calculated by taking one word
utterance(file) at a time separately and what about means , variances,
mixture_weights, transition_matrices
What about that
-What does transition matrices have??
Transition matrices hold transition probabilities between HMM states
-Is it an kind of signature kept for identification of word or phoneme
No
-The HUB4 Acoustic Model and Language Model available for download at
CMU.SourceForge can it be used for general purpose speech recognition, like,
can it recognize my utterance, batch-mode, recorded in an standard home
environment on my LAPTOP.
Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have a fairly simple question that I cannot find the answer to. Does your
Acoustic model HAVE to contain the exact spoken utterances that your language
model dictionary defines? In other words, if I need the word:
commodity
in my language model (so that it can be recognized), does the associated
acoustic model need to be recorded with that word somewhere in it, OR, will
the speech recognition system recognize 'commodity' based on its similarity to
other word(s) that are in the acoustic model that were similar to 'commodity'?
Other questions I have are:
- what matters most in terms of speech recognition speed -- acoustic model size, language model size, or both? In other words, if I want the fastest recognition time possible, should my language model just contain the words I need, or should I use a much larger pre-defined (open source) language model? Same goes for the acoustic model; would I get faster recognition with an acoustic model that only had the words I need?
Thank You,
Eric
Acoustic model describes phone pronunciations, it's unrelated to words. It's
incorrect to write that "acoustic model contains spoken utterances"
no
both
If language model is small recognition is faster
Acoustic model doesn't contain any words, see above.
You don't need to develop any acoustic models, but if you still want, you can
use sphinxtrain distributed here
It makes sense sometimes, but you should better avoid that
P.S. Avoid crossposting on various forums - voxforge, here, somewhere else.
Hello,
At the time of training Acoustic Model it is suggested that we should train it
using large utterances in different environments, different peoples etc.....
It makes sense that it induces variability (adapts).
Suppose, if I took 40 words spoken in 80 utterance files, by man (40
utterance) and by child (40 utterances) a single word spoken twice by man and
child. Their phonetic dictionary uses 40 phone ARPAbet US English same as used
by CMU Sphinx, to make their pronunciations.
My question is,
-does acoustic model is training words by breaking them phonetically.
- What are properties of an good Acoustic Model
-Do i need to train acoustic model, for phonemes separately like, AA, AE, EH, IH.....
-While doing above experiment, Does MFCC is calculated by taking one word utterance(file) at a time separately and what about means , variances, mixture_weights, transition_matrices
-What does transition matrices have??
-Is it an kind of signature kept for identification of word or phoneme
-The HUB4 Acoustic Model and Language Model available for download at CMU.SourceForge can it be used for general purpose speech recognition, like, can it recognize my utterance, batch-mode, recorded in an standard home environment on my LAPTOP.
No
It recognizes speech with good accuracy
No
What about that
Transition matrices hold transition probabilities between HMM states
No
Yes