I'm using PocketSphinx0.6 on Windows. Have couple of questions.
What is needed for trying a "continuous HMM" based accoustic model ? Are there such models available for US English that
can be tried with PocketSphinx ?
What changes are required for trying "Floating Point" implementation ?
What is the difference between hub4.5000 and wsj0vp.5000 language models ?
Any reference which briefly describes meanings of various command line options and configuration parameters for
PocketSphinx ?
Any refrence which briefly describes formats of various databses used in PocketSphinx (Accoustic Models, Language
Models etc).
For running the decoder, the sample rate settings has to be in synch with the model (Correct ?), How to find sample rates for
various accoustic models provided with 0.6 and 0.5 versions ?
Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What changes are required for trying "Floating Point" implementation ?
You mean fixed point? In Linux it's just --enable-fixed configure flag. In
windows you need
to make sure ENABLE_FIXED is defined in sphinx_config.h
What is the difference between hub4.5000 and wsj0vp.5000 language models
?
One is trained from HUB4 (Broadcast news task) texts, other from WSJ (Reading
Wall Street Journal task) texts.
Any reference which briefly describes meanings of various command line
options and configuration parameters for PocketSphinx ?
Run pocketsphinx_batch without arguments or look into the sources.
Any refrence which briefly describes formats of various databses used in
PocketSphinx (Accoustic Models, Language Models etc).
Acoustic model should have the following files: feat.params, mdef, means,
variances, transition_matrices, mixture_weights. Instead of mixture_weights
there could be sendump file. There could be other files like
feature_transform, kdtrees, noisedict. Their names are self-descriptive I
think. Details of the format could be found in sources.
Language model could be in ARPA format or in compressed DMP format.
Dictionary format is straightforward
For running the decoder, the sample rate settings has to be in synch
with the model (Correct ?),
Yes
How to find sample rates for various accoustic models provided with 0.6 and
0.5 versions ?
Usually it's mentioned in model description on the website. Most models are
16kHz or 8kHz.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does PocketSphinx allow changing dimension of MFCC feature vector ? And does this too have to be in synch with accoustic
model ?
Is there vendor that does or can supply pocketsphinx compatible accoustic models (or recorded training corpora) for various
english accents and possibly other languages.
This one is very basic: Does accoustic model have to be tuned for specific tasks at hand or they could be fairly generic and
still provide fairly accurate results ?
Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does PocketSphinx allow changing dimension of MFCC feature vector ?
Yes, there are -ceplen and -ncep options as well as -feat for various feature
types
And does this too have to be in synch with accoustic model ?
Yes
Is there vendor that does or can supply pocketsphinx compatible
accoustic models (or recorded training corpora) for various english accents
and possibly other languages.
Most corpora are aquired from LDC http://www.ldc.upenn.edu/ or european conterpart http://www.elra.info/ and they are quite expensive.
There are some non-commercial corpora distributed by various organizations.
Voxforge provides GPL corpora for number of languages.
Models are specific for the particular task, it's unlikely any company has
generic one. Maybe only Google has one. Also, commercial companies tune their
recognition process and their models become incompatible with stock
pocketsphinx or not easy to plug in.
This one is very basic: Does accoustic model have to be tuned for
specific tasks at hand or they could be fairly generic and still provide
fairly accurate results ?
Yes, tuning is usually applied both for model and for the way recognizer is
configured. Many components are specific for the particular task. For example
in far-distance microphone recognition it's critical to have reverberation
removal component and that makes model hardly compatible with telephone
models.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there vendor that does or can supply pocketsphinx compatible accoustic
models
I mean you are welcome to ask on our linkedin group about that http://www.lin
kedin.com/groups?gid=2754506 but
be ready to negotiate using the whole recognizer instead of just a model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Q: Does PocketSphinx allow changing dimension of MFCC feature vector ?
Yes, there are -ceplen and -ncep options as well as -feat for various
feature types
Q: And does this too have to be in synch with accoustic model ?
Yes
For running PocketSphinx 0.6 with models provided with it, do I need to specify "accoustic vector, "sampling rate" and other
Model Specific parameters throgh command line or decoder automatically
extracts it from the model discription ? I'm currently
using only -hmm , -lm, -dict and -samprate
Q: This one is very basic:
Does accoustic model have to be tuned for specific tasks at hand or they could
be fairly generic and still provide fairly accurate results ?
Yes, tuning is usually applied both for model and for the way recognizer is
configured. Many components are specific for the >>particular task. For
example in far-distance microphone recognition it's critical to have
reverberation removal component and that >>makes model hardly compatible with
telephone models
My question was more on the vocabulary aspect of the application eg If I have a 100 word vocabulary recognition task for say a
"command and control" type of application (with an FSG grammar), from an
accuracy point of view, should I still go with a model
like hub4wsj_sc_8k (or something similar trained on a bigger corpus) or I need
to create my own model or is there a way to
customize a bigger model for a smaller vocabulary task ?
Thanks for your prompt responses.
Regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For running PocketSphinx 0.6 with models provided with it, do I need to
specify "accoustic vector, "sampling rate" and other Model Specific parameters
throgh command line or decoder automatically extracts it from the model
discription ? I'm currently using only -hmm , -lm, -dict and -samprate
Others are syncronized automatically since default values are used
My question was more on the vocabulary aspect of the application eg If I
have a 100 word vocabulary recognition task for say a "command and control"
type of application (with an FSG grammar), from an accuracy point of view,
should I still go with a model like hub4wsj_sc_8k (or something similar
trained on a bigger corpus) or I need to create my own model or is there a way
to customize a bigger model for a smaller vocabulary task ?
That's rethorical question without numbers, such decision required detailed
analysis of accuracy, performance and available resources. We usually don't
recommend to train the model just because it's a error-prone process that
could take several month. Default models are reasonably good.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using PocketSphinx0.6 on Windows. Have couple of questions.
What is needed for trying a "continuous HMM" based accoustic model ? Are there such models available for US English that
can be tried with PocketSphinx ?
What changes are required for trying "Floating Point" implementation ?
What is the difference between hub4.5000 and wsj0vp.5000 language models ?
Any reference which briefly describes meanings of various command line options and configuration parameters for
PocketSphinx ?
Any refrence which briefly describes formats of various databses used in PocketSphinx (Accoustic Models, Language
Models etc).
For running the decoder, the sample rate settings has to be in synch with the model (Correct ?), How to find sample rates for
various accoustic models provided with 0.6 and 0.5 versions ?
Thanks and regards,
Point continuous model with -hmm option
At least
http://www.speech.cs.cmu.edu/sphinx/models/
and
http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphin
x/
You mean fixed point? In Linux it's just --enable-fixed configure flag. In
windows you need
to make sure ENABLE_FIXED is defined in sphinx_config.h
One is trained from HUB4 (Broadcast news task) texts, other from WSJ (Reading
Wall Street Journal task) texts.
Run pocketsphinx_batch without arguments or look into the sources.
Acoustic model should have the following files: feat.params, mdef, means,
variances, transition_matrices, mixture_weights. Instead of mixture_weights
there could be sendump file. There could be other files like
feature_transform, kdtrees, noisedict. Their names are self-descriptive I
think. Details of the format could be found in sources.
Language model could be in ARPA format or in compressed DMP format.
Dictionary format is straightforward
Yes
Usually it's mentioned in model description on the website. Most models are
16kHz or 8kHz.
Thanks nshmyrev. Couple more basic ones:
Does PocketSphinx allow changing dimension of MFCC feature vector ? And does this too have to be in synch with accoustic
model ?
Is there vendor that does or can supply pocketsphinx compatible accoustic models (or recorded training corpora) for various
english accents and possibly other languages.
This one is very basic: Does accoustic model have to be tuned for specific tasks at hand or they could be fairly generic and
still provide fairly accurate results ?
Thanks and regards,
Yes, there are -ceplen and -ncep options as well as -feat for various feature
types
Yes
Most corpora are aquired from LDC
http://www.ldc.upenn.edu/ or european conterpart
http://www.elra.info/ and they are quite expensive.
There are some non-commercial corpora distributed by various organizations.
Voxforge provides GPL corpora for number of languages.
Models are specific for the particular task, it's unlikely any company has
generic one. Maybe only Google has one. Also, commercial companies tune their
recognition process and their models become incompatible with stock
pocketsphinx or not easy to plug in.
Yes, tuning is usually applied both for model and for the way recognizer is
configured. Many components are specific for the particular task. For example
in far-distance microphone recognition it's critical to have reverberation
removal component and that makes model hardly compatible with telephone
models.
I mean you are welcome to ask on our linkedin group about that http://www.lin
kedin.com/groups?gid=2754506 but
be ready to negotiate using the whole recognizer instead of just a model.
Q: Does PocketSphinx allow changing dimension of MFCC feature vector ?
Q: And does this too have to be in synch with accoustic model ?
Model Specific parameters throgh command line or decoder automatically
extracts it from the model discription ? I'm currently
using only -hmm , -lm, -dict and -samprate
Q: This one is very basic:
Does accoustic model have to be tuned for specific tasks at hand or they could
be fairly generic and still provide fairly accurate results ?
"command and control" type of application (with an FSG grammar), from an
accuracy point of view, should I still go with a model
like hub4wsj_sc_8k (or something similar trained on a bigger corpus) or I need
to create my own model or is there a way to
customize a bigger model for a smaller vocabulary task ?
Thanks for your prompt responses.
Regards.
Others are syncronized automatically since default values are used
That's rethorical question without numbers, such decision required detailed
analysis of accuracy, performance and available resources. We usually don't
recommend to train the model just because it's a error-prone process that
could take several month. Default models are reasonably good.
Thanx a Ton.