Here's a simple one. I've gotten requests to support the following
accents/languages with OpenEars:
UK English
French
Spanish
German
I've just spent a half hour Googling and searching this site but I'm missing
it somehow and can't quite find anything but the full-sized Sphinx models that
are too big to be shipped with a device. Are there Pocketsphinx-compatible
language models for these accents and languages, similar to the hub4wsj_sc_8k
hmm that ships with the 0.6.1 package, and if so where? I'll keep an eye on
whether they are licensed in a way that would make them a match for iPhone App
development.
Thanks very much,
Halle
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is a very easy way to build a device-compatible LM for iPhone for few
dozen languages - build language model from Wikipedia articles. You can
download wikipedia dump, convert it to text with
Wow! I will definitely be checking that out shortly. But I think the users
have lm/dic data already and they want me to add new hmms, are there sources
for those?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Okeydoke, and my last probably-obvious question: do I have to do anything to
the Voxforge data in order to make it work with Pocketsphinx or is it a drop-
in?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, in order to get started with the process of downloading data and training
a semi-continuous model for a mobile device, where would I look to begin the
learning process?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks very much, I've passed that on. I don't suppose there is anything as
easy for them to use to create small command and control grammars for other
languages as the CMU Language Tool, is there?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I want to know, how can i use pocket sphinx to detect indian english. I have
used voxforge to upload few wav files in indian english, but i don't know how
to use the processed data of voxforge in pocket sphinx.
can you please tell me the way to achieve indian english recognition using
pocketsphinx?
Thanks,
Swathi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
Here's a simple one. I've gotten requests to support the following
accents/languages with OpenEars:
UK English
French
Spanish
German
I've just spent a half hour Googling and searching this site but I'm missing
it somehow and can't quite find anything but the full-sized Sphinx models that
are too big to be shipped with a device. Are there Pocketsphinx-compatible
language models for these accents and languages, similar to the hub4wsj_sc_8k
hmm that ships with the 0.6.1 package, and if so where? I'll keep an eye on
whether they are licensed in a way that would make them a match for iPhone App
development.
Thanks very much,
Halle
Hello Halle
There is a very easy way to build a device-compatible LM for iPhone for few
dozen languages - build language model from Wikipedia articles. You can
download wikipedia dump, convert it to text with
http://medialab.di.unipi.it/wiki/Wikipedia_Extractor
And build the lm of 4000 words!
One should definitely automate this thing!
Wow! I will definitely be checking that out shortly. But I think the users
have lm/dic data already and they want me to add new hmms, are there sources
for those?
At least for German and Spanish voxforge data is good enough.
Okeydoke, and my last probably-obvious question: do I have to do anything to
the Voxforge data in order to make it work with Pocketsphinx or is it a drop-
in?
You can download data and train semi-continuous models for mobile device or
use pretrained models, not sure if they will work for you:
https://sourceforge.net/projects/cmusphinx/files/Acoustic and Language Models/
Yeah, I started out by checking out https://sourceforge.net/projects/cmusphin
x/files/Acoustic
and Language Models/ but I think the models there are too large in filesize to
ship with a mobile app unfortunately.
So, in order to get started with the process of downloading data and training
a semi-continuous model for a mobile device, where would I look to begin the
learning process?
English Voxforge model has script build.sh to automate setup the process. Look
inside the model. You can use the same script for other languages.
Thanks very much, I've passed that on. I don't suppose there is anything as
easy for them to use to create small command and control grammars for other
languages as the CMU Language Tool, is there?
Hello, I'm not sure what issue do you have.
The easiest way is to write jsgf. You can reuse the dictionary provided with
the model.
That answered my question, thanks.
Hello Mr. Nickolay ,
I want to know, how can i use pocket sphinx to detect indian english. I have
used voxforge to upload few wav files in indian english, but i don't know how
to use the processed data of voxforge in pocket sphinx.
can you please tell me the way to achieve indian english recognition using
pocketsphinx?
Thanks,
Swathi
Hi Swathi,
For that, you have to train cmusphinx for Indian English. For further answers, please have a look at the thread:-
https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/b59b7962/