I have a question regarding support of various languages on Pocketsphinx.
For English, Pocketsphinx seems to be using 39 phonemes. I want to get a
feel of number of phonemes that would be required in Pocketsphinx to support
each of these other popular languages of the world:
Spanish
Italian
French
German
Russian
Arabic
Chinese
Japanese
Hindi
Thanks,
Li
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Phonemes depend on the language and not on the recognizer. If your language
has 100 phonemes, you will have to use 100. The more phonemes you want to
train, the more training data will be required. If two phonemes sound similar,
you may decide to combine them (call them by the same name), and the number
comes down.
ok let me slightly rephrase the question. Pocketsphinx has a model by the name
of hub4wsj_sc_8k which gives reasonably good accuracy (english). If one were
to create similar-accuracy acoustic models for each of the above specified
languages, what would be the reasonable numbers of phonemes required for each
of the languages. I'm not looking for an exact numbers but reasonably rough
estimates. Any ideas ?
thanks, Li
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't know if anybody has done such a study, but for Hindi around 50
phonemes are there
see [http://books.google.co.in/books?id=XYgdrA1CIhgC&pg=PA261&lpg=PA261&dq=hin
di+phone+set&source=bl&ots=gM1Crifd8-&sig=wX4NXivuF9NwBXP2A5SAFWUDrX8&hl=en&ei
=wyIDTrGfNo60rAe5p-yRDg&sa=X&oi=book_result&ct=result&resnum=5&ved=0CEoQ6AEwBA
Thanks for the link pranavj. I needed phoneme information for above languages
for exploring a HW accelerator possibility. This info will help in keeping it
generic enough. Thanks, Li
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Choosing a proper phoneme set for any speech application (ASR / TTS) is a
tricky thing. Generally from linguistic point of view phoneme is a smallest
contrastive sound unit in context of a particular language. Linguistically
defined phoneme set is derived by perceptual and articulatory studies. But
from speech applications' point of view the phoneme set need not be same as
defined in any linguistics / phonetics book. What is more important is which
phone occurs with what frequency, and is there any value in modeling each
phoneme distinctly.
Another important point is one need not necessarily go for phonemes as units
for acoustic model. It could be words / syllables (or even something else) as
well. The judgment of which unit should be used depends solely on application,
amount of training data, vocabulary size etc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have a question regarding support of various languages on Pocketsphinx.
For English, Pocketsphinx seems to be using 39 phonemes. I want to get a
feel of number of phonemes that would be required in Pocketsphinx to support
each of these other popular languages of the world:
Spanish
Italian
French
German
Russian
Arabic
Chinese
Japanese
Hindi
Thanks,
Li
Phonemes depend on the language and not on the recognizer. If your language
has 100 phonemes, you will have to use 100. The more phonemes you want to
train, the more training data will be required. If two phonemes sound similar,
you may decide to combine them (call them by the same name), and the number
comes down.
Read http://cmusphinx.sourceforge.net/wiki/tutorialconcepts
Thanks pranavj.
ok let me slightly rephrase the question. Pocketsphinx has a model by the name
of hub4wsj_sc_8k which gives reasonably good accuracy (english). If one were
to create similar-accuracy acoustic models for each of the above specified
languages, what would be the reasonable numbers of phonemes required for each
of the languages. I'm not looking for an exact numbers but reasonably rough
estimates. Any ideas ?
thanks, Li
I don't know if anybody has done such a study, but for Hindi around 50
phonemes are there
see [http://books.google.co.in/books?id=XYgdrA1CIhgC&pg=PA261&lpg=PA261&dq=hin
di+phone+set&source=bl&ots=gM1Crifd8-&sig=wX4NXivuF9NwBXP2A5SAFWUDrX8&hl=en&ei
=wyIDTrGfNo60rAe5p-yRDg&sa=X&oi=book_result&ct=result&resnum=5&ved=0CEoQ6AEwBA
v=onepage&q=hindi%20phone%20set&f=false](http://books.google.co.in/books?id=X
YgdrA1CIhgC&pg=PA261&lpg=PA261&dq=hindi%2Bphone%2Bset&source=bl&ots=gM1Crifd8-
&sig=wX4NXivuF9NwBXP2A5SAFWUDrX8&hl=en&ei=wyIDTrGfNo60rAe5p-yRDg&sa=X&oi=book_
result&ct=result&resnum=5&ved=0CEoQ6AEwBA%23v=onepage&q=hindi%20phone%20set&f=
false)
Actually, why do you want phones in all these languages? Are you interested in
all of them?
Thanks for the link pranavj. I needed phoneme information for above languages
for exploring a HW accelerator possibility. This info will help in keeping it
generic enough. Thanks, Li
Choosing a proper phoneme set for any speech application (ASR / TTS) is a
tricky thing. Generally from linguistic point of view phoneme is a smallest
contrastive sound unit in context of a particular language. Linguistically
defined phoneme set is derived by perceptual and articulatory studies. But
from speech applications' point of view the phoneme set need not be same as
defined in any linguistics / phonetics book. What is more important is which
phone occurs with what frequency, and is there any value in modeling each
phoneme distinctly.
Another important point is one need not necessarily go for phonemes as units
for acoustic model. It could be words / syllables (or even something else) as
well. The judgment of which unit should be used depends solely on application,
amount of training data, vocabulary size etc.