Pronunciation generation, non english

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Pronunciation generation, non english

Forum: Speech Recognition Theory

Created: 2011-10-24

Updated: 2012-09-22

Li3 - 2011-10-24

Hello,

With my little knowledge of Sphinx tool set, I understand that for english
language, Sphinx has a predefined 40 phone set and provides online tools to
generate pronunciation dictionaries (for given set of words) based on this
phone set and its proprietary english dictionary.

Wondering how one needs to go about deciding a phone set and generating a
pronunciation dictionary (in Sphinx format) for any other language so that
he/she could use Pocketsphinx for doing speech recognition in that language.
(FSG based. No language model).

thanks, Li

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-10-25

provides online tools to generate pronunciation dictionaries (for given set
of words) based

Offline tools are also provided

on this phone set and its proprietary english dictionary.

Dictionary is open

http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Wondering how one needs to go about deciding a phone set and generating a
pronunciation dictionary (in Sphinx format) for any other language so that
he/she could use Pocketsphinx for doing speech recognition in that language.
(FSG based. No language model).

See

http://cmusphinx.sourceforge.net/wiki/tutorialdict

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joseph S. Wisniewski - 2011-11-01

Look for existing phone sets in multi-language language models for Sphinx on
this site

CMU Chinese

LIUM French

Mexican Spanish, origin unknown

VoxForge Dutch

VoxForge Russian

VoxForge German

VoxForge Spanish

Also look at the VoxForfge site, where there's additional models and phone
sets for

Italian

Hebrew

Greek

Bulgarian

Portuguese

French, with a different phone set than LIUM, I believe

The Julius site for Japaneses
The iAtros site for Spanish that, I believe, is different than the VoxForge
set.

Then look at the speech synthesis sites like Festival and Flite, to see what
phone sets they use for different languages. That's largely what VoxForge did,
using phone sets and dictionaries from speech synthesis projects.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Li3 - 2011-11-02

Thanks for pointers wiz, Li

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vera - 2012-01-09

hello.. please help me..
I try to build speech to text Japanese version with Pocketsphinx but I can't
make a dump file, so I can't train it.
This is the sample of my corpus text.
ああおいあかいあしたあつい

あなたあのありがとういいいえ

あのひとはだれですかいきますいくら

いくらですかいただきますいちいちがついつ

いまうえおおいしいおおいおげんきですか

おなかおなかすいたおねがいします

おはようおやすみかかぎかさかぜ

When I compile the sentences at http://www.speech.cs.cmu.edu/tools/lmtool-
new.html

there are errors in .log_pronounce
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line
WARN> found an empty line

I don't know if the sphinx knowledge base tool can't generate Japanese
characters. But Pocketsphinx has mandarin version so I think I can make
Japanese version.

What should I do?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-01-09

Online lmtool works only for English. See the tutorial for other options

http://cmusphinx.sourceforge.net/wiki/tutoriallm

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vera - 2012-01-10

nshmyrev, thanks for your answer..
Is it possible that julius (japanese open source) can be combined with
pocketsphinx? Do you know online lmtool for Japanese? Pocketsphinx can only
use Arpa system, is that right?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-01-15

Is it possible that julius (japanese open source) can be combined with
pocketsphinx?

No

Do you know online lmtool for Japanese?

No

Pocketsphinx can only use Arpa system, is that right?

Pocketsphinx can use any trigram language model in ARPA format. Model, not
system.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.