Turkish Speech Recognition

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Turkish Speech Recognition

Forum: Sphinx4 Help

Creator: Anonymous

Created: 2012-02-13

Updated: 2012-09-21

Anonymous - 2012-02-13

We (me and some of my friends) have decided to create a mobile application to
do Turkish speech recognition. We read the tutorials and information about
pocketsphinx and tested pocketsphinx for English.

For Turkish we started with creating a simple language model. Then, we tried
to train the acoustic model according to this
link. Now I have some
questions:

Turkish has some letters like ş, ç, Ö, Ü. Would these letter be a problem for acoustic model and language model?

We couldn't really come up with a phoneset for Turkish. In Turkish we read as the same way as we write, but I think there is nothing to do with this feature considering phoneset. The link above has something about phones saying:

If you don't have a phonetic book, you can just use the word's spelling and
it gives very good results:
ONE O N E
TWO T W O

Would this work for us? For example we both have c and ç letters. c can be
represented with C, and the latter with CC.

Last but not least, are we in the right way to recognize Turkish speech with pocketsphinx?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-02-13

We (me and some of my friends) have decided to create a mobile application
to do Turkish speech recognition. We read the tutorials and information about
pocketsphinx and tested pocketsphinx for English.

That's great

Turkish has some letters like ş, ç, Ö, Ü. Would these letter be a problem
for acoustic model and language model?

No

Would this work for us? For example we both have c and ç letters. c can be
represented with C, and the latter with CC.

Yes

Last but not least, are we in the right way to recognize Turkish speech
with pocketsphinx?

Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2012-02-19

Well, we completed very first tests and everything went well. We created a
small vocabulary with around 50 words consisting of all characters of the
alphabet. As nshmyrev said, special characters didn't create a problem. And we
realized how training data is important and the effect of parameters such as
senones or so.

Now I have some other things to ask. We are planning to create a vocabulary
with 400-500 words (for a mobile application) to cover daily conversations.
Then, we will try to record as much data as possible to train acoustic model
(I am guessing around 100 people with 7-8 hours of recording).

How many different sentences should we use for recordings (of course they should cover all the vocabulary)? Does the number really matter or the number of different combinations?

Should we record all these sentences in a quiet environment with good quality or should we record some of them in such noisy environments?

What can you recommend different pronunciations due to accents?

Finally what should use for CFG_FINAL_NUM_DENSITIES (4 or 8) and CFG_N_TIED_STATES (2000 - 4000)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-02-19

First of all I recommend you to read the tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialam

It will answer some of your questions beforehand

How many different sentences should we use for recordings (of course they
should cover all the vocabulary)?

Ideally they all should be different, that would help to increase diversity

Does the number really matter or the number of different combinations?

There is no strict dependency however more diversity is better than less
diversity.

Should we record all these sentences in a quiet environment with good
quality or should we record some of them in such noisy environments?

Noisy recordings are better

What can you recommend different pronunciations due to accents?

Sorry, it's hard to understand this question

Finally what should use for CFG_FINAL_NUM_DENSITIES (4 or 8) and
CFG_N_TIED_STATES (2000 - 4000)?

You need to try all combinations and see which works better
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2012-02-28

Thanks for help. Everything goes well, but I have one more question about
recording. Should we record the speech as if we are talking normally in a
daily conversation or should we emphasize each word and wait a little bit
between words?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-02-28

You should speak normally as you speak in usual conversations

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2012-04-11

We have created an acoustic model for Turkish according to this
link. Currently we have
around ~500 words, 30 phones, and ~2,5 hours of recording. After preparation
of files etc., running acoustic model script took like ~4 minutes.

As it is suggested in "Using the Model" section, we observed the folder with
name <your_db_name>.cd_semi_<number_of senones="">. This folder is like 20KB. The
whole model_parameters is ~2 MB. This made me feel unsafe. Because the
accuracy is not as good as we expected. Would you suggest something for this
issue? We have checked logdir, but nothing really pops out.</number_of></your_db_name>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-04-13

Because the accuracy is not as good as we expected. Would you suggest
something for this issue? We have checked logdir, but nothing really pops out.

Tutorial has troubleshooting section, please read it

Tutorial also has recommendation for the amount of audio required to train the
system. Please read it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.