Hi
I am confused about this three things on the tutorial:http://cmusphinx.sourceforge.net/wiki/tutorialam
The tutorial name is bulding an acoustic model, but it speaks about adapting a trained existing model or training an existing model!
So, we have no toolkit to build a model and we shoul either adapt or train an existing model?
If I want to build a model for other language(Persian), should I adapt an existing model or train or build a new one? how?
Sorry if the question is very low-level or repeatitive in advanced!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you Nikolay,
But I asked about language model there, and thoght it's better to creat a new thread for acoustic model! sorry if it was out of the rules.
by the way,
So I should find a sound in persian, writing it in Persian too(like creating a Persian subtitle for a Persian movie, or use a text book with it's reading sound) , then I must train the CMUsphinx engine with these materials, is it true?
Also I can find any sound files in Persian and writing it's contents by hand, true?
And the words that those are in the sounds should be existing in my grammar file too.
I read in tutorial it's enogh to have few hours of sound to make a command-controll system, so I think I should make for example 4 to 5 hours voice recording of different peoples that those only say the ssentences that I have in my language model, is it right?
And I can not creating persian subtitle for 3 or 4 movies and using them for my system! it only must be the sentences that I want in my command and control system!
Oh sorry if my English is very horrible!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Also I can find any sound files in Persian and writing it's contents by hand, true?
Yes
Also I can find any sound files in Persian and writing it's contents by hand, true?
Also yes
I should make for example 4 to 5 hours voice recording of different peoples that those only say the ssentences that I have in my language model, is it right?
There is no need to reconfirm what is already written
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Adaptation is known to work well when you are using different recording environments (close-distance or far microphone or telephone channel), or when a slightly different accent is present (UK English or even Indian English) or even another language. Adaptation, for example, works well if you need to quickly add support for some new language just by mapping acoustic model phoneset to target phoneset with the dictionary.
Hi
I am confused about this three things on the tutorial:http://cmusphinx.sourceforge.net/wiki/tutorialam
The tutorial name is bulding an acoustic model, but it speaks about adapting a trained existing model or training an existing model!
So, we have no toolkit to build a model and we shoul either adapt or train an existing model?
If I want to build a model for other language(Persian), should I adapt an existing model or train or build a new one? how?
Sorry if the question is very low-level or repeatitive in advanced!
You need to train a new model for Farsi
I recently answered this question by you already:
https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/7ddf3bef/
There is no need to start a new thread on the same subject, you can continue in the old thread
Thank you Nikolay,
But I asked about language model there, and thoght it's better to creat a new thread for acoustic model! sorry if it was out of the rules.
by the way,
So I should find a sound in persian, writing it in Persian too(like creating a Persian subtitle for a Persian movie, or use a text book with it's reading sound) , then I must train the CMUsphinx engine with these materials, is it true?
Also I can find any sound files in Persian and writing it's contents by hand, true?
And the words that those are in the sounds should be existing in my grammar file too.
I read in tutorial it's enogh to have few hours of sound to make a command-controll system, so I think I should make for example 4 to 5 hours voice recording of different peoples that those only say the ssentences that I have in my language model, is it right?
And I can not creating persian subtitle for 3 or 4 movies and using them for my system! it only must be the sentences that I want in my command and control system!
Oh sorry if my English is very horrible!
Yes
Also yes
There is no need to reconfirm what is already written
I couldnt understand should I adapt or train a model for Farsi?
You should train. If you collect enough data (> 20 hours of transcribed speech), we'll train the model for you.