Hi, pals.
Currently, I am doing a final project in my college about speech recognition and I am a new comer to this topic. My idea is using speech recognition in mobile app ( Windows Phone) to recognize number that user said. My target user is for children and teenage. This recognizer that used in my app is for self assesment test in Color blindness. So, instead of using keyboard as an input, user can directly say the number (only number) she/he sees in screen that in my application using Ishihara plate.
My questions are :
1. How do I build a language model in Bahasa Indonesia with such constraints (finite numbers, say it 1-100) ?
2.With such user target, Am I also have to train children's voice or I am only need to train random people's voice?
3.Is that possible to build such application in Windows Phone and using CMUSphinx as a LM builder and include it in Visual Studio as my IDE ?
Thank you.
Last edit: Hamdan Prakoso 2016-04-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For #2
What is the minimum number of childrens that I should record and for how long for such constraints? Am I only have to record they speak numbers ?
For #3
I just discovered there such a project. Thank you very much
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What is your grammar? to just recognize single word grammar with 100
words I used 8 hours of data from 120 people it don't worked well I used
200 hours approx it seems to be promising and don't just record the words
have some other text recorded.
I don't even know what is the grammar I should use, because I don't have knowledge about 'grammar' yet. The point is , in my application, I am only use the number for single shot command. So, the application will recognize one word, and that word is a number. Because the number list in Ishihara test is under 100, maybe I'll restrict the recognized number in 1-99.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have worked for bahasa ASR and the data requirements for it is entirely different we may need thrice or more than that of data we expected. each word has multiple pronunciations depends on child location.
finally choose the phoneset carefully.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are you saying that I need three times more data than that? I'll use it for my final project, I think I do not have enough research's time to gather data as much as that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are you saying that I need three times more data than that? I'll use it for my final project, I think I do not have enough research's time to gather data as much as that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, pals.
Currently, I am doing a final project in my college about speech recognition and I am a new comer to this topic. My idea is using speech recognition in mobile app ( Windows Phone) to recognize number that user said. My target user is for children and teenage. This recognizer that used in my app is for self assesment test in Color blindness. So, instead of using keyboard as an input, user can directly say the number (only number) she/he sees in screen that in my application using Ishihara plate.
My questions are :
1. How do I build a language model in Bahasa Indonesia with such constraints (finite numbers, say it 1-100) ?
2.With such user target, Am I also have to train children's voice or I am only need to train random people's voice?
3.Is that possible to build such application in Windows Phone and using CMUSphinx as a LM builder and include it in Visual Studio as my IDE ?
Thank you.
Last edit: Hamdan Prakoso 2016-04-13
As described in tutorial, you can create a grammar from the list of words. You can also train a language model, you need to read the tutorial
http://cmusphinx.sourceforge.net/wiki/tutoriallm
You need to train on children voice if you are going to recognize children
Yes, we have the sample project here:
https://github.com/cmusphinx/pocketsphinx-wp-demo
Thank you for immediate answer.
For #2
What is the minimum number of childrens that I should record and for how long for such constraints? Am I only have to record they speak numbers ?
For #3
I just discovered there such a project. Thank you very much
5 hour of recordings of 200 kids. You can find more details in acoustic model training tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialam
You don't have but you could
Wow, that was a large number of kids. Can I just sampling 50-100 kids with 5 hours total of recordings ?
What is your grammar? to just recognize single word grammar with 100
words I used 8 hours of data from 120 people it don't worked well I used
200 hours approx it seems to be promising and don't just record the words
have some other text recorded.
On Wednesday 13 April 2016, Hamdan Prakoso hamdanpe@users.sf.net wrote:
--
Best regards,
K.Rohith Gowtham
I don't even know what is the grammar I should use, because I don't have knowledge about 'grammar' yet. The point is , in my application, I am only use the number for single shot command. So, the application will recognize one word, and that word is a number. Because the number list in Ishihara test is under 100, maybe I'll restrict the recognized number in 1-99.
I have worked for bahasa ASR and the data requirements for it is entirely different we may need thrice or more than that of data we expected. each word has multiple pronunciations depends on child location.
finally choose the phoneset carefully.
Are you saying that I need three times more data than that? I'll use it for my final project, I think I do not have enough research's time to gather data as much as that.
Are you saying that I need three times more data than that? I'll use it for my final project, I think I do not have enough research's time to gather data as much as that.