As my language doesn't use english alphabets and with practice I have realised I can create a more restrictive and 'true' phone set. Although it works I would like to question is this practice okay ? or do I need english alphabets ? If I use english a lot of words will have the same phones but is pronounded differently.
समूह sa mu ha
can be
समूह स मू ह
and the phoneset
sa
mu
ha
can be
स
मू
ह
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nikolay,
I have gone thru the mateiral. My language is Nepali and not Hindi (Although similar). I may not have explained myself properly I will try again.
The document states
By default is pocketsphinx already trained for english language ?
I believe pocketshinx is the base to develop a speech recognition system from scratch. What I have done is seperate each character (letter/ alphabeth) of my language. I have only combined Diacritic .
Basically what I am doing is using my own phoneme set (Nepali Language ) and not linking / converting it into english. Also, I change the code such that it now accepts phones around 350 which is sufficient for my whole language.
I want to know is it a good practice or not to completely use phoneme / dictionary etc of my own characters and not english letters ?
Please check my attached file if I have not made myself clear.
This is the link to the whole project
Last edit: pannam 2017-03-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you need to find phonetic dictionary, read Wikipedia or a book on phonetics. If you are using existing phonetic dictionary. Do not use case-sensitive variants like “e” and “E”. Instead, all your phones must be different even in case-insensitive variation. Sphinxtrain doesn't support some special characters like '*' or '/' and supports most of others like “+” or “-” or “:” But to be safe we recommend you to use alphanumeric-only phone-set.
Replace special characters in the phone-set, like colons or dashes or tildes, with something alphanumeric. For example, replace “a~” with “aa” to make it alphanumeric only. Nowadays, even cell phones have gigabytes of memory on board. There is no sense in trying to save space with cryptic special characters.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As my language doesn't use english alphabets and with practice I have realised I can create a more restrictive and 'true' phone set. Although it works I would like to question is this practice okay ? or do I need english alphabets ? If I use english a lot of words will have the same phones but is pronounded differently.
and the phoneset
Proper Hindi phoneset is covered in publications like this one
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.6058
This is a syllable set, not phoneset, it is not going to work well in training since algorithm expects individual phones.
You can also read
https://en.wikipedia.org/wiki/Hindustani_phonology
Hi Nikolay,
I have gone thru the mateiral. My language is Nepali and not Hindi (Although similar). I may not have explained myself properly I will try again.
The document states
By default is pocketsphinx already trained for english language ?
I believe pocketshinx is the base to develop a speech recognition system from scratch. What I have done is seperate each character (letter/ alphabeth) of my language. I have only combined Diacritic .
Basically what I am doing is using my own phoneme set (Nepali Language ) and not linking / converting it into english. Also, I change the code such that it now accepts phones around 350 which is sufficient for my whole language.
I want to know is it a good practice or not to completely use phoneme / dictionary etc of my own characters and not english letters ?
Please check my attached file if I have not made myself clear.
This is the link to the whole project
Last edit: pannam 2017-03-21
Your phoneset is not really a phoneset.
350 is not the number of the phones of your language, its something different. There are languages with hundred of phones, but not Nepali.
No, it is a bad practice.
so, do I use International phonetic alphabets -IPA or ISO 15919 from this link ?https://en.wikipedia.org/wiki/Help:IPA_for_Hindi_and_Urdu
You need to follow the tutorial: