I try to create a new acoustic model for my language. I used the latest versions of sphinxtrain, sphinxbase and pocketsphinx: sphinxtrain-5prealpha, sphinxbase-5prealpha and pocketsphinx-5prealpha. When I train my model, the sphinxtrain stop after phase 6 Checking that all the words in the transcript are in the dictionary. My dictionary has a lot of duplicated phones because of the nature of our language, Myanmar. I shared my logdir folder. [https://www.dropbox.com/s/6qcl9ndl6wpqcuy/logdir.rar?dl=0]In that folder, only 000.comp_feat folder contains. Please advise and help me.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is no phonetic dictionary in Myanmar. Therefore, I try to created one syllable to one phone mapping. To create phonetic dictionary, MLC (Myanmar Language Commession) dictionary has over 200000 words and I cannot write those words to phonetic dictionary manually. I also used Phonetisaurus tools to create g2p mapping but the results are wrong. So I created phonetic dictionary for Myanmar syllables and there are over 2000 syllables in Myanmar. Please guide me for creating phonetic dictionary.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Now I created new phonetic dictionary and phones files. I setup my model in Ubuntu 12.04 at VMware. But when I train my model, he sphinxtrain stop after phase 6 Checking that all the words in the transcript are in the dictionary. At this stage, I faced many warnings.Can those warnings stop training process?
I posted my etc folder and logdir folder.
I try to create a new acoustic model for my language. I used the latest versions of sphinxtrain, sphinxbase and pocketsphinx: sphinxtrain-5prealpha, sphinxbase-5prealpha and pocketsphinx-5prealpha. When I train my model, the sphinxtrain stop after phase 6 Checking that all the words in the transcript are in the dictionary. My dictionary has a lot of duplicated phones because of the nature of our language, Myanmar. I shared my logdir folder.
[https://www.dropbox.com/s/6qcl9ndl6wpqcuy/logdir.rar?dl=0]In that folder, only 000.comp_feat folder contains. Please advise and help me.
This is my logdir folder.
Phones must be unique, you should also fix other warnings reported.
There is nothing in Myanmar different from other languages, you still can properly select a phoneset according to the wikipedia page
https://en.wikipedia.org/wiki/Burmese_language
You need to share your etc folder if you need help on the database setup, logdir folder is not enough since all your mistakes are in etc folder.
Some words and syllables have two different phones and I faced a lot of warnings in this case.
This is etc folder.
Please open the wikipedia page linked above and read about phonemes of your language. Please do not use syllables as phones.
There is no phonetic dictionary in Myanmar. Therefore, I try to created one syllable to one phone mapping. To create phonetic dictionary, MLC (Myanmar Language Commession) dictionary has over 200000 words and I cannot write those words to phonetic dictionary manually. I also used Phonetisaurus tools to create g2p mapping but the results are wrong. So I created phonetic dictionary for Myanmar syllables and there are over 2000 syllables in Myanmar. Please guide me for creating phonetic dictionary.
Your dictionary should look like this:
Phones must be separated by spaces. Syllables are not phones.
Now I created new phonetic dictionary and phones files. I setup my model in Ubuntu 12.04 at VMware. But when I train my model, he sphinxtrain stop after phase 6 Checking that all the words in the transcript are in the dictionary. At this stage, I faced many warnings.Can those warnings stop training process?
I posted my etc folder and logdir folder.
All the words from the train transcription must be in a dictionary
This is logdir folder.