Hello
Let's considr we want to make a phonetic dictionary for Farsi digits from 1 to 10. this is the results in espeak with this command: espeak -v fa -x
And I should map them to a phoneset that I can use in CMUsphinx language model. so I built this mapping for them:
j y
e e
k k
d d
o o
s s
tS ch
A aa
h h
R r
p p
a a
n n
dZ j
S sh
f f
t t
Finally I should write my dictionary like this:
یک y e k
دو d o
سه s e
چهار ch aa h aa r
پنج p a n j
شش sh e sh
هفت h a f t
هشت h a sh t
نه n o h
ده d a h
Ok, am I in the right way?
What's the next step?
Do I have the phonetic dictionary and phoneset file for my project? should I go forward to training my system with these two files to get my language model?
Last edit: rezaee 2016-10-08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to provide context in which particular words symbols like , or : happen. Symbol : usulaly means prolonged phone, which you can use in phonset or ignore, depends on how frequently it happens.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is no standards but there are rules described in tutorial: spaces between phonemes, lowercase, no punctuation in phonemes. Those make it easier for software to process input files.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So the mapping file isn't important for project and it's only a self guied for ourselve to writing the dictionary. and the phoneset is the file that consist of characters we used for building our dictionary.
I have written following characters in a file with ".phone" extension and writing my dictionary with these. they are all of the phones that we need to write Persian words in dictionary. so this is my phoneset file I think?
a
e
o
aa
i
u
b
p
t
s
j
ch
h
kh
d
z
r
z
zh
s
sh
s
z
t
z
gh
f
gh
k
g
l
m
n
v
h
y
ss
Last edit: rezaee 2016-10-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What about filler dictionary?
How should I build that?
I read the tutorial in acoustic training part but it was't enogh for me!
Can you explain more please?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another question
I used online language modeling service to put my text file senteces between by downloding it's ".sent" file. is there any command for doing this in ofline as easy as the online service?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I read the tutorial in acoustic training part but it was't enogh for me!
You need to ask more detailed question then
I used online language modeling service to put my text file senteces between by downloding it's ".sent" file. is there any command for doing this in ofline as easy as the online service?
SRILM does not require you to insert <s>, it adds them automatically. Otherwise you can write a simple Python script.
Last edit: Nickolay V. Shmyrev 2016-10-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello
Let's considr we want to make a phonetic dictionary for Farsi digits from 1 to 10. this is the results in espeak with this command:
espeak -v fa -x
And I should map them to a phoneset that I can use in CMUsphinx language model. so I built this mapping for them:
Finally I should write my dictionary like this:
Ok, am I in the right way?
What's the next step?
Do I have the phonetic dictionary and phoneset file for my project? should I go forward to training my system with these two files to get my language model?
Last edit: rezaee 2016-10-08
Yes
Continue with speech data collection and training
You have the dictionary, phoneset must be compiled. Phoneset should lists phones.
Yes
You have the dictionary, phoneset must be compiled. Phoneset should lists phones.
What do you mean by this? how should I compile? what is list of phones?
you mean I can not do my project with these 2 files?
May you put a phoneset file here and I can see what is it?
Do I need phoneset for next steps?
This question is answered in acoustic model training tutorial "data preparation" section.
Another question is. what do I must do with these symbols:
' , :
I didn't consider them into writing my phoneset mapping. will this make a problem?
Last edit: rezaee 2016-10-08
You need to provide context in which particular words symbols like , or : happen. Symbol : usulaly means prolonged phone, which you can use in phonset or ignore, depends on how frequently it happens.
Why should we map the phonems from espeak? because it has a Standard that CMUsphinx knows it?
There is no standards but there are rules described in tutorial: spaces between phonemes, lowercase, no punctuation in phonemes. Those make it easier for software to process input files.
So the mapping file isn't important for project and it's only a self guied for ourselve to writing the dictionary. and the phoneset is the file that consist of characters we used for building our dictionary.
I have written following characters in a file with ".phone" extension and writing my dictionary with these. they are all of the phones that we need to write Persian words in dictionary. so this is my phoneset file I think?
Last edit: rezaee 2016-10-09
What about filler dictionary?
How should I build that?
I read the tutorial in acoustic training part but it was't enogh for me!
Can you explain more please?
Another questionby downloding it's ".sent" file. is there any command for doing this in ofline as easy as the online service?
I used online language modeling service to put my text file senteces between
Yes
You need to ask more detailed question then
SRILM does not require you to insert
<s>
, it adds them automatically. Otherwise you can write a simple Python script.Last edit: Nickolay V. Shmyrev 2016-10-09
Unfortunately I don't know python! is there any existen script to addand () after sentences?
hi
plz check your mail...