I am developing Arabic acoustic model, during my research I found that :
The structure of the syllable in Arabic is, of course, based on the phonemic
of Arabic. The peak, or nucleus, is always the most prominent element of the
Arabic syllable. It must be composed of a vowel, either long or short Literal
Arabic has six syllable patterns. Arabic has one open-short syllable, one-long
syllable and four closed-long ones.
Arabic Syllable Patterns
Open Short : CV
Open Long : CVV
Closed Long : CVC,CVVC,CVCC, CVVCC
The Arabic syllable system has the following features:
1. The syllable must begin with a consonant followed by a vowel.
2. The syllable never begins with two consonants.
3. No one phoneme syllable exists in Arabic.
The CVCC occurred only finally or in isolation, the CVVCC occurs only finally,
while the other four patterns occur initially, medially and finally.
The CV, CVV and CVC patterns occur much more often than others; CV being the
most frequent of all.
We can only begin (a sequence) with vowel, just as we can only make a pause
with consonants , and tow consonant cannot come together except at the pause
(which brings in a supporting sound) and when the first letter has along
segment
Rules of Phonological Arabic System
The first basic rule that operates in the phonological system of Arabic
without exception is that the number of syllables in an utterance is equal to
the number of vowels. The issue, then, is not the number of syllables in an
utterance, since this is automatic, but rather the boundaries that are
signaled either by zero, one or two consonants.
The second basic rule of Arabic phonology is that the onset of the syllable
equals the beginning of an utterance. Thus, both can begin with a single
consonant.
The third rule is that the coda of the syllable is identical with the end of
an utterance, coinciding infinitely with the codas of the six syllable types
previously postulated.
Accordingly, syllables in Arabic can be either open or closed, i.e., they can
end in one or two consonants, respectively.
Clearly, then, one should use the three rules just stated to begin the process
of segmentation in Arabic. When properly applied, these rules enable one to
segment almost any utterance in Arabic correctly and easily, for they make the
division between the coda and the onset of nearly all contiguous syllables
clear-cut.
Do you think that I should follow new training strategy rather than that for
English ( English language theoretically has about 300 syllables )?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank you Nicolay
To be honest with you, right now I don’t have clear idea, but the first
question came to my mind is that are the training algorithms used in sphinx
tailored to English language only, or is it independent of language?
If it is language dependent then I might explore new approach ( not concrete
yet) :
By training A syllable-based context independent (CI) with segmented
syllables. using A syllable-based lexicon as the dictionary.
The identification of syllables will done by an algorithm which leverages
linguistic rules of Arabic to segment words into prosodic syllables.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
CMUSphinx technology doesn't depend on the language.
You approach will work but I don't see anything unique or specific for your
language in it. Just another approach that will be better in some cases and
worse in others. It's interesting to try if you have time and passion.
Syllable-based approach has some advantages for quick spontaneous speech but
it's not that simple.
There are fundamental ASR issues which cause problems with accuracy but their
investigation is way more complex problem than just trying another
segmentation in hope it will be few percent better. If you are looking for
perspective direction of the research, you could probably start with the
definition of the issue you need to solve and then with investigation of the
reasons behind it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi Nikolay
i make new algorithm to segment the utterance to syllables and i have about
1040 phones in the phone list when start training i have this error
/home/hiyassat/tutorial/an4/bin/mk_mdef_gen \
-phnlstfn /home/hiyassat/tutorial/an4/etc/an4.phone \
-dictfn /home/hiyassat/tutorial/an4/etc/an4.dic \
-fdictfn /home/hiyassat/tutorial/an4/etc/an4.filler \
-lsnfn /home/hiyassat/tutorial/an4/etc/an4_train.transcription \
-ountiedmdef /home/hiyassat/tutorial/an4/model_architecture/an4.untied.mdef \
-n_state_pm 1
-help no no
-example no no
-phnlstfn /home/hiyassat/tutorial/an4/etc/an4.phone
-inCImdef
-triphnlstfn
-inCDmdef
-dictfn /home/hiyassat/tutorial/an4/etc/an4.dic
-fdictfn /home/hiyassat/tutorial/an4/etc/an4.filler
-lsnfn /home/hiyassat/tutorial/an4/etc/an4_train.transcription
-n_state_pm 3 1
-ocountfn
-ocimdef
-oalltphnmdef
-ountiedmdef /home/hiyassat/tutorial/an4/model_architecture/an4.untied.mdef
-minocc 1 1
-maxtriphones 100000 100000
INFO: main.c(92): Will write untied mdef file
/home/hiyassat/tutorial/an4/model_architecture/an4.untied.mdef
INFO: mk_mdef_gen.c(183): 0 single word triphones in input phone list
INFO: mk_mdef_gen.c(184): 0 word beginning triphones in input phone list
INFO: mk_mdef_gen.c(185): 0 word internal triphones in input phone list
INFO: mk_mdef_gen.c(186): 0 word ending triphones in input phone list
INFO: mk_mdef_gen.c(272): Reading dict /home/hiyassat/tutorial/an4/etc/an4.dic
INFO: mk_mdef_gen.c(304): 1800 words in dict
/home/hiyassat/tutorial/an4/etc/an4.dic
INFO: mk_mdef_gen.c(272): Reading dict
/home/hiyassat/tutorial/an4/etc/an4.filler
INFO: mk_mdef_gen.c(304): 3 words in dict
/home/hiyassat/tutorial/an4/etc/an4.filler
INFO: mk_mdef_gen.c(421): 1803 words in dictionary
INFO: mk_mdef_gen.c(422): 12220824 unique single word triphones in dictionary
INFO: mk_mdef_gen.c(423): 623271 unique word beginning triphones in dictionary
INFO: mk_mdef_gen.c(424): 2647 unique word internal triphones in dictionary
INFO: mk_mdef_gen.c(425): 523594 unique word ending triphones in dictionary
WARNING: "mk_mdef_gen.c", line 455: Out of vocabulary words in transcript will
be mapped to SIL!
INFO: mk_mdef_gen.c(561): 228502 words in transcripts
INFO: mk_mdef_gen.c(562): 8397 single word triphones in transcripts
INFO: mk_mdef_gen.c(563): 110962 word beginning triphones in transcripts
INFO: mk_mdef_gen.c(564): 189772 word internal triphones in transcripts
INFO: mk_mdef_gen.c(565): 110962 word ending triphones in transcripts
INFO: mk_mdef_gen.c(608): 420093 triphones extracted from transcripts
INFO: mk_mdef_gen.c(609): 6528 unique triphones extracted from transcripts
INFO: mk_mdef_gen.c(610): 350 triphones occur once in the transcripts
INFO: mk_mdef_gen.c(611): 191 triphones occur twice in the transcripts
INFO: mk_mdef_gen.c(612): 102 triphones occur thrice in the transcripts
INFO: mk_mdef_gen.c(613): The rest of the triphones occur more than three
times
INFO: mk_mdef_gen.c(614): Count threshold is 1
INFO: mk_mdef_gen.c(835): 1031 n_base, 6528 n_tri
FATAL_ERROR: "ckd_alloc.c", line 79: Calloc failed from itree.c(64)
Fri Dec 30 06:54:45 2011
i thought the problem because of the -maxtriphones 100000
i tried to increase it to 12220824 in the mk_mdef_gen but it did not build the
triphonelist and start the bw without building it
do you have any suggestions
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
when training using syllables , i found that we have 12220824 unique single
word triphones in dictionary, while the configuration file say that
-maxtriphones 100000 100000
i tried to increase it to -maxtriphones 12220824 in the mk_mdef_gen but it did
not build the triphonelist and start the bw without building it
do you have any suggestions ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure you need to build tied-state acoustic model with syllables.
Instead, you might want to add more states to HMM and just use CI models for
decoding.
As for mk_mdef_gen bug I need to have the data to reproduce it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am developing Arabic acoustic model, during my research I found that :
The structure of the syllable in Arabic is, of course, based on the phonemic
of Arabic. The peak, or nucleus, is always the most prominent element of the
Arabic syllable. It must be composed of a vowel, either long or short Literal
Arabic has six syllable patterns. Arabic has one open-short syllable, one-long
syllable and four closed-long ones.
Arabic Syllable Patterns
Open Short : CV
Open Long : CVV
Closed Long : CVC,CVVC,CVCC, CVVCC
The Arabic syllable system has the following features:
1. The syllable must begin with a consonant followed by a vowel.
2. The syllable never begins with two consonants.
3. No one phoneme syllable exists in Arabic.
The CVCC occurred only finally or in isolation, the CVVCC occurs only finally,
while the other four patterns occur initially, medially and finally.
The CV, CVV and CVC patterns occur much more often than others; CV being the
most frequent of all.
We can only begin (a sequence) with vowel, just as we can only make a pause
with consonants , and tow consonant cannot come together except at the pause
(which brings in a supporting sound) and when the first letter has along
segment
Rules of Phonological Arabic System
The first basic rule that operates in the phonological system of Arabic
without exception is that the number of syllables in an utterance is equal to
the number of vowels. The issue, then, is not the number of syllables in an
utterance, since this is automatic, but rather the boundaries that are
signaled either by zero, one or two consonants.
The second basic rule of Arabic phonology is that the onset of the syllable
equals the beginning of an utterance. Thus, both can begin with a single
consonant.
The third rule is that the coda of the syllable is identical with the end of
an utterance, coinciding infinitely with the codas of the six syllable types
previously postulated.
Accordingly, syllables in Arabic can be either open or closed, i.e., they can
end in one or two consonants, respectively.
Clearly, then, one should use the three rules just stated to begin the process
of segmentation in Arabic. When properly applied, these rules enable one to
segment almost any utterance in Arabic correctly and easily, for they make the
division between the coda and the onset of nearly all contiguous syllables
clear-cut.
Do you think that I should follow new training strategy rather than that for
English ( English language theoretically has about 300 syllables )?
Hello
You explained the syllable system for Arabic. But what is the training
strategy you are talking about. It is not quite clear for me.
thank you Nicolay
To be honest with you, right now I don’t have clear idea, but the first
question came to my mind is that are the training algorithms used in sphinx
tailored to English language only, or is it independent of language?
If it is language dependent then I might explore new approach ( not concrete
yet) :
By training A syllable-based context independent (CI) with segmented
syllables. using A syllable-based lexicon as the dictionary.
The identification of syllables will done by an algorithm which leverages
linguistic rules of Arabic to segment words into prosodic syllables.
CMUSphinx technology doesn't depend on the language.
You approach will work but I don't see anything unique or specific for your
language in it. Just another approach that will be better in some cases and
worse in others. It's interesting to try if you have time and passion.
Syllable-based approach has some advantages for quick spontaneous speech but
it's not that simple.
There are fundamental ASR issues which cause problems with accuracy but their
investigation is way more complex problem than just trying another
segmentation in hope it will be few percent better. If you are looking for
perspective direction of the research, you could probably start with the
definition of the issue you need to solve and then with investigation of the
reasons behind it.
Dear hiyassat ,
Do you generate Syllable-based Arabic acoustic model , if yes does it have a
better accuracy than phoneme based acoustic model ?
hi Nikolay
i make new algorithm to segment the utterance to syllables and i have about
1040 phones in the phone list when start training i have this error
/home/hiyassat/tutorial/an4/bin/mk_mdef_gen \
-phnlstfn /home/hiyassat/tutorial/an4/etc/an4.phone \
-dictfn /home/hiyassat/tutorial/an4/etc/an4.dic \
-fdictfn /home/hiyassat/tutorial/an4/etc/an4.filler \
-lsnfn /home/hiyassat/tutorial/an4/etc/an4_train.transcription \
-ountiedmdef /home/hiyassat/tutorial/an4/model_architecture/an4.untied.mdef \
-n_state_pm 1
-help no no
-example no no
-phnlstfn /home/hiyassat/tutorial/an4/etc/an4.phone
-inCImdef
-triphnlstfn
-inCDmdef
-dictfn /home/hiyassat/tutorial/an4/etc/an4.dic
-fdictfn /home/hiyassat/tutorial/an4/etc/an4.filler
-lsnfn /home/hiyassat/tutorial/an4/etc/an4_train.transcription
-n_state_pm 3 1
-ocountfn
-ocimdef
-oalltphnmdef
-ountiedmdef /home/hiyassat/tutorial/an4/model_architecture/an4.untied.mdef
-minocc 1 1
-maxtriphones 100000 100000
INFO: main.c(92): Will write untied mdef file
/home/hiyassat/tutorial/an4/model_architecture/an4.untied.mdef
INFO: mk_mdef_gen.c(183): 0 single word triphones in input phone list
INFO: mk_mdef_gen.c(184): 0 word beginning triphones in input phone list
INFO: mk_mdef_gen.c(185): 0 word internal triphones in input phone list
INFO: mk_mdef_gen.c(186): 0 word ending triphones in input phone list
INFO: mk_mdef_gen.c(272): Reading dict /home/hiyassat/tutorial/an4/etc/an4.dic
INFO: mk_mdef_gen.c(304): 1800 words in dict
/home/hiyassat/tutorial/an4/etc/an4.dic
INFO: mk_mdef_gen.c(272): Reading dict
/home/hiyassat/tutorial/an4/etc/an4.filler
INFO: mk_mdef_gen.c(304): 3 words in dict
/home/hiyassat/tutorial/an4/etc/an4.filler
INFO: mk_mdef_gen.c(421): 1803 words in dictionary
INFO: mk_mdef_gen.c(422): 12220824 unique single word triphones in dictionary
INFO: mk_mdef_gen.c(423): 623271 unique word beginning triphones in dictionary
INFO: mk_mdef_gen.c(424): 2647 unique word internal triphones in dictionary
INFO: mk_mdef_gen.c(425): 523594 unique word ending triphones in dictionary
WARNING: "mk_mdef_gen.c", line 455: Out of vocabulary words in transcript will
be mapped to SIL!
INFO: mk_mdef_gen.c(561): 228502 words in transcripts
INFO: mk_mdef_gen.c(562): 8397 single word triphones in transcripts
INFO: mk_mdef_gen.c(563): 110962 word beginning triphones in transcripts
INFO: mk_mdef_gen.c(564): 189772 word internal triphones in transcripts
INFO: mk_mdef_gen.c(565): 110962 word ending triphones in transcripts
INFO: mk_mdef_gen.c(608): 420093 triphones extracted from transcripts
INFO: mk_mdef_gen.c(609): 6528 unique triphones extracted from transcripts
INFO: mk_mdef_gen.c(610): 350 triphones occur once in the transcripts
INFO: mk_mdef_gen.c(611): 191 triphones occur twice in the transcripts
INFO: mk_mdef_gen.c(612): 102 triphones occur thrice in the transcripts
INFO: mk_mdef_gen.c(613): The rest of the triphones occur more than three
times
INFO: mk_mdef_gen.c(614): Count threshold is 1
INFO: mk_mdef_gen.c(835): 1031 n_base, 6528 n_tri
FATAL_ERROR: "ckd_alloc.c", line 79: Calloc failed from itree.c(64)
Fri Dec 30 06:54:45 2011
i thought the problem because of the -maxtriphones 100000
i tried to increase it to 12220824 in the mk_mdef_gen but it did not build the
triphonelist and start the bw without building it
do you have any suggestions
when training using syllables , i found that we have 12220824 unique single
word triphones in dictionary, while the configuration file say that
-maxtriphones 100000 100000
i tried to increase it to -maxtriphones 12220824 in the mk_mdef_gen but it did
not build the triphonelist and start the bw without building it
do you have any suggestions ?
I'm not sure you need to build tied-state acoustic model with syllables.
Instead, you might want to add more states to HMM and just use CI models for
decoding.
As for mk_mdef_gen bug I need to have the data to reproduce it.
thank you Nikolay and happy new year
i will tray your suggestions
please find the resource on this link
https://docs.google.com/open?id=0BwYORLUwIzfZNTY1MGJmMGQtZjNhYy00ZmJhLWE3MWUt
NzY1ZTZhMTI2N2Y3
Dears,
sorry for interrupting you . i'm facing the same issue :(
I think issue in mk_mdef_gen step is memory issue , did you monitor the
machine RAM memory during training process ?
Dear Nikolay,
sorry for disturbing you, did you manage to download the files,?
Yes, I did. Thanks for the data.