I haven't found any info with google so I will try my luck here.
My question is regarding MAP adaption in case you have a dictionary, where are sometimes more than one phonetic variants per word. Does the adaption take this somehow in case? Or is it needed to have a dictionary just with one phonetic variant per word.?
I am asking, because I am trying to recognize an accented speech and I will use dictionary where are variants added.
There are three options I could actually do:
1. Do adaptation with baseline dictionary (1 Variant per word) and apply my rules later.
2. Adaption with new extended dictionary (approx. 1,3-1,5 Variants per word) - not sure if that works with adaption - which is also a question of for you.
3. Adaption with new dictionary where are same rules as in 2. applied, however no additional variants are created, instead the baseline variant is over and over edited according to chosen rules. (1 Variant per word)
My rules are in form of tri-phones, for example: Every word containing triphone "Y n s" will be changed to "I n s" (or in case of 2. - a variation of that word created with "I n s").
I would be very thankful, if anyone could give me some advice.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Adaptation only considers dictionary variants if they are explicitly marked in transcription file. Like this:
<s>ihavethe(2)question</s>(utt1)
You can align transcription with sphinx3_align in order to assign pronunciation variants like this from a transcription without pronunciation variants. Trainer does this during force_align step for example.
As for adaptation strategy, the phonetic representation for transcripts must be as consistent with the audio as possible. Consistent in terms of spectrum and other usages of the same phoneme. So if you think that "Y n s" will be more consistent than "l n s" there is no need to change Y to l. Otherwise it is better to change. Everything depends on how close the target Y the original Y and original l.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I haven't found any info with google so I will try my luck here.
My question is regarding MAP adaption in case you have a dictionary, where are sometimes more than one phonetic variants per word. Does the adaption take this somehow in case? Or is it needed to have a dictionary just with one phonetic variant per word.?
I am asking, because I am trying to recognize an accented speech and I will use dictionary where are variants added.
There are three options I could actually do:
1. Do adaptation with baseline dictionary (1 Variant per word) and apply my rules later.
2. Adaption with new extended dictionary (approx. 1,3-1,5 Variants per word) - not sure if that works with adaption - which is also a question of for you.
3. Adaption with new dictionary where are same rules as in 2. applied, however no additional variants are created, instead the baseline variant is over and over edited according to chosen rules. (1 Variant per word)
My rules are in form of tri-phones, for example: Every word containing triphone "Y n s" will be changed to "I n s" (or in case of 2. - a variation of that word created with "I n s").
I would be very thankful, if anyone could give me some advice.
Adaptation only considers dictionary variants if they are explicitly marked in transcription file. Like this:
You can align transcription with sphinx3_align in order to assign pronunciation variants like this from a transcription without pronunciation variants. Trainer does this during force_align step for example.
As for adaptation strategy, the phonetic representation for transcripts must be as consistent with the audio as possible. Consistent in terms of spectrum and other usages of the same phoneme. So if you think that "Y n s" will be more consistent than "l n s" there is no need to change Y to l. Otherwise it is better to change. Everything depends on how close the target Y the original Y and original l.
Thank you, I will try some experiments and watch what gives the best results :)