In general, word in a training transcript don't have appended pronunciation
variant (e.g., DID(2)). I've read some old notes online about creating force-
aligned transcripts, could you please verify the following steps are correct
or not missing anything?
1. Build the CI models using the training transcript without pronunciation variant.
2. Create the force-aligned used dictionary and filler dictionary. The filler dictionary only includes , , and <sil>. The noise words such as ++AH++ are merged into the dictionary.
3. Run s3align to generate the forced-aligned transcript.
4. Use the forced-aligned transcript to re-train starting from the CI, and eventually to build CD models. </sil>
I realized that the force-aligned transcript will not only mark pronunciation
variant, but also insert noise words. How do I verify if they are correct or
not? Or I just trust it will work?
Thanks for looking.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In general, word in a training transcript don't have appended pronunciation
variant (e.g., DID(2)). I've read some old notes online about creating force-
aligned transcripts, could you please verify the following steps are correct
or not missing anything?
1. Build the CI models using the training transcript without pronunciation variant.
2. Create the force-aligned used dictionary and filler dictionary. The filler dictionary only includes
,, and <sil>. The noise words such as ++AH++ are merged into the dictionary.3. Run s3align to generate the forced-aligned transcript.
4. Use the forced-aligned transcript to re-train starting from the CI, and eventually to build CD models. </sil>
I realized that the force-aligned transcript will not only mark pronunciation
variant, but also insert noise words. How do I verify if they are correct or
not? Or I just trust it will work?
Thanks for looking.
Hi
Steps are correct
Because you left only silences in filler dict, forced-align will insert ONLY
SILENCES. It will not insert other fillers
Yes, you need to trust it