forced-aligned transcripts

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

forced-aligned transcripts

Forum: Help

Created: 2011-05-13

Updated: 2012-09-22

Jake - 2011-05-13

In general, word in a training transcript don't have appended pronunciation
variant (e.g., DID(2)). I've read some old notes online about creating force-
aligned transcripts, could you please verify the following steps are correct
or not missing anything?
1. Build the CI models using the training transcript without pronunciation variant.
2. Create the force-aligned used dictionary and filler dictionary. The filler dictionary only includes , , and <sil>. The noise words such as ++AH++ are merged into the dictionary.
3. Run s3align to generate the forced-aligned transcript.
4. Use the forced-aligned transcript to re-train starting from the CI, and eventually to build CD models. </sil>

I realized that the force-aligned transcript will not only mark pronunciation
variant, but also insert noise words. How do I verify if they are correct or
not? Or I just trust it will work?

Thanks for looking.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-13

Hi

the following steps are correct or not missing anything?

Steps are correct

I realized that the force-aligned transcript will not only mark
pronunciation variant, but also insert noise words.

Because you left only silences in filler dict, forced-align will insert ONLY
SILENCES. It will not insert other fillers

Or I just trust it will work?

Yes, you need to trust it

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.