I am looking to compare the forced alignments I generated for the TIMIT
dataset with the ground truth provided in the corpus. However, The 'text'
file from the data preparation step and the PHN files provided in the
corpus (which hold the ground truth) provide differing phoneme sequences
for each utterance. For instance, take the sample utterance FAEM0_SX42:
phoneme sequence in the *text* file:
sil b ih vcl b l ih cl k el s cl k aa l er z sil aa r vcl g y uw hh ih s cl
t r iy sil
phoneme sequence in *PHN* file (ground truth):
h# b ih bcl b l ih kcl k el s kcl k aa l er z pau q aa r gcl g y ux hv ih s
tcl t r iy h#
As you can see, the phoneme sequences in both files differ by several
phonemes, disregarding the h#/sil phonemes.
1. Is this normal? How can I accurately test the validity of my alignments
when the ground truth specifies different phoneme sequences than my
generated alignments?
2. Is there a script that would provide the phoneme error rate for the
generated alignments?
3. What kind of metric can I use to compare my forced alignments to the
ground truth?
|