Daniel Wolf - 2020-11-09

I'm thinking of training a custom acoustic model. The dictionary contains words with multiple pronunciations, like this:

either a ɪ ð ə
either(2) i ð ə

Not let's suppose one of the training samples is the phrase "You say either [i ð ə] and I say either [a ɪ ð ə]". What should the transcript file contain? Is the trainer smart enough to determine the correct pronunciation from context, so that the transcript can be "<s> you say either and i say either </s>"? Or do I need to give it the exact word alternatives, like this: "<s> you say either(2) and i say either </s>"?

 

Last edit: Daniel Wolf 2020-11-09