I'm thinking of training a custom acoustic model. The dictionary contains words with multiple pronunciations, like this:
eitheraɪðəeither(2)iðə
Not let's suppose one of the training samples is the phrase "You say either [i ð ə] and I say either [a ɪ ð ə]". What should the transcript file contain? Is the trainer smart enough to determine the correct pronunciation from context, so that the transcript can be "<s> you say either and i say either </s>"? Or do I need to give it the exact word alternatives, like this: "<s> you say either(2) and i say either </s>"?
Last edit: Daniel Wolf 2020-11-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm thinking of training a custom acoustic model. The dictionary contains words with multiple pronunciations, like this:
Not let's suppose one of the training samples is the phrase "You say either [i ð ə] and I say either [a ɪ ð ə]". What should the transcript file contain? Is the trainer smart enough to determine the correct pronunciation from context, so that the transcript can be "<s> you say either and i say either </s>"? Or do I need to give it the exact word alternatives, like this: "<s> you say either(2) and i say either </s>"?
Last edit: Daniel Wolf 2020-11-09