From: Nathan D. <nd...@ca...> - 2013-02-26 21:30:19
|
I'll definitely need children's corpuses regardless. However, I'm wondering if I can skip the decoding step such that I will not need a language / word model as I am only trying to match phonemes — not actual words. Is there a way to do that with this system? My intuition is "no" and if there was I would probably be losing valuable statistics data, but I would be more happy to be wrong. Thanks, Nathan On Feb 26, 2013, at 1:18 PM, Daniel Povey wrote: > It seems to me that what you need to do is to create a suitable language model for sequences of phones. E.g. get examples of the kind of phone sequences that children doing these exercises will typically produce, and build a language model on those the same way you would for word sequences. You could accomplish this using a lexicon that was trivial, with one word for each phone. > It will be difficult to get good results without matched training data, as children's speech is quite different from adults'. > Dan > > > On Tue, Feb 26, 2013 at 4:15 PM, Nathan Dunn <nd...@ca...> wrote: > > I'm trying to put to create a tool to recognize spoken phonemes for children's reading comprehension, i.e., children speaking phonemes only, not the words and of course not a sentence. > > After looking a bit more, it looks there are a couple of good options: > > 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can use at test time - removing the word-position-dependencey > 2 - Extract phones directly from transitions prior to word alignment (i.e., directly from the acoustic model). > > For #2 - I would worry that the lack of information might be problematic. The advantage is that I only need enough data for the acoustic model. Anyway, I would be very happy to share whatever I do come up with with. > > Any thoughts on this would be helpful. > > Thanks, > > Nathan Dunn, PhD. > 541-221-2418 > CAS Scientific Programmer > http://blogs.uoregon.edu/casspr/ > nd...@ca... > > > |