Re: [Kaldi-developers] methods for phoneme recognitions

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I don't think you can ever do without a language model, but you could start
off with a simple phone bigram trained on phone sequences extracted from
words.
Dan

On Tue, Feb 26, 2013 at 4:29 PM, Nathan Dunn <nd...@ca...> wrote:

>
> I'll definitely need children's corpuses regardless.
>
> However, I'm wondering if I can skip the decoding step such that I will
> not need a language / word model as I am only trying to match phonemes —
> not actual words.   Is there a way to do that with this system?  My
> intuition is "no" and if there was I would probably be losing valuable
> statistics data, but I would be more happy to be wrong.
>
> Thanks,
>
> Nathan
>
> On Feb 26, 2013, at 1:18 PM, Daniel Povey wrote:
>
> It seems to me that what you need to do is to create a suitable language
> model for sequences of phones.  E.g. get examples of the kind of phone
> sequences that children doing these exercises will typically produce, and
> build a language model on those the same way you would for word sequences.
>  You could accomplish this using a lexicon that was trivial, with one word
> for each phone.
> It will be difficult to get good results without matched training data, as
> children's speech is quite different from adults'.
> Dan
>
>
> On Tue, Feb 26, 2013 at 4:15 PM, Nathan Dunn <nd...@ca...>wrote:
>
>>
>> I'm trying to put to create a tool to recognize spoken phonemes for
>> children's reading comprehension, i.e., children speaking phonemes only,
>> not the words and of course not a sentence.
>>
>> After looking a bit more, it looks there are a couple of good options:
>>
>> 1 - (thanks Dan) Create a lexicon consisting of just phones, that you can
>> use at test time - removing the word-position-dependencey
>> 2 - Extract phones directly from transitions prior to word alignment
>> (i.e., directly from the acoustic model).
>>
>> For #2 - I would worry that the lack of information might be problematic.
>>  The advantage is that I only need enough data for the acoustic model.
>> Anyway, I would be very happy to share whatever I do come up with with.
>>
>> Any thoughts on this would be helpful.
>>
>> Thanks,
>>
>> Nathan Dunn, PhD.
>> 541-221-2418
>> CAS Scientific Programmer
>> http://blogs.uoregon.edu/casspr/
>> nd...@ca...
>>
>>
>>
>
>