Re: [Kaldi-developers] Multistream ASR

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Daniel,

Thanks, we are working on that.

Best,

Felipe Espic

Quoting Daniel Povey <dp...@gm...>:

> The scoring is normally based on words so the confusion matrix is output as
> a sequence of words.  There are ways to do what you want, involving the
> program ali-to-phones, that would involve aligning the training data with
> steps/align_fmllr.sh or align_si.sh and comparing with the best alignment
> from the decode, then putting it into compute-wer and asking it to output
> the detailed information.  But I don't have time right now to explain it in
> detail.
> Dan
>
>
>
> On Mon, Mar 17, 2014 at 1:39 PM, <fe...@in...> wrote:
>
>> Hi Daniel,
>>
>> Thanks for you quick reply.
>>
>> We want to use confusion matrices to see which phonemes (or types of
>> phonemes) are misclassified.
>>
>> Is there any other way you can suggest to do this?
>>
>> Thanks,
>>
>> Felipe Espic
>>
>>
>>
>>
>> Quoting Daniel Povey <dp...@gm...>:
>>
>>  Hi,
>>> There is no explicit support for multi-stream ASR in Kaldi, you'll have to
>>> try to understand the codebase and code something yourself [although if
>>> you
>>> build separate models with the same tree, you can use the DecodableSum
>>> class to help you decode with scores summed over the models; you'll need
>>> to
>>> write code for this though.]
>>> Regarding a phone confusion matrix- if you build a system to decode
>>> phones,
>>> I think the program compute-wer has an option to output confusion data,
>>> but
>>> I doubt it is in the format you want.  However, I would advise against
>>> this.  Phone confusion matrices are a little old fashioned.
>>> Dan
>>>
>>>
>>>
>>> On Mon, Mar 17, 2014 at 9:20 AM, <fe...@in...> wrote:
>>>
>>>  Dear Sirs,
>>>>
>>>> I am with the Speech Processing and Transmission Lab at the University
>>>> of Chile.
>>>> We are working on multistream speech recognition in Kaldi, then we
>>>> have a couple of questions:
>>>>
>>>> - We want to create a confusion matrix by phoneme to assess the
>>>> performance of only acoustic features. How we could address this in
>>>> Kaldi? I think we have to make a phoneme recognizer (w/o word position
>>>> dependency), thus we read these posts
>>>> http://sourceforge.net/p/kaldi/discussion/1355348/thread/51258bf4/
>>>> and http://sourceforge.net/p/kaldi/discussion/1355348/thread/2294d269/
>>>> from 2013, but we did not find any specific solution.
>>>>
>>>> - Is there any recipe for multistream ASR in Kaldi ? Any help with this?
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Felipe Espic
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Learn Graph Databases - Download FREE O'Reilly Book
>>>> "Graph Databases" is the definitive new guide to graph databases and
>>>> their
>>>> applications. Written by three acclaimed leaders in the field,
>>>> this first edition is now available. Download your free book today!
>>>> http://p.sf.net/sfu/13534_NeoTech
>>>> _______________________________________________
>>>> Kaldi-developers mailing list
>>>> Kal...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>>>
>>>>
>>
>>