Re: [Kaldi-users] Poll about Kaldi

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I agree with this :-)
1 The ultimate goal, I can think of.   is a playable dialogue system in
kaldi.
To this, TTS module implementation is inevitable.
2  Besides, I am curious if there is a kaldi workshop in future, having
more people voluntarily participate to learn, to develop specific modules.

Thanks !

Haihua

On Sat, Oct 25, 2014 at 1:05 AM, Matthew Aylett <mat...@gm...>
wrote:

> Finishing the kaldi TTS would be high up my list :-)
>
> Otherwise the main feedback I get from designers and HCI professionals
> which I think the speech technology need to engage with, is the interactive
> and portable nature of an ASR system.
>
> A lot of people I know are using pocket sphinx and I think a pocket kaldi
> would be really useful. However that would mean a load of not very research
> like work:
>
> i.e. Porting a realtime decoder to android/iOS, allowing arbitrary halting
> and restarting (in response to feedback), and a simple means of making open
> and closed language models which could be carried out by  non-speech tech
> engineers.
>
> Also some binary releases that don't require compilation (more for the
> decoder than the training I would say), and some decent models built on
> data like Librivox etc. which could be freely shared. I think I saw some
> posts about work on this but it would be interesting to know what sort of
> WER a non-technical person might get from such a freely available model.
> For these people WER is not nearly as important as usability and
> flexibility.
>
> Maybe some of this is already available, in which case a doc page or so
> called Kaldi for Dummies would be really useful for non-ASR people like me.
>
> Matthew
>
>
>
>
>
>
>
>
>
>
> On Fri, Oct 24, 2014 at 4:15 PM, Nagendra Goel <nag...@go...
> > wrote:
>
>> I second the VAD proposal. I feel that something that
>>
>> 1)      Looks at the “max” or “sum” of “ALL the silence phones
>> likelihoods including NSN and BRH” likelihoods, and compares those with a
>> configurable threshold (no decoding here)
>>
>> 2)      Places a constraint on minimum (configurable) silence length
>>
>> 3)      Shrinks the silence region boundaries by a (configurable) number
>> of frames
>>
>> 4)      Inverts the resulting silence frame results to get voice results
>>
>>
>>
>> Will suffice as first round of VAD. Compute is already small because we
>> are not calculating all the likelihoods and not decoding anything. This
>> could probably also be implemented in the online code. It will have the
>> added benefits that online adaptation will not happen during these silence
>> portions. It is important to squelch adaptation completely during long
>> portions of silences - like when two people are conversing (ASR listening
>> to only one channel).
>>
>>
>>
>>   I also have been using modified scripts for constructing G.fst that
>> allow construction of grammars for Dates and numbers and the like, while
>> its regular trigram LM for everything else. The arks corresponding to dates
>> and numbers are replaced with this grammars. However this workflow probably
>> needs more research, especially when dealing with multiple LM data sources
>> and interpolation of LMs. I wonder if this is a Kaldi topic, or somebody’s
>> research topic.
>>
>>
>>
>> *From:* Ondrej Platek [mailto:ond...@gm...]
>> *Sent:* Friday, October 24, 2014 7:03 AM
>> *To:* Daniel Povey
>> *Cc:* kal...@li...
>> *Subject:* Re: [Kaldi-users] Poll about Kaldi
>>
>>
>>
>> Dear Dan,
>>
>>
>>
>> My wishlist:
>>
>> 1) VAD
>>
>> 2) easy framework for experimenting with DNN.
>>
>> 3) Standardize Kaldi special semirings (CompactLattice, Lattice) so the
>> standard OpenFst tools would be more usable (or extend the tools)
>>
>>
>>
>> Detail description:
>>
>>
>>
>> Regarding VAD
>>
>> ============
>>
>> I will try to describe the motivation why I want to implement VAD in
>> Kaldi.
>>
>>
>>
>> The two reasons why I need VAD in general:
>>
>> a) VAD in our dialogue systems separates turns. After the detected speech
>> finishes we consider doing turn
>>
>> => VAD is useful in dialogue systems
>>
>> b) VAD is simpler component than the decoder and its RTF (real time
>> factor) should be significantly smaller.
>>
>> In production setup with thousands of channels it safe a lot of
>> computation if VAD RTF is under 0.1 (easy to achieve)
>>
>> => VAD is useful in production
>>
>>
>>
>> The reason why I would like to implement VAD in Kaldi:
>>
>> 1) The VAD would work on top of features (e.g. mfcc, fbanks,..) and I
>> want to reuse them, so I want use Kaldi code for computing the mfcc features
>>
>> 2) The the "recognizer" which would use one of the Kaldi decoders and in
>> the preprocessing steps use VAD should integrate them seamlessly:
>>
>> The idea: the frame numbering should be kept the same, and the silence
>> frames should be marked as silence (mainly) by the VAD.
>>
>> 3) The evaluation of the recogniser with additional VAD component will be
>> done exactly as for the current non VAD recognizers e.g. gmm-latgen-faster,
>> mapped-latgen-faster.
>>
>> The "sil" words will not be counted as errors, and the goal will be
>> obviously to keep the same WER but reduce the RTF.
>>
>>
>>
>> Note: We have VAD in Python using DNN and GMM (DNN is much better) and
>>
>> I am willing to spend MY TIME on implementing this in Kaldi.
>>
>>
>>
>>
>>
>> Regarding toolkit for DNN experiments
>>
>> ================================
>>
>> I want to use DNN models since they seems to the best and try to
>> different easily:
>>
>> This toolkit seems like a choice for me right now:
>>
>> http://www.cs.cmu.edu/~ymiao/pdnntk.html
>>
>> If the author will be keeping this compatible with Kaldi or if it will be
>> integrated into Kaldi
>>
>> it would be very helpful for doing easy experiments.
>>
>>
>>
>>
>>
>> OpenFST
>>
>> =======
>>
>> It is rather technical wish but I was planning to extend pyfst (Python
>> wrapper) for Kaldi semirings
>>
>> so I can easily work with the ASR hypothesis, so far I am only using the
>> OpenFST tropical and log semiring
>>
>> for word lattices in https://github.com/UFAL-DSG/pyfst
>>
>> Note: I do not have time for that right now, but I think this would be
>> useful.
>>
>>
>>
>> Thank you for the poll
>>
>>
>>
>> On 24 October 2014 01:29, Daniel Povey <dp...@gm...> wrote:
>>
>> Hey everyone,
>>
>>
>>
>> I'm thinking of creating a poll to ask people what things they would like
>> improved about Kaldi.
>>
>> I'm sending out this email to get ideas about what we could include in
>> the poll.
>>
>> I'm deliberately not giving you guys examples of what kind of thing I
>> expect, because I don't want to artificially limit the scope at this
>> point.  I'm not necessarily just talking about features - it can encompass
>> wider wishes and concerns.
>>
>> Any ideas?
>>
>>
>>
>>
>>
>> Dan
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>