From: Nagendra G. <nag...@go...> - 2015-01-14 11:35:48
|
I have seen work on syllables (as opposed to phonemes) and there were some publications from IBM in 90's where they joined some word pairs into a new lexicon entry and it helped ( I think on voice mail task) On Jan 13, 2015 6:49 PM, "Nickolay Shmyrev" <nsh...@gm...> wrote: > > > 14 янв. 2015 г., в 2:37, <Dan...@pa...> <Dan...@pa...> > написал(а): > > > > Hello Nicolay, > > > > Thanks very much for your thoughtful answer. My context was that I > wondered whether there might be occasionally be an advantage to mapping > words to word phrases in G rather than assigning probabilities to words. I > assumed that someone had tried it and it was known not to work well since > no one seemed to do it. I couldn't find a record of anyone trying it, so > thought I'd ask. > > In that context it’s probably worth to describe how recognition works. > Many newbies have confusion about that which you might have too. People > imagine that audio is converted to phones, then phones converted to words > and then words converted to phrases. It is not like that because there are > many many ways to do such conversion. Phone boundaries are blurred and > often you can not decide easily which phone correspond to which word. > Consider famous «wreck a nice beach» example which can be confused with > «recognize speech». You can not do a local conversion decision, but you > need a global 1-best result. > > So instead of doing that straightforward process we consider all possible > conversions and select the one of them with global minimum weight. So > decoding is not the straightforward transducer application but scoring of > all the possible paths with an acceptor. This is where acceptor is required > and where you need to assign probabilities to results. > > Decoding result is not > > G(L(audio)) > > it is in simplified form > > min_{over all possible audio splits} G(L(audio split)) > > Not a good discussions for kaldi-developers mailing list, maybe we can > move that off-list. > > > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |