|
From: Nagendra G. <nag...@go...> - 2015-01-14 11:35:48
|
I have seen work on syllables (as opposed to phonemes) and there were some
publications from IBM in 90's where they joined some word pairs into a new
lexicon entry and it helped ( I think on voice mail task)
On Jan 13, 2015 6:49 PM, "Nickolay Shmyrev" <nsh...@gm...> wrote:
>
> > 14 янв. 2015 г., в 2:37, <Dan...@pa...> <Dan...@pa...>
> написал(а):
> >
> > Hello Nicolay,
> >
> > Thanks very much for your thoughtful answer. My context was that I
> wondered whether there might be occasionally be an advantage to mapping
> words to word phrases in G rather than assigning probabilities to words. I
> assumed that someone had tried it and it was known not to work well since
> no one seemed to do it. I couldn't find a record of anyone trying it, so
> thought I'd ask.
>
> In that context it’s probably worth to describe how recognition works.
> Many newbies have confusion about that which you might have too. People
> imagine that audio is converted to phones, then phones converted to words
> and then words converted to phrases. It is not like that because there are
> many many ways to do such conversion. Phone boundaries are blurred and
> often you can not decide easily which phone correspond to which word.
> Consider famous «wreck a nice beach» example which can be confused with
> «recognize speech». You can not do a local conversion decision, but you
> need a global 1-best result.
>
> So instead of doing that straightforward process we consider all possible
> conversions and select the one of them with global minimum weight. So
> decoding is not the straightforward transducer application but scoring of
> all the possible paths with an acceptor. This is where acceptor is required
> and where you need to assign probabilities to results.
>
> Decoding result is not
>
> G(L(audio))
>
> it is in simplified form
>
> min_{over all possible audio splits} G(L(audio split))
>
> Not a good discussions for kaldi-developers mailing list, maybe we can
> move that off-list.
>
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Kaldi-developers mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>
|