Re: [Kaldi-developers] Speech sythesis with Kaldi

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Michal

Idlak - speech synthesis in Kaldi is available on the sandbox/idlak branch.

Currently an alpha front end is available which transforms US English text
into full context models.

And aligns Blizzard data (with a script to download it).

In our interspeech paper from 2015 we tested this front end by using the
models and segmented data

The best place to start with the Idlak synthesis system in Kaldi is to look
at our interspeech paper from 2015.

Here (and within the documentation for Idlak) are instructions for
downloading and building the HTS demo and use the front end within this
system to produce speech output.

I am currently working on generating output trees and models which leaves
the following:

1. Mel generalised cepstrum generation (as per SPTK). We need these so we
can reverse the transformation in synthesis. We could of course use other
parameterisations and vocoders. But currently we could use the hts_engine
vocoder with this parameterisation for testing purposes.

2. Arnab worked on feature extraction of banded noise estimation for mixed
excitation vocoding but I have not followed this up and integrated it into
the voice building system.

3. Trajectory modelling. The algorithm for taking means and variances from
models and producing a trajectory on a per frame basis is well described
but we need to implement an Idlak version of this using the Kaldi matrix
resources.

4. There is an issue in generating  a vocoder. HTS has MLSA implemented but
its hard to follow how this actually works (the original paper is not so
detailed). However someone within the Kaldi community might have a better
insight into this than me.

Currently I am using HTS demo as a test harness for development. At some
point this will cease to be useful and require to many compromises in the
design. Very happy to get more input on this project from people and
contributions to the work also.

Yours

Matthew

On Tue, May 5, 2015 at 7:38 PM, Daniel Povey <dp...@gm...> wrote:

> Matthew Aylett (cc'd) is working on a speech synthesis project called
> "idlak", which is part of the Kaldi repository.  I believe he recently
> got it to the point where it produces output.  Matthew, perhaps you
> can comment, and show him where to look?
> Dan
>
>
> On Tue, May 5, 2015 at 2:45 AM, Michal Klíma <mic...@gm...>
> wrote:
> > Hello,
> > My name is Michal Klíma and I'm student from Czech Republic (University
> of
> > West Bohemia). I'm working on my master's thesis and I want to try your
> > system Kaldi. I have one important question. Are here any possibilities
> to
> > do speech synthesis with Kaldi? I have tried to find something about it,
> but
> > I haven't been succesfull yet. I know that Kaldi is very similar to
> another
> > speech regogniser toolkit named HTK and I know there is possibble to do
> > speech synthesis.
> > Yours sincerely
> > Michal Klíma
> >
> >
> ------------------------------------------------------------------------------
> > One dashboard for servers and applications across Physical-Virtual-Cloud
> > Widest out-of-the-box monitoring support with 50+ applications
> > Performance metrics, stats and reports that give you Actionable Insights
> > Deep dive visibility with transaction tracing using APM Insight.
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > _______________________________________________
> > Kaldi-developers mailing list
> > Kal...@li...
> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers
> >
>