|
From: Matthew A. <mat...@gm...> - 2015-05-06 10:53:13
|
Hi Michal Idlak - speech synthesis in Kaldi is available on the sandbox/idlak branch. Currently an alpha front end is available which transforms US English text into full context models. And aligns Blizzard data (with a script to download it). In our interspeech paper from 2015 we tested this front end by using the models and segmented data The best place to start with the Idlak synthesis system in Kaldi is to look at our interspeech paper from 2015. Here (and within the documentation for Idlak) are instructions for downloading and building the HTS demo and use the front end within this system to produce speech output. I am currently working on generating output trees and models which leaves the following: 1. Mel generalised cepstrum generation (as per SPTK). We need these so we can reverse the transformation in synthesis. We could of course use other parameterisations and vocoders. But currently we could use the hts_engine vocoder with this parameterisation for testing purposes. 2. Arnab worked on feature extraction of banded noise estimation for mixed excitation vocoding but I have not followed this up and integrated it into the voice building system. 3. Trajectory modelling. The algorithm for taking means and variances from models and producing a trajectory on a per frame basis is well described but we need to implement an Idlak version of this using the Kaldi matrix resources. 4. There is an issue in generating a vocoder. HTS has MLSA implemented but its hard to follow how this actually works (the original paper is not so detailed). However someone within the Kaldi community might have a better insight into this than me. Currently I am using HTS demo as a test harness for development. At some point this will cease to be useful and require to many compromises in the design. Very happy to get more input on this project from people and contributions to the work also. Yours Matthew On Tue, May 5, 2015 at 7:38 PM, Daniel Povey <dp...@gm...> wrote: > Matthew Aylett (cc'd) is working on a speech synthesis project called > "idlak", which is part of the Kaldi repository. I believe he recently > got it to the point where it produces output. Matthew, perhaps you > can comment, and show him where to look? > Dan > > > On Tue, May 5, 2015 at 2:45 AM, Michal Klíma <mic...@gm...> > wrote: > > Hello, > > My name is Michal Klíma and I'm student from Czech Republic (University > of > > West Bohemia). I'm working on my master's thesis and I want to try your > > system Kaldi. I have one important question. Are here any possibilities > to > > do speech synthesis with Kaldi? I have tried to find something about it, > but > > I haven't been succesfull yet. I know that Kaldi is very similar to > another > > speech regogniser toolkit named HTK and I know there is possibble to do > > speech synthesis. > > Yours sincerely > > Michal Klíma > > > > > ------------------------------------------------------------------------------ > > One dashboard for servers and applications across Physical-Virtual-Cloud > > Widest out-of-the-box monitoring support with 50+ applications > > Performance metrics, stats and reports that give you Actionable Insights > > Deep dive visibility with transaction tracing using APM Insight. > > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > |