Menu

Tutorial-iSS

Timo Baumann

Tutorial Part 2: Incremental Speech Synthesis

This section will show (and tell) you how to use speech synthesis with sub-utterance units, make you aware of the draw-backs, and indicate how to stay informed about speech delivery progress.

This part requires you to actually implement/code. You will extend some classes in Java and maybe write some of your own.

Tasks:

Setup

There's some example code that we'll be relying on in the remainder of the tutorial. Make yourself a nice directory somewhere and git clone https://bitbucket.org/timobaumann/SimplisticBabelfish/ (or svn checkout svn+ssh://gate.spectrum.uni-bielefeld.de/vol/acl/projects/InPro/INPRO_SVN/Text/IS2013tutorial/demo).

Import the project that you now find in the directory into Eclipse. Ideally (given that you have the Inpro Eclipse project setup), everything should just work. If not, you will have to add the dependencies to you Inpro project as well as to all the Jars in Inpro/lib to your buildpath for the new project.

Hello World.

In your newly imported Eclipse project, take a look at synthesis/SynthesisRunner which extends IUModule.

  • the constructor sets up a SynthesisModule which is appended to our new module, resulting in a short and simple output pipeline: SynthesisRunner-->SynthesisModule-->your ears.
  • ignore the empty leftBufferUpdate method; we do not plan to accept but only to produce IUs.
  • look at the main() method. This is where the interesting things happen: we construct a phrase, add it to our right buffer, and send it off to the speech synthesizer.

Run the new module (right-click, start as Java application).
Notice how it didn't work? You have to have MaryTTS installed, including the tweaks that are necessary for incremental processing support and tell Java about the path. Everything should be set up in the lab (if not, have a look at the setup instructions ), all you need is to add -Dmary.base=/path/to/mary/install/directory to your Eclipse run configuration (second tab, VM arguments).

Re-run the new module and rejoice.

Use incremental speech synthesis

The previous program was hardly incremental (internally it was, but we didn't make use of this). How about adding two PhraseIUs, one for "Dies ist ein langer" and another for "und inkrementell erstellter Satz.". Try this. Is there a difference whether you call notifyListeners() twice or whether you add both IUs in one go? (Is there a difference internally?)

Second, add a short pause between the PhraseIUs that you send. Do this to simulate the case that your module just doesn't know how to finish the utterance, maybe because it is waiting for more data, or processing takes very long. Use Thread.sleep(). What happens prosodically the longer the pause is that you set?

Third, you can add a HesitationIU after your first phrase, in order to cover the pause. What happens if there is no pause?

Next, you could add LabelWriters, both to your current module (in addition to the synthesizer), and also to the synthesis module. What output to they generate? Can you also use the CurrentHypothesisViewer? (Why is the CurrentHypothesisViewer boring? You will later be able to improve it!)

Finally, try adding a bunch of phrases, notifyListeners, and then revoke the last few IUs. Does this work? In what cases does it not work? Why? Do you have ideas to improve the current behaviour? Find out how to implement them...

Catching Updates

So far, the SynthesisRunner does not get any feedback about speech synthesis progress. All we can do is guess how much time we have, or add hesitations to account for possible delays.

Look at the interface IUUpdateListener defined inside inpro.incremental.unit.IU. Now extend that interface (either with your main class or an internal class). Put some

  • catching updates and reporting progress
  • deliver-as-necessary (notice any prosodic degradation?)

If you want to use the Eclipse debugger, you need to stop the whole VM, not only the current thread (Right-click a break-point->Breakpoint Properties->Suspend VM).

More (some advanced, some peripheral) tasks:

  • you can also directly input WordIUs (or even any other types of IU) into the SpeechSynthesizer. Can you configure a pipeline from SimpleText that feeds into the synthesizer? You have to add a speechsynthesizer to the hypChangeListeners-part of currentASRHypothesis and set this up further down in the config file. How about connecting SimpleText with a tagger and then the synthesizer? (Which would, of course, be ridiculous, but might be fun.)
  • try English speech synthesis; see the setup instructions for how to do this. Can you make it work for other languages?

Important Concepts

Concurrent Processing

Concurrency plays an important role in the speech synthesis component, so you need to be familiar with the concept of multiple threads simultaneously participating in a common task. Specifically, in incremental speech synthesis two tasks run in parallel:

  • specifying what to synthesize, this specification is a top-down process,
  • actually synthesizing: starts once specification has delivered its first bits and takes place on the lowest IU level, the speech segments.

Update Messages

InproTK's inter-module communication is limited to left-to-right processing. (Remember? There is a right-buffer object but there is no left-buffer object which could potentially feed back information to a previous module -- also, there are good reasons for this.) To feed back information about delivery status, the synthesis module updates the incoming PhraseIUs' progress information and updates any listeners of the PhraseIUs.

Background Information

InproTK's incremental speech synthesis is based on MaryTTS which is a state-of-the-art speech synthesis and text-to-speech toolkit written in Java.


Related

Wiki: Tutorial

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.