Menu

Algorithms

Help
White Bim
2006-03-31
2013-10-07
  • White Bim

    White Bim - 2006-03-31

    Sorry for my English, but I need your help...
    I'm looking for algorithms, which you have used in espeak. It's very difficult to look it in source code. But I can't find it in Internet. You can help me, I hope...

     
    • Jonathan Duddington

      : I'm looking for algorithms, which you have used in espeak. It's very difficult to look it in source code. But I can't find it in Internet.

      What exactly do you want?

      Here's a discussion of speech generation methods:
      http://www.acoustics.hut.fi/~slemmett/dippa/chap5.html

      espeak uses the "sinusoidal" method. It makes a vowel sound by adding together the sine-waves of the various harmonics (see the wavegen() function in wavegen.cpp)  Different vowels have different mixtures of harmonics.  Consonants such as [s] and [t] are simply recorded sound samples (.WAV files).  Some consonants such as [z] are produced by a mixture of both these two methods.

      A different method of generating speech sounds is to start with a wave-form which is rich in harmonics (eg. something like a triangle wave) and then apply digital filters.  Changing the resonances of the filters produces different sounds.  For an example of this method, look at the "rsynth" project on sourceforge. This is based on the "klatt synthesizer" (do a Google search on that).

      The idea of "formants" is fundamental to both these methods.  These are peaks on the audio spectrum of vowels.  The position of formants 1,2, and 3 determine the type of vowel.  A good tool for analysing speech sounds is "praat" from www.praat.org.  This will display the formants and you can see how they change during a diphthong such as [aI] in "high".

       
    • White Bim

      White Bim - 2006-03-31

      Thanks a lot for your help. I'm very interested in this theme.

       
    • White Bim

      White Bim - 2006-05-15

      I am going to implement an intonation to your application but I have some problems. I don't understand the nature of different coefficients, matrices and formuls for pitch increment and calculating of three componenets of the speed. I have tried to find information in the Internet, but I have found only an overview. Could you write  me where I can see  full description of these algorithms?
      Thank you.

       
      • Jonathan Duddington

        An alternative intonation would be interesting.

        If by "intonation" you mean just the pitch variation throughout a sentence, then you should only need to change the  intonation.cpp  file.  This determines a lower-pitch, upper-pitch, and pitch envelope (fall, rise, fall-rise, etc) for each syllable in a clause, taking note of whether the syllable has primary stress, secondary stress, or is unstressed.

        The  intonation.cpp  routines set the PHONEME_LIST pitch1, pitch2, and env fields for each vowel.  How you do this is up to you.  I came up with the current algorithm by trial and error, adjusting things until it sounded OK.  You don't really need to understand how I did it, just come up with a better way :-)

        For example, a simple method might be to have the pitch decrease throughout the clause, reducing the pitch at each primary stressed syllable.  However, that would have problems for a long clause with many syllables.

        A good program for displaying how pitch changes throughout a sample of spoken or synthesized speech is "praat" from www.praat.org.  It shows the speech waveform of a phrase together with a graph of the pitch and the formants.  So you could speak a sentence, look at how the pitch varies, and then try and write an algorithm which does something similar.  Of course a simple speech synthesizer doesn't understand the meaning of the sentence so it doesn't know which words to emphasize.

        You also mentioned speed coefficients and matrices.  These are not concerned with intonation as such, but rather determine how the length of a vowel varies depending on the adjacent sounds, its stress level, and its position in a word.  For example, in English a vowel is shorter before an unvoiced consonant such as [s] [p] or [t] than before a voiced consonant such as [z] [b] or [d].

        speed1, speed2, speed3  determine the relative lengths the last syllable of a word, the next to last, and earlier syllables, respectively.  They are derived from voice->speedf1, speedf2, speedf3 combined with the overall speaking speed.  These factors are set in VoiceReset() in synthdata.cpp and were determined by trial and error.  Perhaps different values might be better for a different language or accent.

        The stress_lengths array, with gives the relative lengths of vowels with different stressed levels can be set in a voice file, using the stressLength command.  So you can easily experiment to make stressed vowels longer, or to make stressed and unstressed vowels the same length.

        If you have any specific questions, please ask.  Are you interested in improving the intonation/prosody of English or for a different language?

         
    • White Bim

      White Bim - 2006-05-24

      Please, check your e-mail on sourseforge.net. We have sent you a Windows-Linux version of eSpeak.

       
  • Aryan

    Aryan - 2013-10-07

    I am also looking for algorithms use in espeak. I am confuse in synthesis technique use for espeak : the original eSpeak synthesizer and a Klatt synthesize. Which one is use in now days ?
    Which is best ?
    algorithm and formula's for eSpeak synthesizer.

     

Log in to post a comment.

MongoDB Logo MongoDB