> Unfortunately there's no programming interface for mbrola. If you have to
> load festival+mbrola every time you want some speech output reaction time
> will become unacceptable long. So this is probably not a solution.
ah -- i have a solution to that. i access the tts engine via a script
that caches the result -- i do an md5 on the phrase i'm asking it to
speak, and store the .wav file in a file based on that md5 signature.
then the next time i ask for the same phrase, the script notices it has
it, and doesn't need to recreate it. i chose a syntax that lets me
tell the script to break the phrase into sub-phrases of my choosing -- it
breaks them up on a '|' symbol -- and it generates and stores the sub-phrases.
this works very well for things like menuing systems, or simple information
output. for instance, when one of my IR-driven menus asks for the
current time, my script gets:
the time is | eleven | twenty | four | AM
each of the '|'-separated parts is translated and cached separately, so
that one minute later, if i ask for the time again, only one word needs
to be generated.
(i wrote this before i used festival -- i used to get my TTS translations
from the Bell Labs research website -- they have a demo page that will
translate phrases, but there are limits on the number of requests you
can make per day. caching solved this, as well as the latency
problem. btw, if you want _high_ quality, that's the way to go. to
try it, see http://lucent-tts.epresence.com . this is a different
interface than the one i scripted -- that one is at:
for other things, like the current weather forecast, i do the text-to-speech
translation in advance -- rather than fetching it from the web and processing
it every time, i do it several times a day, store the result, and simply
play that output when i need it. the weather service doesn't update
i can share any or all of my TTS scripts if anyone wants them.
paul fox, pgf@... (arlington, ma, where it's 62.6 degrees)