Thread: RE: [Asterisk-java-users] Text-to-speech and gathering data (fastAGI)
Brought to you by:
srt
|
From: Darren H. <dha...@gh...> - 2006-02-28 19:21:34
|
Thanks Stefan,
My intent is actually to optimize the second part of non-static
requests. This test does indeed could used pre-generated voice files
(per the cached remark), but the intent is to identify the best way to
handle the dynamic parts.
Is this the best way to handle dynamic tts parts?=20
-D
> -----Original Message-----
> From: ast...@li...=20
> [mailto:ast...@li...] On=20
> Behalf Of Stefan Reuter
> Sent: Tuesday, February 28, 2006 1:59 PM
> To: ast...@li...
> Subject: Re: [Asterisk-java-users] Text-to-speech and=20
> gathering data (fastAGI)
>=20
> Darren,
>=20
> it looks like you are always using TTS to generate the voice=20
> files for your application even if they are rathe static. I=20
> would only run the TTS once (outside of your AGI scripts) and=20
> then use the generated voice files. If you have dynamic parts=20
> (like greeting the user with his name) I would only render=20
> those parts at run time. This should eliminate the delay and=20
> take away some load from your Asterisk server.
>=20
> =3DStefan
>=20
> Darren Hartford wrote:
> > Hey all,
> > Continuing my mad plan for using asterisk-java & fastAGI...taking=20
> > text, convert to voice, and then gathering data back.
> >=20
> > The below works, but there is a 3-sec delay for the=20
> conversion (which=20
> > could be cached for future use, but focusing on the=20
> one-offs for now):
> >=20
> > =3D=3D=3D=3D=3D=3D[code]=3D=3D=3D=3D=3D
> > String texttospeech =3D "Please enter your favorite number=20
> > between 10 and 100 followed by pound sign.";
> > int soundhash =3D texttospeech.hashCode();
> > String sounddir =3D "/var/lib/asterisk/sounds/tts"; =20
> > String textfile =3D sounddir + "/ttsTEST-" + soundhash=20
> + ".txt";
> > String wavefile =3D sounddir + "/ttsTEST-" + soundhash=20
> + ".wav";
> > String gsmfile =3D sounddir + "/ttsTEST-" + soundhash + =
".gsm";
> > result =3D exec("System","echo '" + texttospeech + "' > " +=20
> > textfile);
> > System.out.println("text save result: " + result);
> > result =3D exec("System","text2wave -F 8000 -o " +=20
> wavefile + " "=20
> > + textfile);
> > System.out.println("text2wave result: " + result);
> > // result =3D exec("System","sox " + wavefile + " -r=20
> 8000 -c1 " +
> > textfile);
> > result =3D exec("System","sox " + wavefile + " " + gsmfile);
> > System.out.println("sox result: " + result);
> > =20
> > String captureddata =3D "";
> > captureddata =3D getData("tts/ttsTEST-" + soundhash);=20
> > =3D=3D=3D=3D=3D[/code]=3D=3D=3D=3D=3D=3D
> >=20
> > Any recommendations to get this streamlined for one-off text=20
> > conversions, or any pitfalls?
> >=20
> > TIA,
> > -D
>=20
|
|
From: Darren H. <dha...@gh...> - 2006-02-28 19:46:20
|
Ah, that's what I was looking for: *Thread the dynamic TTS off. *Play music-on-hold or other already pre-generated sound until the = dyanamic-TTS thread finish. Thanks! -D > -----Original Message----- > From: ast...@li...=20 > [mailto:ast...@li...] On=20 > Behalf Of Stefan Reuter > Sent: Tuesday, February 28, 2006 2:33 PM > To: ast...@li... > Subject: Re: [Asterisk-java-users] Text-to-speech and=20 > gathering data (fastAGI) >=20 > Darren Hartford wrote: > > Thanks Stefan, > > My intent is actually to optimize the second part of non-static=20 > > requests. This test does indeed could used pre-generated=20 > voice files=20 > > (per the cached remark), but the intent is to identify the=20 > best way to=20 > > handle the dynamic parts. > >=20 > > Is this the best way to handle dynamic tts parts?=20 >=20 > You are probably thinking of something like directly=20 > streaming the tts generated voice files to the user instead=20 > of sequentially creating the while voice file and then=20 > streaming it. As far as I know this is currently not possible. > So the best approach that will currently work is > - use pregenerated voice files wherever possible > - try to minimize the dynamic generated part > - generate the dynamic files in the background and as soon as=20 > possible, i.e. as soon as you know what you have to say=20 > create them in a new Thread (not even through AGI but=20 > directly invoiking your tts engine) and meanwhile play=20 > something else to the user to avoid dead air. >=20 > =3DStefan >=20 > -- > reuter network consulting > Neusser Str. 110 > 50760 K=F6ln > Germany > Telefon: +49 221 1305699-0 > Telefax: +49 221 1305699-90 > E-Mail: sr...@re... > Jabber: sr...@ja... >=20 >=20 |
|
From: Stefan R. <sr...@re...> - 2006-02-28 19:33:29
Attachments:
signature.asc
|
Darren Hartford wrote: > Thanks Stefan, > My intent is actually to optimize the second part of non-static > requests. This test does indeed could used pre-generated voice files > (per the cached remark), but the intent is to identify the best way to > handle the dynamic parts. >=20 > Is this the best way to handle dynamic tts parts?=20 You are probably thinking of something like directly streaming the tts generated voice files to the user instead of sequentially creating the while voice file and then streaming it. As far as I know this is currently not possible. So the best approach that will currently work is - use pregenerated voice files wherever possible - try to minimize the dynamic generated part - generate the dynamic files in the background and as soon as possible, i.e. as soon as you know what you have to say create them in a new Thread (not even through AGI but directly invoiking your tts engine) and meanwhile play something else to the user to avoid dead air. =3DStefan --=20 reuter network consulting Neusser Str. 110 50760 K=F6ln Germany Telefon: +49 221 1305699-0 Telefax: +49 221 1305699-90 E-Mail: sr...@re... Jabber: sr...@ja... |