[Communicator-user] Communicator and VoiceXML?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have three points:  (1) you don't need to use all
of VoiceXML -- it can be used just as an ASR/TTS engine,
(2) VoiceXML cleanly separates dialog, ASR and TTS,
and (3) states + memory = Turing machine, so I'd
contend that "state-based" systems impose *no* limitations.

> >Peter Gober
> >As somebody already also pointed out, I see the Communicator and
> VoiceXML as
> >being complementary: Though in principle it is possible to define a voice
> >dialog with the Communicators' hub scripting language, you will typically
> >want something more focussed for that task. (That's why the CSLR
> travel demo
> >has a "dialog server".) VoiceXML seems to be predestinated for defining a
> >dialog. On the other hand, one major strength of the Communicator lies in
> >efficiently and flexibly connecting different components together.
> >
> >Another thing is politics: People, who give you money, tend to ask for
> >buzzwords, that they have recently read in large letters on some magazine
> >cover. So I am convinced that it would be advantageous for the
> Communicator
> >to be describable as a "superset" of VoiceXML.
> >
> >So, I think it would be very nice to have a kind of VoiceXML based dialog
> >server for the Communicator. (Actually, I found a presentation
> from Sam from
> >October 2000, where he suggested such a thing.)

(1)
VoiceXML can also be used for something like Communicator
as a simple synthesis and recognition server.  Any old
HTTP server (eg Apache) can be used as a VoiceXML server.
In conjunction with a VoiceXML client (eg IBM's or VXI),
you get a speech recognition and synthesis engine.  One
benefit is that you can use clients with telephony
connections such as VoiceGenie or Tellme, or you can
build or buy your own.

By running the HTTP server through your favorite
brand of CGI (eg. Perl, servlets, ASP), the service could
communicate with the DARPA Communicator Hub as
a server.  Then you're off and running with your
favorite Hub-compliant dialog management scheme.

Of course, if you already have your components in place,
and you're happy with them, and your customers (eg. DARPA)
aren't seeing "VoiceXML" in red, then I don't see any advantage
to doing things this way.

> Alex Rudnicky
> As to the relationship between OpenVXI and Communicator, I would tend to
> agree that they are in some sense complimentary.  Probably a good way to
> think about it is that Communicator provides an exploded view of a dialog
> system, allowing the developer to work on what are nominally the component
> stages (understanding, dialog management, generation, etc.) of dialog; a
> vxml-based system conflates such components into a single representation.

(2)
VoiceXML provides a way of specifying an entire
dialog system in XML (assuming you don't need any server
side data dips or messages to be sent externally).  But, it
cleanly separates ASR, TTS, and dialog management.  It
allows an ASR engine to be configured for grammar
and language model, and provides some useful underlying
functionality like running grammars in parallel and presenting
scored N-best results.  It allows
whatever text you want to be fed into TTS.  And while it
supplies mechanisms for client-side dialog control,
you don't even have to use these if you'd rather do it
server side.  It's like the choice between using JavaScript
(client side) or CGI (server side) to implement a web site.

In fact, the point of VoiceXML was to turn the components
into commodities so that customers could avoid being locked
into proprietary systems (eg. SpeechWorks!).  So it'd be
easy to hook up a VoiceXMl-based system and try it with various
recognizers. Ditto for TTS or voice talent comparisons.

> Alex Rudnicky
> From another perspective, there are serious limitations to state-based
> representations of dialog, particularly for more complex tasks such as
> Communicator's Travel Planning domain (we know this from
> experience, as we
> first implemented script-based dialog management then had to replace it
> with a more flexible architecture, Agenda).

(3)
I would contend that there are *no* limitations to state-based
representations of dialog, assuming that we allow "state" + "memory".
That gives you a Turing machine.  I think that's what people
using "state-based" dialogs do.  At least it's what we do
at SpeechWorks.  We tend to use states for recognition contexts,
which involve specialized language models and grammars (to
increase accuracy in the sense of precision).

Programming languages that allow
function calls are essentially state-based in that the memory
provides a stack of variable values that can be used to return
to a previous "state".

Now one might argue that having to manage
all this by oneself is ugly, and we'd agree, which is why we're
working on frameworks that would "hide" the management of
state the way programming languages do.

But the point is really like that of (1).  The use of states
may not be helpful, and you may code around them with a lot
of use of memory and conditional action, but they won't restrict
what you can do.

- Bob

Bob...@Sp...
17 State St, 12th Fl, NY, NY 10004 USA
Vox: +1.212.425.7600  Fax: +1.212.425.8845
Web: http://www.colloquial.com/carp