Re: [Communicator-user] Communicator and VoiceXML?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

   Hi all -

   the whole voice community talks about VoiceXML nowadays.

   As somebody already also pointed out, I see the Communicator and
   VoiceXML as being complementary: Though in principle it is possible
   to define a voice dialog with the Communicators' hub scripting
   language, you will typically want something more focussed for that
   task. (That's why the CSLR travel demo has a "dialog server".)
   VoiceXML seems to be predestinated for defining a dialog. On the
   other hand, one major strength of the Communicator lies in
   efficiently and flexibly connecting different components together.

   Another thing is politics: People, who give you money, tend to ask
   for buzzwords, that they have recently read in large letters on
   some magazine cover. So I am convinced that it would be
   advantageous for the Communicator to be describable as a "superset"
   of VoiceXML.

   So, I think it would be very nice to have a kind of VoiceXML based
   dialog server for the Communicator. (Actually, I found a
   presentation from Sam from October 2000, where he suggested such a
   thing.)

   Does anybody work on that or plans to do so? 

As you say, I had proposed doing something like this, and there's
certainly a freely-available VoiceXML 1.0 implementation out there
(from Speechworks, distributed through CMU, I believe). VoiceXML 2.0,
on the other hand, is a separate issue - I don't know if the
SpeechWorks folks have any plans to release a 2.0-compliant version of
their engine. To the best of my knowledge, no one has attempted to
build such a module.

I do have some comments about your preface, though. You're right that
VoiceXML and the Galaxy Communicator software infrastructure are
complementary; the GCSI provides the infrastructure, and others
populate it with functionality. And yes, the Hub scripting language
was never intended as a dialogue processor (it's far too weak and
idiosyncratic). But I'm still uncomfortable with describing GCSI as a
"superset" of VoiceXML; it's actually the house that it or other
dialogue processing modules would live in.

Furthermore, I have very mixed feelings about the relationship between
VoiceXML and the GCSI. At the 2001 PI meeting, I hosted a session
entitled "W3CVB and Communicator" (available at
http://fofoca.mitre.org/doc.html) in which I argued strongly that the
goals of standards development and the goals of the Communicator
program are not the same, and that standards conformance can be a
serious impediment to research. I think that's especially true in the
case of VoiceXML 2.0, which MITRE is on record publicly as having
serious questions about its design (see
http://lists.w3.org/Archives/Public/www-voice/2001OctDec/0034.html).
I'm pretty much convinced that building advanced dialogue capabilities
in VoiceXML would be incredibly onerous, and no researcher who values
his or her time would attempt it. So building a Communicator-compliant
VoiceXML module be a proof of concept, which I'm not even sure is an
interesting one from a marketing point of view.

In (what I hope will be) a chapter of a forthcoming book on building
practical dialogue systems, I outline what I believe is the
fundamental design motivation of the GCSI: to "lower the bar to entry"
for researchers, engineers and students to learning these
technologies, and to build up a development community which can easily
test and disseminate leading-edge ideas. The GCSI is not
standards-conformant, because there are few standards which apply to
it; and in those cases where we clearly aren't (e.g., we don't use a
standard high-level transport layer like XML or CORBA), what we have
chosen is chosen carefully for research purposes and presents a clear
roadmap for standardization, as these leading ideas converge. In other
words, the GCSI is intended to provide a path to FEED standards
efforts, not necessarily to consume them.

A long answer to a short question...

Cheers,
Sam