|
From: Samuel L. B. <sa...@mi...> - 2002-02-13 15:14:03
|
Hi all - the whole voice community talks about VoiceXML nowadays. As somebody already also pointed out, I see the Communicator and VoiceXML as being complementary: Though in principle it is possible to define a voice dialog with the Communicators' hub scripting language, you will typically want something more focussed for that task. (That's why the CSLR travel demo has a "dialog server".) VoiceXML seems to be predestinated for defining a dialog. On the other hand, one major strength of the Communicator lies in efficiently and flexibly connecting different components together. Another thing is politics: People, who give you money, tend to ask for buzzwords, that they have recently read in large letters on some magazine cover. So I am convinced that it would be advantageous for the Communicator to be describable as a "superset" of VoiceXML. So, I think it would be very nice to have a kind of VoiceXML based dialog server for the Communicator. (Actually, I found a presentation from Sam from October 2000, where he suggested such a thing.) Does anybody work on that or plans to do so? As you say, I had proposed doing something like this, and there's certainly a freely-available VoiceXML 1.0 implementation out there (from Speechworks, distributed through CMU, I believe). VoiceXML 2.0, on the other hand, is a separate issue - I don't know if the SpeechWorks folks have any plans to release a 2.0-compliant version of their engine. To the best of my knowledge, no one has attempted to build such a module. I do have some comments about your preface, though. You're right that VoiceXML and the Galaxy Communicator software infrastructure are complementary; the GCSI provides the infrastructure, and others populate it with functionality. And yes, the Hub scripting language was never intended as a dialogue processor (it's far too weak and idiosyncratic). But I'm still uncomfortable with describing GCSI as a "superset" of VoiceXML; it's actually the house that it or other dialogue processing modules would live in. Furthermore, I have very mixed feelings about the relationship between VoiceXML and the GCSI. At the 2001 PI meeting, I hosted a session entitled "W3CVB and Communicator" (available at http://fofoca.mitre.org/doc.html) in which I argued strongly that the goals of standards development and the goals of the Communicator program are not the same, and that standards conformance can be a serious impediment to research. I think that's especially true in the case of VoiceXML 2.0, which MITRE is on record publicly as having serious questions about its design (see http://lists.w3.org/Archives/Public/www-voice/2001OctDec/0034.html). I'm pretty much convinced that building advanced dialogue capabilities in VoiceXML would be incredibly onerous, and no researcher who values his or her time would attempt it. So building a Communicator-compliant VoiceXML module be a proof of concept, which I'm not even sure is an interesting one from a marketing point of view. In (what I hope will be) a chapter of a forthcoming book on building practical dialogue systems, I outline what I believe is the fundamental design motivation of the GCSI: to "lower the bar to entry" for researchers, engineers and students to learning these technologies, and to build up a development community which can easily test and disseminate leading-edge ideas. The GCSI is not standards-conformant, because there are few standards which apply to it; and in those cases where we clearly aren't (e.g., we don't use a standard high-level transport layer like XML or CORBA), what we have chosen is chosen carefully for research purposes and presents a clear roadmap for standardization, as these leading ideas converge. In other words, the GCSI is intended to provide a path to FEED standards efforts, not necessarily to consume them. A long answer to a short question... Cheers, Sam |