|
From: Alexandros P. <po...@re...> - 2002-02-14 17:35:59
|
Good or bad, just being a standard makes VoiceXML very valuable. You both point this out. However, I think that you both of you miss some other important points about VoiceXML. Go through the following exercise. Why do we have HTML? Most people hate it. We are all expert programmers. You could just use cgi scripts to generate everything. In practice, if you are a good programmer/developer/researcher and you want to write a web service 80% of your code is generated from cgi scripts. So why is HTML there and still so widely used? 1. non-experts can generate web pages/services quickly without knowing much about programming 2. there are libraries and editors that hide all the uglyness of HTML and provide modular components. (3. obviously it is the standard...) That is also the answer for VoiceXML. Some common misconceptions about VoiceXML that researchers have: 1. You can't do mixed initiative with VoiceXML (WRONG!) 2. You have to fully enumerate all possible states of your dialogue manager in VoiceXML (WRONG!) Get a high-school student to play a bit with VoiceXML. You'd be suprised to see that he/she can do with VoiceXML many of the things you can do in the research lab. VoiceXML (and any other standard for that reason) is a blessing for the ASR/dialogue systems field. I don't want to go through the network economics of standards or spent 2-3 paragraphs discussing the stories of the adoption of the FAX machine and the web. I suggest that we researchers should spent some time to really learn and evaluate VoiceXML. Figure out how our innovations can be built into the standard without making it bulky. And then spent some more time building tools and libraries around the standard. the other alex ps BTW the main reason that VoiceXML is not used in high-density applications in the field is EFFICIENCY (which you don't touch on). >I agree with Bob's points. > >It's indeed quite reasonable to think of VoiceXML as simply an interface >standard that provides a uniform interface to speech decoding/encoding and >telephony services. I believe this is the de facto use for it in industry. >On a more local scale, this is what the students in our Dialog Systems >Design course (taught by Alan Black and myself) end up doing: apart from >stuff like the prologue, all vxml pages are generated through cgi scripts, >where all the real work of running the dialog gets done. > >The confusion, unless I missed something important, is that VoiceXML was >originally presented as a scheme for authoring dialog. You can certainly do >that but it really doesn't seem to make much sense for anything but the >most trivial systems. Practitioners have realized this. [But there may be >someone around here who can speak more authoritatively to the history of >the idea.] > >The state issue thus becomes a bit of a red herring: in reality no one >really tries to explicitly enumerate all possible states for a system >(which is in fact what you would have to do if you code directly in vxml); >they write systems that (hopefully) are capable of generating all necessary >states. To tie this back to Communicator: most sites, whether explicitly or >implicitly, tried to develop some "theory" of state generation that would >do this in a reasonably efficient manner. Notions like "mixed-initiative", >I think really had to do with the problem of accommodating otherwise >unanticipatable dialog states. > >Alex |