|
From: Roberto P. <ro...@sp...> - 2002-02-16 22:01:37
|
> > The state issue thus becomes a bit of a red herring: in reality no one > really tries to explicitly enumerate all possible states for a system > (which is in fact what you would have to do if you code directly in vxml); > they write systems that (hopefully) are capable of generating all necessary > states. To tie this back to Communicator: most sites, whether explicitly or > implicitly, tried to develop some "theory" of state generation that would > do this in a reasonably efficient manner. Notions like "mixed-initiative", > I think really had to do with the problem of accommodating otherwise > unanticipatable dialog states. > I do agree with this. Not even the so called "state based" dialog systems attempt at enumerating all the possible states, and that includes also the commercial - "mostly" system initiative apps - developed by SpeechWorks and its partners. We tend to use the expression "state-based" for those dialog systems whose strategy can be depicted with a graph with blobs and arcs, but we tend to forget that there are conditions on those arcs that can be pretty complex and that can refer to global variables - and this could be done, in principle, also in voiceXML (although I agree that nobody in their right mind would build a Communicator mixed initiative system in pure voiceXML). So the states in the "state-based" system are general "clusters of states" with the clustering function arbitrarily complex. If the complexity of those clustering function is high, it becomers rather impractical representing the dialog with blobs and arcs. On the other hand we at SpeechWorks have learned the importance (for the goal of a task completion rate close to 100%) of a finely tuned UI that often requires specific assignment of prompts and grammar in specific situations in the interaction. And that generally is easier to do if you have a "state-based" dialog manager where you can control directly "every state" of the interaction. I guess this is one of the many reasons that makes it hard to build a general theory of dialog management that would satisfy everybody - i.e. the necessity of generating (or clustering) states at a high level and the opposite necessity of tailoring the UI to each individual state. As far as the voiceXML dicussion is concerned, as Bob explained, we can look at voiceXML *only* as the protocol used by the application server to invoke ASR and TTS. The dialog is managed at the server side, and the dailog manager could be state based as well as anything else - as long as a voiceXML document could be generated at each turn describing which grammar (or language model) to activate and which prompt to play. Under that limited use, voiceXML can be used within the Galaxy architecture with, in my opinion, at least a couple of advantages: 1. buzz-word compliancy 2. easy (but not completely transparent yet...) porting to different telephony platforms as long as they are voiceXML compliant. Roberto |