|
From: Samuel L. B. <sa...@mi...> - 2002-02-14 17:59:53
|
Go through the following exercise.
Why do we have HTML?
Most people hate it.
We are all expert programmers.
You could just use cgi scripts to generate everything.
In practice, if you are a good programmer/developer/researcher
and you want to write a web service 80% of your
code is generated from cgi scripts.
So why is HTML there and still so widely used?
1. non-experts can generate web pages/services quickly without
knowing much about programming
2. there are libraries and editors that hide all the uglyness
of HTML and provide modular components.
(3. obviously it is the standard...)
That is also the answer for VoiceXML.
Some common misconceptions about VoiceXML that researchers have:
1. You can't do mixed initiative with VoiceXML (WRONG!)
2. You have to fully enumerate all possible states of your
dialogue manager in VoiceXML (WRONG!)
Get a high-school student to play a bit with VoiceXML.
You'd be suprised to see that he/she can do with VoiceXML
many of the things you can do in the research lab.
Well, I suppose you could make the analogy between HTML and
VoiceXML. Except that VoiceXML isn't a markup language, really; it's
code in disguise. And as a programming language, it's really, really
badly designed. It makes all sorts of assumptions about how people
should build dialogue systems, which seem to have accreted through a
gradual process of "whatever people need in the next three months to
sell a telephone-based voice interaction product". Its extensibility,
as far as I can tell, is poor at best. And simply from looking at the
specification, I can't imagine how people could manage to write easily
maintainable code in it.
Sure, throw high school students at it. They've got (a) time to burn,
and (b) high tolerance for pain, and (c) little appreciation for good
design. I'm sure they can work miracles. We could also work miracles
with punch cards, but who in their right mind would do that today?
If someone wants to stand up a Communicator-compliant VoiceXML
component, more power to them, and bless them for being willing to
tolerate it. As a dialogue researcher, however, I'm convinced that
it's not just lacking the basic infrastructure to do real exploration
of advanced dialogue capabilities, it's actually hostile to it. I
think that every dialogue researcher should certainly read the
VoiceXML specification and understand what's there. But researchers
shouldn't be the ones writing tools and libraries against a standard
as badly designed as this; it impedes our work, and frankly, we're not
rewarded well enough for the torture.
Don't get me wrong; I like standards. I think they're a great idea. I
think we'd be toast without them. But it's a waste of my time to try
to conform to a badly designed, limiting standard, and I think
VoiceXML is a particularly bad standard. The SALT Forum specification,
on the other hand, makes a great deal more sense, and leaves the
appropriate room where the room is needed. Alex suggested that what
he's really using VoiceXML for is a uniform interface for speech in
and speech out over telephony; from what I've seen of SALT so far, it
does that far better than VoiceXML, and doesn't get in the way of real
dialogue development. The problem with SALT is that there's no public
version of it yet, and it won't be submitted to a standards
organization until later this year. It's a shame they've chosen to
work in private.
Cheers,
Sam
|