Re: [Communicator-user] GCSI and web services ? What do you think ?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

   Sam,
   You are right but practically CGSI runs with success on a local network
   and only theoretically over the Internet. If the bandwidth will grow the
   network latency issue will be overcome eventually, but the firewall
   issue will be there for ever and ever. Web services make possible to
   overcome the firewall problem because they are build on top of HTTP and
   use XML (SOAP) to communicate.

Let's be fair: it runs more than theoretically over the Internet. The
firewall issue is an obstacle, no doubt, but I would seriously be
surprised if SOAP over HTTP is the answer even in the medium term. The
reason is simple: SOAP over HTTP works because everybody thinks HTTP
is safe. The instant Web services become prevalent, and the instant
there's a security breach because you could do remote object
manipulation over HTTP, there's going to be a clamor to do more
"intelligent filtering" of HTTP traffic, and the firewall issue will
reappear in another form. Right now the filtering is pretty crude;
it's done by ports, because people conceive of ports as being
essentially equivalent to services. As soon as that equivalence is
broken, the current firewall model will fall apart, and the apparent
advantage of object manipulation over HTTP will vanish. I'm frankly
surprised this hasn't happened already with CGI; I know that our
sysadmins are very, very careful about what they allow as CGI scripts,
and it seems to me that Web services will receive even more scrutiny. 

The XML issue is another hot button for me, unfortunately. XML has
at least two serious disadvantages as an on-the-wire transport: first,
it's way too verbose, and second, it has no real datatypes besides 
"string". It's not even remotely clear to me what the advantage of
using it is over, say, a well-defined XDR encoding of object
transport, or, to take another example, CORBA IIOP. I agree that XML
is a great advance for document structure, but that doesn't mean it
should be the basis of a transport layer. I know it's possible to do
some intelligent compression of XML traffic, i.e., sending some data
description in XML across during the initial handshake, and just
wrapping uninterpreted data in XML for the remainder of the transport;
but at that point, are you even using XML, really? In other words,
it's either too verbose over the wire or it's a buzzword.

   > If someone wanted to start a
   > cottage industry running GCSI-compliant servers 24/7, I don't think
   > there's anything to prevent them from doing so right now; the "parser
   > server in Kansas" model is certainly something we originally aimed to
   > support.

   The firewall issue will prevent them from doing this and there is also
   the standards issue. You said that originally you aimed to support
   "parser server in Kansas" model. What happened with it? 

If there's no firewall, you can still do it. Stand up a parser at
Sheffield and have someone in Germany contact it (or someone in the
States). It works just fine. The reason people don't do it is that in
general, researchers appear to want to have their hands on the code
they're relying on. They don't want to worry about someone else's
machine crashing, or someone else's sysadmin being incompetent, or
someone else's bug causing them a headache they can't address. Not to
mention the fact that dialogue systems are sensitive to latency, and
my guess is that even the GCSI running locally is too much transport
overhead for some people. If the code is open source (and almost all
the code I work with is), people are going to want to have it locally,
because there's no comparative advantage to running it remotely. The
"parser server in Kansas" model is transparently supported; it's just
not used because it turns out to be the wrong model for research, at
least in the US. And it's not the firewall issue which is causing
people not to do it.

   I am a great admire of the GCSI and I think you guys did a wonderful job
   with it, but because I like it very much I am interested to see how it
   can be enhanced and made more flexible. In order to do this, one needs
   to try to spot its weaknesses.

   In my view standards compliance is a considerable issue to be taken into
   consideration. One of the advantages that web servers bring in is this
   standard the compliance. Currently there are two main implementations of
   the web services standards (JAX and .NET). 

   If the hub will be rewritten in one of the new language programming,
   Java for example, the frame data structure will be rewritten in XML
   (XSchema for example) and the servers libraries updated to comply with
   web services standards then CGSI will do the same stuff that it does
   right now, but it will be more powerful (better multimodal interaction
   support, multilanguage support through UNICODE, real functionality over
   the Internet) and more flexible because it will use widely available
   standards.

I appreciate your viewpoint, but after giving it some substantial
consideration over the years, I've arrived at a rather different
conclusion. Here's why.

The original goal of the GCSI was to "lower the bar to entry" for
spoken dialogue research. The original model presented to us by the
Communicator program manager was a DARPA program called MOSIS, which
was designed to promulgate best practice about VLSI design throughout
the universities and research labs, which led to an explosion in
training for VLSI designers. We believe very strongly in this model,
and it's been pretty successful so far, to my mind, due to a
confluence of factors: the insistence on the part of the DARPA program
that everyone use the GCSI; MIT's original contribution of its
software infrastructure as the basis of the GCSI and MITRE's assigned
role as its maintainer and designer independent of any dialogue
research agenda; the fact that the GCSI was assigned a generous
open-source license; and the continued endorsement by organizations
like CMU who develop high-quality, open source dialogue components. 

But the GCSI was never intended to be a production system. It's robust
enough to use it for research and prototype development, and to gather
real, high-quality data about human-computer interaction; but even
though its license permits it, it's not going to be part of a fielded
product, for all sorts of reasons beyond the standards issues: it's
not redundant, it's not secure, it doesn't integrate with existing IVR
systems, and I sincerely doubt that it would scale to thousands of
users. These are product-level issues which neither DARPA nor MITRE
ought to be addressing; the marketplace should be addressing them. 

So my first answer is: in principle, the world of GCSI is pretty
separate from the commercial world. Had the relevant standards been
available when we started, we might have consumed them; but they
didn't, and in some ways that was a good thing for us. I actually
don't see the particular advantage in conforming to emerging
standards. 

Consider, for instance, the "flavor of the month" problem in remote
service models. First it was CORBA, then RMI, then JINI or JavaBeans
or whatever the hell it was at that point, now it's .NET and JAX. I
have to tell you, I'm quite skeptical about whether either of those
service models will be any more successful than previous ones. Sure,
.NET has Microsoft behind it, but it's a lot harder to own the Net
than it is to own individual desktops, and if it ever turns out that
there's no money in it for Microsoft (e.g., if people don't go for
Hailstorm and Passport), Microsoft's enthusiasm will wane. And let's
face it, when you're dealing with things which aren't on your desktop,
trust becomes much more of an issue, and Microsoft, frankly, is not a
trustworthy company, and people know it. And as for JAX, I do a lot of
reading, and I'd never even heard of it until you mentioned it (I'll
leave it for you to decide whether that speaks worse of me or of
JAX). My skepticism about the likely success of these approaches may
be unwarranted, but their track record isn't good.

Next, there's the issue of whether the notion of "Web services" makes
any sense at all. At the moment, we're already facing a crisis on the
Net about profit-making; people want information, but they aren't
willing to pay for it, at least given the current payment models. I
personally would be happy to pay five cents per article view in a lot
of places, but there's no established system of micropayments that
seems to be both workable and acceptable from a human-factors
standpoint. So how are Web services going to be successful? Why would
people stand up a service? I doubt that it would cost any less than
standing up information. In other words, I'm not sure the underlying
business model has a whole lot of validity.

So let's say, on the other hand, that a researcher wants to stand up a
Web service for free. This is certainly not out of the question. But
first, it has to address the issue I raised earlier, namely, why a
researcher would tolerate relying on someone else's code on a machine
he/she doesn't have control over. For me and people in my lab, it's a
last resort, especially given the plummeting cost of hardware. But
let's say someone wants to interact in that way. Why use JAX or .NET?
What are the advantages? You list some.

(1) Firewall access: I've already argued that this advantage will
    vanish.

(2) Better multimodal support: it's not clear to me how Web services
    confer an advantage in this area.

(3) Multilanguage support through UNICODE: absolutely, this is missing
    from the GCSI, but again, it's not clear to me how Web services
    confer an advantage in this area. There are two issues as far as
    the GCSI is concerned regarding non-ASCII character sets: the
    transport layer and the client API. The client API would have to
    be fixed no matter what, and I can't believe the transport layer
    would be complicated. So we may get SOMEthing for free, but it
    doesn't seem to me that we'd get much.

(4) More flexible because of widely available standards: I just don't
    see this. There would have to be some marginal advantage in people
    investing in one of these object models over and beyond their work
    with the GCSI, and there just isn't enough motivation, it seems to
    me. For instance,  if you're trying to leverage commercial
    products, why buy a Web service from IBM for ViaVoice when you can
    buy the SDK and run it locally? Furthermore, the real interesting
    work in SR and TTS are being done in the laboratory, and
    tools like Sphinx and Festival are open source and run on multiple
    platforms, so why would you use ViaVoice for research unless you
    had the source code, like IBM? Finally, there aren't any other
    tools (besides, perhaps, VoiceXML) that would provide any
    comparative market advantage for researchers. 

So here's what I see when I look at the idea of Web services:

(a) an overly verbose wire protocol; 
(b) a "flavor of the month" problem with picking the right service
    model; 
(c) attention to an implied development model which doesn't seem to
    match what researchers do.

What this adds up to, for me, is a lot of work with not a lot of
payoff. If somebody else wants to do it, I think that would be great;
for instance, I've advocated for many years that someone build a
wrapper for VoiceXML so that we can do some baselining against current
market capabilities. But as with that effort, I've concluded that it's
just not what we're being asked to do here.

Cheers,
Sam