Re: [Communicator-user] GCSI and web services ? What do you think ?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

   > Consider, for instance, the "flavor of the month" problem in remote
   > service models. First it was CORBA, then RMI, then JINI or JavaBeans
   > or whatever the hell it was at that point, now it's .NET and JAX.

   On a first view Web Services are not very different than CORBA,
   DCOM, RMI, RPC, etc. In fact they are the same thing in terms of
   functionality.  The novelty here is that they bring into light a
   neutral standard. All the previous tries were customized protocols
   and products that were fighting against each other for
   supremacy. Now the specs are neutral and they have been embraced by
   the big players. All they have to do now is to start producing
   implementations for it. To me is just like when the WEB (HTML/HTTP)
   started. Netscape and MS produced implementations for them and the
   browsers war started. Now we are talking about the war between .NET
   and JAX and perhaps other implementations. This is the clue to me
   that make me think that this time they got it right and Web
   Services will be the next BIG thing that every body was expecting.

Are the specs neutral? I can't find any evidence on the Web that the
.NET CLR has been submitted to any standards body (it certainly
doesn't help that Microsoft keeps choosing names for things, like .NET
and C#, which include punctuation which Google strips off). And the
histories of Microsoft and Sun don't give me much confidence that this
will happen. Granted, SOAP is being worked on by the W3C, and
Microsoft deserves kudos for that, and Sun has certainly put a number
of things in the public domain (like XDR, which we use in the
GCSI). Submitting Web services as standards is a real risky
proposition for these companies: on the one hand, they can't really
take off unless there's true distributed access to them, which means
other platforms and other vendors, but the loss of control implied
there also calls into question what the business model might be.

I do take your point here; I'm just not sure I agree. I think CORBA
reall does count as a non-neutral specification; there are open-source
implementations of it for a number of languages, and if I recall
correctly, the Java runtime now comes with an ORB. And it's not clear
that there's anything wrong with CORBA, technically, except for the
judgment I've frequently heard that it's too heavyweight a
protocol. That is, it's not clear to me that the battle has changed
substantially; but I can't really judge this well.

   It is true that SOAP over HTTP is not very reliable but it's a good
   start for the moment. I remember when credit card transaction over the
   Internet begun. It wasn't very safe :)) (you know what I mean) but
   things improved. I heard that the plans for the future stipulate that
   HTTP will be discarded and SOAP will take over. Of course that security
   will never disappear, as well as the firewall issue. They have always
   been there and we have to live with them but they shouldn't stay in our
   way.

I actually have no doubt about its reliability; it's a matter of how
long it will remain under the radar security-wise that concerns me.

   > The XML issue is another hot button for me, unfortunately. XML has
   > at least two serious disadvantages as an on-the-wire transport: first,
   > it's way too verbose, and second, it has no real datatypes besides
   > "string".

   The first disadvantage you present here can bee seen as an
   advantage if we are talking about DEBUGGING. Too verbose means that
   it facilitate human debugging but this comes to the price of being
   slow and taking some bandwidth. The last things will improve
   because it's a hardware problem.  The same has been said about Java
   and it's still around today making the developer's life more
   comfortable.

I'll certainly grant that when I'm trying to figure out what went over
the wire, it's harder than it would be if only ASCII were going over
the wire. However, the situations where anyone besides me would need
to look at that data are exceedingly rare. I certainly agree that in
the best of circumstances, a fatter pipe will address the verbosity
problem, but there are actually some circumstances which I have to
worry about (military contexts, mostly) where the bandwidth is
exceedingly narrow. So I need to pay attention to this issue. I don't
actually believe that processor speed is relevant here, since its
effect is pretty much swamped by network latency in most of the
situations I'm familiar with.

   The second one has been overcome with the arrival of XSD or XML Schema
   maintained by W3C. Web Services couldn't happen without agreeing on a
   standard that regulates the small details that make life so painful
   like: (what is the length of an Integer, or what is the encoding of a
   String). It's being done. Now you can encode any data type with XSD.
   There is a standard agreement and the big players started to implement
   it. It's a very important aspect for the future.

I'd somehow missed this standards effort. Thanks.

   > So let's say, on the other hand, that a researcher wants to stand up a
   > Web service for free. This is certainly not out of the question. But
   > first, it has to address the issue I raised earlier, namely, why a
   > researcher would tolerate relying on someone else's code on a machine
   > he/she doesn't have control over. For me and people in my lab, it's a
   > last resort, especially given the plummeting cost of hardware. But
   > let's say someone wants to interact in that way. Why use JAX or .NET?
   > What are the advantages? You list some.

   Well as you spotted a disadvantage of web services will be the fact
   that "no one will tolerate relying on someone else's code on a
   machine he/she doesn't have control over". But to me this sounds
   like an immense pressure for that ONE because ONE is forced to
   learn lots of things in order to be able to cope with minor/major
   problems. It is just like learning to repair everything in a house
   hold. Instead of relying on specialized services ONE would try to
   apply a DIY "strategy".

   [...]

   In my domain there are groups of researchers doing very well one task
   and doing very badly some other tasks. For example in a NLP system one
   would need to carry a POS (part of speech) operation. To me this should
   be a Web Service accessible worldwide. GOOGLE already took this step
   with its GOOGLE API (http://www.google.com/apis/) offering NLP
   applications a nice and powerful IR system. :))

There are certainly some mature technologies to exploit, like POS
tagging. But if they're really mature and bulletproof, then it's just
as easy to install the code locally and run it yourself as try to
access it over the network. Granted, if it really ISN'T mature, and
you need to engage the developer in fixing it for you, there may be a
debugging advantage in your request running on his server instead of
yours. My experience as been, however, that none of these technologies
has been used in the range of situations which would mostly permit
them to be run out of the box if a researcher is really interested in
exploring advanced dialogue, because the interactions between these
modules have gone largely unexplored. How do you make a speech
recognizer sensitive to context? Do you just swap the language model,
or is there something more sophisticated you can do? It seems to me
that a lot of people are going to want to get into this dimension of
things, and to do that, you need the code. That is, you're right that
people shouldn't have to roll up their sleeves and get their fingers
dirty in things they don't have expertise in, but my experience has
been that people can't afford NOT to, because the problem of spoken
dialogue appears to be a tightly integrated problem. 

Furthermore, the apparent advantage of not needing to support a
service you don't know much about vanishes in real research
circumstances. Let's say I wrote a parser which wasn't perfect, and you
needed some help. But I'm running it as a service, and you don't have
the source code. And there are twenty other groups ALSO subscribing to
my service, and they're having problems as well. Do you really want to
wait for me to get around to paying attention to you, or would you
prefer the option of rolling up your sleeves and trying to fix the
problem yourself? Or let's say there's an enhancement you'd like to
see installed, because of some complex set of dialogues you're
exploring. In other words, I think it's far more likely that
researchers are going to WANT to have other people's code and run it
locally, rather than rely on the apparent simplicity of subscribing to
a remote service.

I realize I'm stating my case awfully strongly here. If the AMITIES
folks really have firewall issues and distributed interaction
requirements that the GCSI doesn't address, we need to talk about that
(offline, of course). But as long as this is an abstract issue, I'm
going to try to stick to my guns.

Cheers,
Sam