[Gtk-gnutella-devel] Re: Restarting devel, etc

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I tried sending this last night, but according to the geocrawler archive,
it didn't make it through.  So I'm sending it from a different machine.

I have a few quick thoughts, and one long one:

 - When doing range request support, note that the RFC describes
     multi-range support, ie "Range: bytes=1-10,30-50".  The response to
     such a request must be a MIME multi-part message.

 - I think the formatting should be standardized as the last
   checkin before an actual release, as this will make it easier to create
   managable patches that will have a chance of applying to both the CVS
   and released trees, and the actual release would be a good point for
   everyone to sync up their individual patches anyway.

The long idea: (which I didn't want to post until a 0.14 release was out,
but I'm starting a new job soon, and I haven't seen the IP agreement yet.
Also, I probably won't have much net access from tomorrow [now today]
(Friday) around noon PST until late next week.)

I've also been doing some work on pulling some functionality out into a
library.  I'm trying the http-serving parts, and making the library give
callbacks when a request is ready to be fulfilled, like how glib makes a
callback when data is ready to be read or written.

The idea is that an application that wants to talk HTTP but doesn't want
to be a web server can just say:

...
start_a_server_on_port( port_number, request_callback );

request_callback(http_object) {
  if (GET or HEAD) {
     char *file_name = make_requested_URL_into_filename (http_object->URL);
     if file_is_ok_to_serve(file_name)
       respond_by_filename(http_object, file_name);
     else
       respond_by_error(http_object, 404 or 403 or whatever's appropriate);
  }
}

and that's it.  (of course, it could also get callbacks to indicate when
a response has been completed, or even get callbacks to tell it whenever
bytes have been written to the socket as part of the response.)

I've got the basic code written, so that it supports (modulo bugs):
  GET and HEAD requests
  Single range requests (including (I hope) the RFC-specified behavour on
    invalid requests)
  Keep-alive connections
  Connection pipelining
  Generation of the Date and Last-Modified headers
    (including the RFC requirement that last-modified <= date)
  It speaks HTTP/1.1, /1.0, and /0.9
  The client can respond to a request with a filename, file descriptor, or
    buffer, and the library will take care of the range issues and the
    *-Modified-Since behaviour (not implemented).
  Bytes-written callbacks.

It should be simple to add support for
  If-Modified-Since and If-Not-Modified-Since headers
  Multiple range requests ("Range: bytes=a-b,x-y,m-n")
  sendfile(2) when available, else mmap(2) when available, else read(2)
    and write(2).
  POST and most other types of requests

It should be less simple but not difficult to add
  WWW-Authenticate

Longer term,
  Performance can be improved by eliminating a lot of the buffer-to-buffer
   copying, and by letting the client supply the buffer to be read into
   (this will also allow things like listening for gnutellanet control
   connections and http connections on the same port, as once the
   connection is identified as http, the client can pass the buffer with
   the already read data ("GET moo HTTP/blah\r\nblah...") to the library.
   I think this is too complicated to try to do until everything else is
   working.

It's currently in a usable state (or at least, it was last month ;) ),
though there's a small memory leak that I haven't even started looking for
(so it may be the same one as in the cvs version.)

A current tarball (no patch - there's lots of other stuff in this tree.
Also, it would probably conflict with most of the things I've been reading
about on the list, and shouldn't be merged until at least 0.14 anyway) is
at

  <URL:http://bigw.org/~pwc/gtk-gnutella-http.tar.bz2>

(It may not be in good shape for development because I accidentally
toasted the Makefile a few minutes ago - I put back in the object file
dependencies, but the source and header dependencies aren't present.  I
should have added them to the Makefile.am, but I didn't.)

This library is further split into two main layers: the http layer, and
what I've called the "proto" (for "protocol", not for "first") layer,
which takes care of asynchronously writing n bytes from a buffer or file,
and reading bytes into a buffer.  I plan to eventually try to do the http
client part, and the gnutella protocol parts on top of this lower layer.

The following comments apply to both layers.

The main issue that's keeping me from getting further is policy/API issue:
I'd like to hear any feedback people have.

The main issue that needs to be resolved has to do with cancelling a
callback from within that callback.  The issue is most obvious in the case
of a connection-fatal error being detected.

The client can always cancel the process generating callbacks - for
example, waiting for new requests or waiting for a response to be
completed.  So these cases need to result in the connection being closed
and the state being freed.  Also, when an error occurs during one of these
times, we want to make a callback to tell the client, and we want to close
the connection and free up the state.

However, if the client can issue the cancel request from within this
callback, then we have to be careful to free things exactly once.

Similar issues come up when the library issues a callback for successful
completion, or just a regular status update (ie, "n bytes written"), and
wants to access the object afterwards.

This can be done by ensuring that the objects are not actually freed until
we can be sure we're not in a callback that depends on that object
existing upon return.

Of course, I did something uglier: if you're going to make a callback, and
you need to access the object afterwards (even if you're just going to
free it) you set a pointer within the object to a private part of memory
(usually a variable on the stack).  Then the object_free() routine sets a
bit in the memory pointed to and frees the object.  Then after making
the callback, the function checks that the object was not destroyed before
it accesses it.

(Note: before setting the pointer, the function has to check if the
pointer was already set - then if the object is destroyed, the calling
function is responsible for setting the bit referenced by this pointer.)

Anyway, it seems to me that a cleaner solution would be a policy like

 "An object may not be destroyed and a request may not be canceled from
  within a callback from that object or resulting from that request."

 "When a callback indicating either an error or a request's sucessful
  completion is made, 'the associated objects' will be destroyed by the
  library after the callback returns.  It is an error to attempt to
  destroy it within such a callback."

I think this would make things much cleaner.  However, I'm not sure that
this wouldn't just make the library more irritating to use.  For example,
glib usually allows callbacks to be canceled by either returning FALSE
from the callback, or canceling the event source, even from within the
callback.  As near as I can tell from a quick look at the code (for
iochannels, anyway), glib uses reference counting and a "destroyed" flag
in the object in order to support this.  So they chose to have more
complicated code in order to give the clients more choices.  I don't think
it's worth it.

The next biggest necessary change is much simpler: it needs a better way
than whatever I've got for responding to a request with an error.  I've
hardly looked at the code in a month since I've been busy moving, but I
remember that the current way sucks.

Also, I'm in the g_http namespace, because I think this should eventually
end up there.  But for now, please don't distribute this without changing
the names of the functions and structures.  I wouldn't be sending it now,
but as I said, this may be my last chance to get it out.

-- 
Paul Cassella

[Gtk-gnutella-devel] Re: Restarting devel, etc

A GTK+ Gnutella client, optimized for speed and scalability

[Gtk-gnutella-devel] Re: Restarting devel, etc