From: Paul C. <pw...@bi...> - 2001-08-31 17:40:42
|
I tried sending this last night, but according to the geocrawler archive, it didn't make it through. So I'm sending it from a different machine. I have a few quick thoughts, and one long one: - When doing range request support, note that the RFC describes multi-range support, ie "Range: bytes=1-10,30-50". The response to such a request must be a MIME multi-part message. - I think the formatting should be standardized as the last checkin before an actual release, as this will make it easier to create managable patches that will have a chance of applying to both the CVS and released trees, and the actual release would be a good point for everyone to sync up their individual patches anyway. The long idea: (which I didn't want to post until a 0.14 release was out, but I'm starting a new job soon, and I haven't seen the IP agreement yet. Also, I probably won't have much net access from tomorrow [now today] (Friday) around noon PST until late next week.) I've also been doing some work on pulling some functionality out into a library. I'm trying the http-serving parts, and making the library give callbacks when a request is ready to be fulfilled, like how glib makes a callback when data is ready to be read or written. The idea is that an application that wants to talk HTTP but doesn't want to be a web server can just say: ... start_a_server_on_port( port_number, request_callback ); request_callback(http_object) { if (GET or HEAD) { char *file_name = make_requested_URL_into_filename (http_object->URL); if file_is_ok_to_serve(file_name) respond_by_filename(http_object, file_name); else respond_by_error(http_object, 404 or 403 or whatever's appropriate); } } and that's it. (of course, it could also get callbacks to indicate when a response has been completed, or even get callbacks to tell it whenever bytes have been written to the socket as part of the response.) I've got the basic code written, so that it supports (modulo bugs): GET and HEAD requests Single range requests (including (I hope) the RFC-specified behavour on invalid requests) Keep-alive connections Connection pipelining Generation of the Date and Last-Modified headers (including the RFC requirement that last-modified <= date) It speaks HTTP/1.1, /1.0, and /0.9 The client can respond to a request with a filename, file descriptor, or buffer, and the library will take care of the range issues and the *-Modified-Since behaviour (not implemented). Bytes-written callbacks. It should be simple to add support for If-Modified-Since and If-Not-Modified-Since headers Multiple range requests ("Range: bytes=a-b,x-y,m-n") sendfile(2) when available, else mmap(2) when available, else read(2) and write(2). POST and most other types of requests It should be less simple but not difficult to add WWW-Authenticate Longer term, Performance can be improved by eliminating a lot of the buffer-to-buffer copying, and by letting the client supply the buffer to be read into (this will also allow things like listening for gnutellanet control connections and http connections on the same port, as once the connection is identified as http, the client can pass the buffer with the already read data ("GET moo HTTP/blah\r\nblah...") to the library. I think this is too complicated to try to do until everything else is working. It's currently in a usable state (or at least, it was last month ;) ), though there's a small memory leak that I haven't even started looking for (so it may be the same one as in the cvs version.) A current tarball (no patch - there's lots of other stuff in this tree. Also, it would probably conflict with most of the things I've been reading about on the list, and shouldn't be merged until at least 0.14 anyway) is at <URL:http://bigw.org/~pwc/gtk-gnutella-http.tar.bz2> (It may not be in good shape for development because I accidentally toasted the Makefile a few minutes ago - I put back in the object file dependencies, but the source and header dependencies aren't present. I should have added them to the Makefile.am, but I didn't.) This library is further split into two main layers: the http layer, and what I've called the "proto" (for "protocol", not for "first") layer, which takes care of asynchronously writing n bytes from a buffer or file, and reading bytes into a buffer. I plan to eventually try to do the http client part, and the gnutella protocol parts on top of this lower layer. The following comments apply to both layers. The main issue that's keeping me from getting further is policy/API issue: I'd like to hear any feedback people have. The main issue that needs to be resolved has to do with cancelling a callback from within that callback. The issue is most obvious in the case of a connection-fatal error being detected. The client can always cancel the process generating callbacks - for example, waiting for new requests or waiting for a response to be completed. So these cases need to result in the connection being closed and the state being freed. Also, when an error occurs during one of these times, we want to make a callback to tell the client, and we want to close the connection and free up the state. However, if the client can issue the cancel request from within this callback, then we have to be careful to free things exactly once. Similar issues come up when the library issues a callback for successful completion, or just a regular status update (ie, "n bytes written"), and wants to access the object afterwards. This can be done by ensuring that the objects are not actually freed until we can be sure we're not in a callback that depends on that object existing upon return. Of course, I did something uglier: if you're going to make a callback, and you need to access the object afterwards (even if you're just going to free it) you set a pointer within the object to a private part of memory (usually a variable on the stack). Then the object_free() routine sets a bit in the memory pointed to and frees the object. Then after making the callback, the function checks that the object was not destroyed before it accesses it. (Note: before setting the pointer, the function has to check if the pointer was already set - then if the object is destroyed, the calling function is responsible for setting the bit referenced by this pointer.) Anyway, it seems to me that a cleaner solution would be a policy like "An object may not be destroyed and a request may not be canceled from within a callback from that object or resulting from that request." "When a callback indicating either an error or a request's sucessful completion is made, 'the associated objects' will be destroyed by the library after the callback returns. It is an error to attempt to destroy it within such a callback." I think this would make things much cleaner. However, I'm not sure that this wouldn't just make the library more irritating to use. For example, glib usually allows callbacks to be canceled by either returning FALSE from the callback, or canceling the event source, even from within the callback. As near as I can tell from a quick look at the code (for iochannels, anyway), glib uses reference counting and a "destroyed" flag in the object in order to support this. So they chose to have more complicated code in order to give the clients more choices. I don't think it's worth it. The next biggest necessary change is much simpler: it needs a better way than whatever I've got for responding to a request with an error. I've hardly looked at the code in a month since I've been busy moving, but I remember that the current way sucks. Also, I'm in the g_http namespace, because I think this should eventually end up there. But for now, please don't distribute this without changing the names of the functions and structures. I wouldn't be sending it now, but as I said, this may be my last chance to get it out. -- Paul Cassella |