Re: [cpp-netlib-devel] async keepalive and other things

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Rick!

Thanks for these, let me respond in-lined below.

On Mon, Mar 21, 2011 at 12:11 AM, Rick Richardson
<ric...@gm...> wrote:
> Three things:
> 1. Hi, I'm Rick Richardson.  I just got a hold of cpp-netlib because I am
> looking to make a highly scalable and runtime reconfigurable reverse proxy.
> I am fairly new to boost::asio but have been doing event driven network
> coding for 10 years.  I still don't fully understand the netlib source so if
> my approach is way off, please forgive me.
>

Cool, it's great to hear from you Rick!

> 2. I added keepalive capabilities to the async server. I altered the
> existing async server because I don't see how any self respecting server
> wouldn't offer keepalive :)

Cool. :)

> My approach:  I wrapped the chain of async functions back around to call
> read_more. Specifically, in server/async_connection.hpp, in write(Range) I
> register a callback to be read_more_wrapper(), which does the work of
> default_error() and then calls read_more.  This is to handle requests that
> may be pipelined over the same connection. Since this callback is carried
> through to be executed after all writes have completed. It should be safe to
> handle pipelined requests in a serial fashion.
>

Okay, but how do you deal with the parsing? Do you reset the parser
state per connection?

Asynchronous server handlers can call connection.write(...) multiple
times per request. The calls to connection.read(...) in the case of a
handler that wants to read up data from the body of a POST/PUT request
will also have to deal with the boundary issues -- granted that the
HTTP/1.1 spec doesn't allow pipelining for POST/PUT requests, we'd
want to be able to let the handler recover or at least deal with
invalid data appropriately.

Have you handled this? (Granted, I haven't looked at your patch just yet).

> If there are no pending additional requests, then the async_read_some will
> block indefinitely.
> The second phase of the plan, then, is to register a deadline timer at start
> of the new connection. Every new request that gets read will kick the
> deadline timer back up to its max timeout time.  When the timer does go off,
> it cancels any operations that are pending on the Connection's socket.
> This does two things, first and foremost, it keeps a hold of the
> Connection's shared ptr, so that it is guaranteed not to be destroyed while
> the keepalive is in effect.  Secondly, it quite obviously will free the
> async_read_some from its wait.

That sounds alright, although we might have to guard for a race
somehow there when the handler is actually reading from the same
connection (in the case of a POST/PUT handler). Basically we have to
know whether the timeout happened while the handler is waiting for a
POST/PUT body or whether the timeout happened while at the boundaries
of the GET requests.

> I haven't bothered to fork cpp-netlib on github so I don't have a checkin to
> make, but I did create a 'git diff' from  boost/   I have attached it here
> for your perusal.
> Please let me know your preferred approach for reviewing and accepting
> patches (I am assuming a pull request, which I can get around to)
>

You assume right that I would much rather prefer looking at diffs over
at Github, so a pull request is going to be most appreciated.

> 3.  In attempting to benchmark my changes.  I did note that yes, keepalive
> works, and also that it breaks httperf.  My problem is that httperf doesn't
> seem to recognize a completed response unless the server closes the
> connection.  It claims to be a 1.1 compliant tester, and when it sends
> requests it doesn't mention a connection: close in its headers, so I'm
> assuming it should expect the connection to remain open.
>

I think you need to flush the data from the server end. I forget if
TCP_NODELAY is turned on by default and whether httperf (I really
don't know what that is, you have to pardon me I use Apache's ab for
performance testing) actually has a short timeout setting for data
transfer or whether the buffers are small enough to avoid buffer
starvation. How much data are you sending from the server side during
the benchmarks?

> Does anyone know the correct procedure here, or why it might be thinking
> that the responses have not completed?  My concern here is that there is
> some nuance of the HTTP 1.1 spec that I may have missed which allows servers
> to delineate between responses since, with pipelining, multiple responses
> may come all at once over the same connection.
>

As far as the spec goes, the server may send a "Connection: close" at
any time to signal to the client that it will initiate a close to
complete the transfer. Also if you're using Transfer-Encoding:
Chunked, I think there's a special sequence to indicate the end of the
response -- I need to cite that part, although an "empty" chunk
usually signals the end of the message.

If you just blindly keep the connection alive though, then the answer
is the last request from the client side *should* have a "Connection:
close" to let the server know that this is the last request in the
pipeline. If it doesn't, the client may have to wait indefinitely for
the server to close the connection while the server is stuck in a
state where it would assume that there might be more requests that may
come into the connection.

It's a little complicated and this is the reason why I've been keeping
this TODO in the back-burner for a while now. Although you are most
welcome to send up a pull request that I can review and merge into the
repo. :)

> Thanks,
> Rick Richardson

Thank you too and I definitely hope to hear from you soon!

-- 
Dean Michael Berris
http://about.me/deanberris