Re: [cpp-netlib-devel] async keepalive and other things

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Mar 21, 2011 at 10:08 AM, Dean Michael Berris <
mik...@gm...> wrote:

> Hi Rick!
>
> ...
>
> Okay, but how do you deal with the parsing? Do you reset the parser
> state per connection?
>
> Yes. That is my current approach. Oddly enough, I have it partially
working. I have a test which pipelines two requests. Currently it gets the
first request and responds correctly. The 2nd request is read and processed
by handle_read_data but is not correctly parsed, and I'm not sure why yet. I
just need to spend a bit more time with it.

> Asynchronous server handlers can call connection.write(...) multiple
> times per request. The calls to connection.read(...) in the case of a
> handler that wants to read up data from the body of a POST/PUT request
> will also have to deal with the boundary issues -- granted that the
> HTTP/1.1 spec doesn't allow pipelining for POST/PUT requests, we'd
> want to be able to let the handler recover or at least deal with
> invalid data appropriately.
>

Yea, this is a tricky point because we're basically relying on the top layer
of the stack to complete parsing for us.

It is good that the spec disallows pipelining posts and puts, but either way
I would like to actually process the entire body in the Connection object
because there is *tons* of complicating parsing logic I would like to put in
for POSTS:
There is a new type of socket exhaustion attack out which correctly
(according to the HTTP spec) slowly dribbles in data in a multi-part post.
 I would like to put counter-measures to defend against such things as well
as other DOS attacks (see
http://www.ecl-labs.org/2011/03/17/roboo-http-mitigator.html )

>
> Have you handled this? (Granted, I haven't looked at your patch just yet).
>

I don't handle this well enough yet, apparently.

>
> > If there are no pending additional requests, then the async_read_some
> will
> > block indefinitely.
> > The second phase of the plan, then, is to register a deadline timer at
> start
> > of the new connection. Every new request that gets read will kick the
> > deadline timer back up to its max timeout time.  When the timer does go
> off,
> > it cancels any operations that are pending on the Connection's socket.
> > This does two things, first and foremost, it keeps a hold of the
> > Connection's shared ptr, so that it is guaranteed not to be destroyed
> while
> > the keepalive is in effect.  Secondly, it quite obviously will free the
> > async_read_some from its wait.
>
> That sounds alright, although we might have to guard for a race
> somehow there when the handler is actually reading from the same
> connection (in the case of a POST/PUT handler). Basically we have to
> know whether the timeout happened while the handler is waiting for a
> POST/PUT body or whether the timeout happened while at the boundaries
> of the GET requests.
>

If we process the multipart POST body in the Connection object, we can
re-set the keepalive timer for every packet of data we receive, (if the
client waits more than TIMEOUT seconds between each packet, I would like to
kick them off anyways)

> > I haven't bothered to fork cpp-netlib on github so I don't have a checkin
> to
> > make, but I did create a 'git diff' from  boost/   I have attached it
> here
> > for your perusal.
> > Please let me know your preferred approach for reviewing and accepting
> > patches (I am assuming a pull request, which I can get around to)
> >
>
> You assume right that I would much rather prefer looking at diffs over
> at Github, so a pull request is going to be most appreciated.
>
> I will do that.. That patch is already outdated anyways. I made a bunch of
changes.

> > 3.  In attempting to benchmark my changes.  I did note that yes,
> keepalive
> > works, and also that it breaks httperf.  My problem is that httperf
> doesn't
> > seem to recognize a completed response unless the server closes the
> > connection.  It claims to be a 1.1 compliant tester, and when it sends
> > requests it doesn't mention a connection: close in its headers, so I'm
> > assuming it should expect the connection to remain open.
> >
>
> I think you need to flush the data from the server end. I forget if
> TCP_NODELAY is turned on by default and whether httperf (I really
> don't know what that is, you have to pardon me I use Apache's ab for
> performance testing) actually has a short timeout setting for data
> transfer or whether the buffers are small enough to avoid buffer
> starvation. How much data are you sending from the server side during
> the benchmarks?
>
> Flush is being called. I did find one of the problems.  Most clients expect
that the header arrive in a single packet. The problem in AsyncConnection is
that it was sending the first line as a single packet, then following up
with the rest of the header.  I altered the code (in a later patch than what
I sent) to add the first line straight into the header buffer. It fixed
things.

> > Does anyone know the correct procedure here, or why it might be thinking
> > that the responses have not completed?  My concern here is that there is
> > some nuance of the HTTP 1.1 spec that I may have missed which allows
> servers
> > to delineate between responses since, with pipelining, multiple responses
> > may come all at once over the same connection.
> >
>
> As far as the spec goes, the server may send a "Connection: close" at
> any time to signal to the client that it will initiate a close to
> complete the transfer. Also if you're using Transfer-Encoding:
> Chunked, I think there's a special sequence to indicate the end of the
> response -- I need to cite that part, although an "empty" chunk
> usually signals the end of the message.
>
> If you just blindly keep the connection alive though, then the answer
> is the last request from the client side *should* have a "Connection:
> close" to let the server know that this is the last request in the
> pipeline. If it doesn't, the client may have to wait indefinitely for
> the server to close the connection while the server is stuck in a
> state where it would assume that there might be more requests that may
> come into the connection.
>

Ah. That last bit is a really useful piece of information. If all clients
conform to that, then it will make this job much easier.

>
> Thank you too and I definitely hope to hear from you soon!
>
>
Thanks. I would like to experiment with adding request body processing, even
if it doesn't end up in the AsyncConnection, there is a lot of work to be
done here.   I am not looking forward to CHUNK processing yet, but I will
get to that this weekend (My day job and 2 kids keep me rather occupied
during the week)