From: Rick R. <ric...@gm...> - 2011-03-20 16:11:43
Attachments:
rrichardson3-20.diff
|
Three things: 1. Hi, I'm Rick Richardson. I just got a hold of cpp-netlib because I am looking to make a highly scalable and runtime reconfigurable reverse proxy. I am fairly new to boost::asio but have been doing event driven network coding for 10 years. I still don't fully understand the netlib source so if my approach is way off, please forgive me. 2. I added keepalive capabilities to the async server. I altered the existing async server because I don't see how any self respecting server wouldn't offer keepalive :) My approach: I wrapped the chain of async functions back around to call read_more. Specifically, in server/async_connection.hpp, in write(Range) I register a callback to be read_more_wrapper(), which does the work of default_error() and then calls read_more. This is to handle requests that may be pipelined over the same connection. Since this callback is carried through to be executed after all writes have completed. It should be safe to handle pipelined requests in a serial fashion. If there are no pending additional requests, then the async_read_some will block indefinitely. The second phase of the plan, then, is to register a deadline timer at start of the new connection. Every new request that gets read will kick the deadline timer back up to its max timeout time. When the timer does go off, it cancels any operations that are pending on the Connection's socket. This does two things, first and foremost, it keeps a hold of the Connection's shared ptr, so that it is guaranteed not to be destroyed while the keepalive is in effect. Secondly, it quite obviously will free the async_read_some from its wait. I haven't bothered to fork cpp-netlib on github so I don't have a checkin to make, but I did create a 'git diff' from boost/ I have attached it here for your perusal. Please let me know your preferred approach for reviewing and accepting patches (I am assuming a pull request, which I can get around to) 3. In attempting to benchmark my changes. I did note that yes, keepalive works, and also that it breaks httperf. My problem is that httperf doesn't seem to recognize a completed response unless the server closes the connection. It claims to be a 1.1 compliant tester, and when it sends requests it doesn't mention a connection: close in its headers, so I'm assuming it should expect the connection to remain open. Does anyone know the correct procedure here, or why it might be thinking that the responses have not completed? My concern here is that there is some nuance of the HTTP 1.1 spec that I may have missed which allows servers to delineate between responses since, with pipelining, multiple responses may come all at once over the same connection. Thanks, Rick Richardson |
From: Rick R. <ric...@gm...> - 2011-03-20 18:57:08
|
oops.. I left out a if (error != boost::asio::error::operation_aborted) ... from the handle_timer function.. I was playing with some things and forgot to put this back in. Leads to erratic behavior when this is not present :) On Sun, Mar 20, 2011 at 12:11 PM, Rick Richardson <ric...@gm... > wrote: > Three things: > > 1. Hi, I'm Rick Richardson. I just got a hold of cpp-netlib because I am > looking to make a highly scalable and runtime reconfigurable reverse proxy. > I am fairly new to boost::asio but have been doing event driven network > coding for 10 years. I still don't fully understand the netlib source so if > my approach is way off, please forgive me. > > > 2. I added keepalive capabilities to the async server. I altered the > existing async server because I don't see how any self respecting server > wouldn't offer keepalive :) > > My approach: I wrapped the chain of async functions back around to call > read_more. Specifically, in server/async_connection.hpp, in write(Range) I > register a callback to be read_more_wrapper(), which does the work of > default_error() and then calls read_more. This is to handle requests that > may be pipelined over the same connection. Since this callback is carried > through to be executed after all writes have completed. It should be safe to > handle pipelined requests in a serial fashion. > > If there are no pending additional requests, then the async_read_some will > block indefinitely. > > The second phase of the plan, then, is to register a deadline timer at > start of the new connection. Every new request that gets read will kick the > deadline timer back up to its max timeout time. When the timer does go off, > it cancels any operations that are pending on the Connection's socket. > This does two things, first and foremost, it keeps a hold of the > Connection's shared ptr, so that it is guaranteed not to be destroyed while > the keepalive is in effect. Secondly, it quite obviously will free the > async_read_some from its wait. > > I haven't bothered to fork cpp-netlib on github so I don't have a checkin > to make, but I did create a 'git diff' from boost/ I have attached it > here for your perusal. > Please let me know your preferred approach for reviewing and accepting > patches (I am assuming a pull request, which I can get around to) > > > 3. In attempting to benchmark my changes. I did note that yes, keepalive > works, and also that it breaks httperf. My problem is that httperf doesn't > seem to recognize a completed response unless the server closes the > connection. It claims to be a 1.1 compliant tester, and when it sends > requests it doesn't mention a connection: close in its headers, so I'm > assuming it should expect the connection to remain open. > > Does anyone know the correct procedure here, or why it might be thinking > that the responses have not completed? My concern here is that there is > some nuance of the HTTP 1.1 spec that I may have missed which allows servers > to delineate between responses since, with pipelining, multiple responses > may come all at once over the same connection. > > > Thanks, > Rick Richardson > > |
From: Dean M. B. <mik...@gm...> - 2011-03-21 14:10:01
|
Hi Rick! Thanks for these, let me respond in-lined below. On Mon, Mar 21, 2011 at 12:11 AM, Rick Richardson <ric...@gm...> wrote: > Three things: > 1. Hi, I'm Rick Richardson. I just got a hold of cpp-netlib because I am > looking to make a highly scalable and runtime reconfigurable reverse proxy. > I am fairly new to boost::asio but have been doing event driven network > coding for 10 years. I still don't fully understand the netlib source so if > my approach is way off, please forgive me. > Cool, it's great to hear from you Rick! > 2. I added keepalive capabilities to the async server. I altered the > existing async server because I don't see how any self respecting server > wouldn't offer keepalive :) Cool. :) > My approach: I wrapped the chain of async functions back around to call > read_more. Specifically, in server/async_connection.hpp, in write(Range) I > register a callback to be read_more_wrapper(), which does the work of > default_error() and then calls read_more. This is to handle requests that > may be pipelined over the same connection. Since this callback is carried > through to be executed after all writes have completed. It should be safe to > handle pipelined requests in a serial fashion. > Okay, but how do you deal with the parsing? Do you reset the parser state per connection? Asynchronous server handlers can call connection.write(...) multiple times per request. The calls to connection.read(...) in the case of a handler that wants to read up data from the body of a POST/PUT request will also have to deal with the boundary issues -- granted that the HTTP/1.1 spec doesn't allow pipelining for POST/PUT requests, we'd want to be able to let the handler recover or at least deal with invalid data appropriately. Have you handled this? (Granted, I haven't looked at your patch just yet). > If there are no pending additional requests, then the async_read_some will > block indefinitely. > The second phase of the plan, then, is to register a deadline timer at start > of the new connection. Every new request that gets read will kick the > deadline timer back up to its max timeout time. When the timer does go off, > it cancels any operations that are pending on the Connection's socket. > This does two things, first and foremost, it keeps a hold of the > Connection's shared ptr, so that it is guaranteed not to be destroyed while > the keepalive is in effect. Secondly, it quite obviously will free the > async_read_some from its wait. That sounds alright, although we might have to guard for a race somehow there when the handler is actually reading from the same connection (in the case of a POST/PUT handler). Basically we have to know whether the timeout happened while the handler is waiting for a POST/PUT body or whether the timeout happened while at the boundaries of the GET requests. > I haven't bothered to fork cpp-netlib on github so I don't have a checkin to > make, but I did create a 'git diff' from boost/ I have attached it here > for your perusal. > Please let me know your preferred approach for reviewing and accepting > patches (I am assuming a pull request, which I can get around to) > You assume right that I would much rather prefer looking at diffs over at Github, so a pull request is going to be most appreciated. > 3. In attempting to benchmark my changes. I did note that yes, keepalive > works, and also that it breaks httperf. My problem is that httperf doesn't > seem to recognize a completed response unless the server closes the > connection. It claims to be a 1.1 compliant tester, and when it sends > requests it doesn't mention a connection: close in its headers, so I'm > assuming it should expect the connection to remain open. > I think you need to flush the data from the server end. I forget if TCP_NODELAY is turned on by default and whether httperf (I really don't know what that is, you have to pardon me I use Apache's ab for performance testing) actually has a short timeout setting for data transfer or whether the buffers are small enough to avoid buffer starvation. How much data are you sending from the server side during the benchmarks? > Does anyone know the correct procedure here, or why it might be thinking > that the responses have not completed? My concern here is that there is > some nuance of the HTTP 1.1 spec that I may have missed which allows servers > to delineate between responses since, with pipelining, multiple responses > may come all at once over the same connection. > As far as the spec goes, the server may send a "Connection: close" at any time to signal to the client that it will initiate a close to complete the transfer. Also if you're using Transfer-Encoding: Chunked, I think there's a special sequence to indicate the end of the response -- I need to cite that part, although an "empty" chunk usually signals the end of the message. If you just blindly keep the connection alive though, then the answer is the last request from the client side *should* have a "Connection: close" to let the server know that this is the last request in the pipeline. If it doesn't, the client may have to wait indefinitely for the server to close the connection while the server is stuck in a state where it would assume that there might be more requests that may come into the connection. It's a little complicated and this is the reason why I've been keeping this TODO in the back-burner for a while now. Although you are most welcome to send up a pull request that I can review and merge into the repo. :) > Thanks, > Rick Richardson Thank you too and I definitely hope to hear from you soon! -- Dean Michael Berris http://about.me/deanberris |
From: Rick R. <ric...@gm...> - 2011-03-21 18:59:50
|
On Mon, Mar 21, 2011 at 10:08 AM, Dean Michael Berris < mik...@gm...> wrote: > Hi Rick! > > ... > > Okay, but how do you deal with the parsing? Do you reset the parser > state per connection? > > Yes. That is my current approach. Oddly enough, I have it partially working. I have a test which pipelines two requests. Currently it gets the first request and responds correctly. The 2nd request is read and processed by handle_read_data but is not correctly parsed, and I'm not sure why yet. I just need to spend a bit more time with it. > Asynchronous server handlers can call connection.write(...) multiple > times per request. The calls to connection.read(...) in the case of a > handler that wants to read up data from the body of a POST/PUT request > will also have to deal with the boundary issues -- granted that the > HTTP/1.1 spec doesn't allow pipelining for POST/PUT requests, we'd > want to be able to let the handler recover or at least deal with > invalid data appropriately. > Yea, this is a tricky point because we're basically relying on the top layer of the stack to complete parsing for us. It is good that the spec disallows pipelining posts and puts, but either way I would like to actually process the entire body in the Connection object because there is *tons* of complicating parsing logic I would like to put in for POSTS: There is a new type of socket exhaustion attack out which correctly (according to the HTTP spec) slowly dribbles in data in a multi-part post. I would like to put counter-measures to defend against such things as well as other DOS attacks (see http://www.ecl-labs.org/2011/03/17/roboo-http-mitigator.html ) > > Have you handled this? (Granted, I haven't looked at your patch just yet). > I don't handle this well enough yet, apparently. > > > If there are no pending additional requests, then the async_read_some > will > > block indefinitely. > > The second phase of the plan, then, is to register a deadline timer at > start > > of the new connection. Every new request that gets read will kick the > > deadline timer back up to its max timeout time. When the timer does go > off, > > it cancels any operations that are pending on the Connection's socket. > > This does two things, first and foremost, it keeps a hold of the > > Connection's shared ptr, so that it is guaranteed not to be destroyed > while > > the keepalive is in effect. Secondly, it quite obviously will free the > > async_read_some from its wait. > > That sounds alright, although we might have to guard for a race > somehow there when the handler is actually reading from the same > connection (in the case of a POST/PUT handler). Basically we have to > know whether the timeout happened while the handler is waiting for a > POST/PUT body or whether the timeout happened while at the boundaries > of the GET requests. > If we process the multipart POST body in the Connection object, we can re-set the keepalive timer for every packet of data we receive, (if the client waits more than TIMEOUT seconds between each packet, I would like to kick them off anyways) > > I haven't bothered to fork cpp-netlib on github so I don't have a checkin > to > > make, but I did create a 'git diff' from boost/ I have attached it > here > > for your perusal. > > Please let me know your preferred approach for reviewing and accepting > > patches (I am assuming a pull request, which I can get around to) > > > > You assume right that I would much rather prefer looking at diffs over > at Github, so a pull request is going to be most appreciated. > > I will do that.. That patch is already outdated anyways. I made a bunch of changes. > > 3. In attempting to benchmark my changes. I did note that yes, > keepalive > > works, and also that it breaks httperf. My problem is that httperf > doesn't > > seem to recognize a completed response unless the server closes the > > connection. It claims to be a 1.1 compliant tester, and when it sends > > requests it doesn't mention a connection: close in its headers, so I'm > > assuming it should expect the connection to remain open. > > > > I think you need to flush the data from the server end. I forget if > TCP_NODELAY is turned on by default and whether httperf (I really > don't know what that is, you have to pardon me I use Apache's ab for > performance testing) actually has a short timeout setting for data > transfer or whether the buffers are small enough to avoid buffer > starvation. How much data are you sending from the server side during > the benchmarks? > > Flush is being called. I did find one of the problems. Most clients expect that the header arrive in a single packet. The problem in AsyncConnection is that it was sending the first line as a single packet, then following up with the rest of the header. I altered the code (in a later patch than what I sent) to add the first line straight into the header buffer. It fixed things. > > Does anyone know the correct procedure here, or why it might be thinking > > that the responses have not completed? My concern here is that there is > > some nuance of the HTTP 1.1 spec that I may have missed which allows > servers > > to delineate between responses since, with pipelining, multiple responses > > may come all at once over the same connection. > > > > As far as the spec goes, the server may send a "Connection: close" at > any time to signal to the client that it will initiate a close to > complete the transfer. Also if you're using Transfer-Encoding: > Chunked, I think there's a special sequence to indicate the end of the > response -- I need to cite that part, although an "empty" chunk > usually signals the end of the message. > > If you just blindly keep the connection alive though, then the answer > is the last request from the client side *should* have a "Connection: > close" to let the server know that this is the last request in the > pipeline. If it doesn't, the client may have to wait indefinitely for > the server to close the connection while the server is stuck in a > state where it would assume that there might be more requests that may > come into the connection. > Ah. That last bit is a really useful piece of information. If all clients conform to that, then it will make this job much easier. > > Thank you too and I definitely hope to hear from you soon! > > Thanks. I would like to experiment with adding request body processing, even if it doesn't end up in the AsyncConnection, there is a lot of work to be done here. I am not looking forward to CHUNK processing yet, but I will get to that this weekend (My day job and 2 kids keep me rather occupied during the week) |
From: Dean M. B. <mik...@gm...> - 2011-03-22 14:57:34
|
On Tue, Mar 22, 2011 at 2:59 AM, Rick Richardson <ric...@gm...> wrote: > > > On Mon, Mar 21, 2011 at 10:08 AM, Dean Michael Berris > <mik...@gm...> wrote: >> >> Hi Rick! >> >> ... >> >> Okay, but how do you deal with the parsing? Do you reset the parser >> state per connection? >> > Yes. That is my current approach. Oddly enough, I have it partially working. > I have a test which pipelines two requests. Currently it gets the first > request and responds correctly. The 2nd request is read and processed by > handle_read_data but is not correctly parsed, and I'm not sure why yet. I > just need to spend a bit more time with it. > There is a parser type that does the incremental parsing of incoming data. Maybe all that needs to be done is to check to see if the parser's state is reset properly every time once a GET request is parsed appropriately. There may be other complications but I was thinking of a way to make it happen leveraging object lifetimes, which may or may not work -- I still have to explore that possibility. That said I'll wait to see your pull request(s) soon. :) > >> >> Asynchronous server handlers can call connection.write(...) multiple >> times per request. The calls to connection.read(...) in the case of a >> handler that wants to read up data from the body of a POST/PUT request >> will also have to deal with the boundary issues -- granted that the >> HTTP/1.1 spec doesn't allow pipelining for POST/PUT requests, we'd >> want to be able to let the handler recover or at least deal with >> invalid data appropriately. > > Yea, this is a tricky point because we're basically relying on the top layer > of the stack to complete parsing for us. > It is good that the spec disallows pipelining posts and puts, but either way > I would like to actually process the entire body in the Connection object > because there is *tons* of complicating parsing logic I would like to put in > for POSTS: > There is a new type of socket exhaustion attack out which correctly > (according to the HTTP spec) slowly dribbles in data in a multi-part post. > I would like to put counter-measures to defend against such things as well > as other DOS attacks > (see http://www.ecl-labs.org/2011/03/17/roboo-http-mitigator.html ) > Right. Implementing the POST/PUT body parser was something I wasn't thinking of incorporating in the connection, opting to delegate those to helper/utility types/functions. However I see that having those in the connection's implementation would do a lot of good to a lot of people so I definitely hope to see how you (and others) would take this task on. :) >> >> Have you handled this? (Granted, I haven't looked at your patch just yet). > > I don't handle this well enough yet, apparently. That's fine. :) >> >> That sounds alright, although we might have to guard for a race >> somehow there when the handler is actually reading from the same >> connection (in the case of a POST/PUT handler). Basically we have to >> know whether the timeout happened while the handler is waiting for a >> POST/PUT body or whether the timeout happened while at the boundaries >> of the GET requests. > > If we process the multipart POST body in the Connection object, we can > re-set the keepalive timer for every packet of data we receive, (if the > client waits more than TIMEOUT seconds between each packet, I would like to > kick them off anyways) Yes, I agree. Although that would mean making the Connection object's implementation a lot hairier than it already is. Just thinking about incremental parsing support for multi-part MIME on the *server* side is making me want to block off a good week and just hole up with a powerful machine to get that figured out. :D >> >> > I haven't bothered to fork cpp-netlib on github so I don't have a >> > checkin to >> > make, but I did create a 'git diff' from boost/ I have attached it >> > here >> > for your perusal. >> > Please let me know your preferred approach for reviewing and accepting >> > patches (I am assuming a pull request, which I can get around to) >> > >> >> You assume right that I would much rather prefer looking at diffs over >> at Github, so a pull request is going to be most appreciated. >> > I will do that.. That patch is already outdated anyways. I made a bunch of > changes. > Cool, I'll look forward to the pull requests then. :) > Flush is being called. I did find one of the problems. Most clients expect > that the header arrive in a single packet. The problem in AsyncConnection is > that it was sending the first line as a single packet, then following up > with the rest of the header. I altered the code (in a later patch than what > I sent) to add the first line straight into the header buffer. It fixed > things. > Interesting. So how does it change the interface though? And how do you intend to support sending '100 Continue' responses from the handler with this change? >> >> As far as the spec goes, the server may send a "Connection: close" at >> any time to signal to the client that it will initiate a close to >> complete the transfer. Also if you're using Transfer-Encoding: >> Chunked, I think there's a special sequence to indicate the end of the >> response -- I need to cite that part, although an "empty" chunk >> usually signals the end of the message. >> >> If you just blindly keep the connection alive though, then the answer >> is the last request from the client side *should* have a "Connection: >> close" to let the server know that this is the last request in the >> pipeline. If it doesn't, the client may have to wait indefinitely for >> the server to close the connection while the server is stuck in a >> state where it would assume that there might be more requests that may >> come into the connection. > > Ah. That last bit is a really useful piece of information. If all clients > conform to that, then it will make this job much easier. It's recommended by the spec that this be the mode of operation, but if you think about it though just having a 'Content-Length' field in the HTTP response should allow the client to not have to wait for the connection to be closed regardless of whether requests were pipelined or not. Does that make sense? >> >> Thank you too and I definitely hope to hear from you soon! >> > > Thanks. I would like to experiment with adding request body processing, even > if it doesn't end up in the AsyncConnection, there is a lot of work to be > done here. I am not looking forward to CHUNK processing yet, but I will > get to that this weekend (My day job and 2 kids keep me rather occupied > during the week) > Cool. :) I know how that goes -- with family and having a day job. In my case though lack of a powerful enough machine to do template-heavy development on is killing my productivity a lot. Hopefully I can remedy that soon now that the wife has agreed to invest on a more modern machine now. :D Have a good one Rick and I look forward to your pull requests soon! -- Dean Michael Berris http://about.me/deanberris |