From: Dean M. B. <mik...@gm...> - 2010-09-09 17:57:02
|
Hi Guys, Just a quick heads up, I've pushed an incremental parser for HTTP responses that I'm building with two birds to hit: 1. Develop a spec-compliant HTTP response message parser that is stateful, space-efficient, and is restartable that will be used in the asynchronous http client implementation. 2. Develop an incremental parser concept which might stand as a good abstraction on its own. #1 is the short term goal, while #2 is the longer term goal. The reason for #2 is similar to the reasoning behind the development of the Message concept as well as the Request and Response concepts: so that it would be easier for us and others to implement compliant, drop-in implementations that model the Message, Request, and Response concepts. The goal for having a suitable incremental parser interface is so that we can generally abstract the client/server implementations enough to just rely on concepts like an IncrementalParser which can be implemented underneath using as many different technologies as possible. I think if we have an implementation of incremental parsers for HTTP, XMPP, SMTP, then that might make a compelling component of the library that would be useful in different kinds of applications -- much like how the URI parser is really useful in many different contexts outside of just cpp-netlib. Thanks guys and I hope this helps. PS. In case you want to pick up where I've left things hanging, please feel free to fork and send in pull requests. I've tried to document the rationale in the test, and will be documenting the rationale for the implementation more as I we go along. The test is in libs/network/test/http_incremental_parser.cpp (in commit http://bit.ly/dmkA9f, the actual file is http://bit.ly/9h61ju). The implementation now is able to at least parse the HTTP version part of a range, and say whether it was successful or whether there are still missing parts, or that the range isn't conforming. -- Dean Michael Berris deanberris.com |
From: Jeroen H. <vex...@gm...> - 2010-09-09 21:30:54
|
On 9 September 2010 19:56, Dean Michael Berris <mik...@gm...> wrote: > Hi Guys, > > Just a quick heads up, I've pushed an incremental parser for HTTP > responses that I'm building with two birds to hit: > > 1. Develop a spec-compliant HTTP response message parser that is > stateful, space-efficient, and is restartable that will be used in the > asynchronous http client implementation. > > 2. Develop an incremental parser concept which might stand as a good > abstraction on its own. > > #1 is the short term goal, while #2 is the longer term goal. The > reason for #2 is similar to the reasoning behind the development of > the Message concept as well as the Request and Response concepts: so > that it would be easier for us and others to implement compliant, > drop-in implementations that model the Message, Request, and Response > concepts. The goal for having a suitable incremental parser interface > is so that we can generally abstract the client/server implementations > enough to just rely on concepts like an IncrementalParser which can be > implemented underneath using as many different technologies as > possible. > > I think if we have an implementation of incremental parsers for HTTP, > XMPP, SMTP, then that might make a compelling component of the library > that would be useful in different kinds of applications -- much like > how the URI parser is really useful in many different contexts outside > of just cpp-netlib. > > Thanks guys and I hope this helps. > > PS. In case you want to pick up where I've left things hanging, please > feel free to fork and send in pull requests. I've tried to document > the rationale in the test, and will be documenting the rationale for > the implementation more as I we go along. The test is in > libs/network/test/http_incremental_parser.cpp (in commit > http://bit.ly/dmkA9f, the actual file is http://bit.ly/9h61ju). The > implementation now is able to at least parse the HTTP version part of > a range, and say whether it was successful or whether there are still > missing parts, or that the range isn't conforming. > > -- > Dean Michael Berris > deanberris.com > Hi Dean, Brilliant! I've personally always considered this to be a weak spot in cpp-netlib and given how important it is, I'm glad it's getting some attention. Having said that, I'm surprised you're starting from scratch given that there are some very good and tested HTTP parsers out there (mongrel comes to mind). Secondly I must ask you to read, and re-read the RFC and to make sure you follow it (I've had a quick glance at the code you've committed and already spotted a deviation, for which I've filed an issue at github). Currently I've got little time to spend on cpp-netlib since I'm busy with a project of my own (shameless plug, http://github.com/VeXocide/construe_cast) but I'll definitely be following this closely. Regards, Jeroen Habraken |
From: Dean M. B. <mik...@gm...> - 2010-09-13 02:56:56
|
Hi Jeroen! Sorry about the late response, it was the long weekend here in the Philippines and I took some time off to have fun with the family. See some of my thoughts below. :) On Fri, Sep 10, 2010 at 5:30 AM, Jeroen Habraken <vex...@gm...> wrote: > > Hi Dean, > > Brilliant! I've personally always considered this to be a weak spot in > cpp-netlib and given how important it is, I'm glad it's getting some > attention. Thanks, it has been in the back of my mind for a while and I think there's no better way to address it than to actually do something about it. > Having said that, I'm surprised you're starting from > scratch given that there are some very good and tested HTTP parsers > out there (mongrel comes to mind). Ah, yes. The rationale is really simple: 1. I'd like it done test-driven, meaning actual requirements dictate the implementation. This is hard to do if you're going to be basing the implementation on something already existing. At least this is my personal view on it. 2. The license might be an issue if I base the implementation on something existing. Given that cpp-netlib is licensed under the Boost Software License, anything that is licensed under a non-BSL compatible license is a non-starter. 3. I'd like to gain more experience writing more parsers by hand to get a better understanding and get a better chance at optimizing the implementation at any given point in the future. ;) So if you can suggest anything that is already Boost Software Licensed that I can just tweak -- something like what is already in Boost.Asio examples, which BTW, we already use in the HTTP Server implementation -- then it might be something I'd be willing to look into. :D > Secondly I must ask you to read, > and re-read the RFC and to make sure you follow it (I've had a quick > glance at the code you've committed and already spotted a deviation, > for which I've filed an issue at github). > Yes, thanks. I however would like to deal with the RFC issues later, just as soon as I can parse a valid narrow subset of the RFC. My aim is really to get something that will allow me to just parse the "known good" incoming data in an incremental manner. I will however base test inputs on the RFC, so that might allow me to go RFC-compliant in the tests, while the implementation might be narrower than the full RFC implementation. > Currently I've got little time to spend on cpp-netlib since I'm busy > with a project of my own (shameless plug, > http://github.com/VeXocide/construe_cast) but I'll definitely be > following this closely. > No worries -- I'm looking forward to construe_cast be proposed and included in Boost myself ;). I will try to post more about the progress and discoveries I make along the way. > Regards, > Jeroen Habraken > Thanks Jeroen and I definitely look forward to more of your insights as I continue working on this particular part of the library! :) -- Dean Michael Berris deanberris.com |
From: Dean M. B. <mik...@gm...> - 2010-09-14 07:09:12
|
On Mon, Sep 13, 2010 at 10:56 AM, Dean Michael Berris <mik...@gm...> wrote: > > On Fri, Sep 10, 2010 at 5:30 AM, Jeroen Habraken <vex...@gm...> wrote: >> >> Secondly I must ask you to read, >> and re-read the RFC and to make sure you follow it (I've had a quick >> glance at the code you've committed and already spotted a deviation, >> for which I've filed an issue at github). >> > > Yes, thanks. I however would like to deal with the RFC issues later, > just as soon as I can parse a valid narrow subset of the RFC. My aim > is really to get something that will allow me to just parse the "known > good" incoming data in an incremental manner. I will however base test > inputs on the RFC, so that might allow me to go RFC-compliant in the > tests, while the implementation might be narrower than the full RFC > implementation. > In this light, I've fixed the issue you filed (#14) with regards to the HTTP version parsing. I've also finished the incremental parsing of the status message and headers. What remains is a body parser that should be able to understand chunked transfer encoding. I'm not racking my brain with that yet, but I will be implementing a simplistic incremental parser. This will be the basis too of a streaming client parser for the responses. I'm also looking at changing the regex-based implementation of the existing synchronous client to use this restartable parser and remove the dependency on Boost.Regex in the near future. Hopefully things don't break when I introduce the incremental parsing to the synchronous client implementation. ;) Have a great one guys and I definitely hope this helps! -- Dean Michael Berris deanberris.com |