From: Dean M. B. <mik...@gm...> - 2009-08-21 09:10:46
|
Hi Guys, I see that John has done quite a lot of work already in his branch(es) and I'd like to get a release out (0.4) that will have some support for persistent connections and a limited (partially correct) HTTP URL parser, and along with it support for HTTPS. Having said this, I have two major issues to deal with and it has something to do with some major parts of the library moving forward. URI/URL Parsing: I see that there have already been three attempts at a URI/URL parsing library (one from Kim, one from John, and one from me). Kim's attempt was more of an OO approach (please correct me if I'm wrong Kim), something that I felt was too "simple" and could also be done with just static polymorphism. John's approach has been in progress for a while now, uses Spirit.Classic, and adheres to the RFC's almost to the letter from what I've seen (given the EBNF). My approach is almost different from any approach that I've seen taken as far as URL parsing is concerned -- using template functions and template classes and a generic programming approach to specific URL parsing. However mine is not as close to the RFC as I'd like, and is not as well tested as I'd like either. Can us three gentlemen work together towards: 1) Adding better test coverage 2) Implementing the details of the RFC and 3) Merging what we can towards something that works and is release-ready? My criteria for a release-ready URI/URL parsing library are: * Something that can stand on its own and is able to handle HTTP(S) URLs with encoded characters left alone * A URL encoding function/library that will turn a string into a URL encoded string * Well documented library (concepts used, internal implementation, and nice readable user examples). HTTPS handling: John has started implementing persistent connections using a policy-based design that will actually change the innards of the current HTTP client. I can see that his approach (although a bit verbose for my taste) actually works semantically separating the connection management from the client implementation. I do have some issues with this approach breaking the simplicity of the client interface for users. Part of the HTTPS effort includes delegating connection management to a subsystem or component that determines what kind of connection or which persisting connection is used to service a request -- if it sees an https scheme then it's a matter of creating an SSL socket to the specified port on the destination host and then piping the already crafted HTTP request. Now the way I imagine doing this is through some connection_manager type which based on runtime variables will be able to create the appropriate connection type -- and if HTTP 1.1 was to be supported, also maintain connections that do support the default persistent connection in HTTP 1.1. This connection_manager can be encapsulated in the http::client so that the users don't have to worry about it. However, the choice of what specific *kind* of manager to instantiate is a client initialization setting. Something like this in client code: http::client c(http::follow_redirects, http::persistent, http::pipelined); http::response r = c.get("https://cpp-netlib.sourceforge.net/yaddayadda"); Inside the http::client class, would be something like this: template <...> struct client { private: shared_ptr<connection_manager> manager; public: client(...) : manager( connection_manager_factory::create_manager( /* client constructor parameters? */ ) ) {} // ... }; Then from within the connection_manager interface would be something like this: struct connection_manager { virtual shared_ptr<connection> get_connection(host, port); virtual void put_connection(shared_ptr<connection>); }; A connection type would then be the one handling the wire protocol implementation (supporting gzip for example, handling mime types, etc.). Maybe with this design we may even be able to implement stream-like response objects which have support for chunked-reading of data. With C++0x we may be able to get away with code like this on the client side: http::client c(/* init options */); auto stream = c.get("http://some.site.com/streamed", http::streamed) while (true) { auto chunk = read(stream, 1024); // will block read 1KB of the body being streamed if (size(chunk) > 0) { /* deal with the chunk */ } else break; } I'd even like to support Iterator/Range semantics too: http::client c(/* init options */); auto stream = c.get("http://some.site.com/streamed", http::streamed); copy(begin(stream), end(stream), ostream_iterator<char>(cout, "")); At this stage of the game though it's going to need some work to be done to get to this level of expressiveness and simplicity on the client side. Internally though we can go as complex and powerful as we may want, but the premium has to be put on the client code being easy to read and write. Sorry for the long post, but do you guys think we can get something done in this front that we can merge to trunk then have it released as version 0.4? Hope to hear from you soon! (BTW, please feel free to respond and change the subject line to indicate which part of this post you're responding to -- I didn't feel like sending a lot of emails with disparate topics being discussed in different emails because I felt this is all related in some manner. Thanks for understanding. :D) -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Kim G. <kim...@gm...> - 2009-08-21 14:59:12
|
Hi Dean, On Fri, Aug 21, 2009 at 11:10, Dean Michael Berris<mik...@gm...> wrote: > > Kim's attempt was more of an OO approach (please correct me if I'm > wrong Kim), something that I felt was too "simple" and could also be > done with just static polymorphism. Yes, that's a fair assessment. > My approach is almost different from any approach that I've seen taken > as far as URL parsing is concerned -- using template functions and > template classes and a generic programming approach to specific URL > parsing. However mine is not as close to the RFC as I'd like, and is > not as well tested as I'd like either. > > Can us three gentlemen work together towards: 1) Adding better test > coverage 2) Implementing the details of the RFC and 3) Merging what we > can towards something that works and is release-ready? I'd love to help with the tests, but we've just had our second child and life with two kids doesn't leave much time for hacking :-) Let me see if I can help out, but don't count on it. Quick question -- parse_specific seems to be a function template that you specialize on the Range and Tag. I thought specializing function templates was frowned upon, e.g. [1]...? Plus, as I understand it, you can't partially specialize it (reuse code parsers for one Range over several Tags). Can't you use simple overloading to get the same result? I can't claim to understand it in detail, I just browsed the code quickly, but this jumped out at me. Thanks, - Kim [1] http://www.gotw.ca/publications/mill17.htm |
From: Dean M. B. <mik...@gm...> - 2009-08-21 15:08:07
|
On Fri, Aug 21, 2009 at 10:59 PM, Kim Gräsman<kim...@gm...> wrote: [snip] > >> My approach is almost different from any approach that I've seen taken >> as far as URL parsing is concerned -- using template functions and >> template classes and a generic programming approach to specific URL >> parsing. However mine is not as close to the RFC as I'd like, and is >> not as well tested as I'd like either. >> >> Can us three gentlemen work together towards: 1) Adding better test >> coverage 2) Implementing the details of the RFC and 3) Merging what we >> can towards something that works and is release-ready? > > I'd love to help with the tests, but we've just had our second child > and life with two kids doesn't leave much time for hacking :-) Let me > see if I can help out, but don't count on it. > I understand -- I'm actually on the way to becoming a father myself this November so I'm actually trying to get as much open source programming time now before my first baby arrives. :D It would be great to see you contributing to the testing and even the implementation again. :) > Quick question -- parse_specific seems to be a function template that > you specialize on the Range and Tag. I thought specializing function > templates was frowned upon, e.g. [1]...? Plus, as I understand it, you > can't partially specialize it (reuse code parsers for one Range over > several Tags). Can't you use simple overloading to get the same > result? I can't claim to understand it in detail, I just browsed the > code quickly, but this jumped out at me. > Yes, I was actually struggling with this one -- if I go about it through the overloading route, it would look something like this: template <class Range> bool parse_specific(Range & range, tags::default_); template <class Range> bool parse_specific(Range & range, tags::http); Which I think would work and would be worth a shot implementing. The call to parse_specific then would look like: parse_specific(range, Tag()); In parse_url. If you can make the changes and make sure the tests pass, that would be super. :) > Thanks, You're welcome, and thank you for pointing out a better way at approaching parse_specific. :D -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Kim G. <kim...@gm...> - 2009-08-23 08:03:36
|
Hi Dean, On Fri, Aug 21, 2009 at 17:07, Dean Michael Berris<mik...@gm...> wrote: > > I understand -- I'm actually on the way to becoming a father myself > this November so I'm actually trying to get as much open source > programming time now before my first baby arrives. :D It would be > great to see you contributing to the testing and even the > implementation again. :) Congratulations! > Yes, I was actually struggling with this one -- if I go about it > through the overloading route, it would look something like this: > > template <class Range> > bool parse_specific(Range & range, tags::default_); > > template <class Range> > bool parse_specific(Range & range, tags::http); > > Which I think would work and would be worth a shot implementing. The > call to parse_specific then would look like: > > parse_specific(range, Tag()); > > In parse_url. Looking more closely, I'm not sure that can be made to work, as parse_specific's second param is an url_parts<Tag>&, derived from the template arg. We could overload on the entire url_parts<tag> type, but then there would be no way of accessing other types derived from the tag (e.g. string_type) inside the implementation. I think maybe the easiest -- accepting the strangeness of tag dispatching for a second ;-) -- would be to turn parse_specific into a class template, e.g. scheme_specific_parser, like so: template< typename Range, typename Tag > struct scheme_specific_parser { bool parse(Range& range, url_parts<Tag>& parts) { return true; } }; Then we'd get rid of the problems with overload resolution and partial specialization wrt function templates entirely. - Kim |
From: Dean M. B. <mik...@gm...> - 2009-08-23 10:24:31
|
Hi Kim! On Sun, Aug 23, 2009 at 4:03 PM, Kim Gräsman<kim...@gm...> wrote: > Hi Dean, > > On Fri, Aug 21, 2009 at 17:07, Dean Michael > Berris<mik...@gm...> wrote: >> >> I understand -- I'm actually on the way to becoming a father myself >> this November so I'm actually trying to get as much open source >> programming time now before my first baby arrives. :D It would be >> great to see you contributing to the testing and even the >> implementation again. :) > > Congratulations! > Thanks. :) >> Yes, I was actually struggling with this one -- if I go about it >> through the overloading route, it would look something like this: >> >> template <class Range> >> bool parse_specific(Range & range, tags::default_); >> >> template <class Range> >> bool parse_specific(Range & range, tags::http); >> >> Which I think would work and would be worth a shot implementing. The >> call to parse_specific then would look like: >> >> parse_specific(range, Tag()); >> >> In parse_url. > > Looking more closely, I'm not sure that can be made to work, as > parse_specific's second param is an url_parts<Tag>&, derived from the > template arg. > Oh right. Should it then be: template <class Range> bool parse_specific(Range & range, url_parts<tag::default_> & parts); template <class Range> bool parse_specific(Range & range, url_parts<tag::http> & parts); or template <class Range> bool parse_specific(Range & range, url_parts<tag::default_> & parts, tag::default_); template <class Range> bool parse_specific(Range & range, url_pars<tag::http> & parts, tag::http); ? > We could overload on the entire url_parts<tag> type, but then there > would be no way of accessing other types derived from the tag (e.g. > string_type) inside the implementation. > > I think maybe the easiest -- accepting the strangeness of tag > dispatching for a second ;-) -- would be to turn parse_specific into a > class template, e.g. scheme_specific_parser, like so: > > template< typename Range, typename Tag > > struct scheme_specific_parser > { > bool parse(Range& range, url_parts<Tag>& parts) > { > return true; > } > }; > > Then we'd get rid of the problems with overload resolution and partial > specialization wrt function templates entirely. > Makes sense... However, my only concern here is that the simplicity of using the free function parse_specific and relying on ADL to pick up the correct implementation. Although I guess it should be easy to make 'parse' a public static function so we don't have to instantiate the scheme_specific_parser<>. I like it! :D When do I expect a patch and passing tests? :D -- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com |
From: Kim G. <kim...@gm...> - 2009-08-23 15:54:51
|
Hi Dean, On Sun, Aug 23, 2009 at 12:24, Dean Michael Berris<mik...@gm...> wrote: > > Makes sense... However, my only concern here is that the simplicity of > using the free function parse_specific and relying on ADL to pick up > the correct implementation. Ah, I see -- the ADL overload aspect didn't occur to me. > Although I guess it should be easy to make 'parse' a public static > function so we don't have to instantiate the scheme_specific_parser<>. Oops, yeah, that's what I meant -- I missed the 'static'. But does that sort out the ADL-based implementation selection? We probably need a wrapper function that delegates to the scheme_specific_parser, no? > I like it! :D When do I expect a patch and passing tests? :D Maybe when I have more than 6 continuous minutes/day of time to myself ;-) - Kim |