[cpp-netlib-devel] Merge to Trunk, Preparing for the Pain

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Guys,

I see that John has done quite a lot of work already in his branch(es)
and I'd like to get a release out (0.4) that will have some support
for persistent connections and a limited (partially correct) HTTP URL
parser, and along with it support for HTTPS. Having said this, I have
two major issues to deal with and it has something to do with some
major parts of the library moving forward.

URI/URL Parsing:

I see that there have already been three attempts at a URI/URL parsing
library (one from Kim, one from John, and one from me).

Kim's attempt was more of an OO approach (please correct me if I'm
wrong Kim), something that I felt was too "simple" and could also be
done with just static polymorphism.

John's approach has been in progress for a while now, uses
Spirit.Classic, and adheres to the RFC's almost to the letter from
what I've seen (given the EBNF).

My approach is almost different from any approach that I've seen taken
as far as URL parsing is concerned -- using template functions and
template classes and a generic programming approach to specific URL
parsing. However mine is not as close to the RFC as I'd like, and is
not as well tested as I'd like either.

Can us three gentlemen work together towards: 1) Adding better test
coverage 2) Implementing the details of the RFC and 3) Merging what we
can towards something that works and is release-ready?

My criteria for a release-ready URI/URL parsing library are:
  * Something that can stand on its own and is able to handle HTTP(S)
URLs with encoded characters left alone
  * A URL encoding function/library that will turn a string into a URL
encoded string
  * Well documented library (concepts used, internal implementation,
and nice readable user examples).

HTTPS handling:

John has started implementing persistent connections using a
policy-based design that will actually change the innards of the
current HTTP client. I can see that his approach (although a bit
verbose for my taste) actually works semantically separating the
connection management from the client implementation. I do have some
issues with this approach breaking the simplicity of the client
interface for users. Part of the HTTPS effort includes delegating
connection management to a subsystem or component that determines what
kind of connection or which persisting connection is used to service a
request -- if it sees an https scheme then it's a matter of creating
an SSL socket to the specified port on the destination host and then
piping the already crafted HTTP request.

Now the way I imagine doing this is through some connection_manager
type which based on runtime variables will be able to create the
appropriate connection type -- and if HTTP 1.1 was to be supported,
also maintain connections that do support the default persistent
connection in HTTP 1.1. This connection_manager can be encapsulated in
the http::client so that the users don't have to worry about it.
However, the choice of what specific *kind* of manager to instantiate
is a client initialization setting. Something like this in client
code:

  http::client c(http::follow_redirects, http::persistent, http::pipelined);
  http::response r = c.get("https://cpp-netlib.sourceforge.net/yaddayadda");

Inside the http::client class, would be something like this:

  template <...> struct client {
  private:
    shared_ptr<connection_manager> manager;
  public:
    client(...)
    : manager(
        connection_manager_factory::create_manager(
        /* client constructor parameters? */
        )
      )
    {}
    // ...
  };

Then from within the connection_manager interface would be something like this:

  struct connection_manager {
    virtual shared_ptr<connection> get_connection(host, port);
    virtual void put_connection(shared_ptr<connection>);
  };

A connection type would then be the one handling the wire protocol
implementation (supporting gzip for example, handling mime types,
etc.). Maybe with this design we may even be able to implement
stream-like response objects which have support for chunked-reading of
data. With C++0x we may be able to get away with code like this on the
client side:

  http::client c(/* init options */);
  auto stream = c.get("http://some.site.com/streamed", http::streamed)
  while (true) {
    auto chunk = read(stream, 1024); // will block read 1KB of the
body being streamed
    if (size(chunk) > 0) { /* deal with the chunk */ }
    else break;
  }

I'd even like to support Iterator/Range semantics too:

  http::client c(/* init options */);
  auto stream = c.get("http://some.site.com/streamed", http::streamed);
  copy(begin(stream), end(stream), ostream_iterator<char>(cout, ""));

At this stage of the game though it's going to need some work to be
done to get to this level of expressiveness and simplicity on the
client side. Internally though we can go as complex and powerful as we
may want, but the premium has to be put on the client code being easy
to read and write.

Sorry for the long post, but do you guys think we can get something
done in this front that we can merge to trunk then have it released as
version 0.4?

Hope to hear from you soon!

(BTW, please feel free to respond and change the subject line to
indicate which part of this post you're responding to -- I didn't feel
like sending a lot of emails with disparate topics being discussed in
different emails because I felt this is all related in some manner.
Thanks for understanding. :D)

-- 
Dean Michael Berris
blog.cplusplus-soup.com | twitter.com/mikhailberis
linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com