Re: [cpp-netlib-devel] HTTP Synchronous server reply methods?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Feb 2, 2011 at 3:24 AM, Nelson, Erik - 2
<eri...@ba...> wrote:
>> Dean Michael Berris wrote on Tuesday, February 01, 2011 1:20 PM
>>
>> Well, the problem with sending binary data as 7-bit clear over the
>> network has been documented extensively over the Internet. The spec
>> clearly states that you have to be transferring data safe to transfer
>> over 7-bit transfers encoding -- meaning that's technically ASCII
>> text. It is an accident that images are being sent in the clear as
>> binary data, and if you notice in history this is largely why people
>> (browser developers and server developers) had to agree that they
>> would just take whatever was sent over the wire in the body and just
>> have the MIME identifiers there.
>>
>
> Either way you're sending binary in these examples... the only issue was whether that binary should be copied into a std::string
>

Why is copying into a string a problem in the first place?

>>
>> See, the fact that I'm even using std::string is already something I
>> detest (read the thread about [string]proposal on the Boost ML ;) ) --
>
> I've been reading the (extensive) [string] proposal on boost. :)
>

Ah, so you know the reason why I want a better string is so that I can
use it in cpp-netlib, no? :D

>> but at the moment it is the most sane and simple thing to do lacking a
>> proper efficient segmented data storage mechanism around (no
>> std::deque has its own issues, and ptr_list<array<T,N> > is too
>> "esoteric"). Forcing people to deal with std::vector<char> is just
>> unnecessary when std::string is much more familiar however ugly.
>
> I'm not proposing vector<char>- the foundation of my point is the library *should not* make the container choice for the user.  There is no choice that works well for everyone.
>

Right, and it has never been a choice made by the library. It just so
happened that the default is std::string for the string. Anybody can
extend the tag dispatch mechanisms and create their own tags that
define a different string<Tag>::type to yield something other than
std::string.

Actually if I remember correctly, there is a body<Tag>::type
metafunction that I was using to determine what the type of the body
should be. Basically these extension points are still undocumented at
this point and I'm thinking about how to make configuring the types
easier maybe with macro's instead of forcing people who want to extend
the library to learn all the traits and metafunctions that depend on
the tag dispatch. ;)

>> Erik wrote:
>>>
>>> auto_ptr<MyObject> obj(new MyObject);
>>> response << body(obj);
>>>
>
>> Uh oh, this is dangerous because the user can use the obj right after
>> the data is passed to the response.
>
> Nope- the auto_ptr will be cleared when the ownership transfers to the response, right?
>

Well, after the `response << body(obj)` line, obj is still accessible
to the user. It's not guaranteed that the user will do this line at
the end of the handler body. Also, it's not safe to take an arbitrary
void* as well and write that out to the network. That's asking for
trouble not only with the protocol but with C++ in general.

Also, auto_ptr is going away for good reason -- it's an awfully
non-value-semantics playing type.

>> Erik wrote:
>>> should be sufficient for POD types, and for non-POD types, maybe
>>> you'd need something like
>>>
>>> auto_ptr<vector<char> > obj(new vector<char>());
>>> response << body(&(*obj)[0], obj->size(), obj);
>>>
>>
>> And this is just ugly. ;)
>
> A little... think of it from a user standpoint- how would you send an arbitrary object in the response body?

You can't, you need to linearize it to a container that is suitable
for network transfer. That's the whole point of the interface defining
the types -- I'm not about to go back to the days of C where I'll just
treat a thunk of memory as data that I can safely use in the library.

See, the point of the interface is to make things *easy*. There's
absolutely no point in allowing a pointer to an arbitrary type in
memory to be serialized out to a socket. There's 0 utility in that.

Remember also that we're talking about the *synchronous*
implementation. This means there's defined semantics for what a
response type is. If you need to be sending arbitrarily large data in
memory, *don't use the synchronous interface*. Look at what the
fileserver example in 0.9-devel does -- which is actually portable to
0.8.1 IIRC.

> Can you write anything cleaner or more intuitive than
>
> response << body(void*, length) ?  The way you're describing makes you write
>
> response.content.resize(length);
> memcpy((void*)response.content.data(), &object, length);
>
> That looks less ugly?  More intuitive?
>

Well, there are a few reasons why your suggestion is ugly:

1) You're passing in a naked pointer. That's C'ish of you to suggest. ;)

2) Also, it's largely bad form to try to take a void* when you can get
the type of the object directly. What I aim to do as a library
designer responsible for the interface and the implementation is to
enforce the safety of the interface and the operations which I use.

3) There's absolutely 0 reason for you not to use the string when the
semantics of the handler is to craft a string response. If you want to
send data, you need to send a string in. Even if it is binary data.

There is swap idiom you can use if you don't want copies. It's been
with the STL since the beginning:

  std::string my_long_serialized_string = "...";
  swap(response.content, my_long_serialized_string); // look ma, no copy

>
>> Erik wrote:
>>>> The response can delete it whenever it's done with doing whatever it
>>> does.
>>>
>>
>> That's bad design. :D
>>
>
> That's what it happens in the current implementation...

Nope.

> the only difference is that it is deleting a copy (that's in the body) instead of the original.

Again, nope. The response object lives as long as the connection is
alive. Nothing gets deleted until all shared references to the
connection go out of scope.

> The issue here is that the lifetime of the response object is longer than the lifetime of the handler function.  The solution is going to be to have the response destroy the body memory, either in std::string form (like now) or in original-object form.  Either way the response is taking ownership of the memory that it is sending down the wire.
>

So how do you intend to support request pipelining on an
implementation as you propose? As it is at the moment there is no
guarantee as to *when* an object will be deleted. This is a
non-deterministic code path. The last thing I want to do is introduce
memory management code in the network implementation when that stuff
should be handled automatically.

There's no *easy* way to achieve what you want to achieve with the
synchronous server implementation. There's a reason why I had to write
an asynchronous server implementation precisely for serving huge
amounts of data, supporting a streaming use case, etc.

>>
>> Unfortunately, that's not much better than saying:
>>
>>   std::ifstream f(...);
>>   response.content.reserve(file_size);
>>   f.read(response.content.data(), file_size);
>>
>> Am I missing something?
>
> I think this example illustrates my point about the trickiness of the interface.  There are two reasons this won't work.
>
> 1.  response.content.data() returns const char*, right?  You can't write into that.

You're right. D'oh.

> 2.  After this is run, response.content.size() will always be zero, right?  And nothing will be sent?
>

Yeah. *facepalm*

> You probably wrote these bugs because you're up too late. :)  Still, they show the brittleness of using a string like this.
>

Or probably because I wasn't thinking straight generally. :P

The better way is here:

  std::ifstream input(...);
  response.content.reserve(file_size);
  std::copy(istreambuf_iterator<char>(input.rdbuf()),
istreambuf_iterator<char>(),
    std::back_inserter(response.content));

/me <3 the STL sometimes. ;)

>>
>> For data that's already in memory I get the utility of being able to
>> refer to the bytes directly. Unfortunately making the library
>> deallocate memory that the user allocated is, quite bluntly, bad form
>> and error prone.
>>
>> Makes sense?
>>
>
> I'd say that passing ownership of an object to a library is neither bad form nor error prone.  Sutter has a good writeup of using auto_ptr effectively in 'Exceptional C++' that seems applicable.  Transferring object ownership with auto_ptr is safe, efficient, and idiomatic.
>

Eh? Are you seriously suggesting to me that passing a raw pointer up
to a calling function and expecting that calling function to manage
the memory for you is good form?

Remember, the response object is created before it's even called --
the implementation *should not be responsible for managing the memory
of the extension (in which case this is the handler)*. That is what I
refer to as bad form along with the ugliness that is auto_ptr.

> Anyway, you're putting the sweat equity into cpp-netlib, and are closer to the project.  My only original observation was 'The same question will come up again and again because it's a surprising interface'.  It clearly can work as it is, but it is surprising.  That's why he didn't figure it out by himself.
>

And the answer will be the same, turn your data into a string and
stuff that into the response.content (at least for 0.8/0.8.1). With
0.9-devel there should be a better way, but nothing like what's
already being suggested. I'm considering adding an optional completion
callback to the response type, to allow for "chaining writes" even in
the synchronous interface. I'm leaning towards the one I described
earlier with a variant solution:

  variant<
    string_type,
    tuple<
      boost::asio::const_buffer,
      function<
        void(boost::asio::const_buffer, boost::system::error_code)
      >
    >
  > content;

This means you can either assign a string to it, or assign a tuple of
a boost::asio::const_buffer and a completion handler that takes care
of the data in the buffer, and knows what to do with the error code
passed in. It's a lot more complicated I know, but that's the only way
I can think of to allow the synchronous server to deal with the case
where you would like to chain writes instead of sending a big thunk of
data (also a good way to enforce chunked encoding on the synchronous
interface). This is a lot more scalable too, now I just have to think
about whether there's a good way to do this without breaking too much
code that's using the 0.8-line idiom.

HTH

-- 
Dean Michael Berris
about.me/deanberris