Re: [cpp-netlib-devel] HTTP Synchronous server reply methods?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Feb 2, 2011 at 1:04 AM, Nelson, Erik - 2
<eri...@ba...> wrote:
>> From: Dean Michael Berris wrote on Tuesday, February 01, 2011 9:00 AM
>>> This question will continue to come up.  It's true that std::string
>>> *can* hold binary data.  It's just not *natural*.  The std::string
>>> interface is strongly influenced by null-terminated strings.  Things
>>> that the interface implies (like string.c_str() or string(char*)
>>> constructor actually do what you'd think) are broken if you use
>>> std::string with binary data.
>>
>> I know. ;)
>>
>> Except a lot of times though, the HTTP protocol largely deals with
>> *text transfer*. Also it's not as simple as it sounds. See below.
>>
>
> The fact that it's a common use case (I'd suggest that actually much more binary image data is transmitted than text) means that the library should easily accommodate text transfer.  I think this goes back to an underlying assumption that you expressed some time ago- iirc, you believed that HTTP could *only* transmit text, and that binary data needed to be base64'd into text before transmission.  That assumption may still be informing your design choices here, but the very suggestion of copying binary data into a string should (in my opinion) is a red flag that there's something wrong with the interface.
>

Well, the problem with sending binary data as 7-bit clear over the
network has been documented extensively over the Internet. The spec
clearly states that you have to be transferring data safe to transfer
over 7-bit transfers encoding -- meaning that's technically ASCII
text. It is an accident that images are being sent in the clear as
binary data, and if you notice in history this is largely why people
(browser developers and server developers) had to agree that they
would just take whatever was sent over the wire in the body and just
have the MIME identifiers there.

The design of the library has actually nothing to do with whether I
think text should be the only way transmitted over HTTP. If you also
notice the type of the string is parametric to the tag type used. It
makes it *easy* to just use std::string. I could very well be
implementing a chained-block-data-structure for the underlying message
storage and manipulate those directly and expose ranges for the
accessor/wrappers (like how ACE does it) but that's too much work to
do at the moment -- patches to implement this would be most welcome.
;)

At any rate, the reason why it's technically better to send things via
HTTP using Base64 encoding is really just so that you're OK as far as
the spec goes. This avoids all the endianness issues you might
encounter on the other end (although Boost.Asio should be dealing with
that issue for us). It's also largely a matter of convenience -- it's
perfectly *fine* to put binary data in an std::string or an
std::vector.

> Having an overload that makes std::string usage natural (like it is now) is good thing.  Forcing someone to copy memory regions into a std::string is a bad thing.
>

The reason the copy is forced is for simplicity of the implementation.
Again if you wanted to use a no-copy or single-copy interface, use the
asynchronous server implementation *today*. ;)

>>
>> Yet another way is to put the burden of managing the memory of the
>> data to be written out by the server, by providing an optional
>> callback function when providing the content. So it would then be a
>> variant between a string and a tuple<void*, size_t,
>> function<void(void*)> >. All of these options makes the synchronous
>> handler's implementation needlessly complex.
>
> My original point was only about the inelegance of using std::string.  The point you're raising here is a different one, and all you're saying is that the lifetime of the payload needs to be roughly the same as the lifetime of the result object.  That's not a terribly complicated- problem, and there are lots of solutions to it.  It seems that one that's easy for the user is to just hand off ownership to the response, something like
>

See, the fact that I'm even using std::string is already something I
detest (read the thread about [string]proposal on the Boost ML ;) ) --
but at the moment it is the most sane and simple thing to do lacking a
proper efficient segmented data storage mechanism around (no
std::deque has its own issues, and ptr_list<array<T,N> > is too
"esoteric"). Forcing people to deal with std::vector<char> is just
unnecessary when std::string is much more familiar however ugly.

Of course nothing's stopping you or anyone to create a different tag
type that defines string<Tag>::type as std::vector<char> or anything
for that matter. ;)

The idea is really, to make the hard thing simple to do -- imagine
having to implement your own HTTP server, and you might think whether
you're making a copy of the data one extra time is *not* that big of a
deal. Of course I'd like to improve the library, it's just that, well,
if only more people submitted pull requests and actually addressed it,
we might have a better library to use sooner than later. :P

>
> auto_ptr<MyObject> obj(new MyObject);
> response << body(obj);
>

Uh oh, this is dangerous because the user can use the obj right after
the data is passed to the response.

> should be sufficient for POD types, and for non-POD types, maybe you'd need something like
>
> auto_ptr<vector<char> > obj(new vector<char>());
> response << body(&(*obj)[0], obj->size(), obj);
>

And this is just ugly. ;)

> The response can delete it whenever it's done with doing whatever it does.
>

That's bad design. :D

> That seems to me to be pretty easy and intuitive for the user.
>

Unfortunately, that's not much better than saying:

  std::ifstream f(...);
  response.content.reserve(file_size);
  f.read(response.content.data(), file_size);

Am I missing something?

For data that's already in memory I get the utility of being able to
refer to the bytes directly. Unfortunately making the library
deallocate memory that the user allocated is, quite bluntly, bad form
and error prone.

Makes sense?

BTW I should really be asleep right now, but I just couldn't wait
until I'm awake post my side of the story as to why things are as they
are for the synchronous server. For the asynchronous server, this is
largely a non-issue. ;)

-- 
Dean Michael Berris
about.me/deanberris