Re: [cpp-netlib-devel] FW: body copies on http::client::post

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, Oct 20, 2010 at 11:26 PM, Nelson, Erik - 2
<eri...@ba...> wrote:
>>Dean Michael Berris wrote on Wednesday, October 20, 2010 8:40 AM
>
>>> Erik Nelson wrote:
>>> std::strings are actually terrible buffers if you're manipulating
>>> binary objects, at least in our project.
>>
>> Hmmm... why? Binary buffers are just thunks of memory, and you can
>> always get the .data() of std::string objects.
>>
>
> Think about what you'd have to do to store and manipulate an array
> of doubles in a string, referencing it through the .data() member.
> vector<double> seems like a much more natural fit
>

You can do that, you just have to re-interpret the pointer from a char
* to a double *. Alignment issues aside, this is possible to do.

Besides, you can't send a vector<double> through a socket, you need a
char* so you're going to reinterpret the other way around.

>>> If we have a big memory
>>> region that's full of POD and want to send it across the wire,
>>> making a std::string out of it means *yet another* copy.  That's why
>>> the asio buffer interface is in no way std::string-centric.
>>
>> Sure, but Asio requires that everything you want to send *is* in
>> memory. Whether it's a string or just a range, well as far as Asio is
>> concerned it'd have to be in memory to be sent. cpp-netlib is built to
>> make simple things easy really. ;)
>
> It's fine to have it in memory... just not copies of it.
> Simple things *should* be easy.  I'm not sure that means there needs
> to be multiple copies made of the payload.
>

The easiest thing that can possibly work is to make copies of the data
to help with threading (as already mentioned before) and to delineate
where the application's data ends and where cpp-netlib's data starts.

>>
>> If you really need zero-copy messages, that would require that you
>> pass in pointers to the data, and make sure that they're "live" when
>> cpp-netlib starts requiring the data.
>
> That's how asio works, right?  The buffers are pointers, and the app
> needs to make sure they have not been invalidated;
>

Asio needs a char * buffer last I checked. Actually, it's generic on
the buffer types, but there is a concept of a ConstBuffer in Asio that
requires certain things.

>>
>> You have two choices here:
>>
>> 1) Implement it yourself, use the concepts and make a "better" message
>> type, and send a pull request. :)
>>
>> 2) Wait until I (or someone else) gets around to fixing that part of
>> the implementation. :)
>>
>
> Understood, just wanted to throw the issue into the design discussion.
>

1 is easy to do as it is now, it's even possible to make the request
objects just POD types -- which I intend to do also -- to allow for
use cases where they're just data holders.

>>>
>>> That sounds great!  When you say that cpp-netlib will 'conserve' the
>>> ranges/buffers, what does that mean?
>>>
>
>> That means, only when data is going to be sent will data be
>> "linearized" to a fixed-size buffer that gets re-used every time a
>> write is going to be attempted. Asio has the async_read and
>> async_write functions that take non-mutable buffers, and at this time
>> ranges don't count as buffers.
>>
>> This is going to be helpful in the case when files are going to be
>> served, and the possibility of having mmap'ed files/buffers can be
>> expressed as a range of iterators. The data can even be non-linear, it
>> can be a range of joined iterators, it can be a range of function
>> input iterators, but the data will be linearized still to a fixed-size
>> buffer for Asio's consumption.
>>
>
> Does asio require linearized data?
>

Yep, it needs the data to be a contiguous thunk of memory.

>> Until Asio has a way of dealing with Boost.Ranges on its own, the
>> linearization will have to be done by cpp-netlib. At the worst case,
>> there will be a fixed overhead which will be caused by a buffer for
>> each connection.
>>
>
> I think we might just disagree on this point, but, in my view, this
> linearization is a core problem since it will cause a complete
> memory copy of anything that is sent (unless I'm missing something).

Well, unless you just keep re-using a buffer of 4kb over and over --
then the overhead will be a fixed 4kb of memory. It doesn't matter how
large your data range is, it just matters that data is sent through
the socket 4kb at a time. This means you only ever linearize 4kb of
memory every time.

> That's fine if you're serving up 1 MB of something, but a problem
> if you're serving up 50 GB of something.  This design will preclude
> a typical Windows 32-bit program from serving files larger than
> 1 or 2 GB, right?
>

Nope, this won't. Like I point out above, writing just 4kb of data (a
default page's size in Linux) at a time will only linearize 4kb of
data for every write. You can serve however large file you want.

Also, the server will not have this problem in 0.8 because I will
allow for handlers to do the writing asynchronously -- so the handlers
can figure out how much data it wants to serve as chunks at a time,
and they get a chance to write it out on their own.

On the client side, having a message type that is a POD and supports
ranges allows for more demanding applications (that serve huge amounts
of data) to supply ranges instead, which the client will linearize
into fixed-sized buffers to be written for every write.

> Is Boost.Range on the roadmap for asio?
>

I don't think so, no. It's trivial to linearize ranges into a
Boost.Array anyway, and hand Boost.Asio the Boost.Array pointer and
the size of the data in the Boost.Array to write out to the socket.

-- 
Dean Michael Berris
deanberris.com