From: Dean M. B. <mik...@gm...> - 2010-10-20 12:40:17
|
Hi Erik, I apologize for not responding fast enough to this message. Google insists that email from you is spam. :( On Tue, Oct 19, 2010 at 11:55 PM, Nelson, Erik - 2 <eri...@ba...> wrote: > >>Dean Michael Berris wrote on Tuesday, October 19, 2010 11:26 AM >> >> Actually, Asio does. It requires that you give it a set of buffers -- >> and std::string objects qualify as good buffers -- to write out to the >> sockets. > > The asio buffers can't be strings, right? They can be. > As I understand it, asio > buffers are simply pointers into contiguous memory regions, and > std::string is only one of many types that hold contiguous memory > regions (and is in no way preferred). > http://bit.ly/dubjW5 > Yup, my point wasn't that they're required to be std::string. Heck, we can use C strings but they're much more cumbersome to deal with and are just much uglier than std::string. > > std::strings are actually terrible buffers if you're manipulating > binary objects, at least in our project. Hmmm... why? Binary buffers are just thunks of memory, and you can always get the .data() of std::string objects. You can use a Boost.Array and you'd have a good enough statically sized buffer with iterator semantics, but that's not what the message objects are meant to model. Also, it's possible to implement a basic_message<> specialization that uses an internal linked-list of Boost.Array's but, well nobody's asked for that yet. Actually, it's even possible to make the body be just a range as I explained in the previous email. > If we have a big memory > region that's full of POD and want to send it across the wire, > making a std::string out of it means *yet another* copy. That's why > the asio buffer interface is in no way std::string-centric. > Sure, but Asio requires that everything you want to send *is* in memory. Whether it's a string or just a range, well as far as Asio is concerned it'd have to be in memory to be sent. cpp-netlib is built to make simple things easy really. ;) >> Also, if you have an std::string implementation that implements >> copy-on-write semantics, then you don't pay for a 50GB copy because a >> copy is pretty much just a single pointer copy anyway. Most, if not >> all implementations of std::string do the COW optimization precisely >> because of the overhead of copying strings around even if you don't >> modify them. >> > > I'm unconvinced that COW strings can be depended on as a central > library feature- a quick Google seems to indicate otherwise. > > http://bit.ly/hiqqU > http://bit.ly/cqpKVN > cpp-netlib doesn't depend on COW, but most implementations do implement std::string with COW optimizations. This means, GNU's libstdc++ has a COW string, Dinkumware (now Microsoft's STL) implements it as well. Worrying about copies only matters if it's really affecting the performance of the application. If you really need zero-copy messages, that would require that you pass in pointers to the data, and make sure that they're "live" when cpp-netlib starts requiring the data. At this point, it is entirely possible to optimize the cpp-netlib to use messages that use a single buffer, with just ranges of iterators defining which parts of the buffer is the source, destination, etc. but managing these buffers and the ranges are quite a challenge in themselves. > I'm not sure I fully buy into the 'the-optimizer-will-cover-it-up' > argument, either. The optimizer covers up very little in debug > mode and a program that can't be well exercised in debug mode is > hard to... debug. The speed will be slower, but having the > memory footprint explode just due to the networking library is > a bitter pill to swallow. > There's no "the optimizer will cover it up" here. It's just that most std::string implementations (that I know) use COW so there's nothing to worry about there. Now if you use a different string implementation, then maybe you will need to avoid it (not sure if Apache's STL implements COW for strings) or you'd implement your own message type that works well with cpp-netlib. :) You have two choices here: 1) Implement it yourself, use the concepts and make a "better" message type, and send a pull request. :) 2) Wait until I (or someone else) gets around to fixing that part of the implementation. :) >> At any rate, the next version will have a message type that supports >> ranges for bodies. This allows for a means of making the body of a >> request a Boost.Range compatible range -- so an input_iterator range >> would work fine in that version. In that case data will be pulled from >> input iterator ranges, and cpp-netlib can conserve the buffers as they >> come. >> > > That sounds great! When you say that cpp-netlib will 'conserve' the > ranges/buffers, what does that mean? > That means, only when data is going to be sent will data be "linearized" to a fixed-size buffer that gets re-used every time a write is going to be attempted. Asio has the async_read and async_write functions that take non-mutable buffers, and at this time ranges don't count as buffers. This is going to be helpful in the case when files are going to be served, and the possibility of having mmap'ed files/buffers can be expressed as a range of iterators. The data can even be non-linear, it can be a range of joined iterators, it can be a range of function input iterators, but the data will be linearized still to a fixed-size buffer for Asio's consumption. Until Asio has a way of dealing with Boost.Ranges on its own, the linearization will have to be done by cpp-netlib. At the worst case, there will be a fixed overhead which will be caused by a buffer for each connection. HTH (Also, I hope Google doesn't treat your mail as Spam again) -- Dean Michael Berris deanberris.com |