From: Dean M. B. <mik...@gm...> - 2011-02-01 21:19:44
|
On Wed, Feb 2, 2011 at 3:24 AM, Nelson, Erik - 2 <eri...@ba...> wrote: >> Dean Michael Berris wrote on Tuesday, February 01, 2011 1:20 PM >> >> Well, the problem with sending binary data as 7-bit clear over the >> network has been documented extensively over the Internet. The spec >> clearly states that you have to be transferring data safe to transfer >> over 7-bit transfers encoding -- meaning that's technically ASCII >> text. It is an accident that images are being sent in the clear as >> binary data, and if you notice in history this is largely why people >> (browser developers and server developers) had to agree that they >> would just take whatever was sent over the wire in the body and just >> have the MIME identifiers there. >> > > Either way you're sending binary in these examples... the only issue was whether that binary should be copied into a std::string > Why is copying into a string a problem in the first place? >> >> See, the fact that I'm even using std::string is already something I >> detest (read the thread about [string]proposal on the Boost ML ;) ) -- > > I've been reading the (extensive) [string] proposal on boost. :) > Ah, so you know the reason why I want a better string is so that I can use it in cpp-netlib, no? :D >> but at the moment it is the most sane and simple thing to do lacking a >> proper efficient segmented data storage mechanism around (no >> std::deque has its own issues, and ptr_list<array<T,N> > is too >> "esoteric"). Forcing people to deal with std::vector<char> is just >> unnecessary when std::string is much more familiar however ugly. > > I'm not proposing vector<char>- the foundation of my point is the library *should not* make the container choice for the user. There is no choice that works well for everyone. > Right, and it has never been a choice made by the library. It just so happened that the default is std::string for the string. Anybody can extend the tag dispatch mechanisms and create their own tags that define a different string<Tag>::type to yield something other than std::string. Actually if I remember correctly, there is a body<Tag>::type metafunction that I was using to determine what the type of the body should be. Basically these extension points are still undocumented at this point and I'm thinking about how to make configuring the types easier maybe with macro's instead of forcing people who want to extend the library to learn all the traits and metafunctions that depend on the tag dispatch. ;) >> Erik wrote: >>> >>> auto_ptr<MyObject> obj(new MyObject); >>> response << body(obj); >>> > >> Uh oh, this is dangerous because the user can use the obj right after >> the data is passed to the response. > > Nope- the auto_ptr will be cleared when the ownership transfers to the response, right? > Well, after the `response << body(obj)` line, obj is still accessible to the user. It's not guaranteed that the user will do this line at the end of the handler body. Also, it's not safe to take an arbitrary void* as well and write that out to the network. That's asking for trouble not only with the protocol but with C++ in general. Also, auto_ptr is going away for good reason -- it's an awfully non-value-semantics playing type. >> Erik wrote: >>> should be sufficient for POD types, and for non-POD types, maybe >>> you'd need something like >>> >>> auto_ptr<vector<char> > obj(new vector<char>()); >>> response << body(&(*obj)[0], obj->size(), obj); >>> >> >> And this is just ugly. ;) > > A little... think of it from a user standpoint- how would you send an arbitrary object in the response body? You can't, you need to linearize it to a container that is suitable for network transfer. That's the whole point of the interface defining the types -- I'm not about to go back to the days of C where I'll just treat a thunk of memory as data that I can safely use in the library. See, the point of the interface is to make things *easy*. There's absolutely no point in allowing a pointer to an arbitrary type in memory to be serialized out to a socket. There's 0 utility in that. Remember also that we're talking about the *synchronous* implementation. This means there's defined semantics for what a response type is. If you need to be sending arbitrarily large data in memory, *don't use the synchronous interface*. Look at what the fileserver example in 0.9-devel does -- which is actually portable to 0.8.1 IIRC. > Can you write anything cleaner or more intuitive than > > response << body(void*, length) ? The way you're describing makes you write > > response.content.resize(length); > memcpy((void*)response.content.data(), &object, length); > > That looks less ugly? More intuitive? > Well, there are a few reasons why your suggestion is ugly: 1) You're passing in a naked pointer. That's C'ish of you to suggest. ;) 2) Also, it's largely bad form to try to take a void* when you can get the type of the object directly. What I aim to do as a library designer responsible for the interface and the implementation is to enforce the safety of the interface and the operations which I use. 3) There's absolutely 0 reason for you not to use the string when the semantics of the handler is to craft a string response. If you want to send data, you need to send a string in. Even if it is binary data. There is swap idiom you can use if you don't want copies. It's been with the STL since the beginning: std::string my_long_serialized_string = "..."; swap(response.content, my_long_serialized_string); // look ma, no copy > >> Erik wrote: >>>> The response can delete it whenever it's done with doing whatever it >>> does. >>> >> >> That's bad design. :D >> > > That's what it happens in the current implementation... Nope. > the only difference is that it is deleting a copy (that's in the body) instead of the original. Again, nope. The response object lives as long as the connection is alive. Nothing gets deleted until all shared references to the connection go out of scope. > The issue here is that the lifetime of the response object is longer than the lifetime of the handler function. The solution is going to be to have the response destroy the body memory, either in std::string form (like now) or in original-object form. Either way the response is taking ownership of the memory that it is sending down the wire. > So how do you intend to support request pipelining on an implementation as you propose? As it is at the moment there is no guarantee as to *when* an object will be deleted. This is a non-deterministic code path. The last thing I want to do is introduce memory management code in the network implementation when that stuff should be handled automatically. There's no *easy* way to achieve what you want to achieve with the synchronous server implementation. There's a reason why I had to write an asynchronous server implementation precisely for serving huge amounts of data, supporting a streaming use case, etc. >> >> Unfortunately, that's not much better than saying: >> >> std::ifstream f(...); >> response.content.reserve(file_size); >> f.read(response.content.data(), file_size); >> >> Am I missing something? > > I think this example illustrates my point about the trickiness of the interface. There are two reasons this won't work. > > 1. response.content.data() returns const char*, right? You can't write into that. You're right. D'oh. > 2. After this is run, response.content.size() will always be zero, right? And nothing will be sent? > Yeah. *facepalm* > You probably wrote these bugs because you're up too late. :) Still, they show the brittleness of using a string like this. > Or probably because I wasn't thinking straight generally. :P The better way is here: std::ifstream input(...); response.content.reserve(file_size); std::copy(istreambuf_iterator<char>(input.rdbuf()), istreambuf_iterator<char>(), std::back_inserter(response.content)); /me <3 the STL sometimes. ;) >> >> For data that's already in memory I get the utility of being able to >> refer to the bytes directly. Unfortunately making the library >> deallocate memory that the user allocated is, quite bluntly, bad form >> and error prone. >> >> Makes sense? >> > > I'd say that passing ownership of an object to a library is neither bad form nor error prone. Sutter has a good writeup of using auto_ptr effectively in 'Exceptional C++' that seems applicable. Transferring object ownership with auto_ptr is safe, efficient, and idiomatic. > Eh? Are you seriously suggesting to me that passing a raw pointer up to a calling function and expecting that calling function to manage the memory for you is good form? Remember, the response object is created before it's even called -- the implementation *should not be responsible for managing the memory of the extension (in which case this is the handler)*. That is what I refer to as bad form along with the ugliness that is auto_ptr. > Anyway, you're putting the sweat equity into cpp-netlib, and are closer to the project. My only original observation was 'The same question will come up again and again because it's a surprising interface'. It clearly can work as it is, but it is surprising. That's why he didn't figure it out by himself. > And the answer will be the same, turn your data into a string and stuff that into the response.content (at least for 0.8/0.8.1). With 0.9-devel there should be a better way, but nothing like what's already being suggested. I'm considering adding an optional completion callback to the response type, to allow for "chaining writes" even in the synchronous interface. I'm leaning towards the one I described earlier with a variant solution: variant< string_type, tuple< boost::asio::const_buffer, function< void(boost::asio::const_buffer, boost::system::error_code) > > > content; This means you can either assign a string to it, or assign a tuple of a boost::asio::const_buffer and a completion handler that takes care of the data in the buffer, and knows what to do with the error code passed in. It's a lot more complicated I know, but that's the only way I can think of to allow the synchronous server to deal with the case where you would like to chain writes instead of sending a big thunk of data (also a good way to enforce chunked encoding on the synchronous interface). This is a lot more scalable too, now I just have to think about whether there's a good way to do this without breaking too much code that's using the 0.8-line idiom. HTH -- Dean Michael Berris about.me/deanberris |