From: Dean M. B. <mik...@gm...> - 2011-02-01 18:20:22
|
On Wed, Feb 2, 2011 at 1:04 AM, Nelson, Erik - 2 <eri...@ba...> wrote: >> From: Dean Michael Berris wrote on Tuesday, February 01, 2011 9:00 AM >>> This question will continue to come up. It's true that std::string >>> *can* hold binary data. It's just not *natural*. The std::string >>> interface is strongly influenced by null-terminated strings. Things >>> that the interface implies (like string.c_str() or string(char*) >>> constructor actually do what you'd think) are broken if you use >>> std::string with binary data. >> >> I know. ;) >> >> Except a lot of times though, the HTTP protocol largely deals with >> *text transfer*. Also it's not as simple as it sounds. See below. >> > > The fact that it's a common use case (I'd suggest that actually much more binary image data is transmitted than text) means that the library should easily accommodate text transfer. I think this goes back to an underlying assumption that you expressed some time ago- iirc, you believed that HTTP could *only* transmit text, and that binary data needed to be base64'd into text before transmission. That assumption may still be informing your design choices here, but the very suggestion of copying binary data into a string should (in my opinion) is a red flag that there's something wrong with the interface. > Well, the problem with sending binary data as 7-bit clear over the network has been documented extensively over the Internet. The spec clearly states that you have to be transferring data safe to transfer over 7-bit transfers encoding -- meaning that's technically ASCII text. It is an accident that images are being sent in the clear as binary data, and if you notice in history this is largely why people (browser developers and server developers) had to agree that they would just take whatever was sent over the wire in the body and just have the MIME identifiers there. The design of the library has actually nothing to do with whether I think text should be the only way transmitted over HTTP. If you also notice the type of the string is parametric to the tag type used. It makes it *easy* to just use std::string. I could very well be implementing a chained-block-data-structure for the underlying message storage and manipulate those directly and expose ranges for the accessor/wrappers (like how ACE does it) but that's too much work to do at the moment -- patches to implement this would be most welcome. ;) At any rate, the reason why it's technically better to send things via HTTP using Base64 encoding is really just so that you're OK as far as the spec goes. This avoids all the endianness issues you might encounter on the other end (although Boost.Asio should be dealing with that issue for us). It's also largely a matter of convenience -- it's perfectly *fine* to put binary data in an std::string or an std::vector. > Having an overload that makes std::string usage natural (like it is now) is good thing. Forcing someone to copy memory regions into a std::string is a bad thing. > The reason the copy is forced is for simplicity of the implementation. Again if you wanted to use a no-copy or single-copy interface, use the asynchronous server implementation *today*. ;) >> >> Yet another way is to put the burden of managing the memory of the >> data to be written out by the server, by providing an optional >> callback function when providing the content. So it would then be a >> variant between a string and a tuple<void*, size_t, >> function<void(void*)> >. All of these options makes the synchronous >> handler's implementation needlessly complex. > > My original point was only about the inelegance of using std::string. The point you're raising here is a different one, and all you're saying is that the lifetime of the payload needs to be roughly the same as the lifetime of the result object. That's not a terribly complicated- problem, and there are lots of solutions to it. It seems that one that's easy for the user is to just hand off ownership to the response, something like > See, the fact that I'm even using std::string is already something I detest (read the thread about [string]proposal on the Boost ML ;) ) -- but at the moment it is the most sane and simple thing to do lacking a proper efficient segmented data storage mechanism around (no std::deque has its own issues, and ptr_list<array<T,N> > is too "esoteric"). Forcing people to deal with std::vector<char> is just unnecessary when std::string is much more familiar however ugly. Of course nothing's stopping you or anyone to create a different tag type that defines string<Tag>::type as std::vector<char> or anything for that matter. ;) The idea is really, to make the hard thing simple to do -- imagine having to implement your own HTTP server, and you might think whether you're making a copy of the data one extra time is *not* that big of a deal. Of course I'd like to improve the library, it's just that, well, if only more people submitted pull requests and actually addressed it, we might have a better library to use sooner than later. :P > > auto_ptr<MyObject> obj(new MyObject); > response << body(obj); > Uh oh, this is dangerous because the user can use the obj right after the data is passed to the response. > should be sufficient for POD types, and for non-POD types, maybe you'd need something like > > auto_ptr<vector<char> > obj(new vector<char>()); > response << body(&(*obj)[0], obj->size(), obj); > And this is just ugly. ;) > The response can delete it whenever it's done with doing whatever it does. > That's bad design. :D > That seems to me to be pretty easy and intuitive for the user. > Unfortunately, that's not much better than saying: std::ifstream f(...); response.content.reserve(file_size); f.read(response.content.data(), file_size); Am I missing something? For data that's already in memory I get the utility of being able to refer to the bytes directly. Unfortunately making the library deallocate memory that the user allocated is, quite bluntly, bad form and error prone. Makes sense? BTW I should really be asleep right now, but I just couldn't wait until I'm awake post my side of the story as to why things are as they are for the synchronous server. For the asynchronous server, this is largely a non-issue. ;) -- Dean Michael Berris about.me/deanberris |