On Fri, Sep 4, 2009 at 5:28 AM, Oren Ben-Kiki <oren@ben-kiki.org> wrote:
As for what is returned by the library, I agree that the only "safe"
choice is a byte array. However, the library _may_ choose to return
something else (such as a word array or even an "almost valid string")
depending on the platform, as long as it properly preserves the data.
 
It is also entirely possible for a library to expose options to the application about how to handle the new data type. Options that a library might expose include:

1) Always return it as a byte array (I think this should be the default and required behavior)

2) Throw exceptions/Raise errors if it can't convert it to a valid string (so if an application only wants to deal with strings, it can)

3) Convert if possible, otherwise return a byte array

For both 2 and 3, a library can expose different options about how to convert.

Is this something that the YAML spec needs to cover, though? It seems to me that this is a contract issue between the YAML processor (which is most likely in a library) and the application.

If the default/required behavior of the library is to return a language specific string when it can (option 3), then, in my opinion, it probably should not, by default, handle how invalid byte sequences are encoded into that string. In other words, the default behavior would be to only convert the bytes into a language specific string if the byte sequence is valid for that string type. A library should then expose options to change this behavior (with a tag handler being the most flexible choice).

The reason I say this is that the method chosen for conversion is going to be highly application specific. It will depend entirely upon that application's needs (most notably, whether or not it will retransmit the data). That is why I say it is largely an issue to be settled between the processor (most likely in a library) and the rest of the application..