From: Clark C . E. <cc...@cl...> - 2001-07-16 09:57:27
|
On Mon, Jul 16, 2001 at 10:57:29AM +0200, Oren Ben-Kiki wrote: | Clark C . Evans [mailto:cc...@cl...] wrote: | > I think I'd like to build in the BASE64 encoding | > into the core as well, using the [BASE64] mechanism | > per the proposal before coloring. Given the ability | > to strongly support this at the API level, I'm | > in favor. | | Hmmm. You've mentioned this before, and I'm not 100% certain we do actually | have a way to resolve all the issues. Obviously there's no problem with Java | and C. But are we certain that there's no problem with Perl, Python and | JavaScript? The trick requires that the users of the binary value know that their value is binary (even though it may appear as unicode text). But if the user knows this, and sets/gets a given value using the binary interface, then there isn't a round-tripping problem. The binary interface need not always used Base64 encoding... it would only use Base64 encoding if there were illegal UTF8 sequences. Why? Beacuse our problem occured when a binary value was mistaken as unicode, saved as UTF-16 and then transcoded to UTF-8, destroying the data despite what the user had expected. This problem can be fixed by simply dictating that any unicode string can be treated as a binary value by using it's UTF8 encoding. Thus, by fixing the encoding for binary transfer at the API level... things are set. Now the YAML file can be UTF-16LE, UTF-16BE, UTF-32*, UTF-8, or any other darn encoding. No problems! This works in the YAR case, if the file is marked UTF-16 or UTF-32 or UTF-8 via a BOM then it is loaded entirely as unicode. Otherwise it is treated as UTF-8. If any illegal codes appear, then the file would have to be base64 encoded. | Another issue: I think that being "binary" is just an example of a class | (together with int, real, date, etc.). Instead of providing an ad-hoc | solution for it, I'd rather solve the class issue in general and use | binary/int/real/date as test cases that the solution actually works. It's harder to argue on this level. It certainly complicates the sequential API (a bit). However... without this complication, one would have to pass around BASE64 encoded strings rather than the actual binary value. And I'm not sure if this is acceptable. You may want to look at the C API for details. Your feedback would be helpful. | I see you came around to the idea of using '!' as a shorthand and apply it | only for maps and scalars. We've discussed this idea already. I agree with | your point that this works pretty well for the incremental API. But there | are still issues with the native API. Well, given that class seems to be an exception, I'm contemplating not using color for it. After all, the native APIs won't use color for the class construct, right? | The problem scenario ("Schema Evolution") is of an application using the | native API to read a file where what used to be a scalar value has been | replaced with an instance of a class the application isn't aware of. | | It turns out that when using the native API the application code would | break. To make it robust one would have to use the v/w operations, something | most people won't normally do. | | Are we OK with this? Nothing we can do about this, it's a limitation of using the native API. | Another issue with the ! shorthand as it stands today. There is a | distinction between the in-memory class of the object and the format used to | serialize it. A good example is the "date" class. [Option 1: force a single format] [Option 2: use derived classes] [Option 3: add a format color] | 4. Generalize the shorthand mechanism. Instead of making '!' | a unique shorthand character, we could allow for a slightly | more general mechanism: | | key: [shorthand] value | | For example: | | date1: [!date %iso] 2003-02-01 | | is equivalent to: | | date1: % | !: date | %: iso | =: 2003-02-01 Interesting. I need to think about this a bit more, but I think now it might be better to add class to the information model and don't have it as a color. *sigh* | It seems to me that (1) and its variants (2)/(3) are probably enough for | Brian's Data::Denter applications. I'm less convinced it is good enough for | YAML as a whole. If (1) isn't enough, then (4) seems like a harmless | syntactical trick which would solve the problem in the more general case, at | little or no cost for Data::Denter type applications. Hmm. Yep [1] or [4]. Clark |