Re: [Yaml-core] Re: YAML Implementations a Plenty

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, Jul 16, 2001 at 12:53:19PM +0200, Oren Ben-Kiki wrote:
| > The trick requires that the users of the binary
| > value know that their value is binary (even though
| > it may appear as unicode text).  But if the user knows 
| > this, and sets/gets a given value using the binary 
| > interface, then there isn't a round-tripping problem. 
| 
| So what you seem to be saying is that it isn't the
| information model which says whether a value is binary
| or text, it is the application which decides. But in
| this case, there's no need to provide a special
| syntax/model/API for binary values. A trivial wrapper
| around the incremental read/write methods, applied at
| the discretion of the application, would do the trick
| just as well.

Not quite. I'm saying that you can consider all scalars 
binary by fixing on an encoding of UTF8 for unicode scalars.
Thus, given a scalar you could have two methods:

   asBytes()  -- The scalar's binary value (UTF8 if string)
   asString() -- The scalar's native unicode value

asBytes() always works and round-trips without a problem
independent of the output encoding.  asString() on the
other hand will only return a string if it is a valid
unicode object (i.e. it contains only valid code points).

As for the flag in the information model.  With random
access one could always scan the input to see if it
contains any invalid unicode code points.  So, it is
a derived flag; however, for sequential access it is
helpful for this flag to be cashed.

| Again, this is the same problem as being forced to 
| pass strings instead of integer values. Which is probably 
| a much more common problem then binary values.

Good point, but this is what class is for; perhaps
the "C" API needs to have one more growth spurt to
natively understand class.  (A registered class 
can have a reader/writer pair from the byte stream
to provide one more function:

   asNative()

This begs the question if Unicode string is a 
class... from the information model perspective,
and UTF8, UTF16LE, UTF16BE, etc., are just
transfer encodings like BASE64.

| Seems good enough for me. Of course,
| 
|   value: [!gif %base64] <value>
| 
| Seems even better :-)

Hmm.  So the % color is the transfer encoding.
I think you've shown with this example why
"base64" isn't a class... as it is a transfer
encoding.  Why not just build-in one binary
transfer encoding -- base64 -- and be done
with it?

   value: !image/gif 
       [R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAO
        fn515eXvPz7Y6OjuDg 4J+fn5OTk6enp56enmlp
        aWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f
        /++f/++f/++f/++f/+ +f/++f/++f/++SH+Dk1h
        ZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjo
        EwnuNAFOhpEMTRiggc z4BNJHrv/zCFcLiwMWYN
        G84BwwEeECcgggoBADs=]

Best,

Clark