From: Clark C . E. <cc...@cl...> - 2001-07-16 16:35:48
|
On Mon, Jul 16, 2001 at 12:53:19PM +0200, Oren Ben-Kiki wrote: | > The trick requires that the users of the binary | > value know that their value is binary (even though | > it may appear as unicode text). But if the user knows | > this, and sets/gets a given value using the binary | > interface, then there isn't a round-tripping problem. | | So what you seem to be saying is that it isn't the | information model which says whether a value is binary | or text, it is the application which decides. But in | this case, there's no need to provide a special | syntax/model/API for binary values. A trivial wrapper | around the incremental read/write methods, applied at | the discretion of the application, would do the trick | just as well. Not quite. I'm saying that you can consider all scalars binary by fixing on an encoding of UTF8 for unicode scalars. Thus, given a scalar you could have two methods: asBytes() -- The scalar's binary value (UTF8 if string) asString() -- The scalar's native unicode value asBytes() always works and round-trips without a problem independent of the output encoding. asString() on the other hand will only return a string if it is a valid unicode object (i.e. it contains only valid code points). As for the flag in the information model. With random access one could always scan the input to see if it contains any invalid unicode code points. So, it is a derived flag; however, for sequential access it is helpful for this flag to be cashed. | Again, this is the same problem as being forced to | pass strings instead of integer values. Which is probably | a much more common problem then binary values. Good point, but this is what class is for; perhaps the "C" API needs to have one more growth spurt to natively understand class. (A registered class can have a reader/writer pair from the byte stream to provide one more function: asNative() This begs the question if Unicode string is a class... from the information model perspective, and UTF8, UTF16LE, UTF16BE, etc., are just transfer encodings like BASE64. | Seems good enough for me. Of course, | | value: [!gif %base64] <value> | | Seems even better :-) Hmm. So the % color is the transfer encoding. I think you've shown with this example why "base64" isn't a class... as it is a transfer encoding. Why not just build-in one binary transfer encoding -- base64 -- and be done with it? value: !image/gif [R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAO fn515eXvPz7Y6OjuDg 4J+fn5OTk6enp56enmlp aWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f /++f/++f/++f/++f/+ +f/++f/++f/++SH+Dk1h ZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLCAgjo EwnuNAFOhpEMTRiggc z4BNJHrv/zCFcLiwMWYN G84BwwEeECcgggoBADs=] Best, Clark |