|
From: Donal K. F. <don...@ma...> - 2008-05-03 13:25:48
|
TIP #317: EXTEND BINARY ENSEMBLE WITH BINARY ENCODINGS ======================================================== Version: $Revision: 1.1 $ Author: Pat Thoyts <patthoyts_at_users.sourceforge.net> State: Draft Type: Project Tcl-Version: 8.6 Vote: Pending Created: Saturday, 03 May 2008 URL: http://www.tcl.tk/cgi-bin/tct/tip/317.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== This TIP extends the *binary* command with implementations in C of commonly used binary encodings. In particular the /base64/ encoding is implemented but the Tcl ensemble scheme [TIP #112] can be used to provide simple extension of the implemented formats. SPECIFICATION =============== The *binary* command ensemble will be extended to include two new subcommands, *encode* and *decode*. Each subcommand will accept two arguments. The first is the name of an encoding format and the second is the data to be operated upon. *binary encode* /format data/ *binary decode* /format data/ In keeping with the nature of the *binary* command, the /data/ argument is treated as a byte array. This means that users should ensure their data is already in a suitable character encoding before applying a binary encoding. This is already a requirement for other implementations of this functionality (e.g. the tcllib and Trf packages). The initial set of binary encodings consists of *base64*, *uuencode* and *hex*. The implementation of the *encode* and *decode* subcommands will make use of the Tcl ensemble command mechanism ([TIP #112]) and will therefore be extensible via the ensemble mechanism. REFERENCE IMPLEMENTATION ========================== A patch against the Tcl HEAD (8.6) is located at <URL:http://sf.net/tracker/?func=detail&aid=1956530&group_id=10894&atid=310894> COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows |
|
From: Lars H. <Lar...@re...> - 2008-05-03 20:24:34
|
Donal K. Fellows skrev:
> TIP #317: EXTEND BINARY ENSEMBLE WITH BINARY ENCODINGS
[snip]
> This TIP extends the *binary* command with implementations in C of
> commonly used binary encodings. In particular the /base64/ encoding is
> implemented
The application domain for this feature is quite similar to that of
"filters" in PDF [1, Sec. 3.3], so it might be interesting to compare
the two.
PDF has two filters ASCIIHexDecode and ASCII85Decode similar to those
of this TIP; the first presumably is the same as the suggested hex
encoding, whereas the second is more comparable to base64/uuencode
(though using a set of 85 characters to encode 4 bytes using 5
character, rather than 3 bytes using 4 characters, so completely
incompatible).
PDF also has a large number of filters related to compression:
LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode, ...
Finally, there is a Crypt filter (which could just as well have been a
family of filters) which does decryption of data.
One point is thus that there are other uses for [binary encode] than
just ASCIIfying binary data, but that's fine, since
> the Tcl ensemble scheme [TIP #112] can be used to
> provide simple extension of the implemented formats.
A second point is however that the dictated syntax:
> SPECIFICATION
> ===============
>
> The *binary* command ensemble will be extended to include two new
> subcommands, *encode* and *decode*. Each subcommand will accept two
> arguments. The first is the name of an encoding format and the second
> is the data to be operated upon.
>
> *binary encode* /format data/
>
> *binary decode* /format data/
may be too simplistic. Many of the compression decoders have additional
parameters for the process (although these mainly concern the use of
predictors to improve compression of sampled image data). An encryption
decoder most definitely would need a parameter for the key. In general,
I think explicitly allowing
binary encode $format {*}$extraParameters $data
would be better (parameters before data simplifies creating aliases for
conversions with certain parameter values). All block compression
commands of TIP #234 (zlib compression) take an optional parameter.
A third point is that PDF filters can be composed; they can be
specified as a list (or in PDF terminology, an array) which will be
applied in sequence. The equivalent for [binary encode] would be that
instead of
binary encode $format $data
one would have the syntax
binary encode $formatList $data
However, I think this would be an unnecessary complication. We have
other means of composing operations. ;-)
Finally, there is of course the matter of whether these features like
base64 encoding really need to be in the core, but I abstain from
taking an opinion on that.
Lars Hellström
[1] PDF Reference, Fourth Edition, version 1.5;
see http://www.adobe.com/devnet/pdf/pdf_reference.html.
(This is not the newest version, but it's fairly nice,
and it's the last version available that doesn't require
an object stream capable reader.)
|
|
From: Kevin K. <ke...@ac...> - 2008-05-04 00:44:48
|
Lars Hellström wrote: > Finally, there is of course the matter of whether these features like > base64 encoding really need to be in the core, but I abstain from > taking an opinion on that. I would presume that the thing would be implemented as an extensible ensemble. No need to have everything in the Core (although base64 probably needs to be there, it's used in too many other places that the Core depends on). We don't need to have all the bells and whistles as long as we've provided a framework for extensions to implement them. -- 73 de ke9tv/2, Kevin |
|
From: Donal K. F. <don...@ma...> - 2008-05-04 07:31:49
|
Kevin Kenny wrote: > I would presume that the thing would be implemented as an extensible > ensemble. No need to have everything in the Core (although base64 > probably needs to be there, it's used in too many other places that > the Core depends on). We don't need to have all the bells and whistles > as long as we've provided a framework for extensions to implement them. Agreed. The way I read the TIP is that the formats are actually sub-sub-commands. Plugging in new ones will just be a matter of creating commands in the right namespace. (Probably.) Donal. |
|
From: Pat T. <pat...@us...> - 2008-05-05 09:35:26
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Donal K. Fellows wrote: | Kevin Kenny wrote: |> I would presume that the thing would be implemented as an extensible |> ensemble. No need to have everything in the Core (although base64 |> probably needs to be there, it's used in too many other places that |> the Core depends on). We don't need to have all the bells and whistles |> as long as we've provided a framework for extensions to implement them. | | Agreed. The way I read the TIP is that the formats are actually | sub-sub-commands. Plugging in new ones will just be a matter of creating | commands in the right namespace. (Probably.) Exactly. [binary encode] is a map ensemble with commands in tcl::binary::encode. Its just like [info] and all the other core ensembles. Pat Thoyts -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (MingW32) iQCVAwUBSB7U42B90JXwhOSJAQjm5wP+Pu03qHewruLnm5cZFa+W6TTdiE3WAL/z IbPACMR+siuzldh6oX44y0CmPPqTwaOLk99ocrS6Uce4CH51K2YlelBBFtgFy6Ec ouu19+Q5AljB8XGUOPJ3nxXDwA+k8ZhIxKYO15q6t2NlYOAMpqkRXn5EvIx+fFZW /oH56F9PaMU= =d1bw -----END PGP SIGNATURE----- |
|
From: Lars H. <Lar...@re...> - 2008-05-05 12:59:52
|
Pat Thoyts skrev: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Donal K. Fellows wrote: > | Agreed. The way I read the TIP is that the formats are actually > | sub-sub-commands. Plugging in new ones will just be a matter of creating > | commands in the right namespace. (Probably.) > > Exactly. [binary encode] is a map ensemble with commands in > tcl::binary::encode. The kind of ensemble is a detail which it would be valuable to have in the documentation. Extending a -map ensemble (as Pat hints at) is very different from extending an ensemble built from exported namespace commands (as Donal hints at). Coding a package that needs to extend an ensemble with fallback code to handle every kind of ensemble (there are at least three) is not fun. Pat Thoyts skrev: > | However, I think this would be an unnecessary complication. We have > | other means of composing operations. ;-) > > This would be better handled using channels I think. Strange that so many (I also got a mail off-list) should immediately jump to channels when confronted with the idea of composing [binary encode] formats. For the record, then: The composition method I hinted at was of course command substitution, as in: binary decode flate [binary decode base64 $data] Lars Hellström |
|
From: Pat T. <pat...@us...> - 2008-05-05 09:51:29
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Lars Hellström wrote:
[snip]
|> The *binary* command ensemble will be extended to include two new
|> subcommands, *encode* and *decode*. Each subcommand will accept two
|> arguments. The first is the name of an encoding format and the second
|> is the data to be operated upon.
|>
|> *binary encode* /format data/
|>
|> *binary decode* /format data/
|
| may be too simplistic. Many of the compression decoders have additional
| parameters for the process (although these mainly concern the use of
| predictors to improve compression of sampled image data). An encryption
| decoder most definitely would need a parameter for the key. In general,
| I think explicitly allowing
|
| binary encode $format {*}$extraParameters $data
|
| would be better (parameters before data simplifies creating aliases for
| conversions with certain parameter values). All block compression
| commands of TIP #234 (zlib compression) take an optional parameter.
I agree. I've just modified the tcllib base64 code to make use of this
and in order to support the -maxlen and -wrapchar parameters I have to
post-process the base64 encoded data using [string range] and [append]
and it increases the time by an order of magnitude
% source b64.tcl
% set s [binary format @102400]; string length $s
102400
% time {base64::encode -maxlen 0 $s} 10
1265.9 microseconds per iteration
% time {base64::encode -maxlen 60 $s} 10
7546.0 microseconds per iteration
And as pointed out, some possible applications may require additional
options.
| A third point is that PDF filters can be composed; they can be specified
| as a list (or in PDF terminology, an array) which will be applied in
| sequence. The equivalent for [binary encode] would be that instead of
|
| binary encode $format $data
|
| one would have the syntax
|
| binary encode $formatList $data
|
| However, I think this would be an unnecessary complication. We have
| other means of composing operations. ;-)
This would be better handled using channels I think. The plan here is to
provide a fast encoding primitive that can be used with [chan create] to
create an encoding channel layer. Just providing a channel would be
limiting, providing a command for use in script-based channels seems
more general. I think this requires TIP 230 but I don't think thats
likely to be rejected.
Pat Thoyts
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
iQCVAwUBSB7Yq2B90JXwhOSJAQiw7QP7Bc1iCqDTF+gwYTGnRir+Wtn6iRfvPEJ3
YEzEZdkOE0l7yxkzAibBCIWbLOW99QQV6UhOHNL/lotaS5m/JTqgGqNPGdKCGoWu
DYbvsHu/5+LaHEOJQnO0gSjgdjfWXgJaoktBfGgvPRPs4NIfETnrsldfwlLjrBoZ
gNjg8H3vdO0=
=Tqif
-----END PGP SIGNATURE-----
|
|
From: Donal K. F. <don...@ma...> - 2008-05-05 10:09:37
|
Pat Thoyts wrote: > This would be better handled using channels I think. The plan here is to > provide a fast encoding primitive that can be used with [chan create] to > create an encoding channel layer. Just providing a channel would be > limiting, providing a command for use in script-based channels seems > more general. I think this requires TIP 230 but I don't think thats > likely to be rejected. I agree with this. There are a number of cases that we know about where base64 data needs to be encoded to or decoded from strings. For example, it's fairly common to have base64-encoded data in XML documents, and embedding of encoded images for things like the plugin is another case; there was reasonable consensus recently that the magic that Tk's GIF loader does is nasty. Donal. |