This specification uses the Augmented Backus-Naur Form with the following extension: %xZZ
is a short-hand notation for %x00-FF
, which represents any value between %x00
and %xFF
.
Transenc consists of a sequence of tokens, and a token consists of one or more octets. Each token starts with an octet containing a type, and some tokens may be followed by additional octets. There are four kinds of types, which are described below.
Value types consists of a single octet that contains both the type and the value. Value types occupy three type ranges.
The range %x00
to %x7F
are small positive integers that encode the values 0 to 127. For example, the value 1 is encoded as %x01
.
The range %xE0
to %xFF
are small negative integers that encode the values -32 to -1. For example, the value -1 is encoded as %xFF
.
The range %x80
to %x8F
is reserved for special values. For example, the boolean value of true is encoded as %x81
.
Unknown value tokens can be skipped by advancing over the type octet.
Fixed-length types require a fixed amount of octets to carry the value. The type contains information about the primitive data type of the value (e.g. integer or floating-point), and the number of octets in the value. The value must be located immediately after the type.
The primitive data type is encoded into the type by the following bit pattern.
Bit pattern | Primitive type |
---|---|
ZZZZ Z000 | Signed integer |
ZZZZ Z001 | Character |
ZZZZ Z010 | Float |
ZZZZ Z011 | Byte |
ZZZZ Z100 | Reserved for future use |
ZZZZ Z101 | Reserved for future use |
ZZZZ Z110 | Reserved for future use |
ZZZZ Z111 | Reserved for future use |
The size type is encoded into the type by the following bit patterns.
Bit pattern | Size type | Size in octets |
---|---|---|
1010 0ZZZ | 8-bits | 1 |
1011 0ZZZ | 16-bits | 2 |
1100 0ZZZ | 32-bits | 4 |
1101 0ZZZ | 64-bits | 8 |
For example, the 16-bits signed integer value 4660
(0x1234
) is encoded as %xB0.12.34
.
Unknown fixed-length tokens can be skipped by advancing over the type octet plus the following 1, 2, 4, or 8 octets depending on the size type.
Variable-length types require a variable amount of octets to carry the value. Variable-length tokens are encoded as a type-length-value sequence. The type contains information about the primitive data type of the value (e.g. string.) The length is an integer and contains the number of octets in the value.
The primitive data type is encoded in the same way as described for fixed-length types. The length type is encoded by the following bit patterns.
Bit pattern | Length type | Length in octets |
---|---|---|
1010 1ZZZ | 8-bits | 1 |
1011 1ZZZ | 16-bits | 2 |
1100 1ZZZ | 32-bits | 4 |
1101 1ZZZ | 64-bits | 8 |
For example, the string "AB"
is encoded as %xA9.02.41.42
.
The length is encoded as an unsigned integer. Because some programming languages do not support unsigned integers and therefore have difficulty representing an unsigned 64-bit number, we have limited the range of 64-bits lengths to be less than 263. This means that the length can be implemented using a signed 64-bits integer. Lengths equal to or greater than 263 must be reported as errors.
length = length8 / length16 / length32 / length64 length8 = 1%x00-FF length16 = 2%x00-FF length32 = 4%x00-FF length64 = %x00-7F.7%x00-FF
Unknown variable-length tokens can be skipped by advancing over the type octet, read the length type, advance over the length type, and finally advance over the subsequent number of octets determined by the length.
Group types consists of a single octet that signifies either an opening or a closing token. They occupy the range %x90
to %x9F
.
Groups are used to encode composite data types. A group consists of an opening token and a corresponding closing token. An opening token must be followed by a closing token later in the stream. Groups can be nested inside other groups, so the opening and closing tokens must be balanced. Unbalanced groups must be reported as errors.
The group token uses the least significant bit, bit 0, to distinguish between opening and closing tokens, and bits 1-3 to identify the specific group.
Bit pattern | Group brackets |
---|---|
1001 ZZZ0 | Open group ZZZ |
1001 ZZZ1 | Close group ZZZ |
Unknown group tokens can be skipped by advancing over the opening type octet, and continue skipping over the subsequent tokens (according to their skipping rules) until the balanced closing type is encountered, and finally skipping over the closing type.
The abovementioned structure is used to encode the actual data types.
Null is a nullable type that indicates the absence of a value.
null = %x82
Booleans can be either true or false.
boolean = true / false true = %x81 false = %x80
Integers are two's complement signed numbers. The values are ordered as little endian.
int = small-pos-int / small-neg-int / int8 / int16 / int32 / int64 small-pos-int = %x00-7F small-neg-int = %xE0-FF int8 = %xA0 %xZZ int16 = %xB0 %xZZ.ZZ int32 = %xC0 %xZZ.ZZ.ZZ.ZZ int64 = %xD0 %xZZ.ZZ.ZZ.ZZ.ZZ.ZZ.ZZ.ZZ
Floats are IEEE 754 encoded numbers with single or double precision. The values are ordered as little endian.
float = float32 / float64 float32 = %xC2 %xZZ.ZZ.ZZ.ZZ float64 = %xD2 %xZZ.ZZ.ZZ.ZZ.ZZ.ZZ.ZZ.ZZ
Binary data (a.k.a. byte-buffer or octet-buffer) is a variable-length type that may contain a BLOB.
binary = binary8 / binary16 / binary32 / binary64 binary8 = %xAB length8 data binary16 = %xBB length16 data binary32 = %xCB length32 data binary64 = %xDB length64 data
String is a variable-length type. The length is encoded as for binary data. The value must be UTF-8 encoded. Invalid UTF-8 encoding must be reported as errors.
string = string8 / string16 / string32 / string64 string8 = %xA9 length8 data string16 = %xB9 length16 data string32 = %xC9 length32 data string64 = %xD9 length64 data
All containers are potentially nested, so opening and closing group types must be balanced.
The following rules are used to describe the specific container types.
container = record / array / map element = container / null / boolean / int / float / binary / string count = int / null
A record (also known as tuple or struct) is a sequence of pre-determined elements. The definition of a record is assumed to be known to both the sender and the receiver, so there is no reason to embed type of size information into the encoding.
record = open0 *element close0 open0 = %x90 close0 = %x91
An array is a sequence of elements.
array = open1 count *element close1 open1 = %x92 close1 = %x93
For a heterogenous array, the encoding uses an optional count to indicate how many elements are in the array.
For streaming arrays, the array encoding is used, but with count set to null
.
An associative array is a collection of key-value pairs. A pair is a record with two elements.
map = open6 count *pair close6 pair = open0 key value close0 key = element value = element open6 = %x9C close6 = %x9D
This table contains a summary of all the currently defined tokens. All tokens not mentioned below are reserved for future use. Ellipsis (...) indicates a continuous range of tokens beween the row above and the row below.
Hex | Binary | Type | Size | Description |
---|---|---|---|---|
%x00 |
0000 0000 |
int8 | 1 | Value 0 |
... | ||||
%x7F |
0111 1111 |
int8 | 1 | Value 127 |
%x80 |
1000 0000 |
bool | 1 | False |
%x81 |
1000 0001 |
bool | 1 | True |
%x82 |
1000 0010 |
null | 1 | Null |
%x90 |
1001 0000 |
open0 | 1 | Open record |
%x91 |
1001 1000 |
close0 | 1 | Close record |
%x92 |
1001 0001 |
open1 | 1 | Open array |
%x93 |
1001 1111 |
close1 | 1 | Close array |
%x9C |
1001 0010 |
open6 | 1 | Open associative array |
%x9D |
1001 1111 |
close6 | 1 | Close associative array |
%xA0 |
1010 0000 |
int8 | 1 + 1 | 8-bits signed integer. |
%xA9 |
1010 1001 |
string8 | 1 + 1 + N | String with int8 length. N = number of octets in string. |
%xAB |
1010 1000 |
binary8 | 1 + 1 + N | Octet array with int8 length. N = Number of octets in array. |
%xB0 |
1011 0000 |
int16 | 1 + 2 | 16-bits signed integer. |
%xB9 |
1011 1001 |
string16 | 1 + 2 + N | String with int16 length. N = number of octets in array. |
%xBB |
1011 1000 |
binary16 | 1 + 2 + N | Octet array with int16 length. N = Number of octets in array. |
%xC0 |
1100 0000 |
int32 | 1 + 4 | 32-bits signed integer. |
%xC2 |
1100 0010 |
float32 | 1 + 4 | 32-bits float. |
%xC9 |
1100 1001 |
string32 | 1 + 4 + N | String with int32 length. N = Number of octets in string. |
%xCB |
1100 1000 |
binary32 | 1 + 4 + N | Octet array with int32 length. N = Number of octets in array. |
%xD0 |
1101 0000 |
int64 | 1 + 8 | 64-bits signed integer. |
%xD2 |
1101 0010 |
float64 | 1 + 8 | 64-bits float. |
%xD9 |
1101 1001 |
string64 | 1 + 8 + N | String with int64 length. N = Number of octets in string. |
%xDB |
1101 1000 |
binary64 | 1 + 8 + N | Octet array with int64 length. N = Number of octets in array. |
%xE0 |
1110 0000 |
int8 | 1 | Value -32 |
... | ||||
%xFF |
1111 1111 |
int8 | 1 | Value -1 |