> There's nothing wrong with whole-file/stream compression :-) But from what I

>understand, Paul is looking for something else, which is mucking with the internal

>structure, such as allowing for non-printable binary blobs.


Yes – I’m looking at a little more than just zip-like compression.


No  - I’m  hoping to not muck with the internal structures too much, but instead adding an alternate encoding. The main idea is that you can have both human readable (YAML text representation) and efficient binary encoding that are isomorphic.  This adds a new representation type but it would be directly convertible to a readable YAML text format (assuming you have the encoding rules).


All data types would have both a canonical text and binary encoding defined.  The human readable form as currently envisioned would be YAML – or at least a subset that conforms to stricter conventions, but that could be read and parsed by existing YAML libraries.


Validation would be possible using the encoding schema for both the human readable and binary versions.


The application space is small embedded devices, sensors, wireless protocols, etc.  Some of the motivation comes from being tired of working in standards that are hard to read that use TLV formats and long prose to describe the protocol.  I’m trying to get the most readable and efficient representation of a binary protocol. 


The specific protocol I’m working on is for “privilege management”  - basically defining who can do what.  The expressions for these statements need to be human readable and understandable – YAMl would be ideal.  Parameter validity checking is required (e.g. there are a bunch of security issues with dns names if you only treat the dns name as a string).   TLV encoding and “efficient” representation of enumerated options is necessary for the embedded applications.




Paul A. Lambert |  Marvell  | +1 650 787 9141


From: Oren Ben-Kiki [mailto:oren@ben-kiki.org]
Sent: Saturday, June 25, 2011 10:58 PM
To: Trans
Cc: yaml-core@lists.sourceforge.net
Subject: Re: [Yaml-core] Reg: PyYAML grammar


There's nothing wrong with whole-file/stream compression :-) But from what I understand, Paul is looking for something else, which is mucking with the internal structure, such as allowing for non-printable binary blobs.


BTW, YAML already provides a form of internal "compression". If you have many keys with some "long value foo", you can easily add an anchor to the 1st one and specify all the rest as "*f". Think of it as a poor man's LZW. IMO minimal-size flow-style YAML with heavy use of such anchors, plus some streaming compression of the stream, should give you a pretty compact format.


Of course I don't know Paul's application - he hinted that doing gzip may be too heavy for his use case (and I don't know what is data and BW constraints are). He's pretty much required to invent his own format which is tailored to his specific environment (or so it seems).


Have fun,


    Oren Ben-Kiki

On Sun, Jun 26, 2011 at 5:20 AM, Trans <transfire@gmail.com> wrote:

On Jun 13, 8:20 pm, Paul Lambert <p...@marvell.com> wrote:
> Ok - the YAML community and binary encoding do not appear to be compatible.  Too bad your world view is so limited.  Take a look a protobuf as an example of combining human readable and binary encodings if you want to see a worked hybrid.

I don't understand. What's wrong with compression?

All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense..

Yaml-core mailing list