Here’s a simple example of what I’m trying to do – for the following simple example attribute:


    macAddress: 12-34-56-78-90-AB    #(total YAML encoded length roughly 30+ characters)


To encode this example – I’m looking at a couple of extensions:

1)      You need a mechanism to bind a integer Type to the tag (macAddress, eg. 5)

2)      You need stronger typing of the values with both binary and text encoding rules for each type.  For example a macAddress would be defined as 6 hex bytes for binary encoding and a dash separated hex encoding for the human readable version (17 characters).


Using these rules the examples encoding would be:


    x0506123456789AB                        (total length 8 bytes)


So in this example you get about a factor of 4 size reduction.  The notion would be that the human readable and the binary would be able to be directly converted back and forth … as long as you have a few schema/typing hints for the conversation that include the tagging values and associated typing that goes with the tag.  String oriented types would have less benefit for size reduction.  Enumerations would typically have larger reductions (but are a more complicated example).


The strong  typing is important.  I’m doing this work for a security oriented standard and need to have clarity in the usage of the values that is different than the usually YAML simplicity.  For example even though a DNS name is a string … the semantics and valid processing for a DNS name are much more restricted that a basic string and a very different definition of containment.  For delegation there needs to be a very strict interpretation of the containment of one string/name in another range.  So – these concepts need to be built into the system.


I’d like to embed the hints for schema and tagging into a YAML compatible format.  For example – a simple line of schema for the above could be:

    macAddress: <macAddressType> # [5]


There would be a fair number of standard prescribed types for interesting objects (date/time, dnsName, ipAddress, string, int, enum, etc.) and ability to extend and add types.


I’m still futzing with the style of representing the schema information – hence my strong interest in existing YAML schema work.  However, I also need to overload any schema language with the tagging information .. so some invention is still required.  My intent is to do the overloading in comments so any schema is still a correct YAML document.




Paul A. Lambert |  Marvell  | +1 650 787 9141


From: Oren Ben-Kiki []
Sent: Friday, June 10, 2011 10:42 PM
To: Paul Lambert
Subject: Re: [Yaml-core] Reg: PyYAML grammar


Sure you could define a while new binary format, but I doubt you will get significant savings compared to a flow style (other than for stuff like embedded PNG images). After all it requires just 1 character at the start and at the end of each collection ('{' ... '}'), one character to separate collection entries (',') and so on.


I suppose it is possible to define a minor extension to the flow style - say, DLE <size of binary data> <binary data> - which would be "equivalent" to specifying a base64 string. This would require minimal tweaking of existing YAML parsers and would offer very dense encoding of the overall stream (throw in zipping if !tags are an issue).


This might be an interesting idea to keep in mind for future YAML spec versions. It wouldn't be anything we will do unless someone actually used it first, though ;-)


Have fun,


    Oren Ben-Kiki

On Fri, Jun 10, 2011 at 11:13 PM, Paul Lambert <> wrote:

Thanks for the help …


> Binary encoding... sorry. YAML is first and foremost about readability.

The idea is to have both canonical human readable and binary encodings.  It’s pretty straight forward to define a TLV approach that simply takes a tag and converts it to an enumerated integer.  YAML as it stands could have such an efficient encoding with a suitable definition of ways to do tag mapping.  A Schema could help define the tag to integer mapping (ASN.1 or Protobuf as examples).


I’ll sent a sample schema to the list as it gets put together …






From: Oren Ben-Kiki []
Sent: Friday, June 10, 2011 12:30 PM
To: Paul Lambert
Subject: Re: [Yaml-core] Reg: PyYAML grammar


There hasn't been much work on YAML schemas. There is an issue here that people mean different things when they say "schema" and "validation".


If you are looking for validating your specific input files in your specific application, YAML is very well suited for that. A minimal amount of !tag-ing combined with some implicit tagging rules, plus adding verification code in the matching classes in your favorite implementation language, and you are all set for as strong (or as weak) a verification as you want. This is what most people do.


If you are looking for a generic "validate a data file with a schema file" ability. There's no such thing for YAML, perhaps because it isn't that useful in practice after all...


The same arguments can be made about JSON, except there you are completely at the mercy of "implicit" tagging, and the whole notion of deserialization into your application classes is an afterthought rather than being an inherent part of the spec.


Binary encoding... sorry. YAML is first and foremost about readability. Sure, you can produce pretty dense (and unreadable) YAML files using the flow styles (think JSON but with the !tag-ing, such as [!foo{bar: baz}]. That's not _too_ bad size-wise, as long as all your data is textual anyway - and you can zip it for additional shrinkage. If you have true binary data (e.g., a PNG image), you'd need to base-64 it, and zipping the result would only help so far... YAML really wasn't designed for this sort of thing.


Have fun,


    Oren Ben-Kiki


On Fri, Jun 10, 2011 at 9:52 PM, Paul Lambert <> wrote:

So … I’ve only recently joined and been tracking this list, so please excuse what might be off-topic.


I’m working on a security standard and trying to use YAML.  However, I need a schema – not to enforce a schema on YAML, but  primarily to be able to describe in YAML subsets of a possible schema.


The standard is focused on describing “who can do what”. Who is cryptographic, what can be objects describable in YAML.  Delegation requires the ability to define set of information within a schema.


I’ve looked at Doctrine – it’s close but not quite what I need. Are there other efforts in place that could be leveraged?


I’m also looking at stronger typing and binary encoding (akin to Protobuf).  The use of schemas, strong types and alternate encodings may be contrary to YAML culture and goals (of staying simple).  Is YAML the right choice in this context or should I just start something new that is YAML-like?


Thanks in advance,






EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
Yaml-core mailing list