From: Brian I. <in...@tt...> - 2003-05-30 08:05:32
|
Sent earlier but it never got to the list. Trying again. On 29/05/03 07:42 +0000, skinhat skinhat wrote: > I'm after a format like XML that is as small as possible. The closest format > I've found to what I'm after is YAML yet I feel YAML could even be more > compressed. Something that could make YAML smaller would be to allow a > single character to define an element or attribute. The goals of YAML are "human readability" and a "clean information model". While this has resulted in a rather compressable syntax, the syntax is not optimized for compression per se, but rather for completeness. > For example say you have the following XML: First off, the structure of XML does not map cleanly into YAML. (or vice versa) > <FARES> > <DATA fareref="DOGMA" currency="GBP" departure="LAX" arrival="EDI"/> > <DATA fareref="MADF" currency="AUD" departure="SYD" arrival="MEL"/> > <DATA fareref="DFSF" currency="USD" departure="JFK" arrival="MCO"/> > </FARES> > > In YAML it woud be defined something like: This is invalid YAML for two reasons: 1) You cannot have the same key used more than once ('DATA') in the same mapping. 2) The key/value separator is ': '. ('fareref: DOGMA') > FARES: > DATA: > fareref:DOGMA > currency:GBP > departure:LAX > arrival:EDI > DATA: > fareref:MADF > currency:AUD > departure:SYD > arrival:MEL > DATA: > fareref:DFSF > currency:USD > departure:JFK > arrival:MCO This would be how I would represent your data in YAML: FARES: - fareref: DOGMA currency: GBP departure: LAX arrival: EDI - fareref: MADF currency: AUD departure: SYD arrival: MEL - fareref: DFSF currency: USD departure: JFK arrival: MCO or: FARES: - {fareref: DOGMA, currency: GBP, departure: LAX, arrival: EDI} - {fareref: MADF, currency: AUD, departure: SYD, arrival: MEL} - {fareref: DFSF, currency: USD, departure: JFK, arrival: MCO} Which is the same thing, only in less lines. You could also model your data with sequences instead of mappings, as long as you application was aware of the meaning: FARES: - - DOGMA - GBP - LAX - EDI - - MADF - AUD - SYD - MEL - - DFSF - USD - JFK - MCO or: FARES: - [DOGMA, GBP, LAX, EDI] - [MADF, AUD, SYD, MEL] - [DFSF, USD, JFK, MCO] This is about as compact as YAML will allow. > A suggestion I have is to allow one character to specify the name of > attributes and elements, that is, if a colon is not found then by default > the first character is the attribute or element name. For example the above > YAML could become: > > F > D > fDOGMA > cGBP > dLAX > aEDI > D > fMADF > cAUD > dSYD > aMEL > D > fDFSF > cUSD > dJFK > aMCO > > > This could be compressed even further by using closing elements instead of > indentation. > For example the above format becomes: > > F > D FDOGMA CGBP DLAX AEDI D > D FMADF CAUD DSYD AMEL D > D FDFSF CUSD DJFK AMCO D > F This would not work for YAML for myriad reasons. But my question for you is, what does this even buy you? You've lost general human comprehension, if not human readability. In your XML and YAML examples, I could understand your data/application at a glance. In this final example, all meaning is lost to me. It is easy for people to invent a domain specific serialization syntax, optimized for whatever scratches their itch. YAML attempts to soothe a broad range of rashes, and it does so in a way that has been shown to be pleasing to a broad range of people. Cheers, Brian |