[Yaml-core] !!array tag for fast read-in ?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

We consider YAML as a candidate for combining metadata with 256x256 or 1024x1204 images
(actually physical detector counts, mostly small integers, which makes ASCII storage
quite efficient).
Unfortunately, yamlcpp and libyaml are both too slow for our purpose;
they perform 1-2 orders of magnitude slower than JSON or purpose-written ASCII parsers.

In plain YAML, our image data reside in a sequence of sequences, like
- image:
   - [ 0 3 2 0 8 ... ]
   - [ 1 0 7 3 2 ... ]
   - ...
This is inefficient because the parser has no knowledge about this peculiar
data structure, and therefore spends lots of time with unnecessary actions.
Is this right?

What do you think of the following concept:

Define a tag !!array, to be used as follows:
- image: !!array 2 uint8 256 256 |
     0 3 2 0 8 ...
     1 0 7 3 2 ...

The number immediately after the tag (here 2) indicates the rank D;
the next word (here uint8) is the data type;
the next D numbers (here 256 and 256) indicate the size of the multidimenional array.
This then is followed by the image data themselves.

In our read-in routine, upon encountering the tag '!!array',
we would switch from the parser of libyaml to a parser of our own
that takes full advantage of our knowledge of what entries must follow an '!!array' tag.

Does this make sense?
Is it consistent with letter and intent of the YAML specs?

Or what else should we do to reconcile YAML with our need for speed?

- Joachim