On Fri, 2008-03-21 at 12:02 +0100, Alexis Guillaume wrote:
> I was asked at work to implement some network protocol using a INI format.
> I was almost resigned when I heard for the first time about yaml and almost
> fell in love with it ;-) It's been a week now and I am trying to make a good
> case to use yaml instead of INI. As INI was chosen because of its extreme
> simplicity, one of the point I now have to make is that it is not *that* hard
> to load a yaml stream in memory (we have to use C++). That goal in mind, I am
> currently working on a test program which allows simple queries on *any* yaml
> file. It loads the yaml file (using libyaml) in a graph ; basically, each node
> is either a std::string (scalar), a std::vector of nodes (yaml sequence), a
> std::map of nodes (mapping) or pointer to another node (alias). In our final
> application, that graph would of course be used by a higher-level to read and
> write message according to our protocole description.
So you are writing a C++ yaml parser? You would probably do better to
start with libyaml or the like.
> 1) Am I right about aliasing ? When the parser sends me an alias event, should
> I create a fully duplicated node, should I keep a pointer to the original node
> or should that decision be taken at a higher application level ?
In principle you are supposed to share a pointer to the original node.
> 2) Is merging really supported in yaml ? It is not mentioned in the yaml spec
> and no specific merge event is sent by the parser. Currently, when my program
> finds a merge key (<<), it replaces it by a *copy* of all the key-value pairs
> of the referenced mapping. Is it the right thing to do or should I keep the
> merge key as if it were any other key and let a higher level of application
> handle the merging part ?
Merging isn't the parser's job, it is done at a higher level (and is
therefore optional). Your code is "correct" in using a copy of the
merged nodes.
However if your data structure is *purely* read-only, then of course you
could just use pointers to the same object all over the place, not just
for merged data (e.g., the same string appearing anywhere in the YAML
document could be loaded to a single std::string object). This approach
(when coded very carefully) may speed things up significantly for large
YAML files with repetetive data. And of course for non-read-only there
is always copy-on-right, which further complicates things... for simple
configuration files this is all probably an overkill.
> 3) When loading a mapping, how should my program behave when it encounters a
> duplicated key ? Should it produce an error and exit, allow the identical keys
> to peacefully cohabit in the graph, overrides the old key by the new one or,
> again, let a higher power decide of what to do ?
Duplicate keys mean the input file is invalid. You should at minimum
emit a warning message (possibly to a log file), so the users will know
something bad has happened and will (hopefully) fix the problem. Whether
you then abort, ignore or override the key depends on the application I
guess. Some apps must "never crash" so aborting isn't an option; as to
whether to use the "old" or "new" value, there really is no "right"
answer. You should _definitely_ not let the keys co-exist.
Have fun,
Oren Ben-Kiki
|