From: Chris R. <cro...@di...> - 2006-08-28 16:24:22
|
Hello there, folks. Okay, so I've sat down and am trying to write a bit of code that will parse a YAML configuration file, and read it into a structure in memory that I can then inquire of to get the desired configuration information. However, the issues I'm having are really all "intro to YAML and libyaml" type questions, I think. :-) So, I'm using the parser, as suggested. I have looked at the API synopsis on the PyYAML page for libyaml, and am getting results similar to those I'd expect. However, I think I made an erronious presumption. I guess I thought that the library would return a SCALAR event with both the key and value, when the key/value was for a scalar. But, it seems to return something more like tokens. If this is right, and I'm seeing this correctly, how do I know when I get a SCALAR event whether it's for a key or a value? Does this all only work inside of a mapping? When a SCALAR exists outside of a mapping, is it always just a "thing", and cannot have a key/value type identity? If inside of a mapping, is the first (the third, etc) event always the KEY, and the next (be it a scalar, or a start-thru- end of seq or mapping) is the VALUE? I can keep track of this in my code, but honestly I expected the library to do that for me. There's a parser, so it should parse and comprehend to a limited degree. I think I'm seeing something more like a tokenizer that doesn't do much of anything else. So, slightly confused. Just drop me a line if you would to indicate whether I'm headed in the right direction, or if I'm missing something [obvious]. Thanks... - Chris |
From: Kirill S. <xi...@ga...> - 2006-08-30 17:27:34
|
On Mon, Aug 28, 2006 at 12:24:22PM -0400, Chris Ross wrote: > Hello there, folks. Okay, so I've sat down and am trying to write > a bit of code that will parse a YAML configuration file, and read it > into a structure in memory that I can then inquire of to get the > desired configuration information. However, the issues I'm having > are really all "intro to YAML and libyaml" type questions, I think. :-) > > So, I'm using the parser, as suggested. I have looked at the API > synopsis on the PyYAML page for libyaml, and am getting results > similar to those I'd expect. However, I think I made an erronious > presumption. I guess I thought that the library would return a > SCALAR event with both the key and value, when the key/value was for > a scalar. But, it seems to return something more like tokens. If > this is right, and I'm seeing this correctly, how do I know when I > get a SCALAR event whether it's for a key or a value? Does this all > only work inside of a mapping? When a SCALAR exists outside of a > mapping, is it always just a "thing", and cannot have a key/value > type identity? If inside of a mapping, is the first (the third, etc) > event always the KEY, and the next (be it a scalar, or a start-thru- > end of seq or mapping) is the VALUE? I can keep track of this in my > code, but honestly I expected the library to do that for me. There's > a parser, so it should parse and comprehend to a limited degree. I > think I'm seeing something more like a tokenizer that doesn't do much > of anything else. First, I'm not sure if you use the correct API. Do you use yaml_parser_parse()? yaml_parser_parse() is a parser while yaml_parser_scan() is a tokenizer. The events produced by the parser satisfy the following grammar: stream ::= STREAM-START document* STREAM-END document ::= DOCUMENT-START node DOCUMENT-END node ::= ALIAS | SCALAR | sequence | mapping sequence ::= SEQUENCE-START node* SEQUENCE-END mapping ::= MAPPING-START (node node)* MAPPING-END The first (and 3rd, 5th, etc) nodes in the mapping production are keys while the second (and 4th, 6th, etc) nodes are the corresponding values. Note that in YAML, sequence items, mapping values, and even mapping keys could be complex objects like sequences or mappings. Therefore you shouldn't expect that, say, that the second event after MAPPING-START is a mapping value. The code processing YAML events with libyaml should look like this (sans error handling): void process_stream() { yaml_parser_parse(&parser, &event); // Eat STREAM-START yaml_event_delete(&event); while (1) { yaml_parser_parse(&parser, &event); // Eat STREAM-END or the // first event of a document if (event.type == YAML_DOCUMENT_END_EVENT) break; process_node(event); } yaml_event_delete(&event); } void process_node(event) { if (event.type == YAML_SEQUENCE_START_EVENT) { process_sequence(event); } else if (event.type == YAML_MAPPING_START_EVENT) { process_mapping(event); } else if (event.type == YAML_ALIAS_EVENT) { // Do something with the alias or produce an error message yaml_delete_event(&event); } else { // Process a scalar event yaml_delete_event(&event); } } void process_sequence(event) { yaml_event_delete(&event); while (1) { yaml_parser_parse(&parser, &event); // Eat the first event of // the next sequence item if (event.type == YAML_SEQUENCE_END) break; process_node(event); // Process a sequence item. } yaml_event_delete(&event); } void process_mapping(event) { yaml_event_delete(&event); while (1) { yaml_parser_parse(&parser, &event); // Eat the first event of // the next mapping key if (event.type == YAML_MAPPING_END) break; process_node(event); // Process a mapping key yaml_parser_parse(&parser, event); process_node(event); // Process the corresponding value } yaml_event_delete(&event); } Well, the code may become really complicated when you include error handling and processing of the configuration format. I think I'll add node based API (like DOM), and it will become easier. -- xi |
From: Chris R. <cro...@di...> - 2006-08-30 19:13:51
|
On Aug 30, 2006, at 1:27 PM, Kirill Simonov wrote: > First, I'm not sure if you use the correct API. Do you use > yaml_parser_parse()? yaml_parser_parse() is a parser while > yaml_parser_scan() is a tokenizer. Yup. Definitely using yaml_parser_parse. I noted that difference in them myself looking at them. > The first (and 3rd, 5th, etc) nodes in the mapping production are keys > while the second (and 4th, 6th, etc) nodes are the corresponding > values. > > Note that in YAML, sequence items, mapping values, and even mapping > keys > could be complex objects like sequences or mappings. Therefore you > shouldn't > expect that, say, that the second event after MAPPING-START is a > mapping > value. Right. I understood that looking at the stream of events I'm getting back. As I think I said in my original email, I think I was just expecting a slightly higher-level interface where it would return a complex object with a key and a value. Now that you've confirmed I have to build those pairings myself, I can certainly do that. > The code processing YAML events with libyaml should look like this > (sans > error handling): Excellent. Thanks. This looks about what I had expected to do, and had started. I just wanted to make sure I wasn't missing something. I see you swallow the STREAM_START and just toss it. Will yaml_parser_parse *always* start by returning a STREAM_START event? > Well, the code may become really complicated when you include error > handling and processing of the configuration format. I think I'll add > node based API (like DOM), and it will become easier. Right. That's about the interface I'm going to add myself specifically for dealing with configuration files. Let me know when you get something like that done on your end, and with any luck I can switch to it. :-) Thanks for the help... - Chris |
From: Chris R. <cro...@di...> - 2006-09-01 16:51:51
|
Okay. More work on this. So, I'm trying to write an "intermediate" layer between the stream coming from libyaml, and the "what is config variable X?" layer I'll need for my applications. The core reason of this, I think, is to just suck the whole stream into a data structure in memory so I can then query it. So, I think I have a couple things I'm not sure of given my limited exposure to YAML, and libyaml. I think the object I want to store in memory is always going to be a single document/stream. (I'm also not 100% sure what the difference between those two are, but I think it doesn't matter for me right now.) So, the "object" (be it called "document" or "stream"): Does it ever have more than one node in it? Looking at what I'm getting from libyaml for my test config files, it always seems to contain one MAPPING "node", which then contains all the other things. Is this always the case? Is a YAML stream/document always just a single "node", where a node can be a sequence or a mapping? Can you have a stream that's not? I can't see how, since I can't think how you'd have something that wasn't just a sequence or mapping on the outside. Unless it was a single item. Is that a single item, or a sequence with a single scalar inside of it? Thanks. Then, I'll get it all read into memory. If anyone has already done this sort of work, I'd love to see what you have. I know something somewhat like this will likely later get added to libyaml, but unless there's already code, I'll continue trying to write one. :-) - Chris |
From: Chris R. <cro...@di...> - 2006-09-01 17:20:51
|
On Sep 1, 2006, at 12:51 PM, Chris Ross wrote: > I think the object I want to store in memory is always going to > be a single document/stream. (I'm also not 100% sure what > the difference between those two are, but I think it doesn't matter > for me right now.) So, the "object" (be it called "document" or > "stream"): Does it ever have more than one node in it? I could still use any feedback anyone has on my original questions, but I think I figured this one out. It looks like if the first "node" in the document must be the only node in the document. If I put multiple nodes (multiple scalars that can't be scanned as a single scalar) it gives me an error about expecting to find a "document start". If I put in the "---", it then parses the stream successfully as two documents. So, I now see that a document seems to always be a single "object", tho that "object" can be a sequence or mapping. - Chris |
From: Kirill S. <xi...@ga...> - 2006-09-02 11:27:03
|
Hi Chris, On Fri, Sep 01, 2006 at 01:20:52PM -0400, Chris Ross wrote: > > On Sep 1, 2006, at 12:51 PM, Chris Ross wrote: > > I think the object I want to store in memory is always going to > > be a single document/stream. (I'm also not 100% sure what > > the difference between those two are, but I think it doesn't matter > > for me right now.) So, the "object" (be it called "document" or > > "stream"): Does it ever have more than one node in it? > > I could still use any feedback anyone has on my original > questions, but I think I figured this one out. It looks like if > the first "node" in the document must be the only node in > the document. If I put multiple nodes (multiple scalars > that can't be scanned as a single scalar) it gives me an > error about expecting to find a "document start". If I put > in the "---", it then parses the stream successfully as > two documents. > > So, I now see that a document seems to always be a > single "object", tho that "object" can be a sequence > or mapping. Yes, you are right in this analysis. -- xi |