From: Michael G S. <sc...@po...> - 2002-12-04 21:33:06
|
Hi. I'm using YAML in the context mostly of config files and simple meta-data. I'm also Ingy's roomate and he got sick of me asking him questions and told me to post to the list. I've got a lot of issues to dump, so I'll try to break them down by post. I use YAML mostly for perl module meta-data. MakeMaker and Module::Build will soon be emitting and reading YAML spec files with information about the perl module we're building. Nothing complicated, mostly key: value stuff. Using YAML just means we have an agreed upon format. There's something of a dependency problem. Since Module::Build and MakeMaker are both modules to build other modules they shouldn't have dependencies. Which means they shouldn't depend on YAML.pm. So I started writing a small YAML parser which handles only a subset of YAML most useful for human writable applications such as config files and meta-data. The parser is basically a state machine destructively parsing a single stream rather than line-by-line. Its turning out to be fairly simple and Ingy said folks might be interested in the approach. I started by tearing out everything in YAML which is either difficult to parse or not useful for human writers. I took the YAML spec, dumped it to text and just cut things out. A copy of it can be had here: http://www.pobox.com/~schwern/tmp/YAML-POY/POY.spec Things which went: - Collection flows (ie. {...} and [...]) - Complex, multi-line or quoted keys. All that's left is "foo: " - Scalar only documents - Folded scalars (ie. >) - Document end (ie. ...) - Directives (ie. 3.3.2) - Node Properties (ie. 3.3.4) - Transfer Methods (ie. 3.3.5) - Anchors (ie. 3.3.6) - Aliases (ie. 3.4) Things which were kept but might go: - Single and double quotes (I've figured an easy way to parse them easily) - Unicode (a necessary evil) - Explicit indentation (ie. |2) (very easy to handle) - Chomping (ie. |-) (ditto) Changes: - All special characters which were removed (ie. [, {, etc...) have been added to the reserved_yaml production to preserve forward compatiblity. The mini parser (such as it is) can be found at http://www.pobox.com/~schwern/tmp/YAML-POY/lib/YAML/POY.pm. It only validates at this point and doesn't even do all of the mini spec, but its enough to prove the concept. It does the hard parts. The code is layed out as a state machine. The states and transitions can be seen in: http://www.pobox.com/~schwern/tmp/YAML-POY/States Originally, it worked line-by-line. This made things very difficult because a lot of extra states were added. Things like "More Single Quote", "More Flow", "More Double Quote", "More Literal" etc... Trying to properly parse single quotes made me give up on that approach. Now it works as a destructive stream parser. Essentially, the whole YAML document is copied into a single string (possible because I do not expect large documents) and then simple search and replaces are done on it, biting off chunks from the front of the string. This means parsing reduces to a set of rather simple regexes looking at the front of the string. State transitions reduce to calling another subroutine based on what regex matched. No look-aheads, look-behinds or prewalks. As each piece is parsed it is removed from the front of the string. Finding a map key and its indentation level is just: /^(\s*)([^:]+):[^\S\n]*/ Determining what value state you should go to is: /^|\s*\n/ # literal /^'/ # single quote /^"/ # double quote /^[$set_of_reserved_characters]/ # error: reserved else its a flow scalar Single quotes, including escaping, are handled with a single regex: /( '[^']*' )+$/mx and then unescaping is resolved easily: s/''/'/g Flow scalars are: /^(.*)/ # the first line /(^\n {$min_indent_lvl,}.*$)+//m # the rest So that's the layout of the state parser for those who are interested. I'll likely finish it up. While writing the parser, I hit a bunch of things in the spec which were odd. I'm going to mercifully split my posts by issue, so more on that next. -- Michael G. Schwern <sc...@po...> http://www.pobox.com/~schwern/ Perl Quality Assurance <pe...@pe...> Kwalitee Is Job One I know you get this a lot, but what's an unholy fairy like you doing in a mosque like this? |