Read Me
Dapar
Copyright (c) 2011-2012 Daniel van Vugt <danv@users.sf.net>
All rights reserved. See LICENSE.txt for details.
INTRODUCTION
Dapar is a universal parsing library written in C. It will interpret any
grammar you give it in a BNF-like format, and constructs a matching expression
tree for any given input. This makes developing a new parser for any language
simple and reliable.
Features:
* Small and portable (all C code)
* Understands ASCII, Unicode, UTF-8, and UTF-32 input
* Automatic ambiguity detection and debugging
* Implements a superset of all common BNF features found in EBNF, ABNF and
W3-BNF, using macros to generate native code
* Includes complete example grammars for parsing:
- ABNF: Augmented BNF for Syntax Specifications (RFC 5234)
- EBNF: Extended BNF (ISO 14977)
- mathematics (simple and algebraic)
- XML 1.0 (Fifth Edition)
BUILDING
Building Dapar is the same on Linux/Unix and Windows. On Windows you will
need to use "nmake" instead of "make".
make configure
make
And always run the test cases to make sure nothing is broken:
make test
USING DAPAR IN YOUR CODE
This is a simple introduction to using dapar in your code. For more complete
examples, please look in the examples/ subdirectory.
Start by defining your grammar using the special macros from dapar.h:
#include "dapar.h"
DAPAR_TOKN(integer) DAPAR_CHAR("0-9"), DAPAR_REPEAT_PREV(1), DAPAR_END
DAPAR_RULE(sum) integer, "+", integer, DAPAR_END
DAPAR_RULE(mylanguage) sum, DAPAR_OR, integer, DAPAR_END
And use your grammar to initialize the library:
dapar_stream_t m;
dapar_err_t err = dapar_stream_init(&m, mylanguage);
Feed input to match against the grammar:
err = dapar_stream_input_utf8(&m, "123+456", -1);
Finish the input and get a pointer to the resulting expression tree:
const dapar_tree_t *tree;
err = dapar_stream_end(&m, &tree);
Now use the expression tree in "tree" for whatever you need. Refer to examples/
and dapar.h for more information.
Always remember to free the memory you used before exiting:
dapar_stream_free(&m);
Linking to dapar is simple because it is a static library: libdapar.a or
dapar.lib.
ADVANCED BUILD OPTIONS
make test VALGRIND=1
Runs all test cases under valgrind. Only succeeds if valgrind finds 0 errors.
make VERBOSE=1
Turns off pretty make output, to show you the commands being run.
AMBIGUITIES
If dapar returns you an error then it is almost certainly your fault. But
tracking down the cause of error 7 (DAPAR_ERR_AMBIGUOUS) particularly can be
tricky if you're not practiced at BNF grammar design. So here is an example...
make configure
make
cd examples/mistakes
./mistakestest pi
Error at line 1 col 1: Ambiguous grammar match detected
Clearly we have a grammar that is ambiguous with the input "pi". So we need
to see what each of the different interpretations are. This is done using the
disambiguate option or -dN with dapar test cases. The parameter N is the
number of the the interpretation to reveal. An ambiguity means there are
always 2 or more interpretations...
./mistakestest -d1 pi
./mistakestest -d2 pi
And compare the resulting expression trees. This will show you how the grammar
can generate two (or more) different interpretations of a the same string.
mistakestest uses the dapartest framework (dapartest.c). You are encouraged
to use it in testing your own grammars. For examples of how to do this,
look at examples/*/*test.c
In some cases, your test input may be too long for dapar to figure out exactly
where the ambiguity is, and gives up early (as soon as its sure there is an
ambigutity). In this case you will need to slowly reduce the size of your test
input, while still reproducing the ambiguity, until it is short enough for
dapartest to fully parse.
--
Daniel van Vugt, January 2012.