Looking for the latest version? Download bnf2xml-7.0.2.tar.gz (316.5 kB)
Name Modified Size Downloads / Week Status
README 2016-04-08 19.4 kB 0
bnf2xml-7.0.2.tar.gz 2016-04-07 316.5 kB 44 weekly downloads
bnf.c.prettier.txt 2013-06-26 24.9 kB 0
bnf2xml-5.tar.gz 2013-06-05 150.9 kB 0
bnf2xml-1.tar.gz 2013-05-04 102.2 kB 0
Totals: 6 Items   613.9 kB 4
ABOUT bfn2xml is a BNF parser that can be used easily by shell applications because it has xml output (also other optional reports). bfn2xml just reads input and prints xml output, that simple. (marked up by matched bnf grammar of course) Most bnf tools are complex; to be used within compiled code running inside programs and output complex trees. The bnf suites stating "XML" I've seen take XML Input, do Not generate XML output (but could do so by coding I'm sure). To build this c++ prog do: $ tar -xzf bnf2xml-7.0.2.tar.gz $ cd bnf2xml-7.02/ $ ./configure $ make $ make install $ bnf2xml -h $ man bnf2xml ------------------------------------------------------------- DOCUMENTATION README covers bnf2xml specifics, has little bnf instruction. For education on BNF in general see: google BNF and try http://www.wikipedia.com see also: examples in tarball This is a bnf parser that: takes a file as input reads the bnf definition file identifies patterns in file using bnf definitions outputs XML markup of the definitions matched BETA this is a beta release not ready for heavier use testing is on-going to make it stable for target "old school EBNF" input and bnf files. changelog is far below OUTPUT outputs the highest bnf line matched which will be the most top most definition (bnf line to match can be picked though) Report C option outputs a text table of all matches (found or failed) (it's a trace of this table that makes the xml output) see OUTPUT EXAMPLE far below. BNF bnf2xml has some enhancement but not follow or cannot do all "new bnf" syntaxes see (1) , bnf2xml is well featured for size, but far less than AntLR compiler parser product SYMBOL DESCRIPTION "a" anything inside quotes is a terminal to match against input <a> is non-terminal, meaning it is a rule to look-up and apply | logical OR, ie. <letter> | <digit> -> "a" or "1" logical AND is otherwise assumed, ie <alph> <digit> -> "a1" [] optional expresssions, absorb input only if all expr are true, the result is always true (matches 0 or 1 times) {} same as ] except it matches 0 or more times () group expressions. absorb input only if all are true * is } + <a>+ == <a> { <a> } ? is ] notes: bnf2xml runtime only does unary postfix ops, ie <a>? if you looked in debug [ <a> <b> ] would be re-written equivalently by listing <a>], <b>] under new <c>. see below about expressions collected under one symbol, which is a different kind of OR, often easier. (different? | is a short circuit OR, see -o below) BNF FEATURES OPS <> | () {} [] "" + * ? fully reflexive, recursive NON-BNF (features added): operators . - ^ ! # @ = ~ syntax <>== "" "" "" ... tokens: <BNF_QUOTE> <BNF_8bit> <BNF_ZERO> <MATCH_LIST> <MATCH_SEP> <RECORD_SEPARATOR> * and ? are quiet on 0 matches {} and [] are noisy on 0 matches ; <a></a> {} and [] can appear as <a> } which avoids an extra rule <> == special, match against lists of | "string"s using binary search = assign <a>= saves text found by last token into <a> ~ congruent <a>~ can do a <repl> in a pair context (ie like typedef) @ quiet, shallow, preceeding item is xml quiet @@ quiet, deep, any matches it made also quiet ^ reflex OP is just alternate syntax for compounded *, see below . skip the input 1 forward - skip the input 1 back (dash) $ "n"$ set next skip to n forward ("5"$ . ; skip 5) $` "n"$` set next skip to n forward, emit data in output # test previous token truth throw out result ! reverses prev token truth throw out result ` emit last terminal or token if rule-line is true `` emit one deep , lookup token then emit (left quote) .` emit current input char, does not skip input -` emit curent input position, converted to decimal (dash) %` emit incremental counter in decimal & quit, stops, and prints if --streaming didn't already see LIMITS && forgets past input blocks, prints if streaming, continues Use only one postfix per object (excepting @, @ must be last) postfixes: ~ = ] } * + ? ` $ independants: | # ! & . - % * ? + ! # ^ ` @ ~ = & _ $ must be attatched (no <white>) example: <a> ? is <a> "?" matches "?" as character, <a>? ok Attachment behavior can be avoided by compiling "NO_UNQUOTED" (the orig. idea was that these cryptic opts (easier to work with once learned) could be replaced from <full_name> to character ops using sed(1) before use as bnf file, avoiding extra bnf parse coding) See "Extra OPS" and see LIMITS for more details. Remaining simply old EBNF is a goal, but with xml pretty. With ~= now added bnf2xml is considered complete, it is thought to do all it needs to for the intended output goal. Please comment if you think otherwise. ABNF EBFN support: No. However ::= can be subst to = by sed(1) before the bfn is parsed. And while I have no plan for: the bnf file could define some EBNF in terms of BNF (and then have EBNF appended to it). BNF ENTRY and EXIT BNF needs a "starting left token". <bnf> is used below. bnf2xml handles lists everywhere even at top level. The top level "runs" slightly different (ie -l, see options). <bnf> ::= <prog1> ; a single token top level <bnf> ::= <prog2> ; or list of entries at top <prog1> ::= hi ; bnf rules <prog1> ::= hello " " world ; bnf rules ... Exit. search must match all of the input (including last EOL) or fails, unless using the special symbol design to quit early. Questionable exit. It's unknown if all of <a> should be all input until " "? is checked. If input is "hi ()" the 1st <a> matches, " "?, then fails: which missed trying 2nd <a>. <bnf> ::= <a> " "? <a> ::= hi <a> ::= hi " ()" ; a sure exit below <bnf> ::= <a> <a> ::= hi " ()" " "? <a> ::= hi " "? (allowing two <a>, tried for truth in turn, may be a feature specific to bnf2xml. it is convenient and avoids complications, see OR) QUOTING BNF syntax default is ", use -b "'" to set BNF_QUOTE to ' bnf quotation is used to indicate what is NOT bnf syntax bnf syntax uses ::= and <>(){}[] space, tab, EOL, and also +*? when not preceeded by any space. ex: "<a>" ; is not a bnf symbol "hi there" ; is one string, and quotes are removed About matching the BNF_QUOTE char itself... There are NO BNF QUOTING RULES: must be pairs. (ie, sh(1) has rules to subvert quote rules, is error prone) Quoting is absolute so BNF_QUOTE is "built-in". <BNF_QUOTE>hi there<BNF_QUOTE> matches "hi there" including the quotes BNF allows line splitting (last char \), absorbs spaces and EOL BNF only sees a new rule start when < is at beginning of a line (and is followed by ::= and is not inside BNF_QUOTE) <a> ::= <b> ... <BNF_ZERO> is also need if "string" contains 0 as and strcp(3) is used in reading bnf, which stops at 0's. (input file has binary 0 being matched, not the tag, of course) EOL How used or not used: $ echo "ask" | ./bnf2xml $ echo "take 1the 0door" | ./bnf2xml --loop --loop doesn't match <EOL> because it's designed to take input a line at a time it matches bnf lines w/out. bnf lines that don't end in <EOL> are matched (note two <EOL> matches echo -e "\n") NOTES see tarball for further README and examples "" <a> ::= "" ; matches no input or fully absorbed input @ <a>@ ; quiet the last token or terminal. shallow means ; <a></a> will not appear but what is found by <a> ; does. "and"@ also works and nothing prints. ; @ must be last, ie, <a>+@ is ok. ^ reflex is similar to * ; avoid reflexes to be more compatible <a> ::= <b>^ ; simple reflex <a> ::= <b>^ <c>^ <d> ; fully reflexive <repl> simple replacing ; is better done by input prefilter <b> ::= <repl> = "a" "b" ; if <b> a is matched <b> c is <b> ::= <foo> == "c" "d" ; answer, always next line used = = only sets (repl can achieved with more bnf rules) <b> <a>= ; <- saves text found by <b> into <a> <a> ::= "foo" ; <- make sure <a> is a defined terminal ; = does not pair symbol <b> to symbol <a> = does not reset if in failed rule, it is sticky ~ ~ is for replacing items (or paired lists) only if found in a certain context. like =, <b> <a>~ saves <b> into <a> ex. CPL, "typdef int I" means replace all I with int if I is found and is an identifier previously found in typedef, repl, else I is left alone by ~ <b> ::= <c>+ <tfe> ; for full example of ~ see junk2/bnf.8 <c> ::= typedef " " <ident> <tie>~ " " <ident> <tfe>~ " ; " <tfe> ::= <repl> == ; see <repl> above and <tie> ::= <foo> == ; dont forget to put these two lines in bnf <b> matches the typedef statement itself in input, and if that happens tie tfe are setup by ~ during matching the statement so that tfe becomes tie when tfe is matched in it's context <ident> (typedef is a string which 8.b finally == is blank? the elements of == are set = while scanning, they are filled in by the (lets say typedef) found in input ~ the above pairs replaces <tfe> to <tie> automatically because <tfe> uses <repl> ~ lists can result, uses <MATCH_LIST> in xml ~ uses <> == "" "" "", can be used in ways not shown above ~ resets if within failing rule, but see LIMITATIONS see c.bnf (or bnf.c.txt) {{}} a run of same lhs are tried in order as if ORed (ex shows recursion too) for example, input: "{{inside1}}" or "{{}}" <a> :: "inside1" <a> :: "{" <a>* "}" ; note <a> is not automatically true <a> :: "inside2" ; <a>* allows "{{}}" to match (example is a simple brace matching in input "{" which would in xml show what is captured by braces, .bnf examples have more on that) However the below is different: <a> is both first and inside OR <a> :: <b> | <a><d> | <c> if no other <a> are defined this needs -o because | uses "short circuit logic". <a><d> only after <b> fails then <a><d> fails, an inf loop results. arg -o converts the above to three <a> (a different kind of OR) diverting bnf2xml's front-recursion it's use is diminished because its un-necessary and may change bnf2xml does not assume to tail recurse the tail of any front recursion. but if it did (right from K&R C book): <a> ::= <a> "[]" ; <a> ::= <a> "()" ; <a> ::= <ident> x[][][]... ; would match and be ok, interpretively x()()()... ; is not intended to allow x()[]()... ; also not intended to allow note that defining <c> ::= <a> does not hide that <a> is front recursive, and <c> still needs a termination rule. front-recursing <a>'s 1st rule (see above) shuts the 1st rule rule until the 1st completes (an inf loop would occur). bnf2xml does not retry <a> or parts of automatically , use + for that. If bnf2xml did these things the recursion results might be ambiguous and bnf2xml has no syntax to handle ambiguity. This may be improved in future if ambiguity is not a result. NEW: i believe the above bnf <a> is "flawed" as should be assumed bison syntax (bottom up parser), not bnf necessarily. Bison may not front recurse: it decides itself given bottom up rules what to match; the above depends on states of such, the appearance of front recursion in syntax isn't what it seems. For ex. it may include lower matches - i'd have to try it. Be warned that operators - ! # could cause side-effects when used with front-recursion, true w/ no input progress may loop. undivert about bnf autorules... auto added rules: <a> :: ( <c> ... ) <d> gets rewritten to: <a> :: <) 1> <d> <) 1> :: <c> ... Why is because unary ops are used, has no scanning to "search for )". (compiled regex does this). The contents "<c> ..." are under one Left token <) so truth of "<c> and ..." is tested (known). see -d, it shows the table after rewrite (if any). shortcut: use <a> } in bnf.txt avoids the above since <a> can be composite (<a> can be a list of rules to try) no -o or auto rules are needed or added if one uses repeated <a> lines instead of (multimple items){mi}[mi] rule lists as seen above, mult. <a> are each tried in order (if recursing on one, that one is turned off to avoid inf recursion until return) this is NOT supported (<a> must all appear together in bnf file) <a> ::= ... <b> ::= ... <a> ::= ... ; TERMINALS is now only for ops like -g to say where to stop, preferrencial Dot . and Minus - skip the input's string position forward or back 1 character, or n char if "n"$ preceeded, which is not the same as matching and is reset on failure of rule. Not ! changes return value, un-does any matching, note that <a>+! is may not be supported # Test keeps the return value but undoes any matching. ` Emit emits the previous token or terminal when rule line is true, `` looks up symbol one deep first, .` dot emit current input char, -` the ipos, %` a counter, "n"$` what was skipped. @ Quiet quiets the last token or terminal and @@ quiets deep. = Equal assigns text of previous match to token and ~ Congruent is like equal with lists as described above. & Quit prints then exits truth <a> as if <a>?&. && forgets past input blocks, prints if streaming, and continues. <MATCH_LIST> only occurs due to ~ if bnf allowed R to be non-uniq. QUIT <a>& prints what is currently known and exists with truth of <a> failure of <a> does not cause rule to fail or block & action think of it as <a>?& and write rules accordingly ~ can cause <MATCH_LIST> <MATCH_SEP> in XML if your BNF ended up allowing a list of R that are the same (see notes on ~). EXAMPLES junk2/ contains examples and regression tests and are mean to be use like: $ sh junk2/tst AN OUTPUT EXAMPLE c.bnf is an almost full K&R C bnf for use with bnf2xml. not tested on much but regression tests yet though. # see README.bnf.c.txt $ echo -e "int main(){;}" | ./bnf2xml ./junk2/c.bnf --loop | junk2/g4 <bnf><program><external-definition><function-definition><decl-specifier-dd><type-specifier><type>int</type></type-specifier></decl-specifier-dd><space> </space><function-declarator><declarator-function><declarator-simple><identifier>main</identifier></declarator-simple><fun>(<parameter-list></parameter-list>)</fun></declarator-function></function-declarator><function-body><type-decl-list></type-decl-list><function-statement>{<declaration-list></declaration-list><statement-list><statement>;</statement></statement-list>}</function-statement></function-body></function-definition><data-definition-list></data-definition-list></external-definition></program></bnf> Some of the above emitted was marked to be emitted if empty in c.bnf The tags could be tree formatted, or be used as colors for html, or ie, an xml parser can pull all <identifiers></identifiers> as if a database). bnf2xml isn't a compiler, c.bnf is not complete K&R. c.bnf was "ok" as use of tesing bnf2xml for bugs (actually, c.bnf is not good at showing bugs - bugs were more tenacious than that). c.bnf may or may not be completed in future. junk2/ includes basic test scripts to check many features of bnf2xml for regression DEBUGGING shows terminals and symbols table. and can often show relative realtime progress while matching. (with -C, shows full truth table, maybe very long though) $ bnf2xml -d -d -k -C | less $ bnf2xml -l 123 # may be useful # & operator should be useful # or put a printf in code LIMITATIONS it's possible to define an infinite loop that does nothing <a> | <b>^ ; ^ is not designed to stay within |, goes to <a> <a>+^+ ; ^ is not designed for post-fix ops and may be wrong ; ie. 1st + clobbers 2nd, + isn't a saved state [<a>]+ ; senseless or double logic is not checked <a>~ <a>= ; always save under first of list of <a> ; = does not reset if rule fails , saves only <a>~ ; if <repl> pairing is used it must be on R ; ex, typedef <tdt> <Lt>~ <ident> <Ri>~ "100"$` . ; lapping input len just shows warning ; start>len of skip (due to maybe $$) exits there is only ONE unary postfix OP, excepting @, so organize your work by nesting of definitions postfixes: ~ = ] } * + ? ` $ independants: | # ! & . - % ex: <a>+@ ok <a>+= wrong <a>+` wrong <b>` ok & is indep so <a>+& works + & must be attatched so <a> + & is <a> "+" "&" <a>& without --streaming can't know how unfinished searches above could end up. results are limited. searches below current that succeeded print, unfinished upper labels are shown <MATCH_LIST> is only generated by ~ <BNF_RECORD> is only generated if options select mult lines the bnf file itself cannot contain binary 0's or "'s See: <BNF_ZERO> <BNF_QUOTE> the bnf file itself cannot have strings greater than #define MAXN 8096 independants must be quoted if litteral , not optional EFFICIENCY <a> ::= <> == "ab" "abe" "abel" ... Is more suitable than | for long string lists. Binary Search into list is done, note entries must be pre-sorted. Special keyword <repl> uses next line for pairing. BNF efficiency. Common sense. If you match something many times it makes sense to make it a symbol which is already matched. SCAN SPEED bnf2xml can be "slow" but progress should stay even for any size file TODOS are finishing K&R C c.bnf, maybe C ver 1, maybe a few speed improv. though it will never be "hand written scanner fast" at all remove VEC lib for efficiency it can have "c++'s need 3 copy syndrom" COMPARISONS sed(1) or grep(1) use regular expression definitions but aren't set up well to parse a language: but are better for small tasks. regex can be compiled in your C app to match and get return(s). sed(1) can be used to parse, ie, a very simple language which is used in a program's user editable startup .cfg file awk(1) is wonderful: it's matching abilities are formidable, and it's laguage plain. awk has limitations (such as memory, lag if resetting RS). Awk isn't well suited for all things (or is it?) cpp can be used to pre-process an input file which adds macro ability to any input file (usu fed to a parser, see also m4). yacc/flex can be used to make a .c file to parse a set pattern which calls set functions when called to read input (may be ambiguous and require C coded fixups, per pattern) Complex but speedy once done, "compiler flexors" read source code, the funs ouput what the linker wants to know, w/fixups. flex is straight forward for easy languages so many C prog use these C flexor functions to parse data / user input files. XSLT parsers are a (new) and some use BNF. They tend to expect (a particular) XML environment and do HTML output. ie, Cduce <snip> there is a little removed from the README viewable on "files download page" </snip LICENSE gpl2 ------------------------------------------------------------- ChangeLog last two bnf2xml (7.0.2) beta; urgency=low * yes same version, just REAME is revised a little more readable i hope * future: beleive c.bnf typedef issue mentioned was fixed in c.bnf, but it is not released yet as still working other mentioned issues for a 7.0.3 -- John Hendrickson <debguy@sourceforge.net> Thu, 07 Apr 2016 15:48:15 -0400 bnf2xml (7.0.1) beta; urgency=low * added autoconf support, rename files, etc, tidy a little -- John Hendrickson <debguy@sourceforge.net> Mon, 22 Dec 2014 12:39:14 -0500
Source: README.txt, updated 2016-04-08

Thanks for helping keep SourceForge clean.

Screenshot instructions:
Red Hat Linux   Ubuntu

Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Briefly describe the problem (required):

Upload screenshot of ad (required):
Select a file, or drag & drop file here.

Please provide the ad click URL, if possible:

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks