For some time I have been trying to modify re2c to my
needs. Although I did not finish that, for some time I
will not have time to continue. Basically, I
reorganized the sources (more modular and probably
easier to understand), added some comments (mostly for
doxygen) and implemented some changes that I hope
others might find useful, too. Description of changes
is in README.in,
together with some comments on the directories. Please
take a look at it.
Note on compiling: for production #define NDEBUG to
turn off assert()'s. Otherwise just ./autogen.sh;
./configure; make; make check; sudo make install;
should work.
The resulting program is named re3c-re2c, it is
compatible to re2c (at least to the extent exercised by
make check), although the code emitted is not
identical. (See also -c flag)
make doxy : runs doxygen (HTML output under doxy/)
Unfortunately cannot upload everything:
Error: Uploaded file must be >20 and <256000 bytes.
Removed doc/, test/*, examples/* to accomodate.
(Kept test/trailing-var.* and examples/cmmap_re.c)
reorganized sources
Logged In: YES
user_id=271023
If you would provide a patch against head that has all the
docs i would happily apply them. Rearranging the source
however doesn't make much sense and you would loose
history completley.
Logged In: YES
user_id=1371979
> If you would provide a patch against head that has all the
> docs i would happily apply them.
Unfortunately I do not have the time to do that, that is why
I decided to upload 'as it is'. I have found particularly
hard to understand some fields that were reused (like depth
and link in class State). These are also hard to document
(need to describe context). I usually tried to avoid these
reuses, and moved some of the uses to temporaries passed as
arguments (e.g. re3c::SCC::state_depths_t state_depths(
dfa->n_states() ); and re3c::SCC::state_link_t state_link(
dfa->n_states() ); in re3c/dfa_pp/dfa_find_sccs.cc ) These
transformations helped me to understand the code. Some of
these might be also the reason why re3c-re2c is somewhat
slower on small inputs. Another thing I have found hard to
understand was the semantics of the DFA after it has been
edited by void DFA::split(State *s); et al. (what do
subclasses of class Action represent, etc) I have removed
the Action stuff, and use class dfa_acceptor_t: public
dfa_label_t {...} and other descendants of dfa_label_t instead.
So at least some of the documentation is not easy to apply
to re2c, and part of the documentation is code
reorganization itself.
> Rearranging the source however doesn't make much sense and
> you would loose history completley.
I understand your worry about loosing history. When I tried
to understand the code, I did feel I have to reorganize it,
but you probably already understand it as it is (which I did
not at the time).
Of the user-visible changes (in the uploaded README.in) some
these might be worth repeating:
1)
a) allow re2c { ... } syntax instead of /*!re2c ... */
b) allow keyword RE_t before named regexps.
RE_t ID = letter (letter|digit|underscore)* ;
instead of
ID = letter (letter|digit|underscore)* ;
(1a) would allow tools like cscope, emacs and doxygen
to process the ... part. (e.g. point out uses of variables
and functions), (1b) makes it look more like
a declaration with initialization. The underscore in
identifiers is also new, allows long_id instead of LongId.
(compare examples/cmmap_re.c uploaded to its original
examples/cmmap.re in distrib)
2) diagnose variable length trailing context as error (or
handle it). (e.g. ' "abc"/"d"+ { RET(ABC); }' )
See: test/trailing-var.re
3) gcc-style error messages (include input file name in
message)
Would allow e.g. emacs to jump to error.
Docs/examples comments:
4) [^] is probably a better expression for 'any' than
[\000-\377], as it does not depend on the size of
characters. I would suggest to use it in the examples.
5) I would promote the use "" instead of [],
because I would expect [] to mean the same as [a]\[a].