#15 reorganized sources

Antal K

For some time I have been trying to modify re2c to my
needs. Although I did not finish that, for some time I
will not have time to continue. Basically, I
reorganized the sources (more modular and probably
easier to understand), added some comments (mostly for
doxygen) and implemented some changes that I hope
others might find useful, too. Description of changes
is in README.in,
together with some comments on the directories. Please
take a look at it.

Note on compiling: for production #define NDEBUG to
turn off assert()'s. Otherwise just ./autogen.sh;
./configure; make; make check; sudo make install;
should work.

The resulting program is named re3c-re2c, it is
compatible to re2c (at least to the extent exercised by
make check), although the code emitted is not
identical. (See also -c flag)

make doxy : runs doxygen (HTML output under doxy/)

Unfortunately cannot upload everything:
Error: Uploaded file must be >20 and <256000 bytes.

Removed doc/, test/*, examples/* to accomodate.
(Kept test/trailing-var.* and examples/cmmap_re.c)


  • Antal K
    Antal K

    reorganized sources

  • Marcus Börger
    Marcus Börger

    • priority: 5 --> 1
    • status: open --> open-rejected
  • Marcus Börger
    Marcus Börger

    Logged In: YES

    If you would provide a patch against head that has all the
    docs i would happily apply them. Rearranging the source
    however doesn't make much sense and you would loose
    history completley.

  • Marcus Börger
    Marcus Börger

    • status: open-rejected --> closed-rejected
  • Antal K
    Antal K

    Logged In: YES

    > If you would provide a patch against head that has all the
    > docs i would happily apply them.
    Unfortunately I do not have the time to do that, that is why
    I decided to upload 'as it is'. I have found particularly
    hard to understand some fields that were reused (like depth
    and link in class State). These are also hard to document
    (need to describe context). I usually tried to avoid these
    reuses, and moved some of the uses to temporaries passed as
    arguments (e.g. re3c::SCC::state_depths_t state_depths(
    dfa->n_states() ); and re3c::SCC::state_link_t state_link(
    dfa->n_states() ); in re3c/dfa_pp/dfa_find_sccs.cc ) These
    transformations helped me to understand the code. Some of
    these might be also the reason why re3c-re2c is somewhat
    slower on small inputs. Another thing I have found hard to
    understand was the semantics of the DFA after it has been
    edited by void DFA::split(State *s); et al. (what do
    subclasses of class Action represent, etc) I have removed
    the Action stuff, and use class dfa_acceptor_t: public
    dfa_label_t {...} and other descendants of dfa_label_t instead.
    So at least some of the documentation is not easy to apply
    to re2c, and part of the documentation is code
    reorganization itself.
    > Rearranging the source however doesn't make much sense and
    > you would loose history completley.
    I understand your worry about loosing history. When I tried
    to understand the code, I did feel I have to reorganize it,
    but you probably already understand it as it is (which I did
    not at the time).

    Of the user-visible changes (in the uploaded README.in) some
    these might be worth repeating:

    a) allow re2c { ... } syntax instead of /*!re2c ... */
    b) allow keyword RE_t before named regexps.
    RE_t ID = letter (letter|digit|underscore)* ;
    instead of
    ID = letter (letter|digit|underscore)* ;

    (1a) would allow tools like cscope, emacs and doxygen
    to process the ... part. (e.g. point out uses of variables
    and functions), (1b) makes it look more like
    a declaration with initialization. The underscore in
    identifiers is also new, allows long_id instead of LongId.
    (compare examples/cmmap_re.c uploaded to its original
    examples/cmmap.re in distrib)

    2) diagnose variable length trailing context as error (or
    handle it). (e.g. ' "abc"/"d"+ { RET(ABC); }' )

    See: test/trailing-var.re

    3) gcc-style error messages (include input file name in
    Would allow e.g. emacs to jump to error.

    Docs/examples comments:
    4) [^] is probably a better expression for 'any' than
    [\000-\377], as it does not depend on the size of
    characters. I would suggest to use it in the examples.

    5) I would promote the use "" instead of [],
    because I would expect [] to mean the same as [a]\[a].