re2c scanner generator / Patches / #15 reorganized sources

#15 reorganized sources

Status: closed-rejected

Owner: nobody

Labels: None

Priority: 1

Updated: 2005-12-29

Created: 2005-12-20

Creator: Antal K

Private: No

For some time I have been trying to modify re2c to my
needs. Although I did not finish that, for some time I
will not have time to continue. Basically, I
reorganized the sources (more modular and probably
easier to understand), added some comments (mostly for
doxygen) and implemented some changes that I hope
others might find useful, too. Description of changes
is in README.in,
together with some comments on the directories. Please
take a look at it.

Note on compiling: for production #define NDEBUG to
turn off assert()'s. Otherwise just ./autogen.sh;
./configure; make; make check; sudo make install;
should work.

The resulting program is named re3c-re2c, it is
compatible to re2c (at least to the extent exercised by
make check), although the code emitted is not
identical. (See also -c flag)

make doxy : runs doxygen (HTML output under doxy/)

Unfortunately cannot upload everything:
Error: Uploaded file must be >20 and <256000 bytes.

Removed doc/, test/*, examples/* to accomodate.
(Kept test/trailing-var.* and examples/cmmap_re.c)

Discussion

Antal K - 2005-12-20

reorganized sources

re3c-0.0.0052.tar.bz2

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marcus Börger - 2005-12-29

priority: 5 --> 1

status: open --> open-rejected
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marcus Börger - 2005-12-29

Logged In: YES
user_id=271023

If you would provide a patch against head that has all the
docs i would happily apply them. Rearranging the source
however doesn't make much sense and you would loose
history completley.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marcus Börger - 2005-12-29

status: open-rejected --> closed-rejected
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Antal K - 2006-01-02

Logged In: YES
user_id=1371979

> If you would provide a patch against head that has all the
> docs i would happily apply them.
Unfortunately I do not have the time to do that, that is why
I decided to upload 'as it is'. I have found particularly
hard to understand some fields that were reused (like depth
and link in class State). These are also hard to document
(need to describe context). I usually tried to avoid these
reuses, and moved some of the uses to temporaries passed as
arguments (e.g. re3c::SCC::state_depths_t state_depths(
dfa->n_states() ); and re3c::SCC::state_link_t state_link(
dfa->n_states() ); in re3c/dfa_pp/dfa_find_sccs.cc ) These
transformations helped me to understand the code. Some of
these might be also the reason why re3c-re2c is somewhat
slower on small inputs. Another thing I have found hard to
understand was the semantics of the DFA after it has been
edited by void DFA::split(State *s); et al. (what do
subclasses of class Action represent, etc) I have removed
the Action stuff, and use class dfa_acceptor_t: public
dfa_label_t {...} and other descendants of dfa_label_t instead.
So at least some of the documentation is not easy to apply
to re2c, and part of the documentation is code
reorganization itself.
> Rearranging the source however doesn't make much sense and
> you would loose history completley.
I understand your worry about loosing history. When I tried
to understand the code, I did feel I have to reorganize it,
but you probably already understand it as it is (which I did
not at the time).

Of the user-visible changes (in the uploaded README.in) some
these might be worth repeating:

1)
a) allow re2c { ... } syntax instead of /*!re2c ... */
b) allow keyword RE_t before named regexps.
RE_t ID = letter (letter|digit|underscore)* ;
instead of
ID = letter (letter|digit|underscore)* ;

(1a) would allow tools like cscope, emacs and doxygen
to process the ... part. (e.g. point out uses of variables
and functions), (1b) makes it look more like
a declaration with initialization. The underscore in
identifiers is also new, allows long_id instead of LongId.
(compare examples/cmmap_re.c uploaded to its original
examples/cmmap.re in distrib)

2) diagnose variable length trailing context as error (or
handle it). (e.g. ' "abc"/"d"+ { RET(ABC); }' )

See: test/trailing-var.re

3) gcc-style error messages (include input file name in
message)
Would allow e.g. emacs to jump to error.

Docs/examples comments:
4) [^] is probably a better expression for 'any' than
[\000-\377], as it does not depend on the size of
characters. I would suggest to use it in the examples.

5) I would promote the use "" instead of [],
because I would expect [] to mean the same as [a]\[a].

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

reorganized sources

Group

Searches

Help

#15 reorganized sources

Discussion