[Cgdb-devel] flex vs regex for tokenizing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I am deciding on how to support syntax highlighting in cgdb.
The first thing necessary is to tokenize the file. I have come up with 2
possible solutions.

1. Use flex.=20
   This would result in 1 flex input file per language supported.
   It would be the fastest method.

   However, changing the flex input file would result in recompiling=20
   the flex driver and linking. This is a pain.

   Also, adding new syntax files would not be plug/play. code would have
   to be changed, configure scripts regenerated, ...
   It would be hard to configure the tokenizer, since the format is done
   at compile time.

2. Use regex.
   This would result in 1 input file used per language. This file would
   be a format of our choice.=20

   This method will be less efficient due to regcomp/regexec. In order
   to get a match, a regex has to be applied to a buffer. So basically,
   cgdb would have to apply each regex against the file. Remembering the
   positions. There could be up to 20 regex's!

   This would be a good approach because we could just drop new text
   files into a plugin directory and then cgdb would be able to
   highlight the language. Since, regex is done at runtime.

So, the dilemma is, how can cgdb gain performance ( flex ) and plugin
support ( regex )? Which is better? Can anyone think of any
improvements?

Thanks,
Bob Rossi