[Cgdb-devel] flex vs regex for tokenizing
Brought to you by:
bobbybrasko,
crouchingturbo
From: Bob R. <bob...@co...> - 2003-06-10 01:36:24
|
Hi, I am deciding on how to support syntax highlighting in cgdb. The first thing necessary is to tokenize the file. I have come up with 2 possible solutions. 1. Use flex.=20 This would result in 1 flex input file per language supported. It would be the fastest method. However, changing the flex input file would result in recompiling=20 the flex driver and linking. This is a pain. Also, adding new syntax files would not be plug/play. code would have to be changed, configure scripts regenerated, ... It would be hard to configure the tokenizer, since the format is done at compile time. 2. Use regex. This would result in 1 input file used per language. This file would be a format of our choice.=20 This method will be less efficient due to regcomp/regexec. In order to get a match, a regex has to be applied to a buffer. So basically, cgdb would have to apply each regex against the file. Remembering the positions. There could be up to 20 regex's! This would be a good approach because we could just drop new text files into a plugin directory and then cgdb would be able to highlight the language. Since, regex is done at runtime. So, the dilemma is, how can cgdb gain performance ( flex ) and plugin support ( regex )? Which is better? Can anyone think of any improvements? Thanks, Bob Rossi |