Re: [re2c-general] re2asm?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello Joshua,

Saturday, January 28, 2006, 9:58:04 PM, you wrote:

> Hello re2c developers!

> I am interested in modularizing re2c's code generation backend, so  
> that it could target other languages in an extensible way.   
> Specifically, I am interested in targeting x86 assembly language, and  
> perhaps even x86 machine code.

> That may strike you as a silly idea, since C compilers target both of  
> those languages very handily.  But my ultimate goal is to make re2c  
> available from high-level languages like Ruby and Python, so that you  
> could create extremely fast lexers in either of those languages  
> without needing an extra compile step in the development cycle.  The  
> idea is that you could use re2c to generate machine code on-the-fly  
> from a description language, and then call into that machine code  
> from the interpreter.

Many of todays vhl-languages have just intime compilers which do te job
automatically. Also in most languages you cannot simply call into some
string that contains some opcodes. Especially with todays antivirus
technologies in place. Thus you would need to create .so/.dll's on the
fly which seems a lot of unneccessary overkill.

> Another benefit is that you could use domain-specific knowledge about  
> regular expression matching to generate more optimized code than a C  
> compiler could, and make your lexers even faster.  You already have  
> tricks in re2c to try coaxing compilers into generating optimal code  
> (I'm thinking of -b and -s) -- having an assembly language backend  
> could give you more opportunities to introduce optimizations without  
> having to trick the compiler.

You cannot mix run-time with compile-time optimization here. And the
domain specific regular expressions are completley run-time whicl re2c
uses a compile-time only approach.

> Tell me if I'm overlooking anything, but it appears to me that  
> modularizing the back-end could be done like so: give all the DFA  
> classes (anything with an emit() method) a base-class with language- 
> agnostic state, and subclasses for every language (LangC::DFA or  
> something like that) that implement emit().  Then make all  
> construction of these objects happen through factories that chose a  
> subclass based on what language back-end is currently selected.

For (v)hl-languages i think it would be best to provide only the emit()
stuff by some table/plugin and convert anything that is not yet output
via emit() into something that lets you hook into output via the same
techniques. Maybe you still need to output from those emit() functions
via rule sets. For example if php would get the got as discussed since
long the only change to current c output would be '$' prefixing of the
variables. Also not all languages have the same escaping sequences in
strings. Last but not leat it may be possible to support re2c with a
an include directive for its inplace configuration stuff that loads
all that is needed to change the output generation from some setup
files which can be produced by the re2c package.

> Does that sound reasonable?  Would you be interested in integrating a  
> change like that into the main code tree?

> Let me know what you think.  Thanks!

I wouldn't invest anything into assembler generation. If you can come up
with a sound patch we can integrate that. However not only testing would
get much more complex...

Best regards,
 Marcus                            mailto:ma...@ma...