|
From: Joshua H. <jo...@re...> - 2006-01-29 20:22:33
|
Hey Marcus, I wasn't as clear as I could have been about my plan for integrating re2c into high-level languages. What I am proposing is embedding the entire re2c engine into Ruby and Python interpreters, by writing a C extension that links re2c in as a library. In the high-level language, you would pass your regular expressions to the embedded re2c engine, which would generate assembly on-the-fly. It's very much like a JIT, but instead of compiling a vhl-language into machine code, it would compile regular expressions into machine code. I believe this would be a big win, but it's ok if you don't agree. The most important question I have for you (and other re2c developers/ maintainers) is whether you would be open to integrating a patch that makes the code-generation modular, so that code-generation back-ends can be chosen at run-time. And if you have any tips for achieving this modularization, that would be great too. Thanks, Josh On Jan 29, 2006, at 3:58 AM, Marcus Boerger wrote: > Hello Joshua, > > Saturday, January 28, 2006, 9:58:04 PM, you wrote: > >> Hello re2c developers! > >> I am interested in modularizing re2c's code generation backend, so >> that it could target other languages in an extensible way. >> Specifically, I am interested in targeting x86 assembly language, and >> perhaps even x86 machine code. > >> That may strike you as a silly idea, since C compilers target both of >> those languages very handily. But my ultimate goal is to make re2c >> available from high-level languages like Ruby and Python, so that you >> could create extremely fast lexers in either of those languages >> without needing an extra compile step in the development cycle. The >> idea is that you could use re2c to generate machine code on-the-fly >> from a description language, and then call into that machine code >> from the interpreter. > > Many of todays vhl-languages have just intime compilers which do te > job > automatically. Also in most languages you cannot simply call into some > string that contains some opcodes. Especially with todays antivirus > technologies in place. Thus you would need to create .so/.dll's on the > fly which seems a lot of unneccessary overkill. > >> Another benefit is that you could use domain-specific knowledge about >> regular expression matching to generate more optimized code than a C >> compiler could, and make your lexers even faster. You already have >> tricks in re2c to try coaxing compilers into generating optimal code >> (I'm thinking of -b and -s) -- having an assembly language backend >> could give you more opportunities to introduce optimizations without >> having to trick the compiler. > > You cannot mix run-time with compile-time optimization here. And the > domain specific regular expressions are completley run-time whicl re2c > uses a compile-time only approach. > >> Tell me if I'm overlooking anything, but it appears to me that >> modularizing the back-end could be done like so: give all the DFA >> classes (anything with an emit() method) a base-class with language- >> agnostic state, and subclasses for every language (LangC::DFA or >> something like that) that implement emit(). Then make all >> construction of these objects happen through factories that chose a >> subclass based on what language back-end is currently selected. > > For (v)hl-languages i think it would be best to provide only the > emit() > stuff by some table/plugin and convert anything that is not yet output > via emit() into something that lets you hook into output via the same > techniques. Maybe you still need to output from those emit() functions > via rule sets. For example if php would get the got as discussed since > long the only change to current c output would be '$' prefixing of the > variables. Also not all languages have the same escaping sequences in > strings. Last but not leat it may be possible to support re2c with a > an include directive for its inplace configuration stuff that loads > all that is needed to change the output generation from some setup > files which can be produced by the re2c package. > >> Does that sound reasonable? Would you be interested in integrating a >> change like that into the main code tree? > >> Let me know what you think. Thanks! > > I wouldn't invest anything into assembler generation. If you can > come up > with a sound patch we can integrate that. However not only testing > would > get much more complex... > > Best regards, > Marcus mailto:ma...@ma... > |