From: Mukhitdinov M. <man...@gm...> - 2009-03-27 15:23:17
|
Hi! A month ago I had conversation with Jeff and Andreas about 2 projects, namely "RegExp engine" and "Graph query language". But unfortunately, "Graph query language" was given to Alejandro(a student from previous year). So I focused my attention on "RegExp engine" project. I've following solutions: - Coding Style Make RegExp engine follow coding style written in "Tcl Style Guide" for Tcl and "Tcl/Tk engineering" for C. - Reversion Reversion of automaton is done simply by adding a new state from which all previous accept states are reachable for 0 step(that means, needs no input). Then, previous accept states are changed to non-accept and previous beginning state is changed to accept state. In practice it can be done by addional "reverse" flag to regexp/regsub ("options" array) and setting appropriate flag for Tcl_RegExpFromObj. All following work with automaton manipulation is done when regexp is compiled -Lookbehind constraints I've 2 solutions to this problem: 1) Lookbehind constraint is somewhat similar to lookafter constraint. So it can be done by changing lookbehind to lookafter and placing empty character before it. For example, If we were searching for (?<=a)b we could change it to e(?=a)b where e - is empty character But this solution is more patchy. To give precise position of the match we should somehow adjust the current position. It's more invisible problems 2) This solution is involves adding caching, structures, constants and functions similar what's needed for lookafter constraint - Stream interface. It seems to me that adding this feature to the core slows the engine. Cause each element is compiled every time it's extracted. Of course, caching may help, but performance will vary from case to case. Besides, it can be done without much effort by the user - Fixed character width I can't see what's the problem. If it intends removing Tcl_UtfToUniCharDString call and doing byte oriented search we can face problems with counting. After each step documentation will be updated and test suites will be developed! Comments are appreciated. Thanks for your attention, Manzur |