From: Volker v. N. <vol...@gm...> - 2015-10-19 17:46:52
|
Am 19.10.2015 um 16:56 schrieb Leo Butler: > > One further observation: sregex is written in scheme and uses recursion > extensively. When I looked at sregex a few years ago as a possible > replacement for nregex, I recall that the lisp version ran very slow and > had numerous stack overflows when run against Maxima's info files. The regex parser share/stringproc/pregexp.lisp by Dorai Sitaram has been fully revised and is now completely written in Lisp. https://github.com/ds26gte/pregexp/ I plan for the next weekend to replace the current pregexp.lisp and to write a documentation for the interface functions in sregex.lisp. Hopefully the new version will show better results. Volker > > Leo > > Volker van Nek <vol...@gm...> writes: > >> Thank you, Michel. >> >> These are very helpful informations. >> >> Volker van Nek >> >> 2015-10-19 10:31 GMT+02:00 Michel Talon <ta...@lp...>: >> >>> Le 18/10/2015 23:10, Robert Dodier a écrit : >>>>> In share/stringproc there is an alternative regex parser (portable regex >>>>>> parser by Dorai Sitaram) with an interface at Maxima level. >>>>>> >>>>>> It works nicely but it appears to be quite slow. >>>> Thanks for the reminder, I had forgotten about that. I suspect that >>>> sregex is strictly more powerful than nregex; I'd be surprised if nregex >>>> were any faster, but that is not a key point. Also it's helpful that >>>> there is a Maxima interface for sregex. Given all that, I'm in favor of >>>> moving sregex into src and nuking nregex. There are a few calls to >>>> nregex in src, but those would be easily replaced by sregex, I believe. >>>> >>> >>> Well, i have looked a little bit at the programes nregex, pregexp, and >>> the way nregex is used in maxima. Obviously pregexp is a complete regex >>> parser, covering more or less the perl regexp syntax, while nregex is >>> a very basic regex parser covering the standard new regex syntax as in >>> grep -E except that modern stuff [:alnum:] etc. is not supported. Of >>> course only 256 characters alphabets are supported. There are some >>> extensions as in emacs regexps, like \w matching words, \b matching >>> boundaries, etc. (and their opposites \W, \B etc.). The repeating >>> patterns {n,m} are not supported, i am not even sure the alternative >>> patterns patt1|patt2 are. But thanks to all these shortcomings nregex >>> is very small and can use a clever trick to speed up character classes >>> matching. Basically when encountering [...] it builds a bitstring of >>> length 256 having ones at each matching position (with special cases for >>> \w etc.) or inverted if one has [^...] and this can be very fast to >>> match on something. On the contrary pregexp does straightforward >>> comparisons. Hence one may expect considerable speed difference between >>> them. In the way in which regexp is used in maxima (in cl-info.lisp for >>> finding stuff and in in commac.lisp for stripping a string of trailing >>> zeroes) the speed difference could cause problems. >>> >>> >>> -- >>> Michel Talon >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Maxima-discuss mailing list >>> Max...@li... >>> https://lists.sourceforge.net/lists/listinfo/maxima-discuss >>> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Maxima-discuss mailing list >> Max...@li... >> https://lists.sourceforge.net/lists/listinfo/maxima-discuss |