Performing a representative sphinx-build (10 x docutils/docs/ref/rst/restructuredtext.txt, dummy builder), and analysing with py-spy, you can see from the attached flamegraph that the smartquote transform accouts for over 22% of the build time!
This PR attempts to improve that situation (at least down to 18%) by caching regex compilation
Thank you for the patch.
I wonder, why there is a considerable performance hit despite the documentation saying that
and Docutils uses far less than
re._MAXCACHE == 512regular expressions.Maybe "re.sub" is no module-level matching function? OTOH, the doc says:
The 22% refer to the parse (or more precise parse+transform) time rather than the build time in a real-world use case (as the dummy builder is more efficient than a HTML builder, say). Still, 22% of the time to create a document tree is impressive.
Could you test the attached simplified version of your patch.
(If the unconditional pre-compilation is considered too wasteful in case "smartquotes" are switched off, I'd rather consider a conditional import of the "smartquotes" module.)
Another improvement may be achieved by simplifying the regexps themselves:
The current version is taken from the "SmartyPants" module that also cares for HTML input and checks for character entities like
–or .It turned out that actually "smartquotes" mostly used pre-compiled regexes already but re-did the recompilation with every call to
educateQuotes(). Replacing these pre-compilations with direct calls allowed Python's caching to kick in and improve performance a bit. Pre-compiling at module import (as proposed in the patch) turned out to further improve, as did simplifying the regular expressions and introducing a preliminary thest for quotes to "educate".After theres optimizations, time spent on "smartquotes" went down from 20% of the time "buildhtml.py" requires to build the Docutils documentation to 10%. (Tested with py-spy before the changes, after the changes and with option
--smart-quotes=no.)Applied in [r9479]. More optimizations in [r9480] and [r9481]
Related
Commit: [r9479]
Commit: [r9480]
Commit: [r9481]