From: Günter M. <mi...@us...> - 2023-08-18 12:31:30
|
Thank you for the patch. I wonder, why there is a considerable performance hit despite the documentation saying that >The compiled versions of the most recent patterns passed to re.compile() and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions. and Docutils uses far less than `re._MAXCACHE == 512` regular expressions. Maybe "re.sub" is no *module-level matching function*? OTOH, the doc says: >Pattern.sub(repl, string, count=0) > Identical to the sub() function, using the compiled pattern. The 22% refer to the parse (or more precise parse+transform) time rather than the build time in a real-world use case (as the dummy builder is more efficient than a HTML builder, say). Still, 22% of the time to create a document tree is impressive. Could you test the attached simplified version of your patch. (If the unconditional pre-compilation is considered too wasteful in case "smartquotes" are switched off, I'd rather consider a conditional import of the "smartquotes" module.) Another improvement may be achieved by simplifying the regexps themselves: The current version is taken from the "SmartyPants" module that also cares for HTML input and checks for character entities like `–` or ` `. Attachments: - [pre-compile-regexes-simplified.patch](https://sourceforge.net/p/docutils/patches/_discuss/thread/2d7c8b7ca3/9c1b/attachment/pre-compile-regexes-simplified.patch) (7.4 kB; text/x-patch) --- **[patches:#206] Improve SmartQuote performance** **Status:** open **Group:** None **Created:** Wed Aug 16, 2023 04:37 PM UTC by Chris Sewell **Last Updated:** Wed Aug 16, 2023 04:37 PM UTC **Owner:** nobody **Attachments:** - [0001-Pre-compile-smartquote-regexes.patch](https://sourceforge.net/p/docutils/patches/206/attachment/0001-Pre-compile-smartquote-regexes.patch) (8.3 kB; application/octet-stream) - [sphinx-build-after.svg](https://sourceforge.net/p/docutils/patches/206/attachment/sphinx-build-after.svg) (199.1 kB; image/svg+xml) - [sphinx-build-before.svg](https://sourceforge.net/p/docutils/patches/206/attachment/sphinx-build-before.svg) (282.1 kB; image/svg+xml) Performing a representative sphinx-build (10 x docutils/docs/ref/rst/restructuredtext.txt, dummy builder), and analysing with py-spy, you can see from the attached flamegraph that the smartquote transform accouts for over 22% of the build time! This PR attempts to improve that situation (at least down to 18%) by caching regex compilation --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/patches/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |