From: Günter M. <mi...@us...> - 2023-11-19 21:32:43
|
It turned out that actually "smartquotes" mostly used pre-compiled regexes already but re-did the recompilation with every call to `educateQuotes()`. Replacing these pre-compilations with direct calls allowed Python's caching to kick in and improve performance a bit. Pre-compiling at module import (as proposed in the patch) turned out to further improve, as did simplifying the regular expressions and introducing a preliminary thest for quotes to "educate". After theres optimizations, time spent on "smartquotes" went down from 20% of the time "buildhtml.py" requires to build the Docutils documentation to 10%. (Tested with py-spy before the changes, after the changes and with option `--smart-quotes=no`.) --- **[patches:#206] Improve SmartQuote performance** **Status:** open **Group:** None **Created:** Wed Aug 16, 2023 04:37 PM UTC by Chris Sewell **Last Updated:** Fri Aug 18, 2023 12:31 PM UTC **Owner:** nobody **Attachments:** - [0001-Pre-compile-smartquote-regexes.patch](https://sourceforge.net/p/docutils/patches/206/attachment/0001-Pre-compile-smartquote-regexes.patch) (8.3 kB; application/octet-stream) - [sphinx-build-after.svg](https://sourceforge.net/p/docutils/patches/206/attachment/sphinx-build-after.svg) (199.1 kB; image/svg+xml) - [sphinx-build-before.svg](https://sourceforge.net/p/docutils/patches/206/attachment/sphinx-build-before.svg) (282.1 kB; image/svg+xml) Performing a representative sphinx-build (10 x docutils/docs/ref/rst/restructuredtext.txt, dummy builder), and analysing with py-spy, you can see from the attached flamegraph that the smartquote transform accouts for over 22% of the build time! This PR attempts to improve that situation (at least down to 18%) by caching regex compilation --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/patches/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |