[Sqlalchemy-tickets] Issue #4270: cache key calc for loader options w/ reuse has performance issue
Brought to you by:
zzzeek
From: Michael B. <iss...@bi...> - 2018-06-06 13:25:24
|
New issue 4270: cache key calc for loader options w/ reuse has performance issue https://bitbucket.org/zzzeek/sqlalchemy/issues/4270/cache-key-calc-for-loader-options-w-reuse Michael Bayer: copied from the mailing list We have just upgraded to sqlalchemy 1.2.7 (from 1.1.14), and had a performance issue with a query that uses a lot of joinedloads that was caused by the automatic baking of all relationship queries that was introduced in 1.2. Say we have a set of tables with relationships Book.pages, Page.font and Font.layout. We have a query of this form: ``` #!python pages = joinedload(Book.pages) option1 = pages.joinedload(Page.font) option2 = pages.joinedload(Page.layout) query = session().query(Book).options(option1, option2) ``` The important point here is that the pages object defined on line 1 is reused in both option1 and option2. Now suppose we fetch another relationship that wasn't joined-loaded on the returned instances. This will case another query as it is not loaded already, and this query will be baked due to the change in 1.2 to bake all relationship loads. We found that the construction of the cache key for baking this query becomes very slow as the number of options of this form increases, and is in fact quadratic in the number of such options (we have ~25 such options in our problematic query). This is due to each option containing all of the joinedloads inside its _to_bind attribute, and _UnboundLoad._generate_cache_key has to process everything in the _to_bind list. E.g. in this example: ``` #!python print([[str(i) for i in load.path] for load in option1._to_bind]) print([[str(i) for i in load.path] for load in option2._to_bind]) [['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']] [['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']] ``` Therefore, when generating the key for each option we are processing the joinedloads from all of the options, leading to the quadratic performance degradation. We fixed it by avoiding reusing the joinedload for Book.pages by doing this: ``` #!python option1 = joinedload(Book.pages).joinedload(Page.font) option2 = joinedload(Book.pages).joinedload(Page.layout) ``` The resulting query is unchanged, but the cache key function is now just linear in the number of joinedloads as each option has only its relationships in its _to_bind attribute. In our case, this completely solved the performance issue. |