Menu

#513 deepcopy performance bug

open
None
major
bug
2026-01-01
2024-04-19
Harmen
No

In comments.py please change

setattr(t, a, copy.deepcopy(getattr(self, a, memo)))

to

setattr(t, a, copy.deepcopy(getattr(self, a), memo))

otherwise it runs in quadratic complexity when deepcopying something as trivial as a list

Thanks!

Discussion

  • Harmen

    Harmen - 2025-11-19

    Any updates on this? It's a serious performance problem with a trivial fix.

     
  • Anthon van der Neut

    I would not mind changing a performance issue, but the proposed change breaks a test that has been in existence since 2019:

    _test/test_issues.py:879: AssertionError
    ================================================================= short test summary info =================================================================
    FAILED _test/test_issues.py::TestIssues::test_issue_295 - assert "A:\n  b:\n  ... - l33: '5'\n" == "A:\n  b:\n  ... - l33: '5'\n"
    ================================================== 1 failed, 772 passed, 77 skipped, 22 xfailed in 4.31s ==================================================
    roo ruamel/yaml »  hg diff
    diff --git a/comments.py b/comments.py
    --- a/comments.py
    +++ b/comments.py
    @@ -448,7 +448,7 @@
                       Tag.attrib, merge_attrib]:
                 if hasattr(self, a):
                     if memo is not None:
    
    -                    setattr(t, a, copy.deepcopy(getattr(self, a, memo)))
    +                    setattr(t, a, copy.deepcopy(getattr(self, a), memo))
                     else:
                         setattr(t, a, getattr(self, a))
             return t
    

    So it is not that trivial.

     
  • Harmen

    Harmen - 2026-01-01

    Hi Anthon, happy new year! I've spent another couple minutes on it, and it seems that this is the full fix.

    Would be happy to see this finally addressed :)

    # HG changeset patch
    # User Harmen Stoppels <me@harmenstoppels.nl>
    # Date 1767266026 -3600
    #      Thu Jan 01 12:13:46 2026 +0100
    # Node ID 670ee0e7ad049a1d7d97f0cabb8b182eb029c57b
    # Parent  7938d2acd452c3b5c8f4a1df2dcbc2c7661149ac
    Fix quadratic complexity and broken referential integrity in deepcopy
    
    This commit addresses two critical issues in the deepcopy implementation for
    `CommentedSeq` and `CommentedBase`:
    
    
    1. Referential integrity: in `CommentedBase.copy_attributes`, the `memo`
    dictionary was incorrectly passed as the default value to `getattr()` instead
    of being passed as the second argument to `copy.deepcopy()`. The call
    `copy.deepcopy(getattr(self, a, memo))` should have been
    `copy.deepcopy(getattr(self, a), memo)`. This caused `deepcopy` to start a
    fresh memoization context for every attribute, breaking referential integrity
    for objects shared between the sequence and its attributes (e.g., comments).
    
    
    2. Quadratic complexity: in `CommentedSeq.__deepcopy__`, the call to
    `self.copy_attributes` was likely accidentally indented inside the loop that
    copies sequence items. Attributes (including comments) were deep-copied `N`
    times for a sequence of length `N`. Combined with issue #1, this resulted in
    severe `N^2` performance degradation. The line is unindented so it executes
    once after the loop completes.
    
    diff -r 7938d2acd452 -r 670ee0e7ad04 comments.py
    --- a/comments.py   Wed Dec 31 17:46:54 2025 +0100
    +++ b/comments.py   Thu Jan 01 12:13:46 2026 +0100
    @@ -448,7 +448,7 @@
                       Tag.attrib, merge_attrib]:
                 if hasattr(self, a):
                     if memo is not None:
    
    -                    setattr(t, a, copy.deepcopy(getattr(self, a, memo)))
    +                    setattr(t, a, copy.deepcopy(getattr(self, a), memo))
                     else:
                         setattr(t, a, getattr(self, a))
             return t
    @@ -562,7 +562,7 @@
             memo[id(self)] = res
             for k in self:
                 res.append(copy.deepcopy(k, memo))
    -            self.copy_attributes(res, memo=memo)
    +        self.copy_attributes(res, memo=memo)
             return res
    
         def __add__(self, other: Any) -> Any:
    
     

Log in to post a comment.

MongoDB Logo MongoDB