Menu

#569 copy.deepcopy loses comments between map key and seq value

open
nobody
None
minor
bug
2026-03-24
2026-03-24
No

copy.deepcopy on a CommentedSeq containing CommentedMap items loses comments that appear between a map key and its sequence value. There are two bugs:

1) CommentedBase.copy_attributes passes memo as the default argument to getattr instead of passing it to copy.deepcopy, so YAML attributes are deepcopied without memo tracking, breaking shared references.

2) CommentedSeq.__deepcopy__ calls copy_attributes inside the element loop. Each append calls insert, which shifts comment indices forward. Without memo tracking (bug 1), each copy_attributes call creates a fresh copy that masks the shifts. If bug 1 is fixed independently, the shifts accumulate and corrupt comment placement.

Reproducer

import copy
import ruamel.yaml

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=0)

yaml_input = """\

-   key_a: value_a
    my_list:
    # comment before first item
    -   item_one
    -   item_two


-   key_b: value_b
"""

data = yaml.load(yaml_input)

from io import BytesIO

def dump_to_str(d):
    buf = BytesIO()
    yaml.dump(d, buf)
    return buf.getvalue().decode()

original_output = dump_to_str(data)
print("=== original ===")
print(original_output)

data_copy = copy.deepcopy(data)
copy_output = dump_to_str(data_copy)
print("=== after deepcopy ===")
print(copy_output)

assert copy_output == original_output, "deepcopy changed the output!"

The output I got:

=== original ===

-   key_a: value_a
    my_list:
    # comment before first item
    -   item_one
    -   item_two


-   key_b: value_b

=== after deepcopy ===

-   key_a: value_a
    my_list: -
        item_one
    -   item_two
-   key_b: value_b

Traceback (most recent call last):
  File "repro.py", line 30, in <module>
    assert copy_output == original_output, "deepcopy changed the output!"
AssertionError: deepcopy changed the output!

The expected output:

=== original ===

-   key_a: value_a
    my_list:
    # comment before first item
    -   item_one
    -   item_two


-   key_b: value_b

=== after deepcopy ===

-   key_a: value_a
    my_list:
    # comment before first item
    -   item_one
    -   item_two


-   key_b: value_b

Suggested fix

Patch against comments.py:

--- a/comments.py
+++ b/comments.py
@@ -448,7 +448,7 @@ class CommentedBase:
                   Tag.attrib, merge_attrib]:
             if hasattr(self, a):
                 if memo is not None:

-                    setattr(t, a, copy.deepcopy(getattr(self, a, memo)))
+                    setattr(t, a, copy.deepcopy(getattr(self, a), memo))
                 else:
                     setattr(t, a, getattr(self, a))
         return t
@@ -560,8 +560,8 @@ class CommentedSeq(MutableSliceableSequence, list, CommentedBase):  # type: igno
         res = self.__class__()
         memo[id(self)] = res
         for k in self:
             res.append(copy.deepcopy(k, memo))
-            self.copy_attributes(res, memo=memo)
+        self.copy_attributes(res, memo=memo)
         return res

The copy_attributes fix corrects the memo argument — it was being passed as the default value to getattr instead of as the second argument to copy.deepcopy. This means YAML attributes were deepcopied without the memo dict, so shared object references within a single deepcopy call were not preserved.

The __deepcopy__ fix moves copy_attributes outside the loop. Inside the loop, each append calls insert, which shifts comment indices forward in ca.items. With correct memo tracking, copy_attributes returns the same ca object each iteration, so the shifts accumulate. Moving it outside the loop means copy_attributes runs once after all elements are in place, with no further shifts.

Platform

  • Python 3.14.3
  • ruamel.yaml 0.19.1
  • macOS (arm64)

Discussion


Log in to post a comment.

MongoDB Logo MongoDB