Menu

#539 Multiline strings don't always roundtrip (writing then reading results in a different string)

resolved
nobody
None
minor
bug
2025-06-06
2025-03-06
Jan Möller
No

Ruamel strings don't always round trip - sometimes, the read string has spaces inserted that weren't there on writing. As far as I can tell, this is a bug on reading the yaml, since the generated yaml file looks okay to my eye.

For example, with width=10, the string "something\n\n# other" becomes
"something\n
\n# other"
which is read back as "something\n \n# other" (note the additional space between the newlines).

Minimal working example to trigger this:

from io import StringIO

from ruamel.yaml import YAML

yaml = YAML()
yaml.width = 10


def yaml_to_string(y):
    with StringIO() as io:
        yaml.dump(y, io)
        return io.getvalue()


original = "something\n\n# other"
y = yaml_to_string(original)
print(y)
read = yaml.load(y)

print(f"original: {original}")
print(original.encode().hex(" "))

print(f"read: {read}")
print(read.encode().hex(" "))
print()
print(f"Are they the same? {original == read}")

Output is as follows:

"something\n
\n# other"

original: something

other

73 6f 6d 65 74 68 69 6e 67 0a 0a 23 20 6f 74 68 65 72
read: something

other

73 6f 6d 65 74 68 69 6e 67 0a 20 0a 23 20 6f 74 68 65 72

Are they same? False

Some observations I made:

  • Here I roundtrip via string, but the same thing happens when going through the file system.
  • This seems to happen when the string is wrapped in the yaml such that the original string has newlines before and/or after the point of wrapping, and also there is a space nearby.
  • The "#" in the original string doesn't matter, but the space after it does. Removing the "#" still reproduces the problem, but removing the space no longer reproduces the issue.
  • Possibly this is related to ticket 508

To me, this is a major issue since I can't change the line width of my yaml and neither can I change my string. However, I believe the priority should be set by you, so I'm leaving it at default.

Discussion

  • Jan Möller

    Jan Möller - 2025-03-06

    I forgot to mention, this is on ruamel version 0.18.10, Python 3.11.9, Windows.

     

    Last edit: Jan Möller 2025-03-06
  • David Shay

    David Shay - 2025-04-29

    This is quite a problem for me. We are using ruamel to pre-process CloudFormation files containing inlined Python code that will be uploaded in an AWS Lambda function. The added spaces make the Python code wrong and the Lambdas cannot execute anymore.

     
  • Saugat Pachhai

    Saugat Pachhai - 2025-05-02

    We are facing this issue in dvc as well.

    It seems this was introduced in 0.17.23.

    After that, the strings are dumped as a double quoted multiline strings, but the splits are no longer escaped with a backquote (\). It seems that was the motivation behind the change ( https://stackoverflow.com/a/75634614 ).

    Reading this stackoverflow, it seems the backquote is necessary, but the author is more familiar with the YAML spec. :)

    I have a following reproducer, coming from a contributor of the issue in https://github.com/iterative/dvc/issues/10668 .

    import io
    
    from ruamel.yaml import YAML
    
    buf = io.StringIO()
    yaml = YAML()
    data = {
        "a_long_parameter_name": "This is a prompt.\nThis is a prompt.\nThis is a prompt.\nThis is a prompt.\nThis is a prompt.\nThis is a prompt.\n"
    }
    yaml.dump(data, buf)
    
    buf.seek(0)
    
    actual = yaml.load(buf)
    assert actual["a_long_parameter_name"] == data["a_long_parameter_name"], (
        f"{actual['a_long_parameter_name']!r} != {data['a_long_parameter_name']!r}"
    )
    
     
    👍
    1

    Last edit: Saugat Pachhai 2025-05-02
  • Anthon van der Neut

    This has been fixed in 0.18.11, which, because of backwards incompatible changes on PyPI/twine, can no longer upload to PyPI. You can install the package using

    pip install https://yaml.dev/packages/python/ruamel.yaml/ruamel.yaml-0.18.11-py3-none-any.whl
    

    but be aware that this does not install ruamel.yaml.clib as a dependency on Python 3.13 yet.

    There are two tests added to ruamel.yaml.data ( https://sourceforge.net/p/ruamel-yaml-data/code/ci/default/tree/string/) for both examples

     

    Last edit: Anthon van der Neut 2025-05-19
  • Saugat Pachhai

    Saugat Pachhai - 2025-05-19

    Hey Anthon. Thanks for fixing this promptly. The above reproducer now works but a new one that I tried (also from above mentioned DVC issue) still fails.

    import io
    import sys
    
    from ruamel.yaml import YAML
    
    yaml = YAML()
    buf = io.StringIO()
    
    s = "This is a prompt.\nThis is a prompt.\nThis is a prompt.\nThis is a prompt.\nThis is a prompt.\n"
    yaml.dump(
        {
            "stages": {
                "faulty_stage": {"params": {"params.yaml": {"fault_parameter_name": s}}}
            }
        },
        buf,
    )
    buf.seek(0)
    sys.stdout.write(buf.getvalue())
    
    actual = yaml.load(buf)
    actual_s = actual["stages"]["faulty_stage"]["params"]["params.yaml"][
        "fault_parameter_name"
    ]
    assert actual_s == s, f"Expected {s!r}, but got {actual_s!r}"
    
     
  • Saugat Pachhai

    Saugat Pachhai - 2025-06-03

    Not sure what happened the last time I tried, but the above reproducer is now passing for me with 0.18.11 and 0.18.12.

     
    • Anthon van der Neut

      I think I messed up and included a fix in 0.18.11 while trying to get that uploaded to PyPI (again). So the 0.18.11 on yaml.dev and the one on PyPI are not the same and you used the first one initially to get the issue, and the second one while re-testing.

       
      👍
      1
  • Anthon van der Neut

    • status: open --> resolved
     

Log in to post a comment.

MongoDB Logo MongoDB