Menu

#566 RoundTripScanner incorrectly scans comments with CRLF line endings

open
nobody
None
major
bug
2026-03-23
2026-03-17
No

When loading a file using RoundTripParser on Windows, RoundTripScanner.scan_next_token() will mistakenly scan the first CRLF of each comment as "\r\n" instead of "\n". On dumping the parsed data to a file Python will replace the "\n" with a full CRLF, resulting in an additional CR character per comment.

1 Attachments

Discussion

  • Sven Stegemann

    Sven Stegemann - 2026-03-23

    To reproduce the bug:

    def test_example(tmp_path: Path):
        import ruamel.yaml
        input_file = Path("example.yaml")
        yaml = ruamel.yaml.YAML()
    
        correctly_parsed = yaml.load(open(input_file, "r"))
        incorrectly_parsed = yaml.load(input_file)
    
        output_path_correct = tmp_path / "correct.yaml"
        output_path_incorrect = tmp_path / "incorrect.yaml"
    
        yaml.dump(correctly_parsed, output_path_correct)
        yaml.dump(incorrectly_parsed, output_path_incorrect)
    
        # This assertion fails, because incorrect.yaml contains additional CR characters. 
        assert output_path_correct.read_text() == output_path_incorrect.read_text()
    

    By passing an text stream (mode="r") to yaml.load, every "\r\n" gets translated to "\n", which ruamel will parse correctly. When passing a Path object instead ruamel will open a binary stream (mode="rb"), encounter the bug described above and write the file using a text stream which translates "\n" to "\r\n" resulting in an additional "\r" in the first line of every comment.

     
  • Sven Stegemann

    Sven Stegemann - 2026-03-23

    example.yaml contained a typo. Here is the corrected version

     

Log in to post a comment.

MongoDB Logo MongoDB