Menu

#351 AssertionError on dump with commented, aliased map key and child

open
None
critical
bug
2021-04-07
2020-06-02
No

When using an alias as a map key, ruamel.yaml crashes with an assertion upon dumping the document when the parent and child each have a comment (even a blank line).

Sample Document:

---
aliases:

  - &key_alias hash

*key_alias :
  # A comment
  key: value

With ruamel.yaml versions 0.15.96 and 0.16.10 on Pythons 3.6, 3.7, and 3.8, I can parse this YAML document and even change the value. But when I attempt to dump() the document, I get a stack-dump pointing at:

  File "env3.6.8/lib/python3.6/site-packages/ruamel/yaml/representer.py", line 1021, in represent_mapping
    assert getattr(node_key, 'comment', None) is None
AssertionError

Note that it takes two comments (even if only blank lines) to trigger this assertion error; one before the parent map and one before the child key.

I have requested critical priority because this is happening in the wild and the result is a dump with no document (total loss of data).

Discussion

  • William Kimball

    William Kimball - 2021-04-07

    I'm guessing this critical bug has been ignored since 2020-06-02 because I didn't provide code to reproduce the error. So, here's the missing code example:

    import warnings
    from sys import maxsize
    
    import ruamel.yaml # type: ignore
    from ruamel.yaml import YAML
    from ruamel.yaml.parser import ParserError
    from ruamel.yaml.composer import ComposerError, ReusedAnchorWarning
    from ruamel.yaml.constructor import ConstructorError, DuplicateKeyError
    from ruamel.yaml.scanner import ScannerError
    from ruamel.yaml.scalarstring import ScalarString
    
    
    source = """---
    aliases:
    
      - &key_alias hash
    
    *key_alias :
      # A comment
      key: value
    """
    
    yaml = YAML()
    yaml.indent(mapping=2, sequence=4, offset=2)
    yaml.explicit_start = True                 # type: ignore
    yaml.preserve_quotes = True                # type: ignore
    yaml.width = maxsize                       # type: ignore
    
    yaml_data = None
    data_available = True
    
    try:
        with warnings.catch_warnings():
            warnings.filterwarnings("error")
            yaml_data = yaml.load(source)
    except KeyboardInterrupt:
        print("Aborting data load due to keyboard interrupt!")
        data_available = False
    except FileNotFoundError:
        print("File not found:  {}".format(source))
        data_available = False
    except ParserError as ex:
        print("YAML parsing error {}:  {}"
                    .format(str(ex.problem_mark).lstrip(), ex.problem))
        data_available = False
    except ComposerError as ex:
        print("YAML composition error {}:  {}"
                    .format(str(ex.problem_mark).lstrip(), ex.problem))
        data_available = False
    except ConstructorError as ex:
        print("YAML construction error {}:  {}"
                    .format(str(ex.problem_mark).lstrip(), ex.problem))
        data_available = False
    except ScannerError as ex:
        print("YAML syntax error {}:  {}"
                    .format(str(ex.problem_mark).lstrip(), ex.problem))
        data_available = False
    except DuplicateKeyError as dke:
        omits = [
            "while constructing", "To suppress this", "readthedocs",
            "future releases", "the new API",
        ]
        message = str(dke).split("\n")
        newmsg = ""
        for line in message:
            line = line.strip()
            if not line:
                continue
            write_line = True
            for omit in omits:
                if omit in line:
                    write_line = False
                    break
            if write_line:
                newmsg += "\n   " + line
        print("Duplicate Hash key detected:  {}"
                    .format(newmsg))
        data_available = False
    except ReusedAnchorWarning as raw:
        print("Duplicate YAML Anchor detected:  {}"
                    .format(
                        str(raw)
                        .replace("occurrence   ", "occurrence ")
                        .replace("\n", "\n   ")))
        data_available = False
    
    # `data_available = True` when yaml_data is populated; False, otherwise
    if data_available:
        print("Dumping YAML data...")
        with open('yaml_dump.yaml', 'w') as yaml_dump:
            yaml.dump(yaml_data, yaml_dump)
    
    print("Done!")
    

    You'll note that the output file is always empty and in this case, the "Done!" message never prints. Please note that my users are typically editing YAML files in-place, so this bug is destructive to user data.

    I've already written an error-handler which detects this case and just restores the user's original -- unmodified -- data file in this case, informing them of the extremely unfortunate reason their edit was rejected. However, this issue is still happening in the wild nearly a year later and with newer versions of ruamel.yaml, so when these operations fail, users are frustrated that they cannot affect desired changes to their data files without first deleting or moving their comments.

    For ruamel.yaml version 0.17.2, the stack dump shows different line numbers:

    Traceback (most recent call last):
      File "351.py", line 90, in <module>
        yaml.dump(yaml_data, yaml_dump)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/main.py", line 559, in dump
        return self.dump_all([data], stream, transform=transform)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/main.py", line 568, in dump_all
        self._context_manager.dump(data)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/main.py", line 904, in dump
        self._yaml.representer.represent(data)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/representer.py", line 79, in represent
        node = self.represent_data(data)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/representer.py", line 102, in represent_data
        node = self.yaml_representers[data_types[0]](self, data)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/representer.py", line 1026, in represent_dict
        return self.represent_mapping(tag, data)
      File "/mnt/c/Users/william/Projects/KimballStuff/yamlpath/venv3.8.5-nix/lib/python3.8/site-packages/ruamel/yaml/representer.py", line 866, in represent_mapping
        assert getattr(node_key, 'comment', None) is None
    AssertionError
    
     
  • Anthon van der Neut

    This is not trivial to solve correctly. Not throwing an assertion is easy, but the comments after hash (in this case \n\n get dropped (which is more clear if there are real EOL or full line comments).

    I am working on redoing the comment handling in 0.17, so I am not going to try and fix this dropping of comments, but I will push out a 0.17.4 that does at least not throw the assertion error.

    BTW A working example only helps most if all of the non-relevant parts are excluded. For this a five line python program (excl. YAML source) suffices.

     
    👍
    1

Log in to post a comment.

MongoDB Logo MongoDB