Menu

#384 Anchored Entries in Unordered Sets Cannot Round-Trip

closed
None
major
bug
2021-05-31
2021-05-29
No

In a YAML Unordered Set, if any of the entries are Anchored, they are dumped back out lacking the required leading ? mark. This produces unusable YAML.

I am presently adding robust support for unordered sets to one of my projects and ran into this issue. As it is, this is now blocking project completion.

Code Sample

import sys
from ruamel.yaml import YAML

inp = """\
---
aliases:
  - &bl_anchor Ty Cobb

baseball_legends: !!set
  ? Mark McGwire
  ? Sammy Sosa
  ? *bl_anchor
  ? Ken Griff
"""

yaml = YAML()
code = yaml.load(inp)
yaml.dump(code, sys.stdout)

As you can see, the code just loads the sample data and immediately dumps it back out.

Output

The output is corrupt and cannot be loaded back in as YAML:

aliases:
- &bl_anchor Ty Cobb

baseball_legends: !!set
  ? Mark McGwire
  ? Sammy Sosa
  *bl_anchor 
  ? Ken Griff

Thank you so much for any and all help!

Discussion

  • Anthon van der Neut

    This is because the node's style is not passed to the alias event on emitting (and because that event doesn't take a style argument). The actual solution in code affects only two lines, but I am not sure how soon I can push out a new version.

    To incorporate these changes in your code requires a lot more copying, but the following should get you going (I might have misses a few ruamel.yaml.XXXX. prefixes, that are only necessary because of this code is extracted, so more extensive YAML might trigger an exception on an unknown Type, you can hopefully fix that yourself)

    import sys
    import ruamel.yaml
    
    inp = """\
    ---
    aliases:
      - &bl_anchor Ty Cobb
    
    baseball_legends: !!set
      ? Mark McGwire
      ? Sammy Sosa
      ? *bl_anchor
      ? Ken Griff
    """
    
    if ruamel.yaml.version_info < (0, 17, 5):
        class MyAliasEvent(ruamel.yaml.events.NodeEvent):
            __slots__ = ('style')
    
        def __init__(self, anchor, start_mark=None, end_mark=None, style=None, comment=None, ):
             ruamel.yaml.events.NodeEvent.__init__(self, anchor, start_mark, end_mark, comment)
             self.style = style
    
        ruamel.yaml.events.AliasEvent = MyAliasEvent
    
        def new_init(self, anchor, start_mark=None, end_mark=None, style=None, comment=None, ):
             ruamel.yaml.events.NodeEvent.__init__(self, anchor, start_mark, end_mark, comment)
             self.style = style
    
        ruamel.yaml.events.AliasEvent.__slots__ = ('style', )
        ruamel.yaml.events.AliasEvent.__init__ = new_init
    
        class MyEmitter(ruamel.yaml.emitter.Emitter):
            def expect_node(self, root=False, sequence=False, mapping=False, simple_key=False):
                self.root_context = root
                self.sequence_context = sequence  # not used in PyYAML
                self.mapping_context = mapping
                self.simple_key_context = simple_key
                if isinstance(self.event, (MyAliasEvent, ruamel.yaml.events.AliasEvent)):
                    self.expect_alias()
                elif isinstance(self.event, (ruamel.yaml.events.ScalarEvent, ruamel.yaml.events.CollectionStartEvent)):
                    if (
                        self.process_anchor('&')
                        and isinstance(self.event, ruamel.yaml.events.ScalarEvent)
                        and self.sequence_context
                    ):
                        self.sequence_context = False
                    if (
                        root
                        and isinstance(self.event, ruamel.yaml.events.ScalarEvent)
                        and not self.scalar_after_indicator
                    ):
                        self.write_indent()
                    self.process_tag()
                    if isinstance(self.event, ruamel.yaml.events.ScalarEvent):
                        # nprint('@', self.indention, self.no_newline, self.column)
                        self.expect_scalar()
                    elif isinstance(self.event, ruamel.yaml.events.SequenceStartEvent):
                        # nprint('@', self.indention, self.no_newline, self.column)
                        i2, n2 = self.indention, self.no_newline  # NOQA
                        if self.event.comment:
                            if self.event.flow_style is False and self.event.comment:
                                if self.write_post_comment(self.event):
                                    self.indention = False
                                    self.no_newline = True
                            if self.write_pre_comment(self.event):
                                self.indention = i2
                                self.no_newline = not self.indention
                        if (
                            self.flow_level
                            or self.canonical
                            or self.event.flow_style
                            or self.check_empty_sequence()
                        ):
                            self.expect_flow_sequence()
                        else:
                            self.expect_block_sequence()
                    elif isinstance(self.event, ruamel.yaml.events.MappingStartEvent):
                        if self.event.flow_style is False and self.event.comment:
                            self.write_post_comment(self.event)
                        if self.event.comment and self.event.comment[1]:
                            self.write_pre_comment(self.event)
                        if (
                            self.flow_level
                            or self.canonical
                            or self.event.flow_style
                            or self.check_empty_mapping()
                        ):
                            self.expect_flow_mapping(single=self.event.nr_items == 1)
                        else:
                            self.expect_block_mapping()
                else:
                    raise EmitterError(
                        _F('expected NodeEvent, but got {self_event!s}', self_event=self.event)
                    )
    
            def expect_block_mapping_key(self, first=False):
                if not first and isinstance(self.event, ruamel.yaml.events.MappingEndEvent):
                    if self.event.comment and self.event.comment[1]:
                        # final comments from a doc
                        self.write_pre_comment(self.event)
                    self.indent = self.indents.pop()
                    self.state = self.states.pop()
                else:
                    if self.event.comment and self.event.comment[1]:
                        # final comments from a doc
                        self.write_pre_comment(self.event)
                    self.write_indent()
                    if self.check_simple_key():
                        if not isinstance(
                            self.event, (ruamel.yaml.events.SequenceStartEvent, ruamel.yaml.events.MappingStartEvent)
                        ):  # sequence keys
                            try:
                                if self.event.style == '?':
                                    self.write_indicator('?', True, indention=True)
                            except AttributeError:  # aliases have no style
                                pass
                        self.states.append(self.expect_block_mapping_simple_value)
                        self.expect_node(mapping=True, simple_key=True)
                        if isinstance(self.event, (MyAliasEvent, ruamel.yaml.events.AliasEvent)):
                            self.stream.write(' ')
                    else:
                        self.write_indicator('?', True, indention=True)
                        self.states.append(self.expect_block_mapping_value)
                        self.expect_node(mapping=True)
    
            def check_simple_key(self):
                length = 0
                if isinstance(self.event, ruamel.yaml.events.NodeEvent) and self.event.anchor is not None:
                    if self.prepared_anchor is None:
                        self.prepared_anchor = self.prepare_anchor(self.event.anchor)
                    length += len(self.prepared_anchor)
                if (
                    isinstance(self.event, (ruamel.yaml.events.ScalarEvent, ruamel.yaml.events.CollectionStartEvent))
                    and self.event.tag is not None
                ):
                    if self.prepared_tag is None:
                        self.prepared_tag = self.prepare_tag(self.event.tag)
                    length += len(self.prepared_tag)
                if isinstance(self.event, ruamel.yaml.events.ScalarEvent):
                    if self.analysis is None:
                        self.analysis = self.analyze_scalar(self.event.value)
                    length += len(self.analysis.scalar)
                return length < self.MAX_SIMPLE_KEY_LENGTH and (
                    isinstance(self.event, (MyAliasEvent, ruamel.yaml.events.AliasEvent))
                    or (isinstance(self.event, ruamel.yaml.events.SequenceStartEvent) and self.event.flow_style is True)
                    or (isinstance(self.event, ruamel.yaml.events.MappingStartEvent) and self.event.flow_style is True)
                    or (
                        isinstance(self.event, ruamel.yaml.events.ScalarEvent)
                        # if there is an explicit style for an empty string, it is a simple key
                        and not (self.analysis.empty and self.style and self.style not in '\'"')
                        and not self.analysis.multiline
                    )
                    or self.check_empty_sequence()
                    or self.check_empty_mapping()
                )
    
    
        class MySerializer(ruamel.yaml.serializer.Serializer):
            def serialize_node(self, node, parent, index):
                # type: (Any, Any, Any) -> None
                alias = self.anchors[node]
                if node in self.serialized_nodes:
                    # self.emitter.emit(ruamel.yaml.events.AliasEvent(alias, style=node.style if node.style == '?' else None))
                    self.emitter.emit(MyAliasEvent(alias, style=node.style if node.style == '?' else None))
                else:
                    self.serialized_nodes[node] = True
                    self.resolver.descend_resolver(parent, index)
                    if isinstance(node, ruamel.yaml.nodes.ScalarNode):
                        # here check if the node.tag equals the one that would result from parsing
                        # if not equal quoting is necessary for strings
                        detected_tag = self.resolver.resolve(ruamel.yaml.nodes.ScalarNode, node.value, (True, False))
                        default_tag = self.resolver.resolve(ruamel.yaml.nodes.ScalarNode, node.value, (False, True))
                        implicit = (
                            (node.tag == detected_tag),
                            (node.tag == default_tag),
                            node.tag.startswith('tag:yaml.org,2002:'),
                        )
                        self.emitter.emit(
                            ruamel.yaml.events.ScalarEvent(
                                alias,
                                node.tag,
                                implicit,
                                node.value,
                                style=node.style,
                                comment=node.comment,
                            )
                        )
                    elif isinstance(node, ruamel.yaml.nodes.SequenceNode):
                        implicit = node.tag == self.resolver.resolve(ruamel.yaml.nodes.SequenceNode, node.value, True)
                        comment = node.comment
                        end_comment = None
                        seq_comment = None
                        if node.flow_style is True:
                            if comment:  # eol comment on flow style sequence
                                seq_comment = comment[0]
                                # comment[0] = None
                        if comment and len(comment) > 2:
                            end_comment = comment[2]
                        else:
                            end_comment = None
                        self.emitter.emit(
                            ruamel.yaml.events.SequenceStartEvent(
                                alias,
                                node.tag,
                                implicit,
                                flow_style=node.flow_style,
                                comment=node.comment,
                            )
                        )
                        index = 0
                        for item in node.value:
                            self.serialize_node(item, node, index)
                            index += 1
                        self.emitter.emit(ruamel.yaml.events.SequenceEndEvent(comment=[seq_comment, end_comment]))
                    elif isinstance(node, ruamel.yaml.nodes.MappingNode):
                        implicit = node.tag == self.resolver.resolve(ruamel.yaml.nodes.MappingNode, node.value, True)
                        comment = node.comment
                        end_comment = None
                        map_comment = None
                        if node.flow_style is True:
                            if comment:  # eol comment on flow style sequence
                                map_comment = comment[0]
                                # comment[0] = None
                        if comment and len(comment) > 2:
                            end_comment = comment[2]
                        self.emitter.emit(
                            ruamel.yaml.events.MappingStartEvent(
                                alias,
                                node.tag,
                                implicit,
                                flow_style=node.flow_style,
                                comment=node.comment,
                                nr_items=len(node.value),
                            )
                        )
                        for key, value in node.value:
                            self.serialize_node(key, node, None)
                            self.serialize_node(value, node, key)
                        self.emitter.emit(ruamel.yaml.events.MappingEndEvent(comment=[map_comment, end_comment]))
                    self.resolver.ascend_resolver()
    
    
    
    yaml = ruamel.yaml.YAML()
    if ruamel.yaml.version_info < (0, 17, 5):
        yaml.Serializer = MySerializer
        yaml.Emitter = MyEmitter
    data = yaml.load(inp)
    yaml.dump(data, sys.stdout)
    
     
  • William Kimball

    William Kimball - 2021-05-29

    Indeed, with the code above added to my project, several unit tests are now failing that had previously been passing. It looks like most are related to the new style attribute not present in other classes. I'll poke at it a bit to see if I can get everything passing again.

    Thanks for the head start!

     
  • William Kimball

    William Kimball - 2021-05-30

    I'm lost "peeling the onion" with this; fixing one issue reveals more issues. The surface error from adding the above code is: AttributeError: 'MappingNode' object has no attribute 'style'.

    So, I added the following to the patch code:

        class MyMappingNode(ruamel.yaml.nodes.CollectionNode):
            __slots__ = 'merge', 'style'
            id = 'mapping'
    
            def __init__(
                self,
                tag,
                value,
                start_mark=None,
                end_mark=None,
                flow_style=None,
                comment=None,
                anchor=None,
                style=None,
            ):
                # type: (Any, Any, Any, Any, Any, Any, Any, Any) -> None
                ruamel.yaml.nodes.CollectionNode.__init__(
                    self, tag, value, start_mark, end_mark, flow_style, comment, anchor
                )
                self.merge = None
                self.style = style
    
        ruamel.yaml.nodes.MappingNode = MyMappingNode
    

    This caused an issue which I fixed by changing the raise EmitterError code to:

                    raise ruamel.yaml.emitter.EmitterError(
                        "expected NodeEvent, but got {}".format(self.event)
                    )
    

    However, this causes many more tests to fail with: ruamel.yaml.emitter.EmitterError: expected NodeEvent, but got DocumentEndEvent()

    When fixing one issue causes dozens more, I'm doing something wrong.

     
  • William Kimball

    William Kimball - 2021-05-30

    I've narrowed the code causing the above EmitterError down to the following example:

    import sys
    import ruamel.yaml
    
    inp = """\
    ---
    device_defaults: &device_defaults
      os: unknown
      type: unknown
      platform: unknown
      notes: N/A
    
    devices:
      R1:
        <<: *device_defaults
        os: ios
        type: router
        platform: asr1k
    """
    
    yaml = ruamel.yaml.YAML()
    
    if ruamel.yaml.version_info < (0, 17, 5):
        # pylint: disable=locally-disabled,import-outside-toplevel
        from yamlpath.patches.aliasstyle import MySerializer # type: ignore
        from yamlpath.patches.aliasstyle import MyEmitter    # type: ignore
        yaml.Serializer = MySerializer         # type: ignore
        yaml.Emitter = MyEmitter               # type: ignore
    
    code = yaml.load(inp)
    yaml.dump(code, sys.stdout)
    

    Note that yamlpath.patches.aliasstyle is a copy-paste of your code above plus the adjustments I listed along with "just enough" documentation to make PEP257 happy. I've attached the complete file as aliasstyle.py.

     
  • William Kimball

    William Kimball - 2021-05-30

    I forgot to mention: In the attached aliasstyle.py above, I commented out the following line so you can see the original error which led me down the path described above.

        # ruamel.yaml.nodes.MappingNode = MyMappingNode
    

    Just uncomment that line to get the followup error.

     
  • Anthon van der Neut

    I'll try to look at this today. I couldn't run all the normal tests yesterday evening as I have not fully finished setting up my forced distro switch (including missing tox)

     
  • Anthon van der Neut

    I update the one line:

                  self.emitter.emit(MyAliasEvent(alias, style=node.style if node.style == '?' else None))
    

    to

                    node_style = getattr(node, 'style', None)
                    if node_style != '?':
                        node_style = None
                    self.emitter.emit(MyAliasEvent(alias, style=node_style))
    

    That way it doesn't barf anymore on MappingNode not having a style, and passes the testsuite for Python 3.9 and 3.8 (although 3.5-3.7 are installed, tox for some reason doesn't find them, so I still cannot fully test and don't want to push a new version until I can).

     

    Last edit: Anthon van der Neut 2021-05-30
  • William Kimball

    William Kimball - 2021-05-30

    That was it! All of my unit tests are passing and the issue with Aliased entries in Unordered Sets is resolved with this workaround. Until your next release -- and I do hope the tox issue turns out to be something simple -- I am no longer blocked, thanks to your help. Please feel free to de-prioritize this issue to major. Thank you, so much!!

     
  • Anthon van der Neut

    • priority: blocker --> major
     
  • Anthon van der Neut

    I pushed 0.17.5. What you might not have noticed is that there was spurious (non-significant) space after the ? *bl_anchor in your output. That is not fixed in the code above, but it is fixed in 0.17.5.

     
  • Anthon van der Neut

    • status: open --> closed
     
  • William Kimball

    William Kimball - 2021-05-31

    Quick follow-up: I'm unable to use 0.17.5 because my tests won't complete with it. It looks like a couple of Type signatures are misaligned with their signatures:

    mypy errors:

    ./lib/python3.6/site-packages/ruamel/yaml/comments.py:92: error: Type signature has too few arguments
    ./lib/python3.6/site-packages/ruamel/yaml/tokens.py:348: error: Type signature has too few arguments
    

    I know you were having some issues getting your test suite to run. I hope the above can help in some way.

     
    • Anthon van der Neut

      Can you try with 0.17.6 ?

       
  • William Kimball

    William Kimball - 2021-05-31

    Most surprisingly, 0.17.6 and 0.17.7 both cause MYPY to become more strict with regard to my uses of CommentedMap and CommentedSeq. I verified by downgrading ruamel.yaml and running the same test suite; the strictness relaxed. Upgrading to 0.17.6+ has forced me to more strictly type my own variables wherever I'm interacting with CommentedMap and CommentedSeq. I'm not saying this is bad; it's just a surprising side-effect. I don't mind the stricter typing.

    That said, all 1,188 of my unit tests, plus MYPY and PyLint, pass with both 0.17.6 and 0.17.7. The runs can be seen at: https://github.com/wwkimball/yamlpath/actions/workflows/python-publish-to-test-pypi.yml

    [I had individually linked the two runs but "SpamBot protection activated!" for this SourceForge thread banned me from posting more than one link.]

    Thank you for all of your help!!

     
  • Anthon van der Neut

    I had to install mypy on my new distro and got the latest version and ~3 hours work of getting rid of messages, in the process I might have changed things so you had to be more strict. I didn't do that on purpose, just a side effect of trying to get rid of messages.
    One thing that happened is that flake8 complained about using a constant string in something like if hasattr(self, 'value'): x= getattr(self, 'value') and after changing that to if hasattr(self, 'value'): x self.value, mypy complained about self not having 'value' as attribute... One year when I have nothing else to do I'll go over all the type definitions and get rid of Any. It is probably easier if you add proper typing to your code from day one.

     
  • William Kimball

    William Kimball - 2021-05-31

    Absolutely true! I fight the same battles from time to time, so I know your pain. I learned very early the same lesson, so you'll find I use types as religiously as I can bear to. You can quiet a lot of the mypy noise regarding returning Any with a mypy.ini entry like:

    [mypy]
    warn_return_any = True
    

    I do the same for my project because I haven't spent enough time to zero-in on just what ruamel.yaml returns for many of its methods. :)

     

Log in to post a comment.