Menu

#341 Custom representers shared accross YAML instances

open
nobody
None
minor
bug
2022-07-02
2020-03-06
No

Python version: 3.8
ruamel.yaml version: 0.16.10

Hi Anton,

Custom representers are managed as a mutable process-wide singleton, which prevents from setting up independent parsers.

from ruamel.yaml.main import YAML

yaml = YAML(typ='unsafe')
class Foo: pass

def foo_representer(dumper, data):
        pass

yaml.representer.add_representer(Foo, foo_representer)
assert Foo in yaml.representer.yaml_representers

# Creating a new instance is expected to be independent from the first one
yaml2 = YAML(typ='unsafe')
# That will fail, since both yaml and yaml2 share the same mapping of representers
assert Foo not in yaml.representer.yaml_representers

# also fails since the same dict is shared
assert yaml2.representer.yaml_representers is not yaml.representer.yaml_representers

This seems to apply to other similar features like add_constructor.

Having this managed as a singleton means it's not possible for a single process to have multiple setup for different sources, and references to the callbacks will not be released when the YAML object is garbage collected. This can be a problem if the callback uses functools.lru_cache() to deduplicate immutable values with anchors for example, as it would be expected to not outlive the YAML object it was attached to.

Having extra unexpected representers installed on a new instance can also be a security issue to some extent, as extra features could creep in a YAML instance that is supposed to be stripped down for a specific usage.

Also, creating and initializing a YAML instance is unexpectedly not thread safe as it modifies global mutable state.

A possible fix for that would be to make yaml_representers a property that would merge both a class-level _yaml_representers, and an instance-level _instance_yaml_representers. A new instance method add_instance_representer would allow populating the one related to the instance. Note that this will require using instances for yaml.representer as well, as it's currently the class directly.

Regards,
Douglas

Discussion

  • Douglas Raillard

    Another real use case I forgot to mention is having multiple dependencies using customized YAML configuration. If the representer and constructor are global to the whole process, these packages are mutually incompatible, which is a big issue.

    The only workaround I can think of is to use low-level import functions so that ruamel.yaml is imported twice, but that's a terrible hack that cannot be applied without changing the libs themselves. It would be much much better to avoid mutable singletons in the first place.

     
  • Stephen Rosen

    Stephen Rosen - 2022-07-02

    I just ran into this as well. My use-case was slightly different:

    I have a very simple custom constructor for a tag, but I observed unexpected behavior when testing it. My unit tests were not isolated even though each used a separate YAML instance.

    In my case, I was able to handle this with a "quick hack":

    def _new_yaml_impl():
        class GeneratedSafeConstructor(ruamel.yaml.SafeConstructor):
            pass
    
        impl = ruamel.yaml.YAML(typ="safe")
        impl.Constructor = GeneratedSafeConstructor
        return impl
    

    But I don't think this is a great fix for all potential use-cases.

    What if the registration dicts were made from class attributes to instance attributes? Is there major perf impact?

     
  • Anthon van der Neut

    This has already been implemented but not released yet. From the not yet committed changelog entry:

    • internals of registering tags have changed, YAML().register_class() no
      longer "reappear" in other YAML instances, because registering is no longer
      done on the Constructor/Representer classes but on their instances.
     
    ❤️
    1

Log in to post a comment.

MongoDB Logo MongoDB