Python version: 3.8
ruamel.yaml version: 0.16.10
Hi Anton,
Custom representers are managed as a mutable process-wide singleton, which prevents from setting up independent parsers.
from ruamel.yaml.main import YAML
yaml = YAML(typ='unsafe')
class Foo: pass
def foo_representer(dumper, data):
pass
yaml.representer.add_representer(Foo, foo_representer)
assert Foo in yaml.representer.yaml_representers
# Creating a new instance is expected to be independent from the first one
yaml2 = YAML(typ='unsafe')
# That will fail, since both yaml and yaml2 share the same mapping of representers
assert Foo not in yaml.representer.yaml_representers
# also fails since the same dict is shared
assert yaml2.representer.yaml_representers is not yaml.representer.yaml_representers
This seems to apply to other similar features like add_constructor.
Having this managed as a singleton means it's not possible for a single process to have multiple setup for different sources, and references to the callbacks will not be released when the YAML object is garbage collected. This can be a problem if the callback uses functools.lru_cache() to deduplicate immutable values with anchors for example, as it would be expected to not outlive the YAML object it was attached to.
Having extra unexpected representers installed on a new instance can also be a security issue to some extent, as extra features could creep in a YAML instance that is supposed to be stripped down for a specific usage.
Also, creating and initializing a YAML instance is unexpectedly not thread safe as it modifies global mutable state.
A possible fix for that would be to make yaml_representers a property that would merge both a class-level _yaml_representers, and an instance-level _instance_yaml_representers. A new instance method add_instance_representer would allow populating the one related to the instance. Note that this will require using instances for yaml.representer as well, as it's currently the class directly.
Regards,
Douglas
Another real use case I forgot to mention is having multiple dependencies using customized YAML configuration. If the representer and constructor are global to the whole process, these packages are mutually incompatible, which is a big issue.
The only workaround I can think of is to use low-level import functions so that ruamel.yaml is imported twice, but that's a terrible hack that cannot be applied without changing the libs themselves. It would be much much better to avoid mutable singletons in the first place.
I just ran into this as well. My use-case was slightly different:
I have a very simple custom constructor for a tag, but I observed unexpected behavior when testing it. My unit tests were not isolated even though each used a separate YAML instance.
In my case, I was able to handle this with a "quick hack":
But I don't think this is a great fix for all potential use-cases.
What if the registration dicts were made from class attributes to instance attributes? Is there major perf impact?
This has already been implemented but not released yet. From the not yet committed changelog entry:
YAML().register_class()nolonger "reappear" in other YAML instances, because registering is no longer
done on the Constructor/Representer classes but on their instances.