doc_infos list leaks across multiple invocations of load*
A recent change (https://sourceforge.net/p/ruamel-yaml/code/ci/08d87cada1f6e5fedde079b55536061e4fe246a0/tree/main.py) added some document tracking to the load functions. This has had an unintended side effect in that every document loaded leaks a small amount of memory, which can add up in a long-running process.
I'm not 100% sure what the purpose of that particular variable is, but I do note that the 'tag' object is reset every load. Should the same thing be done with the doc_infos list?
here's a simple reproduction script.
import os
import time
import ruamel.yaml
my_doc = """
foo: bar
qux: baz
"""
yaml = ruamel.yaml.YAML()
def eat_mem():
for i in range(100000):
yaml.load(my_doc)
print(f"chewing up memory. Run ps -F -p {os.getpid()}")
while True:
eat_mem()
print("ate some memory. Check your memory stats!")
time.sleep(10)
Running it for 12 minutes had the consumed memory grow like this (note RSS):
ps -F -p 4114718
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
kyle 4114718 3887706 99 10870 31360 11 09:00 pts/8 00:00:34 python3 leak.py
ps -F -p 4114718
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
kyle 4114718 3887706 80 112206 429312 3 09:00 pts/8 00:12:15 python3 leak.py
That's ~400MB leaked.
I would expect that when loading a large number of documents, particularly across separate invocations of load, that the memory used by ruamel itself would stay constant. I should not have to discard the YAML object periodically in order to free up this memory; it should be somewhat idempotent.