Menu

#525 Memory leak across multiple calls to load

open
nobody
None
major
bug
2024-08-28
2024-08-28
Kyle Larose
No

doc_infos list leaks across multiple invocations of load*

A recent change (https://sourceforge.net/p/ruamel-yaml/code/ci/08d87cada1f6e5fedde079b55536061e4fe246a0/tree/main.py) added some document tracking to the load functions. This has had an unintended side effect in that every document loaded leaks a small amount of memory, which can add up in a long-running process.

I'm not 100% sure what the purpose of that particular variable is, but I do note that the 'tag' object is reset every load. Should the same thing be done with the doc_infos list?

here's a simple reproduction script.

import os
import time
import ruamel.yaml
my_doc = """
foo: bar
qux: baz
"""

yaml = ruamel.yaml.YAML()

def eat_mem():
    for i in range(100000):
        yaml.load(my_doc)

print(f"chewing up memory. Run ps -F -p {os.getpid()}")

while True:
    eat_mem()
    print("ate some memory. Check your memory stats!")
    time.sleep(10)

Running it for 12 minutes had the consumed memory grow like this (note RSS):

ps -F -p 4114718
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
kyle     4114718 3887706 99 10870 31360  11 09:00 pts/8    00:00:34 python3 leak.py
ps -F -p 4114718
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
kyle     4114718 3887706 80 112206 429312 3 09:00 pts/8    00:12:15 python3 leak.py

That's ~400MB leaked.

I would expect that when loading a large number of documents, particularly across separate invocations of load, that the memory used by ruamel itself would stay constant. I should not have to discard the YAML object periodically in order to free up this memory; it should be somewhat idempotent.

Discussion


Log in to post a comment.