Menu

#228 ParsedParameterFile memory leak

new
nobody
None
normal
major
HaveNotTried
none
2021.06
library
2022-10-25
2022-10-13
shou chin
No

From our test , there is 1-2M memory leak when call ParsedParameterFile function to parse the parameter files. Is it a already known problem? Or some already fixed problem.

best regards

Discussion

  • Bernhard Gschaider

    No I wasn't aware of that. But I can't rule out that there is still a reference to some data that prevents the memory from being freed
    Any simple way to reproduce the problem (script etc)?

     
  • shou chin

    shou chin - 2022-10-18

    Hello,

    We found there is memory leak when parse the attached T and U files.

    script is quite simple
    u_value = ParsedParameterFile('U')
    t_value = ParsedParameterFile('T')

    best regards

     
  • Bernhard Gschaider

    How did you measure the memory usage? (I used resource.getrusage(resource.RUSAGE_SELF) for a quick test)

    At first: as long u_value and t_value still hold references to the data structures the data is not lost. In theory the memory can be reclaimed by resetting the reference

    u_value = None

    or deleting the struct

    del u_value

    In practice the Garbage Collector doesn't free the memory immediately

    I'd consider it a proper memory leak if every call used 1MB and that memory was never given back. So for instance

    for i in range(1000):
    t_value = ParsedParameterFile('T')

    should increase the memory usage by 4GB but according to getrusage/maxrss this loop only increases the memory by approx 9MB (and I see no reason why this wouldn't be give back

    If I make sure that the references are kept

    lst=[]
    for i in range(1000):
    lst.append(ParsedParameterFile('T'))

    then approx 400 MB are used (approx 400k for the scanned contents of a 73k file. I think that is fair. Not great. But OK)

    I guess you reached out because the actual memory problem occurred in a more complicated script than your example. Can you describe that?

     
  • shou chin

    shou chin - 2022-10-18

    Hello,

    Thank you very much for your investigation . In our app, We have to call the parse file process lots of times, far above 1000 times. Total is about MAX_TIME_STEP*SAVE_INTERVAL_STEP times
    MAX_TIME_STEP=200000
    SAVE_INTERVAL_STEP=5000

    So from our test, it will use over 64GB memory finally. From my understanding of python's gc, it is a reference count type gc instead of a general gc like java. That means it will free the memory as far as the reference count was reduce to zero. So I can not understand why the memory usage keep growing without reducing to some reasonable level after a long time run. I will check how long it will take the python gc to free the memory further more.

    best regards

     
  • shou chin

    shou chin - 2022-10-18

    https://stackoverflow.com/questions/6115066/how-much-time-is-the-garbage-collector-using

    from the above link , it seems python use reference count for simple memory allocation ,but use general gc for circular reference. Is there any circular object allocation in the parsing process?
    If there are lots of circular object reference that may slow down the gc greatly.

    Sorry, I made a wrong statement. the total times of our process is about 200000 times instead of 200000 * 5000 . In one step we also parse another openfoam file. So the memory usage will be 200000*1M =200GB that will over our machine 64gb spec easily.

    best regards

     

    Last edit: shou chin 2022-10-18
  • Bernhard Gschaider

    If a discarded ParsedParameterFile-object had circular references then the first for-test above would need more memory
    Are you sure you don't have any references to them in your datastructs (even to parts. A reference to t_value["boundaryField"] would keep parts of t_value "alive")

     
  • Henrik Rusche

    Henrik Rusche - 2022-10-24

    Hi Shou,
    it's important that you report how you measure the leakage. This is not trivial - there are lots of measures and some can be quite misleading - and, of course, anybody who wants to reproduce the error needs to know what number he should look at.
    Does your code actually fail because a memory request cannot be satisfied? Does it slow down drastically due to swapping? Or do you project your numbers and assume that it will fail eventually?
    The post you reference is 11 years old. It primarily talks about how much the time is spent in the gc. Furthermore, it's unclear whether he talks about python 2 or 3 ... in short, this is not a good reference.

    Try to get some debug info out of the gc itself and/or force the gc by hand.
    https://stackabuse.com/basics-of-memory-management-in-python/
    My experience with python is that it grabs a lot of (virtual) memory, but there isn't a real problem.

    Best Regards,
    Henrik

     
  • shou chin

    shou chin - 2022-10-25

    Hello,

    The memory leak was occurring in one complex project using deep learning. We first investigated the problem with tracemalloc that point to the pyfoam with some memory difference report. But We tried to reproduce the leak by extracting the parsing process in a simple project without success. So now we are comment out some code to narrow down the position. The problem is quite complex , we are still monitoring the memory on the server, And I will report again when there is any progress. Sorry for the delayed reply.

    best regards

     

    Last edit: shou chin 2022-10-25
  • Bernhard Gschaider

    As I said: it is possible that your code innocently holds a reference to a small part of the parsed datastructures that prevents the whole thing from being garbage collected because of a back-reference. Of course tracemalloc blames it on the piece of code that allocated the memory. Not the one holding on to it. Copying the data instead ofreferencing it sometimes helps in such cases

    Haven't worked with tracemalloc yet (don'T see whether it shows you the exact location) but had good experiences with "Memory Profiler" described in https://stackify.com/top-5-python-memory-profilers/ (I think it is the 3rd package described there)

     

Log in to post a comment.