foam-extend / Tickets / #74 Problem with large parallel run: increasing memory usage with increasing number of nodes

#74 Problem with large parallel run: increasing memory usage with increasing number of nodes

Milestone: 1.0

Status: accepted

Owner: Hrvoje Jasak

Labels: None

Updated: 2022-08-08

Created: 2022-05-09

Creator: Flavio Galeazzo

Private: No

Hi all,

There is an unexpected behavior when running a (relatively small) simulation in a large number of nodes in a cluster. With an increasing number of compute nodes, the part of the grid solved by each node decreases, and my expectation was that the memory usage by each node should decrease. This happens with a limited number of nodes, but the memory usage increases substantially after a certain number of nodes, which limits the usability of the code. The same behavior has been seen using OpenMPI or MPT (HPE MPI implementation), I am reporting here the results with OpenMPI. As a comparison, I have also used OpenFOAM v2106 to run the same test case, and the memory usage follows the expected behavior.

The details of the test are:

Test case: lid-drive cavity 3d, 8 million grid elements, fixedIter variant https://develop.openfoam.com/committees/hpc/-/tree/cavity-updates/microbenchmarks/cavity-3d/8M/fixedIter
Solver: icoFoam
Compiler: GCC 9.2.0
MPI: OpenMPI 4.0.5
Compute cluster: Hawk supercomputer at HLRS https://www.hlrs.de/systems/hpe-apollo-hawk/
Nodes: 2x AMD EPYC 7742 processors ( 2 x 64 cores), 256GB DDR4 RAM
Interconnect: InfiniBand HDR200

The figure below shows the mean memory used by each node, recorded using the shell command "free" for each time step. With OpenFOAM, the memory usage stabilizes around 16 GB using 16 nodes (2048 cores) up to 128 nodes (16384 cores), while with foam-extend the memory usage starts to rapidly increase with 16 and 32 nodes (2048 and 4096 cores). A run with foam-extend and 128 nodes has crashed due to being out of memory.

I have also used valgrind to detect any memory leaks, and I have seen no difference between foam-extend and OpenFOAM.

Please contact me if I can help running any other test case in a large system.

Best regards,

Flavio Galeazzo

1 Attachments

MeanMemoryUsedByEachNode_OMPI.png

Discussion

Sergey Lesnik - 2022-07-29

Hi Flavio,
the problem comes from the Pstream class while allocating linear and tree communication lists (discovered using valgrind's massif tool). The lists are N large and their entries are of type commsStruct which is approximately also N large, where N is the number of MPI ranks. Thus, these lists are by design of size N^2. The difference to the OpenFOAM version is that, there, the lists are not allocated at start-up, but only sized to N and each entry (commsStruct) is constructed (and therefore allocated) only if the overloaded operator[] is called on the list. I don't see the large lists in massif's output if this lazy evaluation is introduced in Pstream. I also cleaned some private members, which were needed only for the communication lists allocation. Please try out the attached patch on your large setups to be sure the bug is fixed. The patch is to be applied from $WM_PROJECT_DIR (tested on the the ubuntu2004 branch - commit b42fb8a34696e21)

bugFixMemoryHunger.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hrvoje Jasak - 2022-07-31

Hi Guys,

Sergey, thank you - excellent work. I have applied the patch on my machine: how can I test that everything is correct? Is it safe to push this into nextRelease and run with it fur a while?

Hrv

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hrvoje Jasak - 2022-07-31

status: open --> accepted

assigned_to: Hrvoje Jasak
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sergey Lesnik - 2022-08-02

Hi Hrvoje,

To be 100% sure that the bug is fixed, we should wait for the results from Flavio's run with 128 nodes.

For testing locally you can run valgrind's massif with and without the patch and look at the difference of the allocated memory. In order to spot it, you'll need a decent number of ranks. I used 1024, which produces lists of 4MB. I took the standard 2D cavity case with 1000x1000 cells. The command to run:
mpirun -np 1024 --oversubscribe valgrind --tool=massif icoFoam -parallel

If you get an error regarding opened pipe/descriptor limit, here is a solution:
https://superuser.com/questions/1200539/cannot-increase-open-file-limit-past-4096-ubuntu/1200818#1200818

After the run, visualize one of the massif.out files written by valgrind with the massif-visualizer tool. Without the patch, you'll find the two bottom entries from the attached screenshot. With the patch applied, these are absent and the total peak memory per rank is lower by 8MB.

It should be safe to push it to the nextRelease. I deleted only the private members and the access to the communication lists should always go via operator[], which is overloaded now.

Sergey

valgrindMassifPstreamBug.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Flavio Galeazzo - 2022-08-02

Hi guys,

I have applied the patch and prepared the large runs in the Hawk supercomputer. These large runs always take a while, as they really push the limit of inodes of the storage system. I should have the results in a couple of days.

Flavio

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Flavio Galeazzo - 2022-08-08

Hi guys,

Currently I can run foam-extend with up to 32 nodes ( 4096 cores) in the Hawk supercomputer due to an inode limit in the file system. My tests using 32 nodes show that the patch decreased significantly the memory usage from 38.5 GB to 16.1 GB. The results are sumarized in the figure attached, comparing foam-extend-4.1 with and without the patch and OpenFOAM v2160, all using the MPT (HPE) MPI library. It seems that we will be able to runs larger runs with foam-extend with the patch. Thank you Sergey for this!

Flavio

MemoryLoad_OpenFOAM_Abr2022_Allmemlog3.png

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.