From: John P. <jwp...@gm...> - 2013-10-29 18:31:36
|
On Tue, Oct 29, 2013 at 11:19 AM, John Peterson <jwp...@gm...>wrote: > On Tue, Oct 29, 2013 at 9:32 AM, Cody Permann <cod...@gm...>wrote: > >> >> On Tue, Oct 29, 2013 at 5:54 AM, ernestol <ern...@ln...> wrote: >> >> > I am using an cluster with 23 node for a total of 184 cores, and each >> node >> > additionally has 16GB of RAM. I was thinking that the problem maybe is >> in >> > the code. Because if I run at up to 3 processors I dont have any >> problens >> > but when I try with 4 or more I get this problem. >> > > So you have 8 cores per node, and 2 GB of RAM per core, which is pretty > standard. > > I ran your 200^3 code on my Mac workstation and watched the memory usage > in Activity Monitor. > > The results were somewhat surprising as I added cores: > > 1 core: 2.22 Gb/core > 2 cores: 4.0 Gb/core > 3 cores: slightly more than 4.0 Gb/core > 4 cores: machine went into swap (I think) after approaching about 3.5 > Gb/core but code eventually finished > 5 cores: machine again went into swap at around 3.3 Gb/core but finished > eventually > > My workstation has 20 Gb of RAM, so including the OS I guess I could see > how approaching 16Gb might cause it to go into swap. > > But, what is happening when we go from 1 to 2 cores that causes the memory > usage per core to double?! > > Note that in all cases the memory quickly jumps to about 2.22 Gb core. In > the 1 processor case it stays there, but in the 2-5 processor cases, after > reaching 2Gb/core, it slowly ramps up to the approximately 4 Gb/core listed > above. > > This, combined with the error message you received (which comes from > Metis) leads me to believe that the partitioner is taking up a ton of > memory (partitioner doesn't run on 1 proc). So the questions become: > > 1.) Is the partitioner taking up a lot more memory than it conceivably > should? (Seems like yes.) > 2.) Is it taking up more than it used to? I.e., has a bug been introduced > recently (Metis and Parmetis were last updated in April 2013, so pretty > recently actually) > > I don't know about reverting to a prior version of Metis/Parmetis is > easily done at this point, but the relevant hashes where the refresh > happened seem to be: > > e80824e86a > 1c4b6a0d12 > > I may take a stab at this after lunch... Cody has been seeing similar > issues recently as well. > I confirmed that changing the partitioner does seem to reduce the overall memory usage appreciably. Linear Partitioner 1 core: 2.22 Gb/core 2 cores: about 2.7 Gb/core peak 3 cores: same as 2 cores 4 cores: about 2.6 Gb/core CentroidPartitioner 1 core: 2.22 2 cores: about 3 Gb/core peak 4 cores: about 2.8 Gb/core peak SFCPartitioner 1 core: 2.22 2 cores: slightly > 3 Gb/core peak 4 cores: almost exactly the same Gb/core as 2 cores case Using the Activity Monitor does not provide a huge amount of accuracy, but I think the trends are about the same for the Linear, Centroid, and SFC partitioners, and make a lot more sense than the Metis results. In particular, I was able to run on 4 cores without going into swap. -- John |