Re: [Denovoassembler-devel] [CCS #177295] MPICH on titan uses a lot of memory (?)
Ray -- Parallel genome assemblies for parallel DNA sequencing
Brought to you by:
sebhtml
From: Sébastien B. <seb...@ul...> - 2013-10-21 15:21:02
|
Hello, I worked on this issue, but have not find a fix yet. I am running test jobs with 313 nodes and 5008 MPI ranks. Carlos P. Sosa (from Cray) suggested that I add -N 16 to my command, which is now: titan> cat HiSeq-2500-NA12878-demo-2x150-6.sh #PBS -N HiSeq-2500-NA12878-demo-2x150-6 #PBS -l walltime=12:00:00 #PBS -l nodes=313 #PBS -A LSC005 #PBS -l gres=widow1 cd $PBS_O_WORKDIR # 313 * 8 * 2 = 5008 #-debug \ aprun -n 5008 -N 16 \ ./software/lsc005/Ray/616d2a26cc1e39f59325a0e632af46262edaa12c-1/Ray \ -k 31 \ -detect-sequence-files HiSeq-2500-NA12878-demo-2x150 \ -o HiSeq-2500-NA12878-demo-2x150-6 \ The job still crashes. The virtual memory usage of one single rank looks like this before crashing: Rank 3755: assembler memory usage: 106072 KiB Rank 3755: assembler memory usage: 171804 KiB Rank 3755: Rank= 3755 Size= 5008 ProcessIdentifier= 17973 Rank 3755 is testing the network [0/1000] Rank 3755 is testing the network [1000/1000] Rank 3755: mode round trip latency when requesting a reply for a message of 4000 bytes is 27 microseconds (10^-6 seconds) Rank 3755: average round trip latency when requesting a reply for a message of 4000 bytes is 263 microseconds (10^-6 seconds) Rank 3755 is loading sequence reads Rank 3755 : partition is [878283235;878517131], 233897 sequence reads Rank 3755 is fetching file HiSeq-2500-NA12878-demo-2x150/sorted_S1_L002_R1_002.fastq.gz with lazy loading (please wait...) Rank 3755 has 0 sequence reads Rank 3755: assembler memory usage: 3410716 KiB Rank 3755 has 100000 sequence reads Rank 3755: assembler memory usage: 3410716 KiB Rank 3755 has 200000 sequence reads Rank 3755: assembler memory usage: 3410716 KiB Rank 3755 has 233897 sequence reads (completed) Rank 3755 created its Bloom filter Rank 3755 is counting k-mers in sequence reads [1/233897] Rank 3755 has 100000 vertices Rank 3755: assembler memory usage: 3423004 KiB Rank 3755 has 200000 vertices Rank 3755: assembler memory usage: 3423004 KiB Rank 3755 has 300000 vertices Rank 3755: assembler memory usage: 3423004 KiB Rank 3755 has 400000 vertices Rank 3755: assembler memory usage: 3423004 KiB So the first report of virtual memory usage is ~ 100 MiB. But then when file I/O commences, it goes up to 3.4 GiB (this must include Cached and other things shared between processes I suppose). In the error log, MPICH2 complains that he can't allocate 12 MiB: MPICH2 ERROR [Rank 3755] [job id 3742358] [Wed Oct 16 08:11:23 2013] [c1-5c1s5n1] [nid18933] - MPIU_nem_gni_get_hugepages(): Unable to mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize-2097152/hugepagefile.MPICH.2.17973.kvs_3742358, err Cannot allocate memory Error log: /ccs/home/sebhtml/lsc005/projects/human-1-hour/HiSeq-2500-NA12878-demo-2x150-6.e1754627 Each node has 33 GiB of memory. When the crash strikes, 22 GiB are Cached (due to file I/O), and 12 GiB is Inactive. There is probably some sort of overlap between Cached and Inactive. See below the /proc/meminfo: [Rank 3758] Cat of /proc/meminfo [Rank 3755]: MemTotal: 33084652 kB [Rank 3755]: MemFree: 3984520 kB [Rank 3755]: Buffers: 0 kB [Rank 3755]: Cached: 22332700 kB [Rank 3755]: SwapCached: 0 kB [Rank 3755]: Active: 12556068 kB [Rank 3755]: Inactive: 12527116 kB [Rank 3758]: MemTotal: 33084652 kB [Rank 3755]: Active(anon): 2637848 kB [Rank 3758]: MemFree: 3984892 kB [Rank 3755]: Inactive(anon): 168920 kB [Rank 3758]: Buffers: 0 kB [Rank 3755]: Active(file): 9918220 kB [Rank 3758]: Cached: 22332700 kB [Rank 3755]: Inactive(file): 12358196 kB Also, DirectMap1G is 17825792 kB, which may mean that calls to brk() will join 1G segments to the virtual memory of my processes. My last piece of evidence is what is reported by slabinfo (the slab allocator is *supposed* to use slabs to allocate memory, hence should reuse memory as much as possible). Here is what slabinfo had to say: [Rank 3758]: slabinfo - version: 2.1 [Rank 3758]: # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> [Rank 3758]: ll_async_page 5569508 6526674 336 11 1 : tunables 54 27 8 : slabdata 593334 593334 432 So there are 5569508 active_objs in the ll_async_page slab cache. From the evidence I gathered, here is my hypothesis: This is a problem with the Lustre client on every compute nodes. I am using Spider's widow 1. I will rerun with the -debug option. Sébastien On 26/09/13 10:49 AM, Fernanda Foertter via RT wrote: > > Hi Sébastien, > > I can't find any reason why MPICH would use that much memory. Although, you > appear to list virtual memory, not actual memory. > One way to avoid running out of memory on compute nodes is to reduce the number > of cores (MPI tasks) per node. That way, there is more memory per core. > > Also, have you tried profiling your application? You can find information of > available profiling and debugging tools here: > https://www.olcf.ornl.gov/support/software/?softwaretype=kb_software_debugging_and_profiling > > Happy Computing! > F² > > > -------------------------------------------------------- > Fernanda Foertter > National Center for Computational Sciences > Oak Ridge National Laboratory > O: 865-576-9391 F: 865-241-2850 > foe...@or... > > > > On Mon Sep 23 16:54:33 2013, seb...@ul... wrote: >> Dear OLCF support: >> >> I launched a job (Job # 1732882) with 313 nodes (5008 MPI ranks) >> on titan. >> >> >> The code is Ray ( http://denovoassembler.sourceforge.net/ ) and >> does not use the GPU. I am the author of Ray. >> >> >> The job failed with a MPICH2 error. >> >> >> Standard error: >> /tmp/proj/lsc005/projects/human-1-hour/HiSeq-2500-NA12878-demo- >> 2x150-3.e1732882 >> >> Standard output: >> /tmp/proj/lsc005/projects/human-1-hour/HiSeq-2500-NA12878-demo- >> 2x150-3.o1732882 >> >> Launch script: >> /tmp/proj/lsc005/projects/human-1-hour/HiSeq-2500-NA12878-demo- >> 2x150-3.sh >> >> Output directory: >> /tmp/proj/lsc005/projects/human-1-hour/HiSeq-2500-NA12878-demo- >> 2x150-3 >> >> >> >> >> The errors I got: >> >> MPICH2 ERROR [Rank 1227] [job id 3577704] [Mon Sep 16 20:34:24 2013] >> [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): Unable to >> mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize- >> 2097152/hugepagefile.MPICH.2.27853.kvs_3577704, err Cannot allocate >> memory >> MPICH2 ERROR [Rank 1227] [job id 3577704] [Mon Sep 16 20:34:24 2013] >> [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): large page >> stats: free 0 nr 158 nr_overcommit 16154 resv 0 surplus 158 >> MPICH2 ERROR [Rank 1230] [job id 3577704] [Mon Sep 16 20:34:24 2013] >> [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): Unable to >> mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize- >> 2097152/hugepagefile.MPICH.2.27856.kvs_3577704, err Cannot allocate >> memory >> MPICH2 ERROR [Rank 1230] [job id 3577704] [Mon Sep 16 20:34:24 2013] >> [c19-4c0s2n1] [nid12091] - MPIU_nem_gni_get_hugepages(): large page >> stats: free 0 nr 165 nr_overcommit 16154 resv 0 surplus 165 >> MPICH2 ERROR [Rank 4378] [job id 3577704] [Mon Sep 16 20:34:24 2013] >> [c0-2c1s6n0] [nid00114] - MPIU_nem_gni_get_hugepages(): Unable to >> mmap 12582912 bytes for file /var/lib/hugetlbfs/global/pagesize- >> 2097152/hugepagefile.MPICH.2.24160.kvs_3577704, err Cannot allocate >> memory >> MPICH2 ERROR [Rank 4378] [job id 3577704] [Mon Sep 16 20:34:24 2013] >> [c0-2c1s6n0] [nid00114] - MPIU_nem_gni_get_hugepages(): large page >> stats: free 0 nr 173 nr_overcommit 16154 resv 0 surplus 173 >> >> >> >> >> The reported memory usage (I use the VmData entry in >> /proc/self/status) was > 3 GiB per MPI rank at the beginning, before >> my application does anything. >> >> > > > -- > |