I have been trying to run a large wheat input on a large cluster system, and I have gotten as far as running it to completion using ~1000 cores, but with ~3000 cores it appears to be stuck trying to get through merAligner-51. It looks like there is probably some contention for some shared resources, but I was wondering if you might have any insights or suggestions as to what might be going on and what I might try next.
I built UPC/GASnet with --disable-shmem --with-sptr-packed-bits=20,12,32, and I am using the Intel compilers and SGI MPT for MPI. I have run HipMer (built using UPCC_FLAGS="-network mpi") with UPC_SHARED_HEAP_MB ranging from 800MB to 3000MB. My configuration file looks like this:
Hello there!
I have been trying to run a large wheat input on a large cluster system, and I have gotten as far as running it to completion using ~1000 cores, but with ~3000 cores it appears to be stuck trying to get through merAligner-51. It looks like there is probably some contention for some shared resources, but I was wondering if you might have any insights or suggestions as to what might be going on and what I might try next.
I built UPC/GASnet with --disable-shmem --with-sptr-packed-bits=20,12,32, and I am using the Intel compilers and SGI MPT for MPI. I have run HipMer (built using UPCC_FLAGS="-network mpi") with UPC_SHARED_HEAP_MB ranging from 800MB to 3000MB. My configuration file looks like this:
$ cat meraculous-cs42.config
lib_seq R1.fastq,R2.fastq CS42 1000 150 251 1 1 1 1 1 0 0 2 1
is_diploid 0
genome_size 13
mer_size 51
min_depth_cutoff 0
gap_close_rpt_depth_ratio 1.75
gap_close_aggressive 1
Thank you so much!!
-- Haruna :)