Re: [Bio-bwa-help] BWA-MEM on HG38
Status: Beta
Brought to you by:
lh3lh3
From: Joseph F. <jos...@gm...> - 2017-08-30 16:34:25
|
I don't mean to be "that guy," but BWA is an *aligner*, not an assembler. I see misuse of these two terms more and more frequently in the workshops I teach, so perhaps I'm over sensitized to it. ~Joe On Wed, Aug 30, 2017 at 6:09 AM, Adrian Pelin <ape...@gm...> wrote: > I am just shooting in the air here but have you tried monitoring memory > consumption during each if the assemblies? For example for your 32 threads > hg 37 how close does it get to using up 100% of your memory, and how does > that compare with 16 threads for hg38? My guess albeit obvious is that the > memory per thread is different for both genomes. May be very hard to > pinpoint as to why. > > > On Wed, Aug 30, 2017 at 8:52 AM Vladimir Kovacevic <vladimir.kovacevic@ > sbgenomics.com> wrote: > >> Hi all, >> >> I was trying to process some WGS samples (50x - 60x) with Broad's >> recommended HG38 reference genome and experienced a number of failures due >> to lack of memory. The machine I used was AWS C4.8xlarge (36CPU, 60GB, >> 700GB). When I lower the threads number from 36 to 15 all of the samples >> complete successfully. Let me just highlight that with HG37 all samples >> complete successfully with the maximum number of threads. Also, I tried >> processing with modified HG38 reference genome having excluded all regions >> (alt, hla, GL,...) except chromosomal and it still fails on some samples >> when 36, 30, 25 and 20 are set as the number of threads. >> Does someone know why BWA-MEM requires much more memory with HG38 than >> with HG37 reference genome? I suspected on the number of N bases (HG37: 3950662, >> HG38: 2351523), but the difference of ~1.6 million seems small comparing >> to the size of the entire genome (3 billion). >> >> Here is the one of the command lines: >> /opt/bwa-0.7.13/bwa mem -R '@RG\tID:1\tLB:22108087\tPL:Illumina >> HiSeq\tPU:FCA\tSM:HG005' -t 36 hg38.chr.sbg.fasta >> HG005.150424_S1.pe_1.fastq.gz HG005.150424_S1.pe_2.fastq.gz | >> /opt/samblaster/samblaster -r -i /dev/stdin -o /dev/stdout >> 2>HG005.150424_S1.samblaster.log | /opt/sambamba_v0.6.0 view -t 36 -f >> bam -l 0 -S /dev/stdin | /opt/sambamba_v0.6.0 sort -t 36 -m 8GiB --tmpdir >> ./ -o HG005.150424_S1.bam -l 5 /dev/stdin >> >> At the end, when piping status is checked, BWA-MEM has failed with the >> error code 137. >> >> Thanks, >> Vladimir Kovacevic >> >> >> ------------------------------------------------------------ >> ------------------ >> >> Check out the vibrant tech community on one of the world's most >> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______ >> _________________________________________ >> >> Bio-bwa-help mailing list >> >> Bio...@li... >> >> https://lists.sourceforge.net/lists/listinfo/bio-bwa-help >> >> > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Bio-bwa-help mailing list > Bio...@li... > https://lists.sourceforge.net/lists/listinfo/bio-bwa-help > > -- Joseph Fass Bioinformatics Data Analyst UC Davis Genome Center - Bioinformatics Core http://bioinformatics.ucdavis.edu/ jn...@uc... phone ~ 530.752.2698 |