|
From: Shaun J. <sja...@bc...> - 2011-12-21 20:35:56
|
Hi Colin, Good point. I often work with SAM files or compressed SAM files rather than BAM files. For me, a tool that takes SAM input and produces SAM output is often more useful than a tool that produces BAM output. I ran some more timing tests taking a SAM input and producing a compressed output, either .bam or .sam.gz format. The compression, it seems, is as much work as the sorting (for a 3.5 GB SAM file). Jared Simpson pointed out that I should set the memory buffer to the same amount for the two tools. I've set the memory buffer to 8 GB for a 3.5 GB file so that both tools will sort entirely in main memory. The fastest way to sort and compress a SAM file was a UNIX sort piped into gzip --fast, which was 30% faster than samtools sort. The gzip --fast compressed SAM file was 18% larger than the BAM file. The default gzip compressed SAM file was 7% smaller than the BAM file, but took 15% longer than samtools sort. 2m57s samtools view -Su |samtools sort 3m28s sort |samtools view -Sb 3m47s sort |gzip 2m3s sort |gzip --fast 627 MB samtools view -Su |samtools sort 627 MB sort |samtools view -Sb 586 MB sort |gzip 737 MB sort |gzip --fast Cheers, Shaun $ time samtools view -Su test.sam |samtools sort -m 8589934592 -o - - >/dev/null real 2m57.482s user 2m55.054s sys 0m6.648s $ time sort -S8G -snk3 -k4 test.sam |samtools view -Sbt GRCh37.fa - >/dev/null real 3m28.060s user 3m26.836s sys 0m4.762s $ time sort -S8G -snk3 -k4 test.sam |gzip >/dev/null real 3m47.821s user 3m47.739s sys 0m3.286s $ time sort -S8G -snk3 -k4 test.sam |gzip --fast >/dev/null real 2m3.292s user 2m3.019s sys 0m4.336s On Tue, 2011-12-20 at 18:13 -0800, Colin Hercus wrote: > Hi Shaun, > > That's interesting but you are ending up with two different results. > With samtools sort you end up with a compressed bam file and with > Linux sort you still have a sam file (and with no headers). Add the > sam to compressed bam cost to Linux sort and I think samtools is the > winner. > > Kind Regards, Colin > > On Wed, Dec 21, 2011 at 4:04 AM, Shaun Jackman <sja...@bc...> > wrote: > Hi, > > To sort a SAM file, UNIX sort takes less than half the time of > samtools. > Here's a test with a 3.5 GB SAM file: > > $ time samtools view -Su test.sam |samtools sort -o - - > >/dev/null > [samopen] SAM header is present: 25 sequences. > [bam_sort_core] merging from 7 files... > > real 3m55.149s > user 3m48.554s > sys 0m5.623s > > $ time sort -snk3 -k4 test.sam >/dev/null > > real 1m38.004s > user 1m26.216s > sys 0m7.494s > > This trick works if your sequence IDs are in an order that can > be sorted > by UNIX sort. That is, the @SQ headers must be sorted either > alphabetically or numerically. The above sort command uses the > -n option > to sort numerically. > > Cheers, > Shaun > > $ sort --version > sort (GNU coreutils) 7.6 > $ samtools > Program: samtools (Tools for alignments in the SAM format) > Version: 0.1.18 (r982:295) > > $ time samtools view -Su 30NE8AAXX_3.sam >test.bam > [samopen] SAM header is present: 25 sequences. > > real 1m8.586s > user 0m40.915s > sys 0m4.096s > > $ du -h test.bam > 2.9G test.bam > > $ time samtools sort -o test.bam - >/dev/null > [bam_sort_core] merging from 7 files... > > real 3m47.551s > user 3m3.334s > sys 0m3.145s > > $ time samtools view -Sb 30NE8AAXX_3.sam >test.bam > [samopen] SAM header is present: 25 sequences. > > real 2m37.593s > user 2m33.125s > sys 0m3.267s > > $ du -h test.bam > 835M test.bam > > $ time samtools sort -o test.bam - >/dev/null > [bam_sort_core] merging from 7 files... > > real 3m28.348s > user 3m16.909s > sys 0m2.065s > > > > > ------------------------------------------------------------------------------ > Write once. Port to many. > Get the SDK and tools to simplify cross-platform app > development. Create > new or port existing apps to sell to consumers worldwide. > Explore the > Intel AppUpSM program developer opportunity. > appdeveloper.intel.com/join > http://p.sf.net/sfu/intel-appdev > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel > |