From: Nils H. <nil...@gm...> - 2012-03-01 16:27:15
|
Hey SAMtools developers, I have been working on speeding up the compression and decompression of SAM/BAM files by implementing a multi-threaded BGZF reader/writer. The main idea is to parallelize the inflate and deflate functions on individual blocks, similar to pbzip2. I added a "pbgzip" directory that contains a multi-threaded block gzip compressor and decompressor utility called pbgzip. It auto-detects the number of cores available, with the "-n" option overriding the default value. It is analogous to the 'bgzip" utility within SAMtools. See the timing stats below. I have also started work on having SAMtools use this API to read and write BAM files. This is where I need your help. I think it is a worthwhile cause to remove some of the computational bottleneck by using this API, which can potentially be ported over to Picard. I am asking for help developing and testing this version from the fellow SAMtools developers. Some of the functions are partially working (samtools view or samtools index), but others are untested and need debugging. I have posted my code here: https://github.com/nh13/samtools/tree/pbgzip Please send pull requests there if you wish to contribute. Thanks again for all your hard work! Sincerely, Nils Homer A 4GB SAM file was used on a dual-hex-core (12 cores) computer. I benchmarked compression then decompression, making sure the resulting files were the same. Decompression seems to be limited by IO. Name Compression Time Decompression Time bgzip 485.64 39.93 pbgzip -n 1 481.57 40.02 pbgzip -n 2 240.85 41.03 pbgzip -n 4 122.05 41.79 pbgzip -n 8 63.17 41.17 pbgzip -n 12 43.12 41.65 pbgzip -n 16 39.59 41.48 pbgzip -n 20 37.03 42.41 pbgzip -n 24 34.90 47.24 |