An Hierachical Approach to Multi-Reference Genome compression

Add a Review
1 Download (This Week)
Last Update:
Download Source code.zip
Browse All Files



The storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We present a novel multi-reference based genome compression method with a hierachical structure. Our approach works for the de facto standard alignment format (i.e., BAM) compression that is the pressing need at present. We align new sequences to a reference sequence using SOAP3, a GPU-based aligning software, and summarize mapping properties and information for exact mapped reads. To increase the exact aligning rate, we also realign the approximately mapped and unmapped reads by changing the reference sequence or shortening the read length. Meanwhile, we further the study using “lossy” quality values through k-means clustering scheme and find its minute effect on downstream applications. The proposed method has achieved compression ratios from 0.5 to 0.65, which corresponds to space savings of 35%-50%, on experimental datasets.

Hierachical_DNAcoder Web Site


  • Efficient
  • Fast
  • Promising

Update Notifications

Write a Review

User Reviews

Be the first to post a review of Hierachical_DNAcoder!

Additional Project Details



Intended Audience

Engineering, Information Technology, Science/Research

Programming Language



Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.