One limitation of PBJelly we identified is that mapping results were holding us back during or remap/resupport stage of assembly, especially when defining trim boundaries of negative gaps.
Consider the following fake contig PBJelly assembled around a negative gap and it's alignment to the reference.
PBJelly Contig - ...ATCGATCGA------TCGA-----------------GATCGATCG...
Reference - ...ATCGATCGATCGAGATCGATCGGNNNNNNNNNNATCGATCGATCG...
Previously, PBJelly would trim 4 bases off of the 3' end of the left contig (TCG) and 3 bases off of the 5' end of the right contig (ATC) before joining the contigs. However, if you look closely, the first and third gaps in the realignment are unnecessary. That is to say, one can realign the bases TCGA further upstream in a manner more continuous with the rest of the PBJelly contig. After realignment, 9 bases are identified for trimming from the 3' end of the left contig.
In addition to the realignment improvement, we also have sped up the remap/resupport stage of assembly. PBJelly used to map the contigs it created to the entire original reference to find support. This is unnecessary because we've already identified all reads within a contig as supporting a particular gap. We really only need to remap to the gap-flanking region of interest in order to identify what part of our new contig fills the gap. Now PBJelly only remaps to the region of the gap that is being improved. This is the same improvement that allowed us to run PBJelly on the very large Sooty mangabey reference in a reasonable amount of time. It took approximately 2 days from start to finish to improve the Sooty mangabey by submitting up to 300 jobs to the cluster per step in the PBJelly workflow (i.e. splitting the assembly effort into 300 independent partitions - something easily done using the nJobs element in your PBJelly protocol settings.)
Log in to post a comment.