[MUMmer-help] To Mask or not to Mask, that is (one of) the question(s)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello Mummers,

I have three main clusters of questions relating to finishing a 500 Mbp eukaryotic diploid heterozygous genome composed of about 40% repetitive elements.

1) My analysis of the assembly suggests that perhaps as much of 100Mp of the assembly is contigs that have been assembled as haplotypes and should be merged if I want this to be a haploid assembly. I would like to use mummer/nucmer to compare the assembly against itself and look for overlapping regions that can be merged. Since I’m looking for anchor points that are not unique I assume I have to use the —maxmatch and —no simplify options. Would you suggest I use a masked genome to prevent the highly repetitive elements from aligning? If I use a masked genome should I edit the scoring matrix to extend through N’s for free? Is there a better approach to identifying merge-able contigs?

2) I would like to compare my assemblies with a related species whose genome is published. It is divergent enough that PROMER is definitely the way to go. Ideally I would like an easy to visualize synteny map which would be composed of long lines for clustered matching genes. Since this is a eukaryote with large repetitive and intergenic regions what parameters would you suggest for mixmatch, mincluster, breaklen, diagdiff/diagfactor, and maxgap? I have been tweaking them but am not completely clear about which part of the pipeline each is relevant for, especially diagdiff/diagfactor.

3) I have MummerGPU installed and a tesla/kepler card to power it. What modifications should I make to the nucmer and proper scripts to implement it?

Thanks for taking the time to read this!

Best regards,
Jason