|
From: Adam P. <aph...@gm...> - 2013-12-17 15:49:18
|
Hi Jacqueline, Sorry, dnadiff hasn't made it into the online docs yet, but there is a description of it in the source distribution under README and docs/dnadiff.README I have some experience aligning bird genomes with MUMmer, and it can be very computationally intensive. MUMmer was originally designed for bacterial genome alignment, and thus doesn't scale all that well to large genomes -- but it can be done. The first question is how similar are you genomes? Nucmer doesn't perform very well if the similarity drops below ~90% identity. Second, you will want to run it in the "-mumreference" mode with the zebra finch genome as the reference. This will exclude repetitive matches that would otherwise extend the runtime too long. Also, depending on your available memory you may need to align the chromosomes one at a time or in batches (e.g. one finch chromosome as the reference per run of nucmer). If you have enough RAM to do it all in one go: > nucmer -mumreference zebrafinch.fasta yourgenome.fasta > show-coords -THrcl out.delta > out.coords That will produce a tab-delimited set of alignments for you to further analyze. Usually, I would generate summary statistics like the ones you mentioned using dnadiff, but I think that script would take too long on your dataset (because it also reports all SNPs, etc). If you want to try it, no guarantees, you can run it like so: > dnadiff -d out.delta And it will analyze the alignment (delta) file you generated previously. The big caveat here is that by running numcer in the -mumreference mode, many repeats will not be aligned. You'll have to keep this in mind when compiling your statistics. If this all seems to take too long using these tools, you can try another aligner that scales better for large genomes, like BLAT. Best, -Adam On Mon, Dec 16, 2013 at 3:10 PM, Jacqueline R M Doyle <jm...@pu...>wrote: > Hi, > > I have recently done a couple different assemblies of an avian genome and > a reviewer has suggested aligning the two assemblies to the zebra finch > genome and seeing which assembly aligns best. The idea here is that the > assembly that overlaps most closely with the zebra finch genome is probably > the best one to use for downstream analyses. I'd like to align each > assembly to the zebra finch genome using nucmer and then generate some > summary statistics like number of aligned/unaligned contigs, total > aligned/unaligned length, percent of aligned bases, etc. What is the best > way to go about generating this type of data? I found references to a > script called "dnadiff" in the email help archives, but couldn't find the > scrip referenced in the MUMmer 3 manual. > > Best wishes... > > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics > Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |