|
From: Adam P. <aph...@gm...> - 2015-11-02 22:56:39
|
Hi Afif, This is the intended behavior of the program. I believe you are misinterpreting the meaning of a maximal unique match. A maximal match is defined as a match that cannot be extended on either end without encountering a mismatch. Just because a MUM contains a repetitive sequence, does not make the whole MUM non-unique. For the two sequences with unique seqs 'U' and tandem repeats 'T': A: UUUTTTUUU B: UUUTTUUU There would be MUMs found on either side: UUUTT and TTUUU. These two MUMs overlap on both T's in B and overlap on the middle T in A. Both MUMs are unique, i.e. they don't appear anywhere else as a whole ... though they do contain substrings that are repetitive. I hope this little example makes it a little more clear. Nucmer or dnadiff are the best tools in MUMmer for comparing contigs to a reference. I prefer to run: > nucmer -maxmatch -banded ref.fna. contigs.fna > delta-filter -q out.delta > out.qdelta > show-coords -THrcl out.qdelta > out.coords This will report the best alignments found for each contig to the reference, which you can parse to identify large differences or run show-snps to identify smaller polymorphisms. Best, -Adam On Mon, Nov 2, 2015 at 2:54 PM, Afif Elghraoui <ael...@sd...> wrote: > Hello, > I am trying to get a very basic comparison between a de novo assembled > microbial genome and a reference sequence. I used nucmer at first, but > there was a tandem duplication that was causing overlapping alignments > and I just wanted to know precisely what was different. It appears that > the plain mummer program would be appropriate for finding this out, so I > called mummer as follows: > > mummer -mum <reference> <query> > > By using the -mum flag, I thought that I'm not supposed to be seeing any > overlap in the matches in both the query and reference, but I found an > instance of this in those positions where I was previously getting > overlaps in my nucmer alignments: > > 3569553 3570622 132349 > 3701734 3702919 39749 > > If I understood the format of the output correctly, the first match here > overlaps the second in the reference (3569553+132349 = 3701902 > > 3701734) and the query. By manual inspection of blast results of these > positions, I know that there is a tandem repeat in this region where the > query has five copies and the reference has three copies and the overlap > in the matches here matches all three copies in the reference twice and > also matches one of the copies in the query twice. > > This doesn't look to me like intended behavior of the program, > especially since I'm not using -maxmatch. Am I doing something wrong or > misunderstanding anything here? I'm using MUMmer 3.23 as packaged in > Debian 8. I can provide my query genome privately if necessary. > > Many thanks and regards > Afif > > -- > Afif Elghraoui > Laboratory for Pathogenesis of Clinical Drug Resistance and Persistence > San Diego State University > Alvarado Medical Center > 6367 Alvarado Court, Suite 206 > San Diego, CA 92120 > p. 858-222-0454 > http://tuberculosis.sdsu.edu > > > ------------------------------------------------------------------------------ > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |