|
From: Adam P. <aph...@gm...> - 2014-03-05 00:06:40
|
Hi Nengbing, I don't think I ever responded to your question. I'm sorry. The parameters -l 7 -c 15 sets a minimum seed length of 7 and a min cluster length of 15. The cluster length is computed as the sum of the seed lengths. Thus if you have a 15bp "match" with a SNP at position 5, you'll get a 9bp seed from positions 6-15. However, on the other side of the SNP, from positions 1-4, there isn't enough room for another seed. Thus, you'll be left with a single seed of length 9. This doesn't meet the minimum cluster length of 15 and will be discarded. Bottom line, nucmer isn't designed for short, inexact matching. If you wanted to force it to find all 15bp 1-mismatch alignments, you would need to set both the min match and min cluster lengths to 7bp. This would cover the worst-case scenario of a single SNP directly in the middle of the 15bp sequence. This would also generate many false-positive alignments that you could filter with delta-filter. However, there are better aligners for what you are trying to do. Best, -Adam On Thu, Feb 6, 2014 at 11:44 AM, TAO, NENGBING [AG/1005] < nen...@mo...> wrote: > Hi, Adam, > > > > I sent an email two or three times to mummer-help and kept > receiving messages that I need to register even after I did so, but did > not want to send more fearing it might get through and flood someone's > inbox. > > > > I am trying to use nucmer to compare two fasta files and > try to identify matches that are 15 bp long allowing 1 mismatch/indel and I > don't seem to get all matches using nucmer. > > > > Here are the basic stats of query and db files: > > file avg stDev min max median numOfSeq numOfBase > > db.fa 21.73 1.71 15 28 21 7072 153724 > > Seq.fa 256 0 256 256 256 1 256 # just one > sequence > > > > Here are the commands that I used: > > nucmer > > NUCmer (NUCleotide MUMmer) version 3.1 > > > > nucmer --maxmatch -l 7 -c 15 -p oNucmer_1 db.fa Seq.fa; show-coords -T -r > -c -l oNucmer_1.delta > oNucmer_1.coords > > delta-filter -i 93 -l 15 oNucmer_1.delta > filtered_oNucmer_1.delta > > show-coords -T -r -L 15 -I 0.93 -c -l filtered_oNucmer_1.delta > > filtered_oNucmer_1.coords > > wc -l oNucmer_1.coords > > 85 > > wc -l filtered_oNucmer_1.coords > > 4 > > > > I am reasonably confident that there are ~90 matches that > are 15 bp long allowing 1 mismatch/indel, wheres nucmer only gave 4 > matches. I must have missed some parameters. I noticed that parameters -b, > -g, --nooptimize changes output quite a lot, but don't fully understand > what they mean or how they should be used. > > > > Could you kindly give me a pointer on what the best > approach is or what the appropriate parameters should be? > > > > > > Best regards, > > > > Nengbing > > > > > > > > > > *From:* Adam Phillippy [mailto:aph...@gm...] > *Sent:* Thursday, February 06, 2014 10:12 AM > *To:* Govinda Kamath > *Cc:* mummer-help > *Subject:* Re: [MUMmer-help] A clarification about NUCmer > > > > Hi Govinda, > > There is no minimum percent identity threshold for Nucmer and no > guarantees on what it will find. Instead, the sensitivity and quality of > the alignments depends on the minimum match size and cluster parameters > chosen. The closest thing to an identity threshold are the dynamic > programming extension scores, which are set to +3/-7. This equates to a min > avg identity of 70% for the DP algorithm to continue extending. However, > Nucmer will rarely find these low identity alignments, because they will > likely not be seeded. With default parameters, Nucmer is generally > sensitive to alignments >90% idy. > > > > If you want to cap the alignments to a certain identity, after the fact, > you can run delta-filter with the -i option to filter alignments below your > desired threshold. > > > > Hope this helps, > > -Adam > > > > > > On Tue, Feb 4, 2014 at 3:32 PM, Govinda Kamath <gk...@be...> > wrote: > > Hi, > > > > In NUCmer, what is the default percent identity threshold, above which > results are reported in the out.delta file? Also what is the metric (like > Hamming distance or Edit distance) is this calculated in? > > > > Thanks, > > Govinda. > > > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > > > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other > use of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. > |