You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
(5) |
Aug
(3) |
Sep
(10) |
Oct
(9) |
Nov
(4) |
Dec
(3) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(5) |
Feb
(4) |
Mar
(19) |
Apr
(5) |
May
(10) |
Jun
(3) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(14) |
Nov
(9) |
Dec
(8) |
| 2007 |
Jan
(13) |
Feb
(6) |
Mar
(8) |
Apr
(3) |
May
(7) |
Jun
(5) |
Jul
(6) |
Aug
(15) |
Sep
(13) |
Oct
(7) |
Nov
(15) |
Dec
(15) |
| 2008 |
Jan
(7) |
Feb
(15) |
Mar
(12) |
Apr
(24) |
May
(25) |
Jun
(14) |
Jul
(36) |
Aug
(17) |
Sep
(26) |
Oct
(26) |
Nov
(24) |
Dec
(42) |
| 2009 |
Jan
(15) |
Feb
(18) |
Mar
(26) |
Apr
(41) |
May
(45) |
Jun
(4) |
Jul
(5) |
Aug
(3) |
Sep
(10) |
Oct
(12) |
Nov
(10) |
Dec
(3) |
| 2010 |
Jan
(16) |
Feb
(9) |
Mar
(5) |
Apr
(5) |
May
(3) |
Jun
(11) |
Jul
(9) |
Aug
(3) |
Sep
(18) |
Oct
(5) |
Nov
(2) |
Dec
(5) |
| 2011 |
Jan
(3) |
Feb
(10) |
Mar
(16) |
Apr
(3) |
May
(5) |
Jun
(22) |
Jul
(4) |
Aug
(6) |
Sep
(9) |
Oct
(6) |
Nov
(5) |
Dec
(6) |
| 2012 |
Jan
(2) |
Feb
(2) |
Mar
(4) |
Apr
(7) |
May
(2) |
Jun
(5) |
Jul
(6) |
Aug
(6) |
Sep
(8) |
Oct
(2) |
Nov
|
Dec
(5) |
| 2013 |
Jan
(11) |
Feb
(2) |
Mar
(1) |
Apr
(3) |
May
(4) |
Jun
(3) |
Jul
(1) |
Aug
(3) |
Sep
(2) |
Oct
(1) |
Nov
(3) |
Dec
(5) |
| 2014 |
Jan
(5) |
Feb
(5) |
Mar
(4) |
Apr
|
May
(10) |
Jun
(2) |
Jul
(9) |
Aug
(2) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
| 2015 |
Jan
(4) |
Feb
(13) |
Mar
(6) |
Apr
(15) |
May
(8) |
Jun
(6) |
Jul
(3) |
Aug
|
Sep
(2) |
Oct
(3) |
Nov
(9) |
Dec
|
| 2016 |
Jan
|
Feb
(5) |
Mar
(7) |
Apr
(1) |
May
|
Jun
|
Jul
(2) |
Aug
(7) |
Sep
(7) |
Oct
(2) |
Nov
(8) |
Dec
(1) |
| 2017 |
Jan
(7) |
Feb
(5) |
Mar
(5) |
Apr
|
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
(3) |
Sep
|
Oct
|
Nov
(5) |
Dec
(4) |
| 2018 |
Jan
(1) |
Feb
(8) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
(8) |
Oct
(4) |
Nov
(1) |
Dec
|
| 2019 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
| 2020 |
Jan
(1) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
| 2021 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
| 2024 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Wollenberg, K. (NIH/N. [C] <wol...@ni...> - 2014-08-19 17:53:31
|
Hello: I am trying to run a map view analysis based on the tutorial and am getting a frustrating error when I try to include the .gff3 file for my reference sequence. I'm getting the error "The reference seq ID can't be found in the GFF files !". I have gone into my .gff3 file (downloaded the GenBank(full) file in gff3 format from GenBank) and my promer.coords file and made sure that the first ID under [TAGS] in .coords is identical to the entry in the first column of my .gff3 file. I'm still getting this error no matter how I try to make these match. Has anyone else out there run into this and is there a known solution? Am I trying to match the wrong fields? Cheers, Kurt Wollenberg, Ph.D. Contractor - MSC, Inc. Phylogenetics Specialist Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892-0485 Office 301-402-8628 http://bioinformatics.niaid.nih.gov<http://bioinformatics.niaid.nih.gov/> (Within NIH) http://exon.niaid.nih.gov<http://exon.niaid.nih.gov/> (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives |
|
From: Adam P. <aph...@gm...> - 2014-07-31 18:58:07
|
Hi Savita, If any of the mummer programs run for more than a few hours, I usually kill them and relaunch with different parameters. If any of the matching programs run for that long it means they are being overwhelmed by repetitive matches. You can remedy this by setting the "mumreference" option to consider only unique seeds, increase the minimum match length, and increase the minimum cluster length. Adjusting these options will reduce sensitivity somewhat, but will greatly accelerate the runtime. Best, -Adam On Wed, Jul 30, 2014 at 2:17 AM, Savita Karthikeyan <ks...@ib...> wrote: > Dear MUMmer users, > > I have a question about the running time of MUMmer. > > I'm using MUMer's PROmer tool to compare selected sequences from a draft > genome to another genome, to analyse high level synteny. > > The program has been running for around 27 days now, on a > server with 94GB memory. The input file sizes are 26566 sequences-512MB > and 9 sequences-360MB. I wanted to know approximately how long would you > expect PROmer to run for files of this size? Also, I was wondering if > there was an option to run PROmer in a multithreaded way? > > FYI, the .mgap output file is being updated regularly > and the command I used was: > promer --prefix=refBv_qryAh data/Bv.fa data/Ah_scaff.fa --maxmatch > --matrix=2 > > Any help greatly appreciated. > > Many thanks, > Savita Karthikeyan > Research Assistant, > Institute of Bioinformatics and Applied Biotechnology, > Biotech Park > Electronics City Phase I > Bangalore 560 100 > India > > > > > > > > ------------------------------------------------------------------------------ > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |
|
From: Savita K. <ks...@ib...> - 2014-07-30 06:33:15
|
Dear MUMmer users, I have a question about the running time of MUMmer. I'm using MUMer's PROmer tool to compare selected sequences from a draft genome to another genome, to analyse high level synteny. The program has been running for around 27 days now, on a server with 94GB memory. The input file sizes are 26566 sequences-512MB and 9 sequences-360MB. I wanted to know approximately how long would you expect PROmer to run for files of this size? Also, I was wondering if there was an option to run PROmer in a multithreaded way? FYI, the .mgap output file is being updated regularly and the command I used was: promer --prefix=refBv_qryAh data/Bv.fa data/Ah_scaff.fa --maxmatch --matrix=2 Any help greatly appreciated. Many thanks, Savita Karthikeyan Research Assistant, Institute of Bioinformatics and Applied Biotechnology, Biotech Park Electronics City Phase I Bangalore 560 100 India |
|
From: Adam P. <aph...@gm...> - 2014-07-26 01:45:15
|
Hi Theo, I have not seen that error before. The "geometry" option is an X toolkit parameter that sets the size of the displayed window. Can you email some details of the OS/environment you are running? Meanwhile, you might be able to get it working by commenting out a line in the mummerplot script. Search the script for the line containing "geometry" (there's only one, I believe), comment it out with a '#', and rerun. Best, -Adam On Fri, Jul 25, 2014 at 1:43 AM, <The...@cs...> wrote: > Hi, > > > > Any idea what’s cusing this error when trying to draw a nucmer plot?? > > > > Thanks. > > > > Theo > > > > > > Line= > > > > $mummerplot out.delta > > > > Setting up libexiv2-12 (0.23-1) ... > > Setting up libnetpbm10 (2:10.0-15+b1) ... > > Setting up librsvg2-common:amd64 (2.36.1-1) ... > > Setting up netpbm (2:10.0-15+b1) ... > > Setting up psutils (1.17.dfsg-1) ... > > Setting up liblensfun-data (0.2.5-2) ... > > Setting up liblensfun0 (0.2.5-2) ... > > Setting up ufraw-batch (0.18-2) ... > > Processing triggers for libgdk-pixbuf2.0-0:amd64 ... > > all29c@aahl-02-mel:~/data/015-campy_ass/miseq/2898$ mummerplot out.delta > > defined(%hash) is deprecated at /usr/bin/mummerplot line 884. > > (Maybe you should just omit the defined()?) > > defined(%hash) is deprecated at /usr/bin/mummerplot line 894. > > (Maybe you should just omit the defined()?) > > defined(%hash) is deprecated at /usr/bin/mummerplot line 981. > > (Maybe you should just omit the defined()?) > > defined(%hash) is deprecated at /usr/bin/mummerplot line 991. > > (Maybe you should just omit the defined()?) > > defined(%hash) is deprecated at /usr/bin/mummerplot line 1034. > > (Maybe you should just omit the defined()?) > > defined(%hash) is deprecated at /usr/bin/mummerplot line 1044. > > (Maybe you should just omit the defined()?) > > gnuplot 4.6 patchlevel 0 > > Reading delta file out.delta > > Writing plot files out.fplot, out.rplot > > Writing gnuplot script out.gp > > Forking mouse listener > > Rendering plot to screen > > Cannot open load file '-geometry' > > "-geometry", line 0: util.c: No such file or directory > > > > WARNING: Unable to run 'gnuplot -geometry 500x500+0+0 -title mummerplot > out.gp', Inappropriate ioctl for device > > > > *Dr Theo R. Allnutt* > > Project Scientist: Bioinformatics *|* Food Microbiology and Safety Group > Animal, Food and Health Sciences > CSIRO > > > > Phone: +61 397 313 204* | *Fax: +61 3 9731 3201 > the...@cs... <Nar...@cs...> *|* www.csiro.au > Address: CAFHS, 671 Sneydes Road, Werribee, VIC 3030; *and *AAHL, PO Bag > 24 > Geelong VIC 3220. > > > > > > > > > > *Dr Theo R. Allnutt* > > Project Scientist: Bioinformatics *|* Food Microbiology and Safety Group > Animal, Food and Health Sciences > CSIRO > > > > Phone: +61 397 313 204* | *Fax: +61 3 9731 3201 > the...@cs... <Nar...@cs...> *|* www.csiro.au > Address: CAFHS, 671 Sneydes Road, Werribee, VIC 3030; *and *AAHL, PO Bag > 24 > Geelong VIC 3220. > > > > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
|
From: <The...@cs...> - 2014-07-25 05:43:18
|
Hi,
Any idea what's cusing this error when trying to draw a nucmer plot??
Thanks.
Theo
Line=
$mummerplot out.delta
Setting up libexiv2-12 (0.23-1) ...
Setting up libnetpbm10 (2:10.0-15+b1) ...
Setting up librsvg2-common:amd64 (2.36.1-1) ...
Setting up netpbm (2:10.0-15+b1) ...
Setting up psutils (1.17.dfsg-1) ...
Setting up liblensfun-data (0.2.5-2) ...
Setting up liblensfun0 (0.2.5-2) ...
Setting up ufraw-batch (0.18-2) ...
Processing triggers for libgdk-pixbuf2.0-0:amd64 ...
all29c@aahl-02-mel:~/data/015-campy_ass/miseq/2898$ mummerplot out.delta
defined(%hash) is deprecated at /usr/bin/mummerplot line 884.
(Maybe you should just omit the defined()?)
defined(%hash) is deprecated at /usr/bin/mummerplot line 894.
(Maybe you should just omit the defined()?)
defined(%hash) is deprecated at /usr/bin/mummerplot line 981.
(Maybe you should just omit the defined()?)
defined(%hash) is deprecated at /usr/bin/mummerplot line 991.
(Maybe you should just omit the defined()?)
defined(%hash) is deprecated at /usr/bin/mummerplot line 1034.
(Maybe you should just omit the defined()?)
defined(%hash) is deprecated at /usr/bin/mummerplot line 1044.
(Maybe you should just omit the defined()?)
gnuplot 4.6 patchlevel 0
Reading delta file out.delta
Writing plot files out.fplot, out.rplot
Writing gnuplot script out.gp
Forking mouse listener
Rendering plot to screen
Cannot open load file '-geometry'
"-geometry", line 0: util.c: No such file or directory
WARNING: Unable to run 'gnuplot -geometry 500x500+0+0 -title mummerplot out.gp', Inappropriate ioctl for device
Dr Theo R. Allnutt
Project Scientist: Bioinformatics | Food Microbiology and Safety Group
Animal, Food and Health Sciences
CSIRO
Phone: +61 397 313 204 | Fax: +61 3 9731 3201
the...@cs...<mailto:Nar...@cs...> | www.csiro.au<http://www.csiro.au/>
Address: CAFHS, 671 Sneydes Road, Werribee, VIC 3030; and AAHL, PO Bag 24
Geelong VIC 3220.
Dr Theo R. Allnutt
Project Scientist: Bioinformatics | Food Microbiology and Safety Group
Animal, Food and Health Sciences
CSIRO
Phone: +61 397 313 204 | Fax: +61 3 9731 3201
the...@cs...<mailto:Nar...@cs...> | www.csiro.au<http://www.csiro.au/>
Address: CAFHS, 671 Sneydes Road, Werribee, VIC 3030; and AAHL, PO Bag 24
Geelong VIC 3220.
|
|
From: Adam P. <aph...@gm...> - 2014-07-23 19:51:50
|
Hi Astrid, It is most likely a memory issue. How much RAM does your machine have? As a rule of thumb, MUMmer requires approximately 17 bytes per base-pair of reference sequence. For a reference of your size, that would be over 8GB. If it is possible to partition your reference into multiple files, you can run mummer separately on each, which will save memory. No, MUMmer does not currently support multithreading. Best, -Adam On Tue, Jul 22, 2014 at 9:00 AM, astrid boehne <ast...@un...> wrote: > Hi > I am trying to run mummer to compare very similar genome assemblies (two > individuals of the same species) > and get back the following error. > I reduced the query and reference sequence size I used to match the > character limitations. > 1: PREPARING DATA > 2,3: RUNNING mummer AND CREATING CLUSTERS > # reading input file "ref_qry.ntref" of length 497187885 > # construct suffix tree for sequence of length 497187885 > # (maximum reference length is 536870908) > # (maximum query length is 4294967295) > # process 4971878 characters per dot > > #............................................................................ERROR: > mummer and/or mgaps returned non-zero > > Any suggestions on what is happening? > My guess is that this is a memory issue. > Is it possible to run Mummer multi-threaded? > Thank you very much in advance > All the best > Astrid Böhne > > -- > Astrid Böhne > Universität Basel > Zoologisches Institut > Evolutionsbiologie > Vesalgasse 1 > CH-4051 Basel > Switzerland > Phone +41 (0)61 267 03 05 > Fax +41 (0) 61 267 03 01 > > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |
|
From: astrid b. <ast...@un...> - 2014-07-22 13:00:18
|
Hi I am trying to run mummer to compare very similar genome assemblies (two individuals of the same species) and get back the following error. I reduced the query and reference sequence size I used to match the character limitations. 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS # reading input file "ref_qry.ntref" of length 497187885 # construct suffix tree for sequence of length 497187885 # (maximum reference length is 536870908) # (maximum query length is 4294967295) # process 4971878 characters per dot #............................................................................ERROR: mummer and/or mgaps returned non-zero Any suggestions on what is happening? My guess is that this is a memory issue. Is it possible to run Mummer multi-threaded? Thank you very much in advance All the best Astrid Böhne -- Astrid Böhne Universität Basel Zoologisches Institut Evolutionsbiologie Vesalgasse 1 CH-4051 Basel Switzerland Phone +41 (0)61 267 03 05 Fax +41 (0) 61 267 03 01 |
|
From: Sean M. <Sea...@uc...> - 2014-07-01 17:04:18
|
Hello, I ran an alignment of a file of contigs (FASTA from Velvet) against a reference E. coli genome (called MG1655.fa). I ran show-snps on the .delta file and got an error stating that MG1655 was not in the reference. How do I correct this? Sean |
|
From: Adam P. <aph...@gm...> - 2014-06-18 20:10:34
|
Hi Davide, In this context, "unique" means that there is only one alignment covering a region. Since multiple alignments can overlap one another, this option looks at a particular alignment and computes the fraction of its length where it is the *only* alignment that exists at that reference/query position. Best, -Adam On Wed, Jun 18, 2014 at 3:55 AM, Davide VERZOTTO (GIS) < ver...@gi...> wrote: > Hi Adam, > > > > May I kindly ask you how the -u option is actually working in > delta-filter, more in detail than what is written in the manual (for > example, what do you mean exactly with 'unique reference' and 'unique > query')? > > > > Thanks and regards, > > Davide > > > > > > *From:* Adam Phillippy [mailto:aph...@gm...] > *Sent:* Wednesday, May 28, 2014 4:27 AM > *To:* Davide VERZOTTO (GIS) > *Cc:* mum...@li... > *Subject:* Re: [MUMmer-help] dnadiff very slow for Human > > > > Hi Davide, > > I'm not aware of a converter between Mummer formats and the UCSC format > you referenced. However, all of the information required by that format is > contained within the Nucmer delta format, so it would be relatively > straightforward to write such a converter. > > > > Best, > > -Adam > > > > > > On Thu, May 22, 2014 at 11:20 PM, Davide VERZOTTO (GIS) < > ver...@gi...> wrote: > > Hi Adam, > > > > Thank you for your kind reply and your hint on mum-reference, I am testing > it now and seems indeed to dramatically reduce delta file sizes. I was > already increasing the minimum match length, but not the minimum cluster > length (what is actually the meaning of this field?). I also used the the > -l and -i options by applying a delta-filter first, and the -d option in > dnadiff. The slowness problem seems to be with the large number of small > contigs that we have, since it is not really affecting the big scaffolds. > > > > Just another question: is it possible to use dnadiff (or another MUMmer > suite) output to make the annotation lift-over from the Reference genome to > a de novo Human genome assembly using UCSC liftOver tool, which > requires first to chain the alignments found (see the chain format: > https://genome.ucsc.edu/goldenPath/help/chain.html), or other tools that > you may know? > > > > Thanks and regards, > > Davide > > > > > > On May 23, 2014, at 2:49 AM, Adam Phillippy wrote: > > > > Hi Davide, > > dnadiff was primarily designed for microbial genome comparison and > currently does not scale well for large genomes. The 'delta-filter' step is > certainly one of the major bottlenecks. delta-filter scales by the number > of matches it has to analyze, so you can speed things along by reducing the > total number of matches. A few ways to do this: > > > > 1. Run nucmer in mum-reference mode to ignore repetitive alignment seeds > > 2. Increase the minimum match length and minimum cluster length (this will > reduce sensitivity to low-identity alignments) > > 3. Run delta-filter with the -l and -i options to filter alignments by > length and identity (these filters are quick, compared to -1/-m/-r/-q which > all require a dynamic programming step) > > > > Once you have a filtered delta file using the above recommendations, you > can pass it directly to dnadiff using the -d option and it will skip the > alignment phase and process your delta filter directly--hopefully faster > than before. > > > > Best, > > -Adam > > > > > > > > On Tue, May 20, 2014 at 1:34 AM, Davide VERZOTTO (GIS) < > ver...@gi...> wrote: > > Dear MUMmer users, > > We are trying to apply dnadiff for the analysis of breakpoints between our > de novo Human genome assembly and the Reference genome, the latter divided > into multiple chromosomes / separate files. > > We have already computed a NUCmer comparison between the two assemblies > and the related delta file. After this, we tried to compare all our > scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more > than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for > internal server reasons), without writing any temporary file (apart from > the log line "Filtering alignments") and presumably just trying to run > "delta-filter -1". Did you already face this problem? Is there a way or > script to speed up dnadiff for the Human genome comparison? > > Thanks and regards, > Davide > > ------------------------------- > This e-mail and any attachments are only for the use of the intended > recipient and may be confidential and/or privileged. If you are not the > recipient, please delete it or notify the sender immediately. Please do not > copy or use it for any purpose or disclose the contents to any other person > as it may be an offence under the Official Secrets Act. > ------------------------------- > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > > > > > > ------------------------------- > This e-mail and any attachments are only for the use of the intended > recipient and may be confidential and/or privileged. If you are not the > recipient, please delete it or notify the sender immediately. Please do not > copy or use it for any purpose or disclose the contents to any other person > as it may be an offence under the Official Secrets Act. > ------------------------------- > > > > ------------------------------- > This e-mail and any attachments are only for the use of the intended > recipient and may be confidential and/or privileged. If you are not the > recipient, please delete it or notify the sender immediately. Please do not > copy or use it for any purpose or disclose the contents to any other person > as it may be an offence under the Official Secrets Act. > ------------------------------- > |
|
From: Davide V. (GIS) <ver...@gi...> - 2014-06-18 07:55:26
|
Hi Adam, May I kindly ask you how the -u option is actually working in delta-filter, more in detail than what is written in the manual (for example, what do you mean exactly with 'unique reference' and 'unique query')? Thanks and regards, Davide From: Adam Phillippy [mailto:aph...@gm...] Sent: Wednesday, May 28, 2014 4:27 AM To: Davide VERZOTTO (GIS) Cc: mum...@li... Subject: Re: [MUMmer-help] dnadiff very slow for Human Hi Davide, I'm not aware of a converter between Mummer formats and the UCSC format you referenced. However, all of the information required by that format is contained within the Nucmer delta format, so it would be relatively straightforward to write such a converter. Best, -Adam On Thu, May 22, 2014 at 11:20 PM, Davide VERZOTTO (GIS) <ver...@gi...<mailto:ver...@gi...>> wrote: Hi Adam, Thank you for your kind reply and your hint on mum-reference, I am testing it now and seems indeed to dramatically reduce delta file sizes. I was already increasing the minimum match length, but not the minimum cluster length (what is actually the meaning of this field?). I also used the the -l and -i options by applying a delta-filter first, and the -d option in dnadiff. The slowness problem seems to be with the large number of small contigs that we have, since it is not really affecting the big scaffolds. Just another question: is it possible to use dnadiff (or another MUMmer suite) output to make the annotation lift-over from the Reference genome to a de novo Human genome assembly using UCSC liftOver tool, which requires first to chain the alignments found (see the chain format: https://genome.ucsc.edu/goldenPath/help/chain.html), or other tools that you may know? Thanks and regards, Davide On May 23, 2014, at 2:49 AM, Adam Phillippy wrote: Hi Davide, dnadiff was primarily designed for microbial genome comparison and currently does not scale well for large genomes. The 'delta-filter' step is certainly one of the major bottlenecks. delta-filter scales by the number of matches it has to analyze, so you can speed things along by reducing the total number of matches. A few ways to do this: 1. Run nucmer in mum-reference mode to ignore repetitive alignment seeds 2. Increase the minimum match length and minimum cluster length (this will reduce sensitivity to low-identity alignments) 3. Run delta-filter with the -l and -i options to filter alignments by length and identity (these filters are quick, compared to -1/-m/-r/-q which all require a dynamic programming step) Once you have a filtered delta file using the above recommendations, you can pass it directly to dnadiff using the -d option and it will skip the alignment phase and process your delta filter directly--hopefully faster than before. Best, -Adam On Tue, May 20, 2014 at 1:34 AM, Davide VERZOTTO (GIS) <ver...@gi...<mailto:ver...@gi...>> wrote: Dear MUMmer users, We are trying to apply dnadiff for the analysis of breakpoints between our de novo Human genome assembly and the Reference genome, the latter divided into multiple chromosomes / separate files. We have already computed a NUCmer comparison between the two assemblies and the related delta file. After this, we tried to compare all our scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for internal server reasons), without writing any temporary file (apart from the log line "Filtering alignments") and presumably just trying to run "delta-filter -1". Did you already face this problem? Is there a way or script to speed up dnadiff for the Human genome comparison? Thanks and regards, Davide ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ MUMmer-help mailing list MUM...@li...<mailto:MUM...@li...> https://lists.sourceforge.net/lists/listinfo/mummer-help ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- |
|
From: Adam P. <aph...@gm...> - 2014-05-27 20:27:32
|
Hi Davide, I'm not aware of a converter between Mummer formats and the UCSC format you referenced. However, all of the information required by that format is contained within the Nucmer delta format, so it would be relatively straightforward to write such a converter. Best, -Adam On Thu, May 22, 2014 at 11:20 PM, Davide VERZOTTO (GIS) < ver...@gi...> wrote: > Hi Adam, > > Thank you for your kind reply and your hint on mum-reference, I am > testing it now and seems indeed to dramatically reduce delta file sizes. I > was already increasing the minimum match length, but not the minimum > cluster length (what is actually the meaning of this field?). I also used > the the -l and -i options by applying a delta-filter first, and the -d > option in dnadiff. The slowness problem seems to be with the large number > of small contigs that we have, since it is not really affecting the big > scaffolds. > > Just another question: is it possible to use dnadiff (or another MUMmer > suite) output to make the annotation lift-over from the Reference genome to > a de novo Human genome assembly using UCSC liftOver tool, which > requires first to chain the alignments found (see the chain format: > https://genome.ucsc.edu/goldenPath/help/chain.html), or other tools that > you may know? > > Thanks and regards, > Davide > > > On May 23, 2014, at 2:49 AM, Adam Phillippy wrote: > > Hi Davide, > dnadiff was primarily designed for microbial genome comparison and > currently does not scale well for large genomes. The 'delta-filter' step is > certainly one of the major bottlenecks. delta-filter scales by the number > of matches it has to analyze, so you can speed things along by reducing the > total number of matches. A few ways to do this: > > 1. Run nucmer in mum-reference mode to ignore repetitive alignment seeds > 2. Increase the minimum match length and minimum cluster length (this will > reduce sensitivity to low-identity alignments) > 3. Run delta-filter with the -l and -i options to filter alignments by > length and identity (these filters are quick, compared to -1/-m/-r/-q which > all require a dynamic programming step) > > Once you have a filtered delta file using the above recommendations, you > can pass it directly to dnadiff using the -d option and it will skip the > alignment phase and process your delta filter directly--hopefully faster > than before. > > Best, > -Adam > > > > > On Tue, May 20, 2014 at 1:34 AM, Davide VERZOTTO (GIS) < > ver...@gi...> wrote: > >> Dear MUMmer users, >> >> We are trying to apply dnadiff for the analysis of breakpoints between >> our de novo Human genome assembly and the Reference genome, the latter >> divided into multiple chromosomes / separate files. >> >> We have already computed a NUCmer comparison between the two assemblies >> and the related delta file. After this, we tried to compare all our >> scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more >> than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for >> internal server reasons), without writing any temporary file (apart from >> the log line "Filtering alignments") and presumably just trying to run >> "delta-filter -1". Did you already face this problem? Is there a way or >> script to speed up dnadiff for the Human genome comparison? >> >> Thanks and regards, >> Davide >> >> ------------------------------- >> This e-mail and any attachments are only for the use of the intended >> recipient and may be confidential and/or privileged. If you are not the >> recipient, please delete it or notify the sender immediately. Please do not >> copy or use it for any purpose or disclose the contents to any other person >> as it may be an offence under the Official Secrets Act. >> ------------------------------- >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> MUMmer-help mailing list >> MUM...@li... >> https://lists.sourceforge.net/lists/listinfo/mummer-help >> > > > > ------------------------------- > This e-mail and any attachments are only for the use of the intended > recipient and may be confidential and/or privileged. If you are not the > recipient, please delete it or notify the sender immediately. Please do not > copy or use it for any purpose or disclose the contents to any other person > as it may be an offence under the Official Secrets Act. > ------------------------------- > |
|
From: Davide V. (GIS) <ver...@gi...> - 2014-05-23 03:20:57
|
Hi Adam, Thank you for your kind reply and your hint on mum-reference, I am testing it now and seems indeed to dramatically reduce delta file sizes. I was already increasing the minimum match length, but not the minimum cluster length (what is actually the meaning of this field?). I also used the the -l and -i options by applying a delta-filter first, and the -d option in dnadiff. The slowness problem seems to be with the large number of small contigs that we have, since it is not really affecting the big scaffolds. Just another question: is it possible to use dnadiff (or another MUMmer suite) output to make the annotation lift-over from the Reference genome to a de novo Human genome assembly using UCSC liftOver tool, which requires first to chain the alignments found (see the chain format: https://genome.ucsc.edu/goldenPath/help/chain.html), or other tools that you may know? Thanks and regards, Davide On May 23, 2014, at 2:49 AM, Adam Phillippy wrote: Hi Davide, dnadiff was primarily designed for microbial genome comparison and currently does not scale well for large genomes. The 'delta-filter' step is certainly one of the major bottlenecks. delta-filter scales by the number of matches it has to analyze, so you can speed things along by reducing the total number of matches. A few ways to do this: 1. Run nucmer in mum-reference mode to ignore repetitive alignment seeds 2. Increase the minimum match length and minimum cluster length (this will reduce sensitivity to low-identity alignments) 3. Run delta-filter with the -l and -i options to filter alignments by length and identity (these filters are quick, compared to -1/-m/-r/-q which all require a dynamic programming step) Once you have a filtered delta file using the above recommendations, you can pass it directly to dnadiff using the -d option and it will skip the alignment phase and process your delta filter directly--hopefully faster than before. Best, -Adam On Tue, May 20, 2014 at 1:34 AM, Davide VERZOTTO (GIS) <ver...@gi...<mailto:ver...@gi...>> wrote: Dear MUMmer users, We are trying to apply dnadiff for the analysis of breakpoints between our de novo Human genome assembly and the Reference genome, the latter divided into multiple chromosomes / separate files. We have already computed a NUCmer comparison between the two assemblies and the related delta file. After this, we tried to compare all our scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for internal server reasons), without writing any temporary file (apart from the log line "Filtering alignments") and presumably just trying to run "delta-filter -1". Did you already face this problem? Is there a way or script to speed up dnadiff for the Human genome comparison? Thanks and regards, Davide ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ MUMmer-help mailing list MUM...@li...<mailto:MUM...@li...> https://lists.sourceforge.net/lists/listinfo/mummer-help ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- |
|
From: Adam P. <aph...@gm...> - 2014-05-22 20:24:34
|
Hi Anaïs, No, there is not an easy way to go from a comma or tab separated file back to a delta file. The delta format contains alignment information that is not included in the output of show-coords. However, it would be possible to edit the delta file itself using a custom script. You could look at this file to identify the self matches, and remove only those records from the delta file. The delta file format is described here: http://mummer.sourceforge.net/manual/#nucmer Sorry that I cannot think of an easier way to do what you want. Perhaps I can implement this as a filter for a future version of delta-filter. Best, -Adam On Tue, May 13, 2014 at 12:12 PM, Anais Gouin <ana...@ir...> wrote: > Hello, > > I would like to know if it is possible to get a delta file from a tsv file? > Actually, I did a nucmer analysis, matching a fasta file against itself. > Than I would like to filter the results using data-filter -m > but the -m option will report only the matches corresponding to each > sequence against itself. So I was thinking to get the tsv file > from the delta one using show-coord, then delete all lines corresponding > to undesirable matches (a sequence against itself) and > then come back to a delta file to use data-filter. > Is there a solution? > Thanks, > > Anaïs > > -- > Anaïs GOUIN > IR INRA > Equipe GenScale > Bureau D156 > IRISA-INRIA, Campus de Beaulieu > 35042 Rennes cedex, France > Tel : 0299847321 > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
|
From: Adam P. <aph...@gm...> - 2014-05-22 20:12:40
|
Yes, for assembly mapping, mummerplot is the way to go. Also see the -layout option, which will orient and arrange your contigs along the reference for you. Best, -Adam On Thu, May 22, 2014 at 3:55 PM, John F Wolters <jwo...@bi...>wrote: > Thank you for the prompt reply. > > Since I am aligning contigs from draft assemblies, it looks like > mummerplot may do the job nicely. Thank you for the help. > > > On Thu, May 22, 2014 at 2:33 PM, Adam Phillippy <aph...@gm...>wrote: > >> Hi John, >> Sorry, the developer of MapView moved on long ago and I can no longer >> offer support for it. I plan on removing it from the next release. As for >> your current version, it's provided as is, but I'd suggest looking for >> alternative visualization options such as mummerplot; or if you're doing >> read mapping, maybe an assembly viewer like Tablet or IGV. >> >> Best, >> -Adam >> >> >> >> On Thu, May 22, 2014 at 12:17 PM, John F Wolters <jwo...@bi... >> > wrote: >> >>> Hello, >>> >>> I've been using mapview to generate files to visualize the alignments in >>> the following manner: >>> >>> mapview -n 1 -f pdf filename.coords >>> >>> However, in several recent instances, mapview produces no messages, and >>> produces no output files. >>> >>> Nothing appears in response to running the command at all. >>> >>> Any help would be greatly appreciated. >>> >>> Here is an example of a coords file used that causes this problem: >>> >>> /import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2_joined.fasta >>> /import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2.polished_assembly.fasta >>> NUCMER >>> >>> [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] >>> | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] >>> >>> =============================================================================================================================== >>> 1 17884 | 17910 1 | 17884 17910 | 99.83 >>> | 76481 70521 | 23.38 25.40 | S.cerevisiae >>> unitig_1|quiver >>> 8075 10672 | 1 2600 | 2598 2600 | 99.69 >>> | 76481 5150 | 3.40 50.49 | S.cerevisiae >>> unitig_3|quiver >>> 8131 10669 | 5150 2609 | 2539 2542 | 99.45 >>> | 76481 5150 | 3.32 49.36 | S.cerevisiae >>> unitig_3|quiver >>> 8935 9164 | 6863 7101 | 230 239 | 84.43 >>> | 76481 15910 | 0.30 1.50 | S.cerevisiae >>> unitig_0|quiver >>> 9012 9509 | 42308 42800 | 498 493 | 82.58 >>> | 76481 70521 | 0.65 0.70 | S.cerevisiae >>> unitig_1|quiver >>> 9016 9436 | 8918 9335 | 421 418 | 85.25 >>> | 76481 15910 | 0.55 2.63 | S.cerevisiae >>> unitig_0|quiver >>> 14381 25322 | 13325 2386 | 10942 10940 | 99.60 >>> | 76481 15910 | 14.31 68.76 | S.cerevisiae >>> unitig_0|quiver >>> 14442 16994 | 13365 15910 | 2553 2546 | 98.87 >>> | 76481 15910 | 3.34 16.00 | S.cerevisiae >>> unitig_0|quiver >>> 18185 18722 | 48779 49345 | 538 567 | 83.42 >>> | 76481 70521 | 0.70 0.80 | S.cerevisiae >>> unitig_1|quiver >>> 18371 18787 | 8469 8886 | 417 418 | 85.19 >>> | 76481 70521 | 0.55 0.59 | S.cerevisiae >>> unitig_1|quiver >>> 18371 18787 | 1361 944 | 417 418 | 85.19 >>> | 76481 5150 | 0.55 8.12 | S.cerevisiae >>> unitig_3|quiver >>> 18371 18787 | 3847 4264 | 417 418 | 85.19 >>> | 76481 5150 | 0.55 8.12 | S.cerevisiae >>> unitig_3|quiver >>> 20602 20836 | 8738 8960 | 235 223 | 84.45 >>> | 76481 70521 | 0.31 0.32 | S.cerevisiae >>> unitig_1|quiver >>> 20602 20836 | 1092 870 | 235 223 | 84.45 >>> | 76481 5150 | 0.31 4.33 | S.cerevisiae >>> unitig_3|quiver >>> 20602 20836 | 4116 4338 | 235 223 | 84.45 >>> | 76481 5150 | 0.31 4.33 | S.cerevisiae >>> unitig_3|quiver >>> 20603 20836 | 49255 49485 | 234 231 | 84.03 >>> | 76481 70521 | 0.31 0.33 | S.cerevisiae >>> unitig_1|quiver >>> 22865 25314 | 1 2451 | 2450 2451 | 99.96 >>> | 76481 15910 | 3.20 15.41 | S.cerevisiae >>> unitig_0|quiver >>> 23744 27089 | 1 3351 | 3346 3351 | 99.85 >>> | 76481 3351 | 4.37 100.00 | S.cerevisiae >>> unitig_2|quiver >>> 25448 71629 | 68349 22100 | 46182 46250 | 99.77 >>> | 76481 70521 | 60.38 65.58 | S.cerevisiae >>> unitig_1|quiver >>> 25524 27721 | 68323 70521 | 2198 2199 | 98.55 >>> | 76481 70521 | 2.87 3.12 | S.cerevisiae >>> unitig_1|quiver >>> 44253 44984 | 8814 9521 | 732 708 | 82.75 >>> | 76481 15910 | 0.96 4.45 | S.cerevisiae >>> unitig_0|quiver >>> 50957 51450 | 8392 8890 | 494 499 | 82.78 >>> | 76481 70521 | 0.65 0.71 | S.cerevisiae >>> unitig_1|quiver >>> 50957 51450 | 1438 940 | 494 499 | 82.78 >>> | 76481 5150 | 0.65 9.69 | S.cerevisiae >>> unitig_3|quiver >>> 50957 51450 | 3770 4268 | 494 499 | 82.78 >>> | 76481 5150 | 0.65 9.69 | S.cerevisiae >>> unitig_3|quiver >>> 70145 72606 | 1 2466 | 2462 2466 | 99.68 >>> | 76481 4897 | 3.22 50.36 | S.cerevisiae >>> unitig_6|quiver >>> 70155 72611 | 4897 2440 | 2457 2458 | 99.55 >>> | 76481 4897 | 3.21 50.19 | S.cerevisiae >>> unitig_6|quiver >>> 73234 76481 | 21163 17910 | 3248 3254 | 98.22 >>> | 76481 70521 | 4.25 4.61 | S.cerevisiae >>> unitig_1|quiver >>> 73293 75543 | 3908 1658 | 2251 2251 | 99.60 >>> | 76481 3908 | 2.94 57.60 | S.cerevisiae >>> unitig_4|quiver >>> 73355 75773 | 4813 2399 | 2419 2415 | 99.50 >>> | 76481 4813 | 3.16 50.18 | S.cerevisiae >>> unitig_5|quiver >>> 73358 75773 | 1 2414 | 2416 2414 | 99.55 >>> | 76481 4813 | 3.16 50.16 | S.cerevisiae >>> unitig_5|quiver >>> 73850 75526 | 1 1683 | 1677 1683 | 98.40 >>> | 76481 3908 | 2.19 43.07 | S.cerevisiae >>> unitig_4|quiver >>> 74500 74961 | 20981 21367 | 462 387 | 81.90 >>> | 76481 70521 | 0.60 0.55 | S.cerevisiae >>> unitig_1|quiver >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform >>> available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs >>> _______________________________________________ >>> MUMmer-help mailing list >>> MUM...@li... >>> https://lists.sourceforge.net/lists/listinfo/mummer-help >>> >>> >> > |
|
From: John F W. <jwo...@bi...> - 2014-05-22 19:56:01
|
Thank you for the prompt reply. Since I am aligning contigs from draft assemblies, it looks like mummerplot may do the job nicely. Thank you for the help. On Thu, May 22, 2014 at 2:33 PM, Adam Phillippy <aph...@gm...>wrote: > Hi John, > Sorry, the developer of MapView moved on long ago and I can no longer > offer support for it. I plan on removing it from the next release. As for > your current version, it's provided as is, but I'd suggest looking for > alternative visualization options such as mummerplot; or if you're doing > read mapping, maybe an assembly viewer like Tablet or IGV. > > Best, > -Adam > > > > On Thu, May 22, 2014 at 12:17 PM, John F Wolters <jwo...@bi...>wrote: > >> Hello, >> >> I've been using mapview to generate files to visualize the alignments in >> the following manner: >> >> mapview -n 1 -f pdf filename.coords >> >> However, in several recent instances, mapview produces no messages, and >> produces no output files. >> >> Nothing appears in response to running the command at all. >> >> Any help would be greatly appreciated. >> >> Here is an example of a coords file used that causes this problem: >> >> /import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2_joined.fasta >> /import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2.polished_assembly.fasta >> NUCMER >> >> [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] >> | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] >> >> =============================================================================================================================== >> 1 17884 | 17910 1 | 17884 17910 | 99.83 >> | 76481 70521 | 23.38 25.40 | S.cerevisiae unitig_1|quiver >> 8075 10672 | 1 2600 | 2598 2600 | 99.69 >> | 76481 5150 | 3.40 50.49 | S.cerevisiae unitig_3|quiver >> 8131 10669 | 5150 2609 | 2539 2542 | 99.45 >> | 76481 5150 | 3.32 49.36 | S.cerevisiae unitig_3|quiver >> 8935 9164 | 6863 7101 | 230 239 | 84.43 >> | 76481 15910 | 0.30 1.50 | S.cerevisiae unitig_0|quiver >> 9012 9509 | 42308 42800 | 498 493 | 82.58 >> | 76481 70521 | 0.65 0.70 | S.cerevisiae unitig_1|quiver >> 9016 9436 | 8918 9335 | 421 418 | 85.25 >> | 76481 15910 | 0.55 2.63 | S.cerevisiae unitig_0|quiver >> 14381 25322 | 13325 2386 | 10942 10940 | 99.60 >> | 76481 15910 | 14.31 68.76 | S.cerevisiae unitig_0|quiver >> 14442 16994 | 13365 15910 | 2553 2546 | 98.87 >> | 76481 15910 | 3.34 16.00 | S.cerevisiae unitig_0|quiver >> 18185 18722 | 48779 49345 | 538 567 | 83.42 >> | 76481 70521 | 0.70 0.80 | S.cerevisiae unitig_1|quiver >> 18371 18787 | 8469 8886 | 417 418 | 85.19 >> | 76481 70521 | 0.55 0.59 | S.cerevisiae unitig_1|quiver >> 18371 18787 | 1361 944 | 417 418 | 85.19 >> | 76481 5150 | 0.55 8.12 | S.cerevisiae unitig_3|quiver >> 18371 18787 | 3847 4264 | 417 418 | 85.19 >> | 76481 5150 | 0.55 8.12 | S.cerevisiae unitig_3|quiver >> 20602 20836 | 8738 8960 | 235 223 | 84.45 >> | 76481 70521 | 0.31 0.32 | S.cerevisiae unitig_1|quiver >> 20602 20836 | 1092 870 | 235 223 | 84.45 >> | 76481 5150 | 0.31 4.33 | S.cerevisiae unitig_3|quiver >> 20602 20836 | 4116 4338 | 235 223 | 84.45 >> | 76481 5150 | 0.31 4.33 | S.cerevisiae unitig_3|quiver >> 20603 20836 | 49255 49485 | 234 231 | 84.03 >> | 76481 70521 | 0.31 0.33 | S.cerevisiae unitig_1|quiver >> 22865 25314 | 1 2451 | 2450 2451 | 99.96 >> | 76481 15910 | 3.20 15.41 | S.cerevisiae unitig_0|quiver >> 23744 27089 | 1 3351 | 3346 3351 | 99.85 >> | 76481 3351 | 4.37 100.00 | S.cerevisiae unitig_2|quiver >> 25448 71629 | 68349 22100 | 46182 46250 | 99.77 >> | 76481 70521 | 60.38 65.58 | S.cerevisiae unitig_1|quiver >> 25524 27721 | 68323 70521 | 2198 2199 | 98.55 >> | 76481 70521 | 2.87 3.12 | S.cerevisiae unitig_1|quiver >> 44253 44984 | 8814 9521 | 732 708 | 82.75 >> | 76481 15910 | 0.96 4.45 | S.cerevisiae unitig_0|quiver >> 50957 51450 | 8392 8890 | 494 499 | 82.78 >> | 76481 70521 | 0.65 0.71 | S.cerevisiae unitig_1|quiver >> 50957 51450 | 1438 940 | 494 499 | 82.78 >> | 76481 5150 | 0.65 9.69 | S.cerevisiae unitig_3|quiver >> 50957 51450 | 3770 4268 | 494 499 | 82.78 >> | 76481 5150 | 0.65 9.69 | S.cerevisiae unitig_3|quiver >> 70145 72606 | 1 2466 | 2462 2466 | 99.68 >> | 76481 4897 | 3.22 50.36 | S.cerevisiae unitig_6|quiver >> 70155 72611 | 4897 2440 | 2457 2458 | 99.55 >> | 76481 4897 | 3.21 50.19 | S.cerevisiae unitig_6|quiver >> 73234 76481 | 21163 17910 | 3248 3254 | 98.22 >> | 76481 70521 | 4.25 4.61 | S.cerevisiae unitig_1|quiver >> 73293 75543 | 3908 1658 | 2251 2251 | 99.60 >> | 76481 3908 | 2.94 57.60 | S.cerevisiae unitig_4|quiver >> 73355 75773 | 4813 2399 | 2419 2415 | 99.50 >> | 76481 4813 | 3.16 50.18 | S.cerevisiae unitig_5|quiver >> 73358 75773 | 1 2414 | 2416 2414 | 99.55 >> | 76481 4813 | 3.16 50.16 | S.cerevisiae unitig_5|quiver >> 73850 75526 | 1 1683 | 1677 1683 | 98.40 >> | 76481 3908 | 2.19 43.07 | S.cerevisiae unitig_4|quiver >> 74500 74961 | 20981 21367 | 462 387 | 81.90 >> | 76481 70521 | 0.60 0.55 | S.cerevisiae unitig_1|quiver >> >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> MUMmer-help mailing list >> MUM...@li... >> https://lists.sourceforge.net/lists/listinfo/mummer-help >> >> > |
|
From: Adam P. <aph...@gm...> - 2014-05-22 18:49:55
|
Hi Davide, dnadiff was primarily designed for microbial genome comparison and currently does not scale well for large genomes. The 'delta-filter' step is certainly one of the major bottlenecks. delta-filter scales by the number of matches it has to analyze, so you can speed things along by reducing the total number of matches. A few ways to do this: 1. Run nucmer in mum-reference mode to ignore repetitive alignment seeds 2. Increase the minimum match length and minimum cluster length (this will reduce sensitivity to low-identity alignments) 3. Run delta-filter with the -l and -i options to filter alignments by length and identity (these filters are quick, compared to -1/-m/-r/-q which all require a dynamic programming step) Once you have a filtered delta file using the above recommendations, you can pass it directly to dnadiff using the -d option and it will skip the alignment phase and process your delta filter directly--hopefully faster than before. Best, -Adam On Tue, May 20, 2014 at 1:34 AM, Davide VERZOTTO (GIS) < ver...@gi...> wrote: > Dear MUMmer users, > > We are trying to apply dnadiff for the analysis of breakpoints between our > de novo Human genome assembly and the Reference genome, the latter divided > into multiple chromosomes / separate files. > > We have already computed a NUCmer comparison between the two assemblies > and the related delta file. After this, we tried to compare all our > scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more > than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for > internal server reasons), without writing any temporary file (apart from > the log line "Filtering alignments") and presumably just trying to run > "delta-filter -1". Did you already face this problem? Is there a way or > script to speed up dnadiff for the Human genome comparison? > > Thanks and regards, > Davide > > ------------------------------- > This e-mail and any attachments are only for the use of the intended > recipient and may be confidential and/or privileged. If you are not the > recipient, please delete it or notify the sender immediately. Please do not > copy or use it for any purpose or disclose the contents to any other person > as it may be an offence under the Official Secrets Act. > ------------------------------- > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |
|
From: Adam P. <aph...@gm...> - 2014-05-22 18:34:01
|
Hi John, Sorry, the developer of MapView moved on long ago and I can no longer offer support for it. I plan on removing it from the next release. As for your current version, it's provided as is, but I'd suggest looking for alternative visualization options such as mummerplot; or if you're doing read mapping, maybe an assembly viewer like Tablet or IGV. Best, -Adam On Thu, May 22, 2014 at 12:17 PM, John F Wolters <jwo...@bi...>wrote: > Hello, > > I've been using mapview to generate files to visualize the alignments in > the following manner: > > mapview -n 1 -f pdf filename.coords > > However, in several recent instances, mapview produces no messages, and > produces no output files. > > Nothing appears in response to running the command at all. > > Any help would be greatly appreciated. > > Here is an example of a coords file used that causes this problem: > > /import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2_joined.fasta > /import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2.polished_assembly.fasta > NUCMER > > [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | > [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS] > > =============================================================================================================================== > 1 17884 | 17910 1 | 17884 17910 | 99.83 | > 76481 70521 | 23.38 25.40 | S.cerevisiae unitig_1|quiver > 8075 10672 | 1 2600 | 2598 2600 | 99.69 | > 76481 5150 | 3.40 50.49 | S.cerevisiae unitig_3|quiver > 8131 10669 | 5150 2609 | 2539 2542 | 99.45 | > 76481 5150 | 3.32 49.36 | S.cerevisiae unitig_3|quiver > 8935 9164 | 6863 7101 | 230 239 | 84.43 | > 76481 15910 | 0.30 1.50 | S.cerevisiae unitig_0|quiver > 9012 9509 | 42308 42800 | 498 493 | 82.58 | > 76481 70521 | 0.65 0.70 | S.cerevisiae unitig_1|quiver > 9016 9436 | 8918 9335 | 421 418 | 85.25 | > 76481 15910 | 0.55 2.63 | S.cerevisiae unitig_0|quiver > 14381 25322 | 13325 2386 | 10942 10940 | 99.60 | > 76481 15910 | 14.31 68.76 | S.cerevisiae unitig_0|quiver > 14442 16994 | 13365 15910 | 2553 2546 | 98.87 | > 76481 15910 | 3.34 16.00 | S.cerevisiae unitig_0|quiver > 18185 18722 | 48779 49345 | 538 567 | 83.42 | > 76481 70521 | 0.70 0.80 | S.cerevisiae unitig_1|quiver > 18371 18787 | 8469 8886 | 417 418 | 85.19 | > 76481 70521 | 0.55 0.59 | S.cerevisiae unitig_1|quiver > 18371 18787 | 1361 944 | 417 418 | 85.19 | > 76481 5150 | 0.55 8.12 | S.cerevisiae unitig_3|quiver > 18371 18787 | 3847 4264 | 417 418 | 85.19 | > 76481 5150 | 0.55 8.12 | S.cerevisiae unitig_3|quiver > 20602 20836 | 8738 8960 | 235 223 | 84.45 | > 76481 70521 | 0.31 0.32 | S.cerevisiae unitig_1|quiver > 20602 20836 | 1092 870 | 235 223 | 84.45 | > 76481 5150 | 0.31 4.33 | S.cerevisiae unitig_3|quiver > 20602 20836 | 4116 4338 | 235 223 | 84.45 | > 76481 5150 | 0.31 4.33 | S.cerevisiae unitig_3|quiver > 20603 20836 | 49255 49485 | 234 231 | 84.03 | > 76481 70521 | 0.31 0.33 | S.cerevisiae unitig_1|quiver > 22865 25314 | 1 2451 | 2450 2451 | 99.96 > | 76481 15910 | 3.20 15.41 | S.cerevisiae unitig_0|quiver > 23744 27089 | 1 3351 | 3346 3351 | 99.85 | > 76481 3351 | 4.37 100.00 | S.cerevisiae unitig_2|quiver > 25448 71629 | 68349 22100 | 46182 46250 | 99.77 > | 76481 70521 | 60.38 65.58 | S.cerevisiae unitig_1|quiver > 25524 27721 | 68323 70521 | 2198 2199 | 98.55 | > 76481 70521 | 2.87 3.12 | S.cerevisiae unitig_1|quiver > 44253 44984 | 8814 9521 | 732 708 | 82.75 > | 76481 15910 | 0.96 4.45 | S.cerevisiae unitig_0|quiver > 50957 51450 | 8392 8890 | 494 499 | 82.78 > | 76481 70521 | 0.65 0.71 | S.cerevisiae unitig_1|quiver > 50957 51450 | 1438 940 | 494 499 | 82.78 > | 76481 5150 | 0.65 9.69 | S.cerevisiae unitig_3|quiver > 50957 51450 | 3770 4268 | 494 499 | 82.78 > | 76481 5150 | 0.65 9.69 | S.cerevisiae unitig_3|quiver > 70145 72606 | 1 2466 | 2462 2466 | 99.68 > | 76481 4897 | 3.22 50.36 | S.cerevisiae unitig_6|quiver > 70155 72611 | 4897 2440 | 2457 2458 | 99.55 > | 76481 4897 | 3.21 50.19 | S.cerevisiae unitig_6|quiver > 73234 76481 | 21163 17910 | 3248 3254 | 98.22 > | 76481 70521 | 4.25 4.61 | S.cerevisiae unitig_1|quiver > 73293 75543 | 3908 1658 | 2251 2251 | 99.60 > | 76481 3908 | 2.94 57.60 | S.cerevisiae unitig_4|quiver > 73355 75773 | 4813 2399 | 2419 2415 | 99.50 | > 76481 4813 | 3.16 50.18 | S.cerevisiae unitig_5|quiver > 73358 75773 | 1 2414 | 2416 2414 | 99.55 | > 76481 4813 | 3.16 50.16 | S.cerevisiae unitig_5|quiver > 73850 75526 | 1 1683 | 1677 1683 | 98.40 | > 76481 3908 | 2.19 43.07 | S.cerevisiae unitig_4|quiver > 74500 74961 | 20981 21367 | 462 387 | 81.90 > | 76481 70521 | 0.60 0.55 | S.cerevisiae unitig_1|quiver > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
|
From: John F W. <jwo...@bi...> - 2014-05-22 16:17:11
|
Hello,
I've been using mapview to generate files to visualize the alignments in
the following manner:
mapview -n 1 -f pdf filename.coords
However, in several recent instances, mapview produces no messages, and
produces no output files.
Nothing appears in response to running the command at all.
Any help would be greatly appreciated.
Here is an example of a coords file used that causes this problem:
/import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2_joined.fasta
/import/linux/home/jwolter1/Strains/C2/Assembly/Mira_PcBio_CCS/Quiver/Joining_unitigs/C2.polished_assembly.fasta
NUCMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] |
[LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS]
===============================================================================================================================
1 17884 | 17910 1 | 17884 17910 | 99.83 |
76481 70521 | 23.38 25.40 | S.cerevisiae unitig_1|quiver
8075 10672 | 1 2600 | 2598 2600 | 99.69 |
76481 5150 | 3.40 50.49 | S.cerevisiae unitig_3|quiver
8131 10669 | 5150 2609 | 2539 2542 | 99.45 |
76481 5150 | 3.32 49.36 | S.cerevisiae unitig_3|quiver
8935 9164 | 6863 7101 | 230 239 | 84.43 |
76481 15910 | 0.30 1.50 | S.cerevisiae unitig_0|quiver
9012 9509 | 42308 42800 | 498 493 | 82.58 |
76481 70521 | 0.65 0.70 | S.cerevisiae unitig_1|quiver
9016 9436 | 8918 9335 | 421 418 | 85.25 |
76481 15910 | 0.55 2.63 | S.cerevisiae unitig_0|quiver
14381 25322 | 13325 2386 | 10942 10940 | 99.60 |
76481 15910 | 14.31 68.76 | S.cerevisiae unitig_0|quiver
14442 16994 | 13365 15910 | 2553 2546 | 98.87 |
76481 15910 | 3.34 16.00 | S.cerevisiae unitig_0|quiver
18185 18722 | 48779 49345 | 538 567 | 83.42 |
76481 70521 | 0.70 0.80 | S.cerevisiae unitig_1|quiver
18371 18787 | 8469 8886 | 417 418 | 85.19 |
76481 70521 | 0.55 0.59 | S.cerevisiae unitig_1|quiver
18371 18787 | 1361 944 | 417 418 | 85.19 |
76481 5150 | 0.55 8.12 | S.cerevisiae unitig_3|quiver
18371 18787 | 3847 4264 | 417 418 | 85.19 |
76481 5150 | 0.55 8.12 | S.cerevisiae unitig_3|quiver
20602 20836 | 8738 8960 | 235 223 | 84.45 |
76481 70521 | 0.31 0.32 | S.cerevisiae unitig_1|quiver
20602 20836 | 1092 870 | 235 223 | 84.45 |
76481 5150 | 0.31 4.33 | S.cerevisiae unitig_3|quiver
20602 20836 | 4116 4338 | 235 223 | 84.45 |
76481 5150 | 0.31 4.33 | S.cerevisiae unitig_3|quiver
20603 20836 | 49255 49485 | 234 231 | 84.03 |
76481 70521 | 0.31 0.33 | S.cerevisiae unitig_1|quiver
22865 25314 | 1 2451 | 2450 2451 | 99.96 |
76481 15910 | 3.20 15.41 | S.cerevisiae unitig_0|quiver
23744 27089 | 1 3351 | 3346 3351 | 99.85 |
76481 3351 | 4.37 100.00 | S.cerevisiae unitig_2|quiver
25448 71629 | 68349 22100 | 46182 46250 | 99.77 |
76481 70521 | 60.38 65.58 | S.cerevisiae unitig_1|quiver
25524 27721 | 68323 70521 | 2198 2199 | 98.55 |
76481 70521 | 2.87 3.12 | S.cerevisiae unitig_1|quiver
44253 44984 | 8814 9521 | 732 708 | 82.75 |
76481 15910 | 0.96 4.45 | S.cerevisiae unitig_0|quiver
50957 51450 | 8392 8890 | 494 499 | 82.78 |
76481 70521 | 0.65 0.71 | S.cerevisiae unitig_1|quiver
50957 51450 | 1438 940 | 494 499 | 82.78 |
76481 5150 | 0.65 9.69 | S.cerevisiae unitig_3|quiver
50957 51450 | 3770 4268 | 494 499 | 82.78 |
76481 5150 | 0.65 9.69 | S.cerevisiae unitig_3|quiver
70145 72606 | 1 2466 | 2462 2466 | 99.68 |
76481 4897 | 3.22 50.36 | S.cerevisiae unitig_6|quiver
70155 72611 | 4897 2440 | 2457 2458 | 99.55 |
76481 4897 | 3.21 50.19 | S.cerevisiae unitig_6|quiver
73234 76481 | 21163 17910 | 3248 3254 | 98.22 |
76481 70521 | 4.25 4.61 | S.cerevisiae unitig_1|quiver
73293 75543 | 3908 1658 | 2251 2251 | 99.60 |
76481 3908 | 2.94 57.60 | S.cerevisiae unitig_4|quiver
73355 75773 | 4813 2399 | 2419 2415 | 99.50 |
76481 4813 | 3.16 50.18 | S.cerevisiae unitig_5|quiver
73358 75773 | 1 2414 | 2416 2414 | 99.55 |
76481 4813 | 3.16 50.16 | S.cerevisiae unitig_5|quiver
73850 75526 | 1 1683 | 1677 1683 | 98.40 |
76481 3908 | 2.19 43.07 | S.cerevisiae unitig_4|quiver
74500 74961 | 20981 21367 | 462 387 | 81.90 |
76481 70521 | 0.60 0.55 | S.cerevisiae unitig_1|quiver
|
|
From: Davide V. (GIS) <ver...@gi...> - 2014-05-20 06:10:42
|
Dear MUMmer users, We are trying to apply dnadiff for the analysis of breakpoints between our de novo Human genome assembly and the Reference genome, the latter divided into multiple chromosomes / separate files. We have already computed a NUCmer comparison between the two assemblies and the related delta file. After this, we tried to compare all our scaffolds versus hg19 chromosome 1 using dnadiff, and the tool lasted more than 12 days (1 single core used, peak of 24 Gb RAM) before crashing (for internal server reasons), without writing any temporary file (apart from the log line "Filtering alignments") and presumably just trying to run "delta-filter -1". Did you already face this problem? Is there a way or script to speed up dnadiff for the Human genome comparison? Thanks and regards, Davide ------------------------------- This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act. ------------------------------- |
|
From: Anais G. <ana...@ir...> - 2014-05-13 16:12:24
|
Hello, I would like to know if it is possible to get a delta file from a tsv file? Actually, I did a nucmer analysis, matching a fasta file against itself. Than I would like to filter the results using data-filter -m but the -m option will report only the matches corresponding to each sequence against itself. So I was thinking to get the tsv file from the delta one using show-coord, then delete all lines corresponding to undesirable matches (a sequence against itself) and then come back to a delta file to use data-filter. Is there a solution? Thanks, Anaïs -- Anaïs GOUIN IR INRA Equipe GenScale Bureau D156 IRISA-INRIA, Campus de Beaulieu 35042 Rennes cedex, France Tel : 0299847321 |
|
From: Adam P. <aph...@gm...> - 2014-03-25 16:03:28
|
Hi Rajiv, Please refer to the methods we used in the GAGE paper: http://genome.cshlp.org/content/22/3/557.short The supplemental materials for that paper include some basic scripts for using Nucmer to assess assembly quality/completeness vs. a reference. Best, -Adam On Wed, Mar 19, 2014 at 2:16 PM, Rajiv McCoy <rm...@st...> wrote: > Hi, > > I am wondering about the most appropriate way to perform alignment > between a draft genome assembly and a Drosophila reference genome > sequence that could be used to estimate the proportion of the genome > covered by the draft and compare different assemblies. > > I took a look at the instructions here: > http://mummer.sourceforge.net/manual/#mappingdraft, as well as some of > the posts to this list, and it seemed like the "delta-filter -q" > approach was preferred. Does this ensure that each contig may only > align to one position in the reference so that coverage is not > artificially inflated? > > Also, in some cases, my show-coords results show broken alignments to > the same general region, presumably due to indels or assembly error, but > sometimes the [COV Q] field adds up to more than 100%: > > 2608802 2613586 | 1 4785 | 4785 4785 | 100.00 > | 23011544 170323 | 0.02 2.81 | 2L ctg100001147002 > 2613579 2661256 | 8504 56174 | 47678 47671 | 99.99 | > 23011544 170323 | 0.21 27.99 | 2L ctg100001147002 > 2663031 2752834 | 56164 145937 | 89804 89774 | 99.97 | > 23011544 170323 | 0.39 52.71 | 2L ctg100001147002 > 2732783 2777220 | 125915 170323 | 44438 44409 | 99.93 | > 23011544 170323 | 0.19 26.07 | 2L ctg100001147002 > > Would this mean that some portion of the contig is aligning to multiple > locations? > > Thanks for your help! > > Rajiv > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |
|
From: Rajiv M. <rm...@st...> - 2014-03-19 18:17:06
|
Hi, I am wondering about the most appropriate way to perform alignment between a draft genome assembly and a Drosophila reference genome sequence that could be used to estimate the proportion of the genome covered by the draft and compare different assemblies. I took a look at the instructions here: http://mummer.sourceforge.net/manual/#mappingdraft, as well as some of the posts to this list, and it seemed like the "delta-filter -q" approach was preferred. Does this ensure that each contig may only align to one position in the reference so that coverage is not artificially inflated? Also, in some cases, my show-coords results show broken alignments to the same general region, presumably due to indels or assembly error, but sometimes the [COV Q] field adds up to more than 100%: 2608802 2613586 | 1 4785 | 4785 4785 | 100.00 | 23011544 170323 | 0.02 2.81 | 2L ctg100001147002 2613579 2661256 | 8504 56174 | 47678 47671 | 99.99 | 23011544 170323 | 0.21 27.99 | 2L ctg100001147002 2663031 2752834 | 56164 145937 | 89804 89774 | 99.97 | 23011544 170323 | 0.39 52.71 | 2L ctg100001147002 2732783 2777220 | 125915 170323 | 44438 44409 | 99.93 | 23011544 170323 | 0.19 26.07 | 2L ctg100001147002 Would this mean that some portion of the contig is aligning to multiple locations? Thanks for your help! Rajiv |
|
From: Adam P. <aph...@gm...> - 2014-03-05 00:06:40
|
Hi Nengbing, I don't think I ever responded to your question. I'm sorry. The parameters -l 7 -c 15 sets a minimum seed length of 7 and a min cluster length of 15. The cluster length is computed as the sum of the seed lengths. Thus if you have a 15bp "match" with a SNP at position 5, you'll get a 9bp seed from positions 6-15. However, on the other side of the SNP, from positions 1-4, there isn't enough room for another seed. Thus, you'll be left with a single seed of length 9. This doesn't meet the minimum cluster length of 15 and will be discarded. Bottom line, nucmer isn't designed for short, inexact matching. If you wanted to force it to find all 15bp 1-mismatch alignments, you would need to set both the min match and min cluster lengths to 7bp. This would cover the worst-case scenario of a single SNP directly in the middle of the 15bp sequence. This would also generate many false-positive alignments that you could filter with delta-filter. However, there are better aligners for what you are trying to do. Best, -Adam On Thu, Feb 6, 2014 at 11:44 AM, TAO, NENGBING [AG/1005] < nen...@mo...> wrote: > Hi, Adam, > > > > I sent an email two or three times to mummer-help and kept > receiving messages that I need to register even after I did so, but did > not want to send more fearing it might get through and flood someone's > inbox. > > > > I am trying to use nucmer to compare two fasta files and > try to identify matches that are 15 bp long allowing 1 mismatch/indel and I > don't seem to get all matches using nucmer. > > > > Here are the basic stats of query and db files: > > file avg stDev min max median numOfSeq numOfBase > > db.fa 21.73 1.71 15 28 21 7072 153724 > > Seq.fa 256 0 256 256 256 1 256 # just one > sequence > > > > Here are the commands that I used: > > nucmer > > NUCmer (NUCleotide MUMmer) version 3.1 > > > > nucmer --maxmatch -l 7 -c 15 -p oNucmer_1 db.fa Seq.fa; show-coords -T -r > -c -l oNucmer_1.delta > oNucmer_1.coords > > delta-filter -i 93 -l 15 oNucmer_1.delta > filtered_oNucmer_1.delta > > show-coords -T -r -L 15 -I 0.93 -c -l filtered_oNucmer_1.delta > > filtered_oNucmer_1.coords > > wc -l oNucmer_1.coords > > 85 > > wc -l filtered_oNucmer_1.coords > > 4 > > > > I am reasonably confident that there are ~90 matches that > are 15 bp long allowing 1 mismatch/indel, wheres nucmer only gave 4 > matches. I must have missed some parameters. I noticed that parameters -b, > -g, --nooptimize changes output quite a lot, but don't fully understand > what they mean or how they should be used. > > > > Could you kindly give me a pointer on what the best > approach is or what the appropriate parameters should be? > > > > > > Best regards, > > > > Nengbing > > > > > > > > > > *From:* Adam Phillippy [mailto:aph...@gm...] > *Sent:* Thursday, February 06, 2014 10:12 AM > *To:* Govinda Kamath > *Cc:* mummer-help > *Subject:* Re: [MUMmer-help] A clarification about NUCmer > > > > Hi Govinda, > > There is no minimum percent identity threshold for Nucmer and no > guarantees on what it will find. Instead, the sensitivity and quality of > the alignments depends on the minimum match size and cluster parameters > chosen. The closest thing to an identity threshold are the dynamic > programming extension scores, which are set to +3/-7. This equates to a min > avg identity of 70% for the DP algorithm to continue extending. However, > Nucmer will rarely find these low identity alignments, because they will > likely not be seeded. With default parameters, Nucmer is generally > sensitive to alignments >90% idy. > > > > If you want to cap the alignments to a certain identity, after the fact, > you can run delta-filter with the -i option to filter alignments below your > desired threshold. > > > > Hope this helps, > > -Adam > > > > > > On Tue, Feb 4, 2014 at 3:32 PM, Govinda Kamath <gk...@be...> > wrote: > > Hi, > > > > In NUCmer, what is the default percent identity threshold, above which > results are reported in the out.delta file? Also what is the metric (like > Hamming distance or Edit distance) is this calculated in? > > > > Thanks, > > Govinda. > > > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > > > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other > use of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. > |
|
From: Adam P. <aph...@gm...> - 2014-02-17 15:54:04
|
Hi Richard, Are there multiple records in the fasta file? This kind of failure is often caused by non-unique fasta headers. The header is defined by the characters up to the first space, so in your two examples ">test" and ">test_something" would be considered the headers. If those names are repeated in another record in the same file, it will cause an error. -Adam On Mon, Feb 17, 2014 at 6:57 AM, Richard Stabler < ric...@ls...> wrote: > Hi > > I have been using nucmer through RATT but since a recent upgrade of > Ubuntu from 13.04 to 13.10 I have started to encounter problems. > > However I have traced it to the header of the test sample. > Ok, to run > $nucmer ref.fasta test.fasta > > if the header in the test.fasta is > >test > -- doesn't work > > 1: PREPARING DATA > 2,3: RUNNING mummer AND CREATING CLUSTERS > # reading input file "out.ntref" of length 4290253 > # construct suffix tree for sequence of length 4290253 > # (maximum reference length is 536870908) > # (maximum query length is 4294967295) > # process 42902 characters per dot > > #.................................................................................................... > # CONSTRUCTIONTIME /usr/bin/mummer out.ntref 1.25 > # reading input file "/home/richard/work/gnb/temp/630.tst" of length > 4290252 > # matching query-file "/home/richard/work/gnb/temp/630.tst" > # against subject-file "out.ntref" > # COMPLETETIME /usr/bin/mummer out.ntref 3.64 > # SPACE /usr/bin/mummer out.ntref 8.39 > 4: FINISHING DATA > *ERROR: Could not parse input from 'Query File'. ** > *Please check the filename and format, or file a bug report > ERROR: postnuc returned non-zero > > However if the header has a second work separeted by a space it works! > >test something > -- this works > >test_something > -- this doesn't work > > Can this be fixed as I don't want to have to edit all my fasta files? > > Many thanks > > Richard > > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |
|
From: Richard S. <ric...@ls...> - 2014-02-17 15:20:24
|
Hi I have been using nucmer through RATT but since a recent upgrade of Ubuntu from 13.04 to 13.10 I have started to encounter problems. However I have traced it to the header of the test sample. Ok, to run $nucmer ref.fasta test.fasta if the header in the test.fasta is >test -- doesn't work 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS # reading input file "out.ntref" of length 4290253 # construct suffix tree for sequence of length 4290253 # (maximum reference length is 536870908) # (maximum query length is 4294967295) # process 42902 characters per dot #.................................................................................................... # CONSTRUCTIONTIME /usr/bin/mummer out.ntref 1.25 # reading input file "/home/richard/work/gnb/temp/630.tst" of length 4290252 # matching query-file "/home/richard/work/gnb/temp/630.tst" # against subject-file "out.ntref" # COMPLETETIME /usr/bin/mummer out.ntref 3.64 # SPACE /usr/bin/mummer out.ntref 8.39 4: FINISHING DATA *ERROR: Could not parse input from 'Query File'. ** *Please check the filename and format, or file a bug report ERROR: postnuc returned non-zero However if the header has a second work separeted by a space it works! >test something -- this works >test_something -- this doesn't work Can this be fixed as I don't want to have to edit all my fasta files? Many thanks Richard |