You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
(2) |
Jul
(5) |
Aug
(3) |
Sep
(10) |
Oct
(9) |
Nov
(4) |
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(5) |
Feb
(4) |
Mar
(19) |
Apr
(5) |
May
(10) |
Jun
(3) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(14) |
Nov
(9) |
Dec
(8) |
2007 |
Jan
(13) |
Feb
(6) |
Mar
(8) |
Apr
(3) |
May
(7) |
Jun
(5) |
Jul
(6) |
Aug
(15) |
Sep
(13) |
Oct
(7) |
Nov
(15) |
Dec
(15) |
2008 |
Jan
(7) |
Feb
(15) |
Mar
(12) |
Apr
(24) |
May
(25) |
Jun
(14) |
Jul
(36) |
Aug
(17) |
Sep
(26) |
Oct
(26) |
Nov
(24) |
Dec
(42) |
2009 |
Jan
(15) |
Feb
(18) |
Mar
(26) |
Apr
(41) |
May
(45) |
Jun
(4) |
Jul
(5) |
Aug
(3) |
Sep
(10) |
Oct
(12) |
Nov
(10) |
Dec
(3) |
2010 |
Jan
(16) |
Feb
(9) |
Mar
(5) |
Apr
(5) |
May
(3) |
Jun
(11) |
Jul
(9) |
Aug
(3) |
Sep
(18) |
Oct
(5) |
Nov
(2) |
Dec
(5) |
2011 |
Jan
(3) |
Feb
(10) |
Mar
(16) |
Apr
(3) |
May
(5) |
Jun
(22) |
Jul
(4) |
Aug
(6) |
Sep
(9) |
Oct
(6) |
Nov
(5) |
Dec
(6) |
2012 |
Jan
(2) |
Feb
(2) |
Mar
(4) |
Apr
(7) |
May
(2) |
Jun
(5) |
Jul
(6) |
Aug
(6) |
Sep
(8) |
Oct
(2) |
Nov
|
Dec
(5) |
2013 |
Jan
(11) |
Feb
(2) |
Mar
(1) |
Apr
(3) |
May
(4) |
Jun
(3) |
Jul
(1) |
Aug
(3) |
Sep
(2) |
Oct
(1) |
Nov
(3) |
Dec
(5) |
2014 |
Jan
(5) |
Feb
(5) |
Mar
(4) |
Apr
|
May
(10) |
Jun
(2) |
Jul
(9) |
Aug
(2) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2015 |
Jan
(4) |
Feb
(13) |
Mar
(6) |
Apr
(15) |
May
(8) |
Jun
(6) |
Jul
(3) |
Aug
|
Sep
(2) |
Oct
(3) |
Nov
(9) |
Dec
|
2016 |
Jan
|
Feb
(5) |
Mar
(7) |
Apr
(1) |
May
|
Jun
|
Jul
(2) |
Aug
(7) |
Sep
(7) |
Oct
(2) |
Nov
(8) |
Dec
(1) |
2017 |
Jan
(7) |
Feb
(5) |
Mar
(5) |
Apr
|
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
(3) |
Sep
|
Oct
|
Nov
(5) |
Dec
(4) |
2018 |
Jan
(1) |
Feb
(8) |
Mar
(1) |
Apr
(2) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
(8) |
Oct
(4) |
Nov
(1) |
Dec
|
2019 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2020 |
Jan
(1) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2024 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Adam P. <aph...@gm...> - 2017-11-29 20:34:34
|
Hi Qihua, Try adding the --layout or --fat options and mummerplot will arrange and orient the contigs to make things look better. Best, -Adam On Mon, Nov 27, 2017 at 4:28 AM, Qihua Liang <qli...@uc...> wrote: > Dear MUMmer develop team, > > I am trying to use NUCmer together with mummer plot to compare different > assemblies of the same plant genome (~640Mb) and my command is: > nucmer --maxmatch -l 100 -c 100 -p compare1_2 assembly1.fasta assembly2. > fasta > mummerplot --png -p compare1_2 compare1_2.delta -R assembly1.fasta > -Q assembly2.fasta > > Both assemblies have hundreds of contigs, and thus the figure looks messy. > Could you provide some instructions on tuning the parameters to better > generate dot plot for two draft assemblies on contig level? > > Thank you so much > Qihua > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
From: Qihua L. <qli...@uc...> - 2017-11-27 09:28:35
|
Dear MUMmer develop team, I am trying to use NUCmer together with mummer plot to compare different assemblies of the same plant genome (~640Mb) and my command is: nucmer --maxmatch -l 100 -c 100 -p compare1_2 assembly1.fasta assembly2.fasta mummerplot --png -p compare1_2 compare1_2.delta -R assembly1.fasta -Q assembly2.fasta Both assemblies have hundreds of contigs, and thus the figure looks messy. Could you provide some instructions on tuning the parameters to better generate dot plot for two draft assemblies on contig level? Thank you so much Qihua |
From: Adam P. <aph...@gm...> - 2017-11-15 00:15:29
|
Hello, That usually happens when the program runs out of memory. You can try to find a machine with more available memory, or you could break your reference sequence into multiple files and align them all separately to reduce the memory requirement. -Adam On Tue, Nov 14, 2017 at 6:38 PM, Qihua Liang <qli...@uc...> wrote: > Dear MUMmer development team, > > I am trying to use MUMmer to compare two assemblies, using the command of: > mummer -mum -b -c Cowpea_Genome_1.0.fasta cowpea_pseudohap.fasta > v1.0_mums > > But it returns such error: > > # reading input file "Cowpea_Genome_1.0.fasta" of length 519436549 > # construct suffix tree for sequence of length 519436549 > # (maximum reference length is 536870908) > # (maximum query length is 4294967295) > # process 5194365 characters per dot > #.....................................................................Segmentation > fault (core dumped) > > Any suggestions on fixing this? > > Thank you > Qihua > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
From: Qihua L. <qli...@uc...> - 2017-11-15 00:12:43
|
Dear MUMmer development team, I am trying to use MUMmer to compare two assemblies, using the command of: mummer -mum -b -c Cowpea_Genome_1.0.fasta cowpea_pseudohap.fasta > v1.0_mums But it returns such error: # reading input file "Cowpea_Genome_1.0.fasta" of length 519436549 # construct suffix tree for sequence of length 519436549 # (maximum reference length is 536870908) # (maximum query length is 4294967295) # process 5194365 characters per dot #.....................................................................Segmentation fault (core dumped) Any suggestions on fixing this? Thank you Qihua |
From: Juliana E. <jul...@ho...> - 2017-11-08 17:10:02
|
I'm using the mix-mater program and when I'm going to use the nucmer comedy, it presents the following alert: "no such option: --maxmatch" And I do not know how to proceed. ----- Juliana Eschholz de Araujo USP/ESALQ - Piracicaba/SP |
From: Adam P. <aph...@gm...> - 2017-08-21 12:12:55
|
Hi Manish, Your parameter set `nucmer --maxmatch -c 100 -b 500 -l 50` will seed an alignment for every pair of ~100 bp repeats and larger. There's quite a few of them in the human genome :) The problem with repeats is that the program considers all pairs of them, so if you have 10 repeats, that leads to 10*10=100 alignments and so on. This quadratic relationship is what makes the runtime so bad. If you really need to find all those repeats, you could try breaking up the genome into multiple pieces and parallelizing the search. Otherwise, you can lower the sensitivity of the search by further increasing the -l and -c options, or using the -mumreference option, which will use only 'unique' seeds and therefore avoid aligning many of the repeats. I usually always run nucmer with -mumreference when dealing with large, repetitive genomes. Best, -Adam On Fri, Aug 18, 2017 at 10:24 AM, Manish Goel <go...@mp...> wrote: > Hi All, > > I am trying to run nucmer to align two human genomes using: > > nucmer --maxmatch -c 100 -b 500 -l 50 refGenome queryGenome > > The program starts and runs fine but get stuck at the last step (finishing > data). > > delta = > running NUCMER > 1: PREPARING DATA > 2,3: RUNNING mummer AND CREATING CLUSTERS > # reading input file "out.ntref" of length 3088286426 > # construct suffix tree for sequence of length 3088286426 > # (maximum reference length is 2305843009213693948) > # (maximum query length is 18446744073709551615) > # process 30882864 characters per dot > #........................................................... > ......................................... > # CONSTRUCTIONTIME ****/software/lib/MUMmer3.23/mummer out.ntref 4578.29 > # reading input file ********** of length 3088496978 > # matching query-file ************ > # against subject-file "out.ntref" > # COMPLETETIME ****/software/lib/MUMmer3.23/mummer out.ntref 37778.09 > # SPACE *****/software/lib/MUMmer3.23/mummer out.ntref 6013.53 > 4: FINISHING DATA > > It is writing the out.delta file but it seems that it is doing so very > slowly. I let the program run for more than 3 months (no kidding) before I > killed the job. I started a new job more than 10days back, it is still > running with the out.delta file more than 1.2gb and growing. Last edit to > out.delta happened 18hrs prior to the time I write this email. I know that > my nucmer installation is working as I have successfully aligned multiple > plant genomes, albeit they also took around 90% of their running time at > the "Finishing data" step only. > > Any suggestion why nucmer is showing this behavior and how to resolve it? > I would hypothesize that it is because human genome is quite large, but I > don't want to believe that nucmer would take more than 3months to align it. > > Thanks for your time and efforts. > > Regards > > Manish Goel > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > |
From: Manish G. <go...@mp...> - 2017-08-18 14:40:02
|
Hi All, I am trying to run nucmer to align two human genomes using: nucmer --maxmatch -c 100 -b 500 -l 50 refGenome queryGenome The program starts and runs fine but get stuck at the last step (finishing data). delta = running NUCMER 1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS # reading input file "out.ntref" of length 3088286426 # construct suffix tree for sequence of length 3088286426 # (maximum reference length is 2305843009213693948) # (maximum query length is 18446744073709551615) # process 30882864 characters per dot #.................................................................................................... # CONSTRUCTIONTIME ****/software/lib/MUMmer3.23/mummer out.ntref 4578.29 # reading input file ********** of length 3088496978 # matching query-file ************ # against subject-file "out.ntref" # COMPLETETIME ****/software/lib/MUMmer3.23/mummer out.ntref 37778.09 # SPACE *****/software/lib/MUMmer3.23/mummer out.ntref 6013.53 4: FINISHING DATA It is writing the out.delta file but it seems that it is doing so very slowly. I let the program run for more than 3 months (no kidding) before I killed the job. I started a new job more than 10days back, it is still running with the out.delta file more than 1.2gb and growing. Last edit to out.delta happened 18hrs prior to the time I write this email. I know that my nucmer installation is working as I have successfully aligned multiple plant genomes, albeit they also took around 90% of their running time at the "Finishing data" step only. Any suggestion why nucmer is showing this behavior and how to resolve it? I would hypothesize that it is because human genome is quite large, but I don't want to believe that nucmer would take more than 3months to align it. Thanks for your time and efforts. Regards Manish Goel |
From: Adam P. <aph...@gm...> - 2017-08-08 19:15:42
|
Hello, The mummer output does not include the length of the input sequences, and so they cannot be organized properly in the dotplot. Use the -R and -Q options to provide the original fasta files (-R rgenome -Q qgenome), and things should work. Alternatively, you could use nucmer for the alignment step and input the resulting .delta file to mummerplot. Best, -Adam On Fri, Jul 28, 2017 at 5:02 PM, Daniel Harris <dap...@um...> wrote: > Hello, > > I am running mummer 3.23 on 2 multi fasta genomes using the command > (rgenome=R_219.fasta qgenome=S_312.fasta): > > mummer -mum -b -c rgenome qgenome > rgenome-qgenome.mums > > Then when I run > > mummerplot --postscript --prefix="rgenome-qgenome" rgenome-qgenome.mums > > I get the following error: > > gnuplot 4.6 patchlevel 2 > Reading mummer file R_219.fasta-S_312.fasta.mums (use mummer -c) > Writing plot files R_219.fasta-S_312.fasta.fplot, > R_219.fasta-S_312.fasta.rplot > WARNING: Multiple ref sequences overlaid, try -R or -r > WARNING: Multiple qry sequences overlaid, try -Q, -q or -c > Writing gnuplot script R_219.fasta-S_312.fasta.gp > Rendering plot R_219.fasta-S_312.fasta.ps > > And the output .gp plot is not empty but plots nothing and does nothing > when called at gnuplots rgenome-qgenome.gp > > What am I doing wrong? Please tell me if you need any other information. > > Thanks, > Daniel > > Thanks, > Daniel > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
From: Daniel H. <dap...@um...> - 2017-07-28 21:02:49
|
Hello, I am running mummer 3.23 on 2 multi fasta genomes using the command (rgenome=R_219.fasta qgenome=S_312.fasta): mummer -mum -b -c rgenome qgenome > rgenome-qgenome.mums Then when I run mummerplot --postscript --prefix="rgenome-qgenome" rgenome-qgenome.mums I get the following error: gnuplot 4.6 patchlevel 2 Reading mummer file R_219.fasta-S_312.fasta.mums (use mummer -c) Writing plot files R_219.fasta-S_312.fasta.fplot, R_219.fasta-S_312.fasta.rplot WARNING: Multiple ref sequences overlaid, try -R or -r WARNING: Multiple qry sequences overlaid, try -Q, -q or -c Writing gnuplot script R_219.fasta-S_312.fasta.gp Rendering plot R_219.fasta-S_312.fasta.ps And the output .gp plot is not empty but plots nothing and does nothing when called at gnuplots rgenome-qgenome.gp What am I doing wrong? Please tell me if you need any other information. Thanks, Daniel Thanks, Daniel |
From: Rameez Mj <ram...@gm...> - 2017-07-11 03:36:08
|
---------- Forwarded message ---------- From: Rameez Mj <ram...@gm...> Date: 5 July 2017 at 15:45 Subject: Error on mummerplot To: mum...@li... I am trying nucmer for aligning two draft genomes and generate the dotplot using mummerplot. Mummerplot command is giving me the error saying "Can't use 'defined(%hash)' (Maybe you should just omit the defined()?) at /usr/bin/mummerplot line 884". How to solve this problem? |
From: Adam P. <aph...@gm...> - 2017-07-10 14:52:58
|
Hello, dnadiff also produces .qdiff, .rdiff, and .coords files. The format of these files is documented in the README under the show-diff and show-coords programs. It is possible to recover the locations of the events from those files. Best, -Adam On Mon, Jul 3, 2017 at 7:06 AM, se...@we... < se...@we...> wrote: > Dear colleagues, > > I am trying to identify genomic difference in two genomes using MUMmer. I > use the command 'dnadiff -d test.delta -p testdnadiff'. > > And then I get the follow report file in testdnadiff.report : > > [Feature Estimates] > Breakpoints 124397 124404 > Relocations 2091 1865 > Translocations 3575 3606 > Inversions 518 505 > > Insertions 32663 37843 > InsertionSum 58149586 59674780 > InsertionAvg 1780.29 1576.90 > > TandemIns 596 612 > TandemInsSum 221566 212787 > TandemInsAvg 371.76 347.69 > ... > > I think it is a very useful summary, but how can I know where the > difference like which position the translocations are located on ? > > Regards, > JiaMing > > ------------------------------ > se...@we... > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > |
From: Rameez Mj <ram...@gm...> - 2017-07-05 10:15:48
|
I am trying nucmer for aligning two draft genomes and generate the dotplot using mummerplot. Mummerplot command is giving me the error saying "Can't use 'defined(%hash)' (Maybe you should just omit the defined()?) at /usr/bin/mummerplot line 884". How to solve this problem? |
From: <se...@we...> - 2017-07-03 11:29:07
|
Dear colleagues, I am trying to identify genomic difference in two genomes using MUMmer. I use the command 'dnadiff -d test.delta -p testdnadiff'. And then I get the follow report file in testdnadiff.report : [Feature Estimates] Breakpoints 124397 124404 Relocations 2091 1865 Translocations 3575 3606 Inversions 518 505 Insertions 32663 37843 InsertionSum 58149586 59674780 InsertionAvg 1780.29 1576.90 TandemIns 596 612 TandemInsSum 221566 212787 TandemInsAvg 371.76 347.69 ... I think it is a very useful summary, but how can I know where the difference like which position the translocations are located on ? Regards, JiaMing se...@we... |
From: Md M. R. <shi...@gm...> - 2017-06-11 04:03:57
|
Dear Mummer authority Plz, Let you know about the following software OPTIONAL UTILITIES -- To use the visualization tools included with MUMmer, it may be necessary to download and install the following utilities: - fig2dev (fig2dev 3.2.3) - gnuplot (gnuplot 4.0) - xfig (xfig 3.2) My computer configuration 10.11.6 processor 3.06 GHz Memory 8 GB |
From: Pedro B. <psb...@gm...> - 2017-05-16 16:03:02
|
Hello, I'm trying to align a set of pacbio reads to access the performance of error correction stage via show coords utility. I run the following command: *nucmer --maxmatch -c 100 -l 50 -p correctedReads draft-0.2.fasta ec_outCov90_minCov4.correctedReads.fasta* nucmer was running for a long time, generating a big mgaps file (96Gb). I noticed the process crashed in the phase of the delta file creation. I can not really see what happened, as the nucmer.error file just reports this and stdout was written to a screen (my bad): *20170514|005552| 52329| ERROR: postnuc returned non-zero* What may be happening here?You think it's possible to restart the process by skipping the clusters generation step? To generate the mgaps file, It took longer than a week. Thanks in advance, Pedro Barbosa |
From: Stephane P. | V. | <ste...@vi...> - 2017-03-28 15:02:39
|
Dear, I get the same error, whichever command I try from the manual. Thanks for any help Stephane (running on ubuntu 4.4.0-66-generic but same error under OSX) mummerplot --large --layout --png out.filter.m defined(%hash) is deprecated at /opt/biotools/bin/mummerplot line 884. (Maybe you should just omit the defined()?) defined(%hash) is deprecated at /opt/biotools/bin/mummerplot line 894. (Maybe you should just omit the defined()?) defined(%hash) is deprecated at /opt/biotools/bin/mummerplot line 981. (Maybe you should just omit the defined()?) defined(%hash) is deprecated at /opt/biotools/bin/mummerplot line 991. (Maybe you should just omit the defined()?) defined(%hash) is deprecated at /opt/biotools/bin/mummerplot line 1034. (Maybe you should just omit the defined()?) defined(%hash) is deprecated at /opt/biotools/bin/mummerplot line 1044. (Maybe you should just omit the defined()?) gnuplot 5.0 patchlevel 3 Writing filtered delta file out.filter Reading delta file out.filter.m Writing plot files out.fplot, out.rplot Writing gnuplot script out.gp Rendering plot out.png set mouse clipboardformat "[%.0f, %.0f]" ^ "out.gp", line 64: wrong option WARNING: Unable to run 'gnuplot out.gp', Inappropriate ioctl for device total 2.6M drwxr-xr-x 2 u0002316 domain users 4.0K Mar 28 16:42 . drwxr-xr-x 7 u0002316 domain users 4.0K Mar 28 16:34 .. -rw-r--r-- 1 u0002316 domain users 4.1K Mar 28 16:41 out.1coords -rw-r--r-- 1 u0002316 domain users 33K Mar 28 16:41 out.1delta -rw-r--r-- 1 u0002316 domain users 19K Mar 28 16:40 out.coords -rw-r--r-- 1 u0002316 domain users 1.8M Mar 28 16:37 out.delta -rw-r--r-- 1 u0002316 domain users 34K Mar 28 16:42 out.filter -rw-r--r-- 1 u0002316 domain users 51K Mar 28 16:40 out.filter.m -rw-r--r-- 1 u0002316 domain users 9.0K Mar 28 16:42 out.fplot -rw-r--r-- 1 u0002316 domain users 1.7K Mar 28 16:42 out.gp -rw-r--r-- 1 u0002316 domain users 17K Mar 28 16:41 out.mcoords -rw-r--r-- 1 u0002316 domain users 51K Mar 28 16:41 out.mdelta -rw-r--r-- 1 u0002316 domain users 0 Mar 28 16:42 out.png -rw-r--r-- 1 u0002316 domain users 3.5K Mar 28 16:41 out.qdiff -rw-r--r-- 1 u0002316 domain users 4.7K Mar 28 16:41 out.rdiff -rw-r--r-- 1 u0002316 domain users 4.1K Mar 28 16:41 out.report -rw-r--r-- 1 u0002316 domain users 4.3K Mar 28 16:42 out.rplot -rw-r--r-- 1 u0002316 domain users 613K Mar 28 16:41 out.snps |
From: Bruna M. <men...@gm...> - 2017-03-27 13:57:39
|
Hi, I am working with MUMmer for comparison of phages genomes. I previously use progressiveMauve for comparison, and i obtained results using the default options. However when I start using MUMmer I only obtained one result in four comparison that was successful in progressiveMauve, I already tried to change the default option but i didn’t have any results. There are any form to obtain the correct option to obtain an optimal comparison? Do you have any suggestion? Thank you for you attention. Best Regards, Bruna Mendes |
From: Manish G. <go...@mp...> - 2017-03-21 12:55:37
|
Hi Adam, Thanks for the recommending the paper. I read it, but sadly it was not of much help. So, as of now I am considering it as a bug. Though, it will be great if it could be resolved. I have also encountered a similar but different problem. This time, the -1 output file contains "fewer" regions (as per my understanding). My workflow is as follows: 1) I run nucmer with --maxmatch to get my out.delta file. 2) I use delta-filter to obtain a -1 filter file using delta-filter -1 -i 90 -l 100 out.delta > out_1.delta and then convert it into coords file with show-coords -THrd out_1.delta > out_1_r.coords (I also create a query sorted coords file, namely out_1_q.coords, too) 3) I use the nucmer --maxmatch output to get all alignments as well, using delta-filter -i 90 -l 100 out.delta > out_m.delta show-coords -THrd out_m.delta > out_m_r.coords Now, I observe that there are alignment in the maxmatch output which are non-overlapping with the alignments in the -1 output. Example: maxmatch output: 1430264 1431028 9238508 9239272 765 765 97.91 1 1 Chr1 Chr2 1430266 1431028 432524 431761 763 764 95.81 1 -1 Chr1 Chr1 1430266 1430859 13853754 13853160 594 595 96.30 1 -1 Chr1 Chr2 1430266 1430515 15952154 15952408 250 255 96.08 1 1 Chr1 Chr3 ***1430269 1430859 24194817 24195410 591 594 95.13 1 1 Chr1 Chr1* 1430296 1431028 19826022 19826753 733 732 97.27 1 1 Chr1 Chr5 -1 r sorted output: 1197986 1258207 1200368 1260677 60222 60310 98.98 1 1 Chr1 Chr1 *1258378 1430265 1260674 1432525 171888 171852 99.85 1 1 Chr1 Chr1** ** 1431029 1433210 1432518 1434692 2182 2175 99.08 1 1 Chr1 Chr1* 1433208 1441523 1435452 1443770 8316 8319 98.57 1 1 Chr1 Chr1 -1 q sorted output: 25261955 25267505 24186484 24192037 5551 5554 98.87 1 1 Chr1 Chr1 *25267802 25270577 24192035 24194814 2776 2780 99.39 1 1 Chr1 Chr1** ** 25270727 25273658 24195853 24198836 2932 2984 94.32 1 1 Chr1 Chr1* 25273658 25343360 24199200 24269012 69703 69813 98.79 1 1 Chr1 Chr1 From here, I would expect, that since the maxmatch alignment example, is aligning two regions which are not present in the current -1 output alignments, it is a unique block and hence should be part of the -1 output file. Am I right in making this assumption? If yes, then what could be the possible reasons for this behavior of the program? And how can I tackle this problem? Thanks for your time and efforts. Best regards Manish Goel On 03/01/2017 09:15 PM, Adam Phillippy wrote: > Hi Manish, > -q is attempting to provide the highest scoring set of alignments that > cover the query sequence, regardless of the reference coordinates. The > two bold alignment lines you highlight both cover the exact same bases > on the query, so I would expect -q to only output one of them (not > both). Perhaps this is a bug? Sorry, you've stumped me. > > If it helps, this paper has a more complete discription of the > algorithm delta-filter is using to compute the filtering: > http://genome.cshlp.org/content/19/4/682.full > > They called it the SuperMap algorithm. Although it came out some years > after I implemented delta-filter, the ideas are the same. > > Best, > -Adam > > > On Fri, Feb 17, 2017 at 8:39 AM, Manish Goel <go...@mp... > <mailto:go...@mp...>> wrote: > > Hi Adam, > > Thanks for the explanation. I think now I have some understanding > about how the algorithm works, and this leads me to a related > question. You said that as long as there is even a single > base-pair difference, the alignment would be considered. Also the > -1 option output is just the intersection of -q and -r. > > Below is the filtered output using the -r parameter of > delta-filter. Of the 10 regions selected here, only two are in the > -q output (in bold). I can justify rejection of seven of these > alignments (reason written in the last column) but unable to > figure out why the penultimate alignment is not selected with -q. > Given that, -q identifies m-to-1 alignments and that the reference > range for this alignment [11260337,11266679] is not overlapping > with any other pre-selected alignment, I find it a suitable member > for the longest consistent set for query. > > Ref.start Ref.end Query.start Query.end Ref.length > Query.length %identity Ref.dir Query.dir Ref.chr > Query.chr Reason for not in -q output > > *11229246 11235589 11167255 11173587 6344 6333 > 98.16 1 1 Chr5 chr5 -* > > 11231838 11238180 11167255 11173587 6343 6333 98.03 > 1 1 Chr5 chr5 reference overlap with previous and > length*identity score is low > > 11237670 11242473 11167255 11172048 4804 4794 97.73 > 1 1 Chr5 chr5 query sequence is substring of > already selected region > > 11242472 11244088 11167850 11169466 1617 1617 97.96 > 1 1 Chr5 chr5 query sequence is substring of already > selected region > > 11244087 11249832 11167850 11173587 5746 5738 97.76 > 1 1 Chr5 chr5 query sequence is substring of already > selected region > > 11247378 11253719 11167255 11173587 6342 6333 98.13 > 1 1 Chr5 chr5 reference is overlapping with better > next alignment > > *11252561 11258903 11167255 11173587 6343 6333 > 98.13 1 1 Chr5 chr5 -* > > 11255153 11261495 11167255 11173587 6343 6333 97.91 > 1 1 Chr5 chr5 reference overlap with previous and > length*identity score is low > > /11260337 11266679 11167255 11173587 6343 6333 > 97.84 1 1 Chr5 chr5 *??*/ > > 11262136 11267975 11167757 11173587 5840 5831 97.61 > 1 1 Chr5 chr5 If we select previous then this > overlaps with poorer score and hence rejected > > Please let me know what is wrong with my understanding of the method. > > Regards > > Manish Goel > > > > On 02/16/2017 05:54 PM, Adam Phillippy wrote: >> Hi Manish, >> These options don't strictly enforce a 1-to-1 mapping of bases, >> but rather of the alignment segments. I think most of your >> confusion comes from the cases of overlapping alignments. As long >> as an alignment contains at least one position that's not in any >> others, it will be included to maximize the number of aligned bases. >> >> Regions are scored by using alignment length * identity, and -1 >> option aims to maximize the sum of scores of all alignments chosen. >> >> For your purposes, looking for duplications, you want to use -m >> >> Best, >> Adam >> >> Sent from my mobile. >> >> On Feb 13, 2017, at 12:25 PM, Manish Goel <go...@mp... >> <mailto:go...@mp...>> wrote: >> >>> Hi Members of Mummer mailing list, >>> >>> I am trying to identify genomic duplicates in two genomes using >>> MUMmer. For this purpose, I am first using NUCmer to find all >>> possible alignments using --maxmatch and then I want to use >>> delta-filter to find the unique matches (1-to-1 matches) and >>> then will try to use this information to find the duplicates. >>> But I am quite confused about the difference between -1 and -g >>> parameters of delta-filter. >>> >>> This is what I have done till now: >>> >>> I filter my out.delta file using -m, -1, and -g parameters >>> (along with -i 90 and -l 50) followed by show-coords -THrd to >>> get 3 coords file, namely out_1_filter.coords, >>> out_g_filter.coords, and out_m_filter.coords. Columns of coords >>> file are: >>> >>> Ref.start Ref.end Query.start Query.end Ref.length >>> Query.length %identity Ref.dir Query.dir Ref.chr Query.chr >>> >>> grep "11167255" out_m_filter.coords >>> 11229246 11235589 11167255 11173587 6344 6333 >>> 98.16 1 1 Chr5 chr5 >>> 11231838 11238180 11167255 11173587 6343 6333 >>> 98.03 1 1 Chr5 chr5 >>> 11237670 11242473 11167255 11172048 4804 4794 >>> 97.73 1 1 Chr5 chr5 >>> 11247378 11253719 11167255 11173587 6342 6333 >>> 98.13 1 1 Chr5 chr5 >>> 11252561 11258903 11167255 11173587 6343 6333 >>> 98.13 1 1 Chr5 chr5 >>> 11255153 11261495 11167255 11173587 6343 6333 >>> 97.91 1 1 Chr5 chr5 >>> 11260337 11266679 11167255 11173587 6343 6333 >>> 97.84 1 1 Chr5 chr5 >>> >>> From the m-to-m alignment (above), we observe that ref. genome >>> contains a repeated region which maps to a region on query >>> genome. But, for the -1 and -g filtered coords file (below), >>> more than one ref. genome region aligns to query genome, which I >>> find counter-intuitive as program should output 1-to-1 alignment >>> with these parameters. >>> >>> grep "11167255" out_1_filter.coords >>> 11229246 11235589 11167255 11173587 6344 6333 >>> 98.16 1 1 Chr5 Chr5 >>> 11252561 11258903 11167255 11173587 6343 6333 >>> 98.13 1 1 Chr5 Chr5 >>> >>> grep "11167255" out_g_filter.coords >>> 11252561 11258903 11167255 11173587 6343 6333 >>> 98.13 1 1 Chr5 Chr5 >>> 11260337 11266679 11167255 11173587 6343 6333 >>> 97.84 1 1 Chr5 Chr5 >>> >>> Also, from the seven repeats on ref. genome what criteria is >>> used to select the two regions identified here and reject the >>> other? Why is ref. region [11252561,11258903] common in both >>> output and why are other two different? Algorithmic as well as >>> biological reasons would be highly appreciated. >>> >>> Thanks for your time and efforts. >>> >>> Best regards >>> >>> Manish Goel >>> >>> PS: I have manually checked these regions and they are indeed >>> repeated in reference genome, meaning that there are no bugs or >>> mistakes. >>> >>> >>> ------------------------------------------------------------------------------ >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, SlashDot.org <http://SlashDot.org>! >>> http://sdm.link/slashdot >>> _______________________________________________ >>> MUMmer-help mailing list >>> MUM...@li... >>> <mailto:MUM...@li...> >>> https://lists.sourceforge.net/lists/listinfo/mummer-help >>> <https://lists.sourceforge.net/lists/listinfo/mummer-help> > > |
From: DeAnna B. <dea...@gm...> - 2017-03-17 19:09:38
|
I've been trying desperately to fix this problem, but I'm not a programmer and so I'm stuck. I have been trying to install Mummer on my MacBook pro (OS Sierra) and have run into some sort of library linking issue. The "make check" runs fine but the install throw this error: ld: warning: ld: warning: ignoring file ../streesrc/libstree.a, file was built for archive which is not the architecture being linked (x86_64): ../streesrc/libstree.aignoring file ../libbasedir/libbase.a, file was built for archive which is not the architecture being linked (x86_64): ../libbasedir/libbase.a Followed by: ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 Any help is much appreciated! Thank you, DeAnna |
From: Adam P. <aph...@gm...> - 2017-03-01 20:16:00
|
Hi Manish, -q is attempting to provide the highest scoring set of alignments that cover the query sequence, regardless of the reference coordinates. The two bold alignment lines you highlight both cover the exact same bases on the query, so I would expect -q to only output one of them (not both). Perhaps this is a bug? Sorry, you've stumped me. If it helps, this paper has a more complete discription of the algorithm delta-filter is using to compute the filtering: http://genome.cshlp.org/content/19/4/682.full They called it the SuperMap algorithm. Although it came out some years after I implemented delta-filter, the ideas are the same. Best, -Adam On Fri, Feb 17, 2017 at 8:39 AM, Manish Goel <go...@mp...> wrote: > Hi Adam, > > Thanks for the explanation. I think now I have some understanding about > how the algorithm works, and this leads me to a related question. You said > that as long as there is even a single base-pair difference, the alignment > would be considered. Also the -1 option output is just the intersection of > -q and -r. > > Below is the filtered output using the -r parameter of delta-filter. Of > the 10 regions selected here, only two are in the -q output (in bold). I > can justify rejection of seven of these alignments (reason written in the > last column) but unable to figure out why the penultimate alignment is not > selected with -q. Given that, -q identifies m-to-1 alignments and that the > reference range for this alignment [11260337,11266679] is not overlapping > with any other pre-selected alignment, I find it a suitable member for the > longest consistent set for query. > > Ref.start Ref.end Query.start Query.end Ref.length > Query.length %identity Ref.dir Query.dir Ref.chr > Query.chr Reason for not in -q output > > *11229246 11235589 11167255 11173587 6344 6333 98.16 > 1 1 Chr5 chr5 -* > > 11231838 11238180 11167255 11173587 6343 6333 98.03 > 1 1 Chr5 chr5 reference overlap with previous and > length*identity score is low > > 11237670 11242473 11167255 11172048 4804 4794 97.73 > 1 1 Chr5 chr5 query sequence is substring of already > selected region > > 11242472 11244088 11167850 11169466 1617 1617 97.96 > 1 1 Chr5 chr5 query sequence is substring of already > selected region > > 11244087 11249832 11167850 11173587 5746 5738 97.76 > 1 1 Chr5 chr5 query sequence is substring of already > selected region > > 11247378 11253719 11167255 11173587 6342 6333 98.13 > 1 1 Chr5 chr5 reference is overlapping with better next > alignment > > *11252561 11258903 11167255 11173587 6343 6333 98.13 > 1 1 Chr5 chr5 -* > > 11255153 11261495 11167255 11173587 6343 6333 97.91 > 1 1 Chr5 chr5 reference overlap with previous and > length*identity score is low > > *11260337 11266679 11167255 11173587 6343 6333 97.84 > 1 1 Chr5 chr5 ??* > > 11262136 11267975 11167757 11173587 5840 5831 97.61 > 1 1 Chr5 chr5 If we select previous then this overlaps with > poorer score and hence rejected > > Please let me know what is wrong with my understanding of the method. > > Regards > > Manish Goel > > > > On 02/16/2017 05:54 PM, Adam Phillippy wrote: > > Hi Manish, > These options don't strictly enforce a 1-to-1 mapping of bases, but rather > of the alignment segments. I think most of your confusion comes from the > cases of overlapping alignments. As long as an alignment contains at least > one position that's not in any others, it will be included to maximize the > number of aligned bases. > > Regions are scored by using alignment length * identity, and -1 option > aims to maximize the sum of scores of all alignments chosen. > > For your purposes, looking for duplications, you want to use -m > > Best, > Adam > > Sent from my mobile. > > On Feb 13, 2017, at 12:25 PM, Manish Goel <go...@mp...> wrote: > > Hi Members of Mummer mailing list, > > I am trying to identify genomic duplicates in two genomes using MUMmer. > For this purpose, I am first using NUCmer to find all possible alignments > using --maxmatch and then I want to use delta-filter to find the unique > matches (1-to-1 matches) and then will try to use this information to find > the duplicates. But I am quite confused about the difference between -1 and > -g parameters of delta-filter. > > This is what I have done till now: > > I filter my out.delta file using -m, -1, and -g parameters (along with -i > 90 and -l 50) followed by show-coords -THrd to get 3 coords file, namely > out_1_filter.coords, out_g_filter.coords, and out_m_filter.coords. Columns > of coords file are: > > Ref.start Ref.end Query.start Query.end Ref.length > Query.length %identity Ref.dir Query.dir Ref.chr Query.chr > > grep "11167255" out_m_filter.coords > 11229246 11235589 11167255 11173587 6344 6333 98.16 > 1 1 Chr5 chr5 > 11231838 11238180 11167255 11173587 6343 6333 98.03 > 1 1 Chr5 chr5 > 11237670 11242473 11167255 11172048 4804 4794 97.73 > 1 1 Chr5 chr5 > 11247378 11253719 11167255 11173587 6342 6333 98.13 > 1 1 Chr5 chr5 > 11252561 11258903 11167255 11173587 6343 6333 98.13 > 1 1 Chr5 chr5 > 11255153 11261495 11167255 11173587 6343 6333 97.91 > 1 1 Chr5 chr5 > 11260337 11266679 11167255 11173587 6343 6333 97.84 > 1 1 Chr5 chr5 > > From the m-to-m alignment (above), we observe that ref. genome contains a > repeated region which maps to a region on query genome. But, for the -1 and > -g filtered coords file (below), more than one ref. genome region aligns to > query genome, which I find counter-intuitive as program should output > 1-to-1 alignment with these parameters. > > grep "11167255" out_1_filter.coords > 11229246 11235589 11167255 11173587 6344 6333 98.16 > 1 1 Chr5 Chr5 > 11252561 11258903 11167255 11173587 6343 6333 98.13 > 1 1 Chr5 Chr5 > > grep "11167255" out_g_filter.coords > 11252561 11258903 11167255 11173587 6343 6333 98.13 > 1 1 Chr5 Chr5 > 11260337 11266679 11167255 11173587 6343 6333 97.84 > 1 1 Chr5 Chr5 > > Also, from the seven repeats on ref. genome what criteria is used to > select the two regions identified here and reject the other? Why is ref. > region [11252561,11258903] common in both output and why are other two > different? Algorithmic as well as biological reasons would be highly > appreciated. > > Thanks for your time and efforts. > > Best regards > > Manish Goel > > PS: I have manually checked these regions and they are indeed repeated in > reference genome, meaning that there are no bugs or mistakes. > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help > > > |
From: Manish G. <go...@mp...> - 2017-02-17 13:39:14
|
Hi Adam, Thanks for the explanation. I think now I have some understanding about how the algorithm works, and this leads me to a related question. You said that as long as there is even a single base-pair difference, the alignment would be considered. Also the -1 option output is just the intersection of -q and -r. Below is the filtered output using the -r parameter of delta-filter. Of the 10 regions selected here, only two are in the -q output (in bold). I can justify rejection of seven of these alignments (reason written in the last column) but unable to figure out why the penultimate alignment is not selected with -q. Given that, -q identifies m-to-1 alignments and that the reference range for this alignment [11260337,11266679] is not overlapping with any other pre-selected alignment, I find it a suitable member for the longest consistent set for query. Ref.start Ref.end Query.start Query.end Ref.length Query.length %identity Ref.dir Query.dir Ref.chr Query.chr Reason for not in -q output *11229246 11235589 11167255 11173587 6344 6333 98.16 1 1 Chr5 chr5 -* 11231838 11238180 11167255 11173587 6343 6333 98.03 1 1 Chr5 chr5 reference overlap with previous and length*identity score is low 11237670 11242473 11167255 11172048 4804 4794 97.73 1 1 Chr5 chr5 query sequence is substring of already selected region 11242472 11244088 11167850 11169466 1617 1617 97.96 1 1 Chr5 chr5 query sequence is substring of already selected region 11244087 11249832 11167850 11173587 5746 5738 97.76 1 1 Chr5 chr5 query sequence is substring of already selected region 11247378 11253719 11167255 11173587 6342 6333 98.13 1 1 Chr5 chr5 reference is overlapping with better next alignment *11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 chr5 -* 11255153 11261495 11167255 11173587 6343 6333 97.91 1 1 Chr5 chr5 reference overlap with previous and length*identity score is low /11260337 11266679 11167255 11173587 6343 6333 97.84 1 1 Chr5 chr5 *??*/ 11262136 11267975 11167757 11173587 5840 5831 97.61 1 1 Chr5 chr5 If we select previous then this overlaps with poorer score and hence rejected Please let me know what is wrong with my understanding of the method. Regards Manish Goel On 02/16/2017 05:54 PM, Adam Phillippy wrote: > Hi Manish, > These options don't strictly enforce a 1-to-1 mapping of bases, but > rather of the alignment segments. I think most of your confusion comes > from the cases of overlapping alignments. As long as an alignment > contains at least one position that's not in any others, it will be > included to maximize the number of aligned bases. > > Regions are scored by using alignment length * identity, and -1 option > aims to maximize the sum of scores of all alignments chosen. > > For your purposes, looking for duplications, you want to use -m > > Best, > Adam > > Sent from my mobile. > > On Feb 13, 2017, at 12:25 PM, Manish Goel <go...@mp... > <mailto:go...@mp...>> wrote: > >> Hi Members of Mummer mailing list, >> >> I am trying to identify genomic duplicates in two genomes using >> MUMmer. For this purpose, I am first using NUCmer to find all >> possible alignments using --maxmatch and then I want to use >> delta-filter to find the unique matches (1-to-1 matches) and then >> will try to use this information to find the duplicates. But I am >> quite confused about the difference between -1 and -g parameters of >> delta-filter. >> >> This is what I have done till now: >> >> I filter my out.delta file using -m, -1, and -g parameters (along >> with -i 90 and -l 50) followed by show-coords -THrd to get 3 coords >> file, namely out_1_filter.coords, out_g_filter.coords, and >> out_m_filter.coords. Columns of coords file are: >> >> Ref.start Ref.end Query.start Query.end Ref.length >> Query.length %identity Ref.dir Query.dir Ref.chr Query.chr >> >> grep "11167255" out_m_filter.coords >> 11229246 11235589 11167255 11173587 6344 6333 98.16 >> 1 1 Chr5 chr5 >> 11231838 11238180 11167255 11173587 6343 6333 98.03 >> 1 1 Chr5 chr5 >> 11237670 11242473 11167255 11172048 4804 4794 97.73 >> 1 1 Chr5 chr5 >> 11247378 11253719 11167255 11173587 6342 6333 98.13 >> 1 1 Chr5 chr5 >> 11252561 11258903 11167255 11173587 6343 6333 98.13 >> 1 1 Chr5 chr5 >> 11255153 11261495 11167255 11173587 6343 6333 97.91 >> 1 1 Chr5 chr5 >> 11260337 11266679 11167255 11173587 6343 6333 97.84 >> 1 1 Chr5 chr5 >> >> From the m-to-m alignment (above), we observe that ref. genome >> contains a repeated region which maps to a region on query genome. >> But, for the -1 and -g filtered coords file (below), more than one >> ref. genome region aligns to query genome, which I find >> counter-intuitive as program should output 1-to-1 alignment with >> these parameters. >> >> grep "11167255" out_1_filter.coords >> 11229246 11235589 11167255 11173587 6344 6333 98.16 >> 1 1 Chr5 Chr5 >> 11252561 11258903 11167255 11173587 6343 6333 98.13 >> 1 1 Chr5 Chr5 >> >> grep "11167255" out_g_filter.coords >> 11252561 11258903 11167255 11173587 6343 6333 98.13 >> 1 1 Chr5 Chr5 >> 11260337 11266679 11167255 11173587 6343 6333 97.84 >> 1 1 Chr5 Chr5 >> >> Also, from the seven repeats on ref. genome what criteria is used to >> select the two regions identified here and reject the other? Why is >> ref. region [11252561,11258903] common in both output and why are >> other two different? Algorithmic as well as biological reasons would >> be highly appreciated. >> >> Thanks for your time and efforts. >> >> Best regards >> >> Manish Goel >> >> PS: I have manually checked these regions and they are indeed >> repeated in reference genome, meaning that there are no bugs or mistakes. >> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, SlashDot.org <http://SlashDot.org>! >> http://sdm.link/slashdot >> _______________________________________________ >> MUMmer-help mailing list >> MUM...@li... >> <mailto:MUM...@li...> >> https://lists.sourceforge.net/lists/listinfo/mummer-help |
From: Adam P. <aph...@gm...> - 2017-02-16 16:54:58
|
Hi Manish, These options don't strictly enforce a 1-to-1 mapping of bases, but rather of the alignment segments. I think most of your confusion comes from the cases of overlapping alignments. As long as an alignment contains at least one position that's not in any others, it will be included to maximize the number of aligned bases. Regions are scored by using alignment length * identity, and -1 option aims to maximize the sum of scores of all alignments chosen. For your purposes, looking for duplications, you want to use -m Best, Adam Sent from my mobile. > On Feb 13, 2017, at 12:25 PM, Manish Goel <go...@mp...> wrote: > > Hi Members of Mummer mailing list, > > I am trying to identify genomic duplicates in two genomes using MUMmer. For this purpose, I am first using NUCmer to find all possible alignments using --maxmatch and then I want to use delta-filter to find the unique matches (1-to-1 matches) and then will try to use this information to find the duplicates. But I am quite confused about the difference between -1 and -g parameters of delta-filter. > > This is what I have done till now: > > I filter my out.delta file using -m, -1, and -g parameters (along with -i 90 and -l 50) followed by show-coords -THrd to get 3 coords file, namely out_1_filter.coords, out_g_filter.coords, and out_m_filter.coords. Columns of coords file are: > > Ref.start Ref.end Query.start Query.end Ref.length Query.length %identity Ref.dir Query.dir Ref.chr Query.chr > grep "11167255" out_m_filter.coords > 11229246 11235589 11167255 11173587 6344 6333 98.16 1 1 Chr5 chr5 > 11231838 11238180 11167255 11173587 6343 6333 98.03 1 1 Chr5 chr5 > 11237670 11242473 11167255 11172048 4804 4794 97.73 1 1 Chr5 chr5 > 11247378 11253719 11167255 11173587 6342 6333 98.13 1 1 Chr5 chr5 > 11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 chr5 > 11255153 11261495 11167255 11173587 6343 6333 97.91 1 1 Chr5 chr5 > 11260337 11266679 11167255 11173587 6343 6333 97.84 1 1 Chr5 chr5 > > From the m-to-m alignment (above), we observe that ref. genome contains a repeated region which maps to a region on query genome. But, for the -1 and -g filtered coords file (below), more than one ref. genome region aligns to query genome, which I find counter-intuitive as program should output 1-to-1 alignment with these parameters. > grep "11167255" out_1_filter.coords > 11229246 11235589 11167255 11173587 6344 6333 98.16 1 1 Chr5 Chr5 > 11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 Chr5 > > grep "11167255" out_g_filter.coords > 11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 Chr5 > 11260337 11266679 11167255 11173587 6343 6333 97.84 1 1 Chr5 Chr5 > > Also, from the seven repeats on ref. genome what criteria is used to select the two regions identified here and reject the other? Why is ref. region [11252561,11258903] common in both output and why are other two different? Algorithmic as well as biological reasons would be highly appreciated. > > Thanks for your time and efforts. > > Best regards > > Manish Goel > > PS: I have manually checked these regions and they are indeed repeated in reference genome, meaning that there are no bugs or mistakes. > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help |
From: Adam P. <aph...@gm...> - 2017-02-16 16:26:15
|
Yes, in this case the insert (expansion) appears to be in your query sequence. Best, Adam Sent from my mobile. > On Feb 15, 2017, at 6:36 PM, Amit Indap <in...@gm...> wrote: > > Dear MUMmer team > > I ran an Assemblytics analysis with my delta file from Mummer. It came back with a Tandem_expansion difference: > > reference ref_start ref_stop ID size strand type ref_gap_size query_gap_size query_coordinates method > 1 70755 70848 Assemblytics_b_1 1136 + Tandem_expansion -93 1043 vnti:80960-82003:+ between_alignments > > I then ran show-diffs on the same delta file: > > [SEQ] [TYPE] [S1] [E1] > 1 GAP 70849 70754 -94 1042 -1136 > > I was looking at the show-diffs mummer documentation. Based on this, if have sequence inserted into my reference. Is the inserted sequence the sequence in my query, vnti:80960-82003, based on the assemblytics output? I'm having trouble visualizing and interpreting this output, so I really appreciate your help. The documentation and manual to MUMmer is very helpful. > A gap between two mutually consistent ordered and > oriented alignments. gap-length-R is the length of the > alignment gap in the reference, gap-length-Q is the length of > the alignment gap in the query, and gap-diff is the difference > between the two gap lengths. If gap-diff is positive, sequence > has been inserted in the reference. If gap-diff is negative, > sequence has been deleted from the reference. If both > gap-length-R and gap-length-Q are negative, the indel is > tandem duplication copy difference. > Best, > > Amit > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > MUMmer-help mailing list > MUM...@li... > https://lists.sourceforge.net/lists/listinfo/mummer-help |
From: Amit I. <in...@gm...> - 2017-02-15 23:36:46
|
Dear MUMmer team I ran an Assemblytics analysis with my delta file from Mummer. It came back with a Tandem_expansion difference: reference ref_start ref_stop ID size strand type ref_gap_size query_gap_size query_coordinates method 1 70755 70848 Assemblytics_b_1 1136 + Tandem_expansion -93 1043 vnti:80960-82003:+ between_alignments I then ran show-diffs on the same delta file: [SEQ] [TYPE] [S1] [E1] 1 GAP 70849 70754 -94 1042 -1136 I was looking at the show-diffs mummer documentation. Based on this, if have sequence inserted into my reference. Is the inserted sequence the sequence in my query, vnti:80960-82003, based on the assemblytics output? I'm having trouble visualizing and interpreting this output, so I really appreciate your help. The documentation and manual to MUMmer is very helpful. *A gap between two mutually consistent ordered and oriented alignments. gap-length-R is the length of the alignment gap in the reference, gap-length-Q is the length of the alignment gap in the query, and gap-diff is the difference between the two gap lengths. If gap-diff is positive, sequence has been inserted in the reference. If gap-diff is negative, sequence has been deleted from the reference. If both gap-length-R and gap-length-Q are negative, the indel is tandem duplication copy difference*. Best, Amit |
From: Manish G. <go...@mp...> - 2017-02-13 17:40:24
|
Hi Members of Mummer mailing list, I am trying to identify genomic duplicates in two genomes using MUMmer. For this purpose, I am first using NUCmer to find all possible alignments using --maxmatch and then I want to use delta-filter to find the unique matches (1-to-1 matches) and then will try to use this information to find the duplicates. But I am quite confused about the difference between -1 and -g parameters of delta-filter. This is what I have done till now: I filter my out.delta file using -m, -1, and -g parameters (along with -i 90 and -l 50) followed by show-coords -THrd to get 3 coords file, namely out_1_filter.coords, out_g_filter.coords, and out_m_filter.coords. Columns of coords file are: Ref.start Ref.end Query.start Query.end Ref.length Query.length %identity Ref.dir Query.dir Ref.chr Query.chr grep "11167255" out_m_filter.coords 11229246 11235589 11167255 11173587 6344 6333 98.16 1 1 Chr5 chr5 11231838 11238180 11167255 11173587 6343 6333 98.03 1 1 Chr5 chr5 11237670 11242473 11167255 11172048 4804 4794 97.73 1 1 Chr5 chr5 11247378 11253719 11167255 11173587 6342 6333 98.13 1 1 Chr5 chr5 11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 chr5 11255153 11261495 11167255 11173587 6343 6333 97.91 1 1 Chr5 chr5 11260337 11266679 11167255 11173587 6343 6333 97.84 1 1 Chr5 chr5 From the m-to-m alignment (above), we observe that ref. genome contains a repeated region which maps to a region on query genome. But, for the -1 and -g filtered coords file (below), more than one ref. genome region aligns to query genome, which I find counter-intuitive as program should output 1-to-1 alignment with these parameters. grep "11167255" out_1_filter.coords 11229246 11235589 11167255 11173587 6344 6333 98.16 1 1 Chr5 Chr5 11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 Chr5 grep "11167255" out_g_filter.coords 11252561 11258903 11167255 11173587 6343 6333 98.13 1 1 Chr5 Chr5 11260337 11266679 11167255 11173587 6343 6333 97.84 1 1 Chr5 Chr5 Also, from the seven repeats on ref. genome what criteria is used to select the two regions identified here and reject the other? Why is ref. region [11252561,11258903] common in both output and why are other two different? Algorithmic as well as biological reasons would be highly appreciated. Thanks for your time and efforts. Best regards Manish Goel PS: I have manually checked these regions and they are indeed repeated in reference genome, meaning that there are no bugs or mistakes. |