From: Adam P. <aph...@gm...> - 2012-02-06 19:21:04
|
Nucmer is designed for long reads, so for large genomes sequenced with short reads, it may be necessary to replace the read mapper to something specifically designed for short reads. The largest genomes I have successfully used it for in its current configuration were a few 100 Mbps. Yes, amosvalidate and FRCurve analyses could run on the resulting banks. However, I would caution that AMOScmp will introduce it's own errors. If it cannot correctly place reads in the correct repeat copy, these reads will appear incorrectly placed to the validation scripts -- so I expect you would get an artificially high count of trouble spots. If you turn off repeat placement, to avoid this problem, then you will fragment the contigs, which will reduce the power of the analysis (i.e. the hard parts to assemble will be excluded from the mapping result). Best, -Adam On Sat, Feb 4, 2012 at 4:39 AM, Ole Kristian Tørresen <o.k...@bi...> wrote: > Hi Adam. > This is an intriguing solution. Would it work for larger genomes (up > to 1 GB or more) and with several different libraries? Would it be > possible to run amosvalidate/FRCurve on the resulting bank? If so, one > could use this to visualize the quality of any assembly from any > assemblers. > > Cheers, > Ole > > On 3 February 2012 22:45, Adam Phillippy <aph...@gm...> wrote: >> Hi Henrik, >> One quick way to map reads to an assembly to visualize in Hawkeye is >> to run AMOScmp with your assembly as the reference and the reads to be >> mapped/reassembled. AMOScmp will generate a bank (by mapping the reads >> using Mummer/Nucmer) that can directly visualized by Hawkeye. Might be >> a quick solution to your problem. >> >> -Adam >> >> >> On Thu, Feb 2, 2012 at 9:45 AM, Henrik Lantz <Hen...@sl...> wrote: >>> Hi Nathan, thanks for the reply. >>> >>> The insert distribution size is bimodal, very far off from a normal >>> distribution (I attach a screenshot in a separate mail to you, the list >>> didn't like the file size). I get the same results when mapping the reads on >>> other assemblies (from Newbler or Mira), so it seems something went wrong in >>> the lab rather than something being wrong with the Celera assembly (or >>> something is wrong with all assemblies, and that sounds unlikely). >>> >>> I have received some help from Michael Schatz who said that Celera is good >>> at handling strange distributions like this, and that the Celera assembly >>> does not have to be negatively affected by it, which was a worry for me. >>> Also, we have 15x 454 single end reads, and 10x 454 3-5 kb mate-pairs too, >>> so we are not dependent on the Illumina data alone. It was mostly intended >>> to correct for homopolymers in our 454 data. >>> >>> Nevertheless, I still want to find regions that might be misassembled. >>> Celera gives me a contig size that is four times bigger than either Mira or >>> Newbler, and I can't help but wonder if Celera is assembling over repeats >>> that should not be assembled. I must admit I have problems understanding how >>> to include the regular expressions you mention, and can't find the necessary >>> "mates" file. CaValidate just used the .asm file from Celera, I have not >>> supplied any estimated insert sizes at any stage. Same goes for my try to >>> use Amosvalidate on the Newbler assembly, I just used the .ace file without >>> any extra information. If you could point me to some information about this >>> mate file, I would be very grateful. >>> >>> I just realized that I can color reads from different libraries differently >>> in Hawkeye, and this helps me spot the 454 reads among the Illumina ones >>> when I look at the Celera assembly, but it is still messy. >>> >>> Still haven't solved this, so all comments are very welcome. >>> /Henrik >>> >>> 1 feb 2012 kl. 22.20 skrev Nathan Watson-Haigh: >>> >>> >>>> I'm new to AMOS, but i hope i can be some help. >>>> >>>> AMOS uses a "mates" file to match up the pairs/mates and assign reads to >>>> libraries. you also supply some info about the expected insert size >>>> distribution for each of these libraries. one of the first things >>>> AMOSvalidate does is to take those pairs and reestimate the insert size >>>> distributions. without this step hawkeye uses the estimates you provided to >>>> determine mate "happiness", stretch and compression. so if your estimates >>>> were off and you don't run Amosvalidate then you could get a lot of unhappy >>>> mates. >>>> >>>> I've found it easiest to supply regular expressions for my mates file in >>>> order to match up the pairs. >>>> >>>> A quick question about the odd distribution: if this isn't reasonably >>>> normally distributed then could the assembler have made some mistakes >>>> because it too would assume normality. How far from normal is it!? >>>> >>>> Cheers, >>>> Nathan >>>> >>>> Sent from my Android phone. >>>> >>>> Henrik Lantz <Hen...@sl...> wrote: >>>> >>>> >>>> Hi >>>> >>>> We have the genome of a fungus that was assembled de novo using Illumina >>>> PE, 454 mate-pair, and 454 single-end reads with Celera Assembler. I ran the >>>> assembly though Cavalidate and it turns out that our Illumina paired ends >>>> have a funny insert-size distribution, and Cavalidate marks a huge numbers >>>> of these as stretched or compressed. This makes it very hard for me to >>>> identify regions that truly are mis-assembled repeats, as everything is >>>> marked, and this hides any truly problematic regions. If I could restrict >>>> the view to just the 454 reads my problem would be solved, but this seems >>>> impossible to do. >>>> >>>> I therefore thought to just map/assemble the 454 mate-pairs, which have a >>>> normal distributed insert size, on the Celera assembly, in the hope that >>>> this would make it easier to spot the truly misassembled regions. My idea >>>> was to use Newbler to map the 454 mate-pairs using the Celera contig-file as >>>> a reference => convert the .ace file to amos-format using toAmos => create a >>>> .bnk using this file and bank-transact => populate the .bnk with features >>>> using amosvalidate => view the result in Hawkeye. >>>> >>>> However, when I do this, the mate pair information seems to have been lost >>>> at some stage, as all reads are reported as single reads in Hawkeye. >>>> >>>> I am doing something wrong, or is this not possible to do? Is there any >>>> other way for me to get what I want, which is a visualization of my 454 >>>> mate-pairs mapped/assembled and compressed/stretched regions marked? >>>> >>>> Any suggestions are very welcome. >>>> >>>> /Henrik >>>> >>>> ------------------------------------------------------------------------------ >>>> Keep Your Developer Skills Current with LearnDevNow! >>>> The most comprehensive online learning library for Microsoft developers >>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>>> Metro Style Apps, more. Free future releases when you subscribe now! >>>> http://p.sf.net/sfu/learndevnow-d2d >>>> >>>> >>>> Nathan Watson-Haigh >>>> Senior Bioinformatician | The Australian Wine Research Institute >>>> Waite Precinct, Hartley Grove cnr Paratoo Road, Urrbrae (Adelaide) SA 5064 >>>> | Map >>>> PO Box 197, Glen Osmond SA 5064, Australia >>>> T: +61 8 83136836 (direct) | F: +61 8 83136601 | >>>> www: www.awri.com.au | AWRI Events >>>> >>>> This communication, including attachments, is intended only for the >>>> addressee(s) and contains information which might be confidential and/or the >>>> copyright of The Australian Wine Research Institute (AWRI) or a third party. >>>> If you are not the intended recipient of this communication please >>>> immediately delete and destroy all copies and contact the sender. If you are >>>> the intended recipient of this communication you should not copy, disclose >>>> or distribute any of the information contained herein without the consent of >>>> the AWRI and the sender. Any views expressed in this communication are those >>>> of the individual sender except where the sender specifically states them to >>>> be the views of the AWRI. No representation is made that this communication, >>>> including attachments, is free of viruses. Virus scanning is recommended and >>>> is the responsibility of the recipient. >>>> >>>> _______________________________________________ >>>> AMOS-help mailing list >>>> AMO...@li... >>>> https://lists.sourceforge.net/lists/listinfo/amos-help >>> >>> >>> ------------------------------------------------------------------------------ >>> Keep Your Developer Skills Current with LearnDevNow! >>> The most comprehensive online learning library for Microsoft developers >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>> Metro Style Apps, more. Free future releases when you subscribe now! >>> http://p.sf.net/sfu/learndevnow-d2d >>> _______________________________________________ >>> AMOS-help mailing list >>> AMO...@li... >>> https://lists.sourceforge.net/lists/listinfo/amos-help >>> >> >> ------------------------------------------------------------------------------ >> Try before you buy = See our experts in action! >> The most comprehensive online learning library for Microsoft developers >> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >> Metro Style Apps, more. Free future releases when you subscribe now! >> http://p.sf.net/sfu/learndevnow-dev2 >> _______________________________________________ >> AMOS-help mailing list >> AMO...@li... >> https://lists.sourceforge.net/lists/listinfo/amos-help |