From: Waldbieser, G. <Geoff.Waldbieser@ARS.USDA.GOV> - 2012-06-13 20:46:29
|
When producing a mer database from a set of paired end Illumina reads, you recommend a lower limit of 15X genome coverage. Is there an upper limit to the amount of genome coverage so that one does not oversample sequencing error? Geoff Waldbieser -----Original Message----- From: wgs...@li... [mailto:wgs...@li...] Sent: Tuesday, June 12, 2012 12:39 AM To: wgs...@li... Subject: wgs-assembler-users Digest, Vol 3, Issue 1 Send wgs-assembler-users mailing list submissions to wgs...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users or, via email, send a message with subject or body 'help' to wgs...@li... You can reach the person managing the list at wgs...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of wgs-assembler-users digest..." Today's Topics: 1. FastqToCA for paired-end reads (Mundy, Michael) 2. Re: FastqToCA for paired-end reads (Ole Kristian T?rresen) 3. Re: small gaps of fixed length (Sajeet Haridas) 4. Ion torrent data (Powers, Jason) 5. ContigContainment failed (Ole Kristian T?rresen) 6. Re: ContigContainment failed (Walenz, Brian) ---------------------------------------------------------------------- Message: 1 Date: Mon, 14 May 2012 13:32:19 -0500 From: "Mundy, Michael" <Mun...@ma...> Subject: [wgs-assembler-users] FastqToCA for paired-end reads To: <wgs...@li...> Message-ID: <CBD6B9E3.978%Mun...@ma...> Content-Type: text/plain; charset="iso-8859-1" I?m using WGS 7.0 and I have two synchronized fastq files with paired-end reads. Based on the documentation at http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=FastqToC A, I tried this command: wgs-7.0/Linux-amd64/bin/fastqToCA -libraryname SRR067601.000 -mates SRR067601.000_1_pair.fq,SRR067601.000_2_pair.fq But it returns this error: ERROR: Mated reads (-mates) must have am insert size (-insertsize). The documentation page says that the ?insertsize option is optional so I thought that was the flag to distinguish between paired-end reads and mate-pair reads. How do I generate a FRG file for paired-end reads? Mike Mundy -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Mon, 14 May 2012 20:46:31 +0200 From: Ole Kristian T?rresen <o.k...@bi...> Subject: Re: [wgs-assembler-users] FastqToCA for paired-end reads To: "Mundy, Michael" <Mun...@ma...> Cc: wgs...@li... Message-ID: <CAH...@ma...> Content-Type: text/plain; charset=windows-1252 On 14 May 2012 20:32, Mundy, Michael <Mun...@ma...> wrote: > I?m using WGS 7.0 and I have two synchronized fastq files with paired-end > reads. ?Based on the documentation at > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=FastqToCA, > I tried this command: > > wgs-7.0/Linux-amd64/bin/fastqToCA -libraryname SRR067601.000 -mates > SRR067601.000_1_pair.fq,SRR067601.000_2_pair.fq > > But it returns this error: > > ERROR: ?Mated reads (-mates) must have am insert size (-insertsize). > > The documentation page says that the ?insertsize option is optional so I > thought that was the flag to distinguish between paired-end reads and > mate-pair reads. ?How do I generate a FRG file for paired-end reads? I guess the documentation is not up to date, so it's not optional to supply the -insertsize option. Just add -insertsize 300 30, if your reads are from a 300 bp DNA fragment and are paired end, or do something like -insertsize 5000 500 -outtie if they are mate pairs from a 5k library. Ole > > Mike Mundy > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > ------------------------------ Message: 3 Date: Fri, 18 May 2012 11:26:28 -0700 From: "Sajeet Haridas" <sa...@gm...> Subject: Re: [wgs-assembler-users] small gaps of fixed length To: <wgs...@li...> Message-ID: <000601cd3523$bed2dcb0$3c789610$@com> Content-Type: text/plain; charset="us-ascii" Hello Brian, Since all the CTP mea is more than -20, can I assume that no overlaps were skipped where contigs overlapped more than 20 bases (ie: all overlaps > 20 were successfully merged into scaffolds with no N's). How does the scaffolder behave when the overlap between contgs is 5bp or less? For a small fungal genome with relatively little repeats and no allelic variations (is haploid), are there any parameters the can reduce the number of false gaps. Will increasing cgwErrorRate help? Thank you, Sajeet -----Original Message----- From: Walenz, Brian [mailto:bw...@jc...] Sent: May-16-12 7:35 PM To: Sajeet Haridas Subject: RE: small gaps of fixed length IIRC, that's the marker for "contigs should overlap, but no overlap found". Possibilities here: the overlap is shorter than we can detect, or there is crud on the end of one contig, or the error rate is too high. Are you on the mailing list? This would have been a nice discussion there. ________________________________________ From: Sajeet Haridas Sent: Wednesday, May 16, 2012 8:22 PM To: Walenz, Brian Subject: RE: small gaps of fixed length Thank you Brian. I also notice that the minimum CTP mea is -20. Is this value also capped? Sajeet From: Walenz, Brian Sent: May-16-12 1:12 PM To: Sajeet Haridas Subject: Re: small gaps of fixed length Yes - that's the lower limit on a gap between contigs. Either mate pairs indicate the contigs should overlap but no overlap could be found, or there really is a small positive gap. Only the asm file will distinguish the two. Look under 'mea' here: http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=ASM_File s#SCF_CTP b On 5/16/12 3:29 AM, "Sajeet Haridas" wrote: Hello Brian, My fungal genome assemblies (30-35MBp) always seem to have ~2500 small gaps always represented by 20 N's - using bog, bogart and various other parameters. Is the assembler trying to tell me something? Thank you, Sajeet ------------------------------ Message: 4 Date: Sun, 20 May 2012 13:09:47 +0000 From: "Powers, Jason" <jp...@ex...> Subject: [wgs-assembler-users] Ion torrent data To: "wgs...@li..." <wgs...@li...> Message-ID: <1EF140AF181DEF48AAAB3C8A832AE432010AC1AD@EA-EXCHANGE3.ExpressionAnalysis.local> Content-Type: text/plain; charset="us-ascii" Any thoughts about the best settings in fastqToCA for Ion Torrent data? My guess is that it is closest to illumina, but thought I would see what you guys thought. Thanks Jason -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Mon, 11 Jun 2012 19:51:27 +0200 From: Ole Kristian T?rresen <o.k...@bi...> Subject: [wgs-assembler-users] ContigContainment failed To: wgs...@li... Message-ID: <CAH...@ma...> Content-Type: text/plain; charset=ISO-8859-1 Hi, I got an assertion fail while scaffolding today. Error message: CreateNewGraphNode()-- Contig 16085537 * Create a contig 16085537 in scaffold 131913 >>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to (15370134,16085537,O) (ahg:143 bhg:340) len: 137 * FOEXS: SUSPICIOUS Overlap found! Looked for (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 WARNING: InsertChunkOverlap()-- Chunk overlap already exists. NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. Keeping old overlap. NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 * Switched right-left, orientation went from I to O * CreateAContigInScaffold() failed. ContigContainment failed. cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): Assertion `0' failed. This is about 30x coverage with Illumina reads; 10x combined reads (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt reads 300 insert and 5k mate pair 100 nt reads, all error corrected with Quake. It's probably not an optimal combination, but I'm testing a bit and interested in the result. Using bogart and dnc, the 5k library compared against the other two. Other options are default. Is that assertion something that can be fixed? Thank you. Ole ------------------------------ Message: 6 Date: Tue, 12 Jun 2012 01:38:29 -0400 From: "Walenz, Brian" <bw...@jc...> Subject: Re: [wgs-assembler-users] ContigContainment failed To: Ole Kristian T?rresen <o.k...@bi...>, "wgs...@li..." <wgs...@li...> Message-ID: <CBFC4E15.4865%bw...@jc...> Content-Type: text/plain; charset="iso-8859-1" Hi, Ole- After merging two scaffolds, we do a least squares estimate of the gap sizes in the new scaffold. If those gap sizes imply two contigs should be merged (via a negative gap) we try to merge them. This merge failed for reasons I haven't looked into. Probably bad sequence alignment. I think you can safely disable this assert. Gap size estimation might run through a few more iterations (with the same result) and eventually give up. The scaffold will have (slightly?) bogus gap sizes. There are, unfortunately, plenty of other scaffolds that fail to get gap size estimates, so this isn't a disaster. b On 6/11/12 1:51 PM, "Ole Kristian T?rresen" <o.k...@bi...> wrote: > Hi, > I got an assertion fail while scaffolding today. Error message: > CreateNewGraphNode()-- Contig 16085537 > * Create a contig 16085537 in scaffold 131913 >>>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to >>>> (15370134,16085537,O) (ahg:143 bhg:340) len: 137 > * FOEXS: SUSPICIOUS Overlap found! Looked for > (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 > WARNING: InsertChunkOverlap()-- Chunk overlap already exists. > NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags > 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 > OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags > 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 > WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. > Keeping old overlap. > NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags > 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 > OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags > 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 > * Switched right-left, orientation went from I to O > * CreateAContigInScaffold() failed. > ContigContainment failed. > cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus > RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): > Assertion `0' failed. > > > This is about 30x coverage with Illumina reads; 10x combined reads > (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt > reads 300 insert and 5k mate pair 100 nt reads, all error corrected > with Quake. It's probably not an optimal combination, but I'm testing > a bit and interested in the result. > > Using bogart and dnc, the 5k library compared against the other two. > Other options are default. > > Is that assertion something that can be fixed? > > Thank you. > > Ole > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users ------------------------------ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ------------------------------ _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users End of wgs-assembler-users Digest, Vol 3, Issue 1 ************************************************* This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |