You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Paul C. <pca...@gm...> - 2012-12-07 14:21:22
|
Hi all, Is there a way in CVS version of runCA (or even version 7.0) to get a report about what happened to each and every sequence that went into the assembly? Thank you, Paul Paul Cantalupo University of Pittsburgh |
From: Walenz, B. <bw...@jc...> - 2012-12-05 19:13:12
|
It also seems that sourceforge anonymous CVS is still down, with no estimate for when it will return. http://sourceforge.net/blog/category/sitestatus/ b On 12/5/12 2:04 PM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> wrote: Joanna – from one of Brian’s earlier messages: The latest bits are at http://wgs-assembler.sf.net/wgs-20121130.tar.bz2 Includes both kmer and the assembler proper. I compiled with no issues (on FreeBSD, gcc 4.4). Hopefully we won’t have to debug gmake and dependencies. Note that in our case we were unable to pass CVS traffic across our network and tar downloads from ViewVC were corrupt. I highly recommend attempting a CVS checkout. From: Joanna Kelley [mailto:jok...@st...] Sent: Wednesday, December 05, 2012 2:00 PM To: Wright, Cory (CDC/OID/NCIRD) (CTR) Cc: Walenz, Brian; wgs...@li... Subject: Re: [wgs-assembler-users] sweatShop.h not found Hello, I am having the same problem with the cvs checkout, is there another place to obtain the up-to-date source? Thanks! Joanna On Fri, Nov 30, 2012 at 9:37 AM, Wright, Cory (CDC/OID/NCIRD) (CTR) <wm...@cd...> wrote: I used all the steps described here and in the wiki article to replicate the problem, with one exception in Step 2: we are forced to download the source tarball via ViewVC instead of checking out via the cvs command. Is there an alternative way to obtain the source? Thanks for any and all help. From: Walenz, Brian [mailto:bw...@jc...] Sent: Friday, November 30, 2012 12:06 PM To: Wright, Cory (CDC/OID/NCIRD) (CTR); 'wgs...@li...' Subject: Re: [wgs-assembler-users] sweatShop.h not found Hi- Do you have kmer/ installed? (It’s in SVN if you need it). https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile. In kmer/ it needs a ‘gmake install’ to put the libraries and headers in the correct spot. You don’t need a fresh checkout, just a fresh compile. Delete the Linux-amd64 directory and try again. We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there). We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks. b On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd... <http://wm...@cd...> > wrote: Hi All In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639 Unfortunately “starting over with a fresh checkout” is not an option as our organization blocks CVS traffic. SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time. Is it possible to update the SVN repo so we can be on our merry way? :) Thanks all and have a great day. ------------------------------------------------------------------------------ Keep yourself connected to Go Parallel: BUILD Helping you discover the best ways to construct your parallel projects. http://goparallel.sourceforge.net _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Wright, C. (CDC/OID/N. (CTR) <wm...@cd...> - 2012-12-05 19:04:41
|
Joanna - from one of Brian's earlier messages: The latest bits are at http://wgs-assembler.sf.net/wgs-20121130.tar.bz2 Includes both kmer and the assembler proper. I compiled with no issues (on FreeBSD, gcc 4.4). Hopefully we won't have to debug gmake and dependencies. Note that in our case we were unable to pass CVS traffic across our network and tar downloads from ViewVC were corrupt. I highly recommend attempting a CVS checkout. From: Joanna Kelley [mailto:jok...@st...] Sent: Wednesday, December 05, 2012 2:00 PM To: Wright, Cory (CDC/OID/NCIRD) (CTR) Cc: Walenz, Brian; wgs...@li... Subject: Re: [wgs-assembler-users] sweatShop.h not found Hello, I am having the same problem with the cvs checkout, is there another place to obtain the up-to-date source? Thanks! Joanna On Fri, Nov 30, 2012 at 9:37 AM, Wright, Cory (CDC/OID/NCIRD) (CTR) <wm...@cd...<mailto:wm...@cd...>> wrote: I used all the steps described here and in the wiki article to replicate the problem, with one exception in Step 2: we are forced to download the source tarball via ViewVC instead of checking out via the cvs command. Is there an alternative way to obtain the source? Thanks for any and all help. From: Walenz, Brian [mailto:bw...@jc...<mailto:bw...@jc...>] Sent: Friday, November 30, 2012 12:06 PM To: Wright, Cory (CDC/OID/NCIRD) (CTR); 'wgs...@li...<mailto:wgs...@li...>' Subject: Re: [wgs-assembler-users] sweatShop.h not found Hi- Do you have kmer/ installed? (It's in SVN if you need it). https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile. In kmer/ it needs a 'gmake install' to put the libraries and headers in the correct spot. You don't need a fresh checkout, just a fresh compile. Delete the Linux-amd64 directory and try again. We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there). We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks. b On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...<http://wm...@cd...>> wrote: Hi All In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639 Unfortunately "starting over with a fresh checkout" is not an option as our organization blocks CVS traffic. SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time. Is it possible to update the SVN repo so we can be on our merry way? :) Thanks all and have a great day. ------------------------------------------------------------------------------ Keep yourself connected to Go Parallel: BUILD Helping you discover the best ways to construct your parallel projects. http://goparallel.sourceforge.net _______________________________________________ wgs-assembler-users mailing list wgs...@li...<mailto:wgs...@li...> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users -- Joanna L. Kelley, PhD Department of Genetics Stanford School of Medicine 300 Pasteur Drive Lane Building, Room L-333 Stanford, CA 94305-5120 jok...@st...<mailto:jok...@st...> |
From: Joanna K. <jok...@st...> - 2012-12-05 19:00:18
|
Hello, I am having the same problem with the cvs checkout, is there another place to obtain the up-to-date source? Thanks! Joanna On Fri, Nov 30, 2012 at 9:37 AM, Wright, Cory (CDC/OID/NCIRD) (CTR) < wm...@cd...> wrote: > I used all the steps described here and in the wiki article to replicate > the problem, with one exception in Step 2: we are forced to download the > source tarball via ViewVC instead of checking out via the cvs command. Is > there an alternative way to obtain the source? Thanks for any and all help. > **** > > ** ** > > *From:* Walenz, Brian [mailto:bw...@jc...] > *Sent:* Friday, November 30, 2012 12:06 PM > *To:* Wright, Cory (CDC/OID/NCIRD) (CTR); ' > wgs...@li...' > *Subject:* Re: [wgs-assembler-users] sweatShop.h not found**** > > ** ** > > Hi- > > Do you have kmer/ installed? (It’s in SVN if you need it). > https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile. > In kmer/ it needs a ‘gmake install’ to put the libraries and headers in > the correct spot. > > You don’t need a fresh checkout, just a fresh compile. Delete the > Linux-amd64 directory and try again. > > We keep avoiding the switch over to subversion (and keep forgetting to > delete the one that is there). We really should bite the bullet and do it, > if only so we can rename *.c to *.C and get rid of all those symlinks. > > b > > > > On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> > wrote:**** > > Hi All > > In trying to compile the latest source code I receive the error seen here: > http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639 > > Unfortunately “starting over with a fresh checkout” is not an option as > our organization blocks CVS traffic. SVN is allowed, and I noticed that > there is a SVN repo, but it does not appear to have been updated in some > time. Is it possible to update the SVN repo so we can be on our merry way? > J Thanks all and have a great day.**** > > > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > BUILD Helping you discover the best ways to construct your parallel > projects. > http://goparallel.sourceforge.net > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > -- Joanna L. Kelley, PhD Department of Genetics Stanford School of Medicine 300 Pasteur Drive Lane Building, Room L-333 Stanford, CA 94305-5120 jok...@st... |
From: Walenz, B. <bw...@jc...> - 2012-11-30 23:59:14
|
Hi, Arjun- Thanks for attaching the spec file. overlapper=mer is the problem. It, and the trimming we do on Illumina reads, are incompatible. Use overlapper=ovl instead. There isn't much gain from overlapper=mer anymore (it was designed for early 454 reads with lots of homopolymer errors) and it doesn't scale much past a microbe. We (still) haven't written a pipeline for this, but our current strategy for correcting and trimming Illumina reads is at: http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preprocessing If you don't have outtie matepair libraries it amounts to a run of meryl, and a run of merTrim. b ________________________________________ From: Arjun Prasad [ap...@ma...] Sent: Friday, November 30, 2012 5:12 PM To: wgs...@li... Subject: [wgs-assembler-users] Error: .mcidx is not a merylStream index file! Hi, I've been trying to assemble some 2x250bp Illumina reads and I'm getting the following error in 0-mertrim/*.err: % cat miseq5.0001.err opening gkStore '/cluster/ifs/projects/pcoat/celass/miseq5/miseq5.gkpStore' loading mer database. merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcidx is not a merylStream index file! merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat is not a merylStream data file! % ls -l /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm?.mcdat -rw-r--r-- 1 aprasad zoo 32 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat -rw-r--r-- 1 aprasad zoo 486424472 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm1.mcdat I've been able to run CA on several other datasets including 454 and older illumina sequence, so I'm guessing there's a path or format issue. Any ideas as to what might be wrong? Thanks, Arjun -- Genome Technology Branch National Human Genome Research Institute National Institutes of Health 5625 Fishers Lane Phone: 301-594-9199 Room 5N-01L Fax: 301-435-6170 Rockville, MD 20892-9400 E-Mail: ap...@nh... |
From: Arjun P. <ap...@ma...> - 2012-11-30 22:12:32
|
Hi, I've been trying to assemble some 2x250bp Illumina reads and I'm getting the following error in 0-mertrim/*.err: % cat miseq5.0001.err opening gkStore '/cluster/ifs/projects/pcoat/celass/miseq5/miseq5.gkpStore' loading mer database. merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcidx is not a merylStream index file! merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat is not a merylStream data file! % ls -l /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm?.mcdat -rw-r--r-- 1 aprasad zoo 32 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat -rw-r--r-- 1 aprasad zoo 486424472 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm1.mcdat I've been able to run CA on several other datasets including 454 and older illumina sequence, so I'm guessing there's a path or format issue. Any ideas as to what might be wrong? Thanks, Arjun -- Genome Technology Branch National Human Genome Research Institute National Institutes of Health 5625 Fishers Lane Phone: 301-594-9199 Room 5N-01L Fax: 301-435-6170 Rockville, MD 20892-9400 E-Mail: ap...@nh... |
From: Wright, C. (CDC/OID/N. (CTR) <wm...@cd...> - 2012-11-30 17:38:57
|
I used all the steps described here and in the wiki article to replicate the problem, with one exception in Step 2: we are forced to download the source tarball via ViewVC instead of checking out via the cvs command. Is there an alternative way to obtain the source? Thanks for any and all help. From: Walenz, Brian [mailto:bw...@jc...] Sent: Friday, November 30, 2012 12:06 PM To: Wright, Cory (CDC/OID/NCIRD) (CTR); 'wgs...@li...' Subject: Re: [wgs-assembler-users] sweatShop.h not found Hi- Do you have kmer/ installed? (It's in SVN if you need it). https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile. In kmer/ it needs a 'gmake install' to put the libraries and headers in the correct spot. You don't need a fresh checkout, just a fresh compile. Delete the Linux-amd64 directory and try again. We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there). We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks. b On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> wrote: Hi All In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639 Unfortunately "starting over with a fresh checkout" is not an option as our organization blocks CVS traffic. SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time. Is it possible to update the SVN repo so we can be on our merry way? :) Thanks all and have a great day. |
From: Walenz, B. <bw...@jc...> - 2012-11-30 17:05:25
|
Hi- Do you have kmer/ installed? (It’s in SVN if you need it). https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile. In kmer/ it needs a ‘gmake install’ to put the libraries and headers in the correct spot. You don’t need a fresh checkout, just a fresh compile. Delete the Linux-amd64 directory and try again. We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there). We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks. b On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> wrote: Hi All In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639 Unfortunately “starting over with a fresh checkout” is not an option as our organization blocks CVS traffic. SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time. Is it possible to update the SVN repo so we can be on our merry way? :) Thanks all and have a great day. |
From: Wright, C. (CDC/OID/N. (CTR) <wm...@cd...> - 2012-11-30 16:50:41
|
Hi All In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639 Unfortunately "starting over with a fresh checkout" is not an option as our organization blocks CVS traffic. SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time. Is it possible to update the SVN repo so we can be on our merry way? :) Thanks all and have a great day. |
From: Walenz, B. <bw...@jc...> - 2012-11-26 15:14:22
|
Hi- There are two different methods to load sequences into the assembler. The older method rewrites sequence/quality data into the frg output as you saw with fastaToCA. The newer method leaves the sequence/quality data in the original fastq file, and gives a wrapper to the assembler. In the wrapper are pointers to the original fastq (along with the format of the QV and orientation of mate pairs): fastqQualityValues=sanger fastqOrientation=innie fastqMates=/tmp2/bcs03/melonFastqCorrected/paired/F7HI6DR01-corrected.fastq Is telling the assembler you have interleaved mated reads with the Sanger (offset=33) encoding that are 5'3' -- 3'5' orientation. 'gatekeeper -dumpinfo *gkpStore' will give a summary of the number of reads loaded for each library. Getting the QV format wrong, I think, will generate a ton of warnings in gkpStore.err or gkpStore.errorLog. I'm not sure if the reads are discarded or 'fixed'. In the CVS version of the assembler, 'fastqAnalyze some.fastq' will make a decent guess at what QV encoding you have. b On 11/22/12 6:29 AM, "Jens Hooge" <jen...@go...> wrote: > Hi, > > I have converted a number of 454 reads in FASTQ format and Sanger reads in > FASTA format. For the Sanger reads I have generated my own quality value file > (as well in FASTA format). > > I called the conversion routines as follows: > > Sanger Library: > ./fastaToCA -l BES_random_shear_library_reverse -s > <pathtofasta>/BES_random_shear_library_reverse -q > <pathtoqual>/BES_random_shear_library_reverse.qlt > > <outpath>/BES_random_shear_library_reverse.frg > > 454 Library: > ./fastqToCA -insertsize 2834 172 -libraryname F7HI6DR01-corrected -technology > 454 -mates <pathtofastq>/F7HI6DR01-corrected.fastq > > <outpath>/F7HI6DR01-corrected.frg > > What strikes me here, is that the conversion of my Sanger library results in > an FRG file where the fields seq: and qlt: are filled, while the conversion of > my 454 library doesn't. This especially confusion to me because when I dump a > FRG file "after" running the assembly using the command > > gatekeeper -dumpfrg -allreads assembly.gkpStore > asm.frgs > > The frg file is filled with the assumably correct sequences and quality > values, even though I only used converted FASTQ files for the assembly. Is > this an expected behaviour? > > Thanks in advance for any help on that matter. > > > fastaToCA Conversion Result: please find BES_random_shear_library_reverse.frg |
From: Jens H. <jen...@go...> - 2012-11-22 11:30:48
|
Hi, I have converted a number of 454 reads in FASTQ format and Sanger reads in FASTA format. For the Sanger reads I have generated my own quality value file (as well in FASTA format). I called the conversion routines as follows: Sanger Library: ./fastaToCA -l BES_random_shear_library_reverse -s <pathtofasta>/BES_random_shear_library_reverse -q <pathtoqual>/BES_random_shear_library_reverse.qlt > <outpath>/BES_random_shear_library_reverse.frg 454 Library: ./fastqToCA -insertsize 2834 172 -libraryname F7HI6DR01-corrected -technology 454 -mates <pathtofastq>/F7HI6DR01-corrected.fastq > <outpath>/F7HI6DR01-corrected.frg What strikes me here, is that the conversion of my Sanger library results in an FRG file where the fields seq: and qlt: are filled, while the conversion of my 454 library doesn't. This especially confusion to me because when I dump a FRG file "after" running the assembly using the command gatekeeper -dumpfrg -allreads assembly.gkpStore > asm.frgs The frg file is filled with the assumably correct sequences and quality values, even though I only used converted FASTQ files for the assembly. Is this an expected behaviour? Thanks in advance for any help on that matter. fastaToCA Conversion Result: please find BES_random_shear_library_reverse.frg |
From: Walenz, B. <bw...@jc...> - 2012-11-20 12:41:02
|
Hi, Xueping- That’s frustrating! Can you send along the qc report? We’re just finishing up a repetitive fish. We had some success changing the ‘astat’ cutoffs for labeling unitigs unique/not-unique. We used astatHighBound=0 and astatLowBound=-20 based on a plot of unitig length vs astat (numbers came from 5-consensus-coverage-stat, but I didn’t do the analysis and would have to pester someone to get any scripts to pass along). If there are large degenerate contigs, this will help by labeling them as unique and letting them be used for scaffolds. Or, it’s possible that unitig construction was poor. I’ll have to think about how to measure this — are they small because of bad trimming, low coverage, biased coverage or repeat boundaries? The signal for all of these looks basically the same, but the resolution is quite different. Sorry I’m not much help yet. b On 11/20/12 6:05 AM, "Quan, Xueping" <x....@im...> wrote: Dear All I have a large plant genome (3.5Gb in size) with high repeat content (more than 60%). The sequencing data I got are about 45x Illumina paired-end and mate pair data (after data cleaning), and 0.5x 454 mate pair data. I have finished the assembly using celera. However, the coverage of contig (600mb) and scaffold sequences (660mb) for the genome is very low. Most of the unitigs (about 5Gb) sequences are failed to be combined into any scaffold). Below is my spec file, could anyone help to give suggestion about how to improve the assembly: " utgGraphErrorRate=0.03 # bogart use utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit utgGraphErrorLimit=3.25 # utgMergeErrorRate=0.045 utgMergeErrorLimit=5.25 ovlErrorRate=0.04 # Larger than utg to allow for correction. cnsErrorRate=0.08 # Larger than utg to avoid occasional consensus failures cgwErrorRate=0.10 # Larger than utg to allow contig merges across high-error ends gkpAllowInefficientStorage=1 # frgMinLen=64 # fragment shorter than this length are not loaded into the assembler ovlMinLen=40 # overlaps shorter than this length are not computed # merSize =22 # default=22; use lower to combine across heterozygosity, higher to separate near-identical repeat copies overlapper=ovl # the mer overlapper for 454-like data is insensitive to homopolymer problems but requires more RAM and disk #UNITIGGER configuration unitigger = bogart batMemory=650 utgBubblePopping = 1 batThreads=64 # utgGenomeSize = 3.5gb # # MERYL calculates K-mer seeds merylMemory = 512000 merylThreads = 32 # # OVERLAPPER calculates overlaps ovlHashBits=24 ovlHashBlockLength=700000000 ovlThreads = 2 ovlConcurrency = 32 ovlRefBlockSize = 320000000 # # OVERLAP STORE build the database ovlStoreMemory = 109210 # Mbp # ERROR CORRECTION not applied to overlaps doFragmentCorrection=0 # Scafolder # CONSENSUS configuration cnsConcurrency = 64 L1_GAIIx.frg L2_GAIIx.frg L3_GAIIx.frg L4_GAIIx.frg L5_GAIIx.frg L6_GAIIx.frg L7_GAIIx.frg L8_GAIIx.frg L3_HiSeq.frg L4_HiSeq.frg L5_HiSeq.frg L6_HiSeq.frg L1_454.frg L2_454.frg L3_454.frg L4_454.frg L5_454.frg L6_454.frg L7_454.frg L8_454.frg L9_454.frg L10_454.frg " Thanks very much! Xueping Quan Imperial College London Tel: +44(0)207 594 17 80 |
From: Walenz, B. <bw...@jc...> - 2012-11-20 12:05:59
|
Hi, Jens- First, I’d suggest upgrading to the cvs version (http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile). Its a bit more work to install, but you’ll get a ton of fixes and optimizations. Plus, the answers below are for this version. Answering your questions out of order: The only requirement for a ‘library’ of reads is that they form a normal insert size distribution. It’s probably better to treat each 454 run as its own library unless you know for sure they came from the same library construction. You can mix paired- and single-ended reads in the same library. Note that the fastqToCA usage changed slightly from 7.0 to the cvs version. Use ‘-reads X.fastq’ and ‘-mates A1.fastq,B1.fastq’ to load SE and PE reads. Note that ‘-mates Y.fastq’ will expect mate pair reads interleaved. The maximum read length is set at compile time, in file AS_global.h, variable AS_READ_MAX_NORMAL_LEN_BITS. The default is 11 (=2047 bases). For PacBio, 13-15 (=8-32kbp) has been used. 16 has been reported to not work. For reads longer than the maximum, it depends on the fastqToCA technology. For tech ‘illumina’, the reads are truncated to the maximum ‘packed’ size (160bp by default). The ‘packed’ format is slightly more efficient storage designed for lots of short reads. The other technologies will also truncate reads, but to the NORMAL_LEN_BITS size. ‘gatekeeper –dumpinfo X.gkpStore’ will generate a table of the reads loaded, number mated, number deleted, and total bases. The ‘not a sequence start line’ errors, I think, were caused by the fastq reader only partially reading a sequence line. On the next input, it was expecting to find “@name” but found bases/qvs instead. In any case, it’s fixed in the cvs version. Just curious - are your reads longer than 2kbp real? I’ve seen these in the past, and they were mostly garbage. b On 11/20/12 5:33 AM, "Jens Hooge" <jen...@go...> wrote: Hi, I'm relatively new to NG and its tools, but at the moment I'm trying to run an assembly of about 70 single- and paired end 454 reads in FASTQ format, using the wgs-7.0-assembler. The version I've been using is the one from http://sourceforge.net/apps/mediawik...itle=Main_Page <http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page> I have converted my FASTQ files to FRG files using CABOGs fastqTOCA routine using a different library name for each FASTQ file. When I run the actual assembly though with runCA. I get an error message in melonAssembly.gkpStore.err. GKP finished with 11339450 alerts or errors: 11338139 # ILL Error: not a sequence start line. 1292 # ILL Error: not a quality start line. 19 # LIB Alert: suspicious mean and standard deviation; reset stddev to 0.10 * mean. To me this looked as if it was a problem with the format of my FASTQ files, so I ran a script to validate on format consistency of the files which resulted in no errors. Some of my reads are longer than 2047 bp and I have the feeling that the bug fix stated at http://sourceforge.net/apps/mediawik..._Release_Notes <http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Version_7.0_Release_Notes> under Bug fixes is not yet fixed in the version I'm using. Quote: "Gatekeeper: Numerous problems with reads longer than the maximum allowed (2047bp) and reads of very specific lengths were discovered and fixed. All of these resulted in gatekeeper crashing." Even though gatekeeper doesn't crash, I would expect about 25 million reads to be processed by CABOG, however while running the assembly I get a stdout print message saying "numFrags = 14499910". To me this looks like not all reads are being used for the assembly. If I add the number of ILL Errors, it comes suspiciously close to my expected number of reads though, which makes me think that CABOG just get's rid of the reads which are longer than the maximally allowed length of 2047 bp. My questions would be: What happens with reads that are longer than the maximally allowed length? Are those reads ignored or clipped to the maximum read length? Is there a way to adjust the maximum read length, to make CABOG use those reads in the assembly as well? Does every FASTQ file have to be added to a different gatekeeper library, or is it enough to put single ended and paired ended reads into their respective libraries? I would be very grateful if anyone could help me out. Ciao, Jens |
From: Quan, X. <x....@im...> - 2012-11-20 11:06:10
|
Dear All I have a large plant genome (3.5Gb in size) with high repeat content (more than 60%). The sequencing data I got are about 45x Illumina paired-end and mate pair data (after data cleaning), and 0.5x 454 mate pair data. I have finished the assembly using celera. However, the coverage of contig (600mb) and scaffold sequences (660mb) for the genome is very low. Most of the unitigs (about 5Gb) sequences are failed to be combined into any scaffold). Below is my spec file, could anyone help to give suggestion about how to improve the assembly: " utgGraphErrorRate=0.03 # bogart use utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit utgGraphErrorLimit=3.25 # utgMergeErrorRate=0.045 utgMergeErrorLimit=5.25 ovlErrorRate=0.04 # Larger than utg to allow for correction. cnsErrorRate=0.08 # Larger than utg to avoid occasional consensus failures cgwErrorRate=0.10 # Larger than utg to allow contig merges across high-error ends gkpAllowInefficientStorage=1 # frgMinLen=64 # fragment shorter than this length are not loaded into the assembler ovlMinLen=40 # overlaps shorter than this length are not computed # merSize =22 # default=22; use lower to combine across heterozygosity, higher to separate near-identical repeat copies overlapper=ovl # the mer overlapper for 454-like data is insensitive to homopolymer problems but requires more RAM and disk #UNITIGGER configuration unitigger = bogart batMemory=650 utgBubblePopping = 1 batThreads=64 # utgGenomeSize = 3.5gb # # MERYL calculates K-mer seeds merylMemory = 512000 merylThreads = 32 # # OVERLAPPER calculates overlaps ovlHashBits=24 ovlHashBlockLength=700000000 ovlThreads = 2 ovlConcurrency = 32 ovlRefBlockSize = 320000000 # # OVERLAP STORE build the database ovlStoreMemory = 109210 # Mbp # ERROR CORRECTION not applied to overlaps doFragmentCorrection=0 # Scafolder # CONSENSUS configuration cnsConcurrency = 64 L1_GAIIx.frg L2_GAIIx.frg L3_GAIIx.frg L4_GAIIx.frg L5_GAIIx.frg L6_GAIIx.frg L7_GAIIx.frg L8_GAIIx.frg L3_HiSeq.frg L4_HiSeq.frg L5_HiSeq.frg L6_HiSeq.frg L1_454.frg L2_454.frg L3_454.frg L4_454.frg L5_454.frg L6_454.frg L7_454.frg L8_454.frg L9_454.frg L10_454.frg " Thanks very much! Xueping Quan Imperial College London Tel: +44(0)207 594 17 80 |
From: Jens H. <jen...@go...> - 2012-11-20 10:35:26
|
Hi, I'm relatively new to NG and its tools, but at the moment I'm trying to run an assembly of about 70 single- and paired end 454 reads in FASTQ format, using the wgs-7.0-assembler. The version I've been using is the one from http://sourceforge.net/apps/mediawik...itle=Main_Page I have converted my FASTQ files to FRG files using CABOGs fastqTOCA routine using a different library name for each FASTQ file. When I run the actual assembly though with runCA. I get an error message in melonAssembly.gkpStore.err. GKP finished with 11339450 alerts or errors: 11338139 # ILL Error: not a sequence start line. 1292 # ILL Error: not a quality start line. 19 # LIB Alert: suspicious mean and standard deviation; reset stddev to 0.10 * mean. To me this looked as if it was a problem with the format of my FASTQ files, so I ran a script to validate on format consistency of the files which resulted in no errors. Some of my reads are longer than 2047 bp and I have the feeling that the bug fix stated at http://sourceforge.net/apps/mediawik..._Release_Notes under Bug fixes is not yet fixed in the version I'm using. Quote: "Gatekeeper: Numerous problems with reads longer than the maximum allowed (2047bp) and reads of very specific lengths were discovered and fixed. All of these resulted in gatekeeper crashing." Even though gatekeeper doesn't crash, I would expect about 25 million reads to be processed by CABOG, however while running the assembly I get a stdout print message saying "numFrags = 14499910". To me this looks like not all reads are being used for the assembly. If I add the number of ILL Errors, it comes suspiciously close to my expected number of reads though, which makes me think that CABOG just get's rid of the reads which are longer than the maximally allowed length of 2047 bp. My questions would be: What happens with reads that are longer than the maximally allowed length? Are those reads ignored or clipped to the maximum read length? Is there a way to adjust the maximum read length, to make CABOG use those reads in the assembly as well? Does every FASTQ file have to be added to a different gatekeeper library, or is it enough to put single ended and paired ended reads into their respective libraries? I would be very grateful if anyone could help me out. Ciao, Jens |
From: Quan, X. <x....@im...> - 2012-11-10 19:57:50
|
Dear All My celera failed in the extendClearRanges stage. The job extendClearRanges-scaffold.0017585.sh failed with the following error: "examining gap 2089 from lcontig 252777756 (orient: F, pos: 1843.074083, 6837.074083) to rcontig 819682 (orient: R, pos: 8597.736468, 6893.736468), size: 56.6 62385 ReplaceEndUnitigInContig()-- contig 252777756 unitig 210835 isLeft(0) contig 252777756 len 4994 (::::sequences) data.unitig_coverage_stat 0.000000 data.unitig_microhet_prob 1.000000 data.unitig_status X data.unitig_unique_rept X data.contig_status U data.num_frags 1271 data.num_unitigs 2 insertMultiAlign()-- ERROR: multialign 252777756 C has invalid fragment/unitig layout (length 4992) -- exceeds bounds of consensus sequence (length 4994). extendClearRanges: MultiAlignStore.C:327: void MultiAlignStore::insertMultiAlign(MultiAlignT*, bool, bool): Assertion `GetMultiAlignLength(ma) <= GetMultiAli gnLength(ma, true)' failed. /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7-1-ECR/extendClearRanges-scaffold.0017585.sh: line 18: 116736 Aborted /apps/wgs/ 2012-11-02//Linux-amd64/bin/extendClearRanges -g /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.gkpStore -t /ax3-savolainenlab/analysis/Pl ants/Genomics/Celera/fullrun1/Howea.tigStore -n 16 -c Howea -b 0017585 -e 0051439 -i 1 -S 3936 ----------------------------------------END Thu Nov 8 15:06:42 2012 (4402 seconds) ----------------------------------------START Thu Nov 8 15:06:42 2012 /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7-1-ECR/extendClearRanges-scaffold.0051443.sh ----------------------------------------END Thu Nov 8 15:06:42 2012 (0 seconds) ERROR: Failed with signal HUP (1) ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /apps/wgs/2012-11-02//bin/runCA line 1391 main::caFailure('extendClearRanges failed', '/ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7...') called at /apps/wgs/2012-11-02//bin/r unCA line 5111 main::eCR('7-1-ECR', undef, 1) called at /apps/wgs/2012-11-02//bin/runCA line 5206 main::scaffolder() called at /apps/wgs/2012-11-02//bin/runCA line 6146 ---------------------------------------- Last few lines of the relevant log file (/ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7-1-ECR/extendClearRanges-scaffold.0051443.err): ---------------------------------------- Failure message: extendClearRanges failed " So the problem is that it didn't success and didn't produce the *.ckp.17 file needed by the following extendClearRanges-scaffold.0051443.sh. But it store a success information in some log file (I don't know which log file) so I can not re-start the extendClearRanges-scaffold.0017585.sh or skip it to the next job. I will really appreciate it for any help! Thanks very much! Dr. Xueping Quan Imperial College London |
From: Walenz, B. <bw...@jc...> - 2012-11-07 17:57:26
|
Hi, Xueping- The consensus ‘errors’ aren’t a problem. They’re detected and resolved by utgcnsfix. The logs are in 5-consensus/consensus-fix.out if you want to see them. We need to do a bit of work on the pipeline to clean this up, but its low priority. The ECR failure has also been reported in CGW. We’ll need an example (and some time) to figure out what’s going wrong. The fix on the wikipage is a little out of date. A simpler method is to skip the entire scaffold, or the gap, that is causing the problem. In the shell script, add either: -s <scaffold-id> -S <gap-number> to the extendClearRanges command. If you can figure out which gap number it is, use that, otherwise skip the whole scaffold. Gap numbers will change based on the –b/-e scaffold ranges — meaning you can’t run just this scaffold to test which gap is failing. Can you send the full log (to me, no need to send to the list)? b FYI, usage: extendClearRanges [opts] -c ckpName -n ckpNumber -g gkpStore -c ckpName Use ckpName as the checkpoint name -n ckpNumber The checkpoint to use -g gkpStore The gatekeeper store -C gap# Start at a specific gap number -b scafBeg Begin at a specific scaffold -e scafEnd End after a specific scaffold (INCLUSIVE) -o scafIID Process only this scaffold -s scafIID Skip this scaffold -O gap# Process only this gap -S gap# Skip this gap -i iterNum The iteration of ECR; either 1 or 2 -load Load gkpStore into memory -V Enable VERBOSE_MULTIALIGN for debugging On 11/7/12 4:50 AM, "Quan, Xueping" <x....@im...> wrote: Dear All My hybrid assembly (with about 131GB Illumina reads and 2GB 454 reads). Now it failed at the extendClearRanges stage. But when I checked all the stage. I found that in the 5-consensus/ folder, there are 110 jobs, only 60 of them succeed (with both err and success output file). Below is the last few lines of the job61.err "NumColumnsInUnitigs = 261474722 NumGapsInUnitigs = 280587 NumRunsOfGapsInUnitigReads = 11234803 NumColumnsInContigs = 0 NumGapsInContigs = 0 NumRunsOfGapsInContigReads = 0 NumAAMismatches = 0 NumVARRecords = 0 NumVARStringsWithFlankingGaps = 0 NumUnitigRetrySuccess = 0 WARNING: Total number of unitig failures = 1 Consensus did NOT finish successfully." Should I re-run the consensus and how can to make the failed consensus jobs succeed? The failure information in file extendClearRanges-scaffold.0017585.err is "FRG type R ident 757459763 container 94540021 parent 94540021 hanginsertMultiAlign()-- ERROR: multialign 636535 C has invalid fragment/unitig layout (lengt h 2214) -- exceeds bounds of consensus sequence (length 2215). extendClearRanges: MultiAlignStore.C:327: void MultiAlignStore::insertMultiAlign(MultiAlignT*, bool, bool): Assertion `GetMultiAlignLength(ma) <= GetMultiAli gnLength(ma, true)' failed. 57 0 position 2106 2185 FRG type R ident 757459764 container 94540021 parent 94540021 hang 57 0 position 2106 2185 FRG type R ident 930433904 container 94540021 parent 94540021 hang 61 0 position 2185 2110 UTG type X ident 636535 position 0 2214 num_instances 0" Can I follow this instruction to restart it http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Extend_clear_ranges_failure? Thanks very much! Dr. Xueping Quan Imperial College London |
From: Quan, X. <x....@im...> - 2012-11-07 09:51:18
|
Dear All My hybrid assembly (with about 131GB Illumina reads and 2GB 454 reads). Now it failed at the extendClearRanges stage. But when I checked all the stage. I found that in the 5-consensus/ folder, there are 110 jobs, only 60 of them succeed (with both err and success output file). Below is the last few lines of the job61.err "NumColumnsInUnitigs = 261474722 NumGapsInUnitigs = 280587 NumRunsOfGapsInUnitigReads = 11234803 NumColumnsInContigs = 0 NumGapsInContigs = 0 NumRunsOfGapsInContigReads = 0 NumAAMismatches = 0 NumVARRecords = 0 NumVARStringsWithFlankingGaps = 0 NumUnitigRetrySuccess = 0 WARNING: Total number of unitig failures = 1 Consensus did NOT finish successfully." Should I re-run the consensus and how can to make the failed consensus jobs succeed? The failure information in file extendClearRanges-scaffold.0017585.err is "FRG type R ident 757459763 container 94540021 parent 94540021 hanginsertMultiAlign()-- ERROR: multialign 636535 C has invalid fragment/unitig layout (lengt h 2214) -- exceeds bounds of consensus sequence (length 2215). extendClearRanges: MultiAlignStore.C:327: void MultiAlignStore::insertMultiAlign(MultiAlignT*, bool, bool): Assertion `GetMultiAlignLength(ma) <= GetMultiAli gnLength(ma, true)' failed. 57 0 position 2106 2185 FRG type R ident 757459764 container 94540021 parent 94540021 hang 57 0 position 2106 2185 FRG type R ident 930433904 container 94540021 parent 94540021 hang 61 0 position 2185 2110 UTG type X ident 636535 position 0 2214 num_instances 0" Can I follow this instruction to restart it http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Extend_clear_ranges_failure? Thanks very much! Dr. Xueping Quan Imperial College London |
From: Walenz, B. <bw...@jc...> - 2012-11-02 19:23:48
|
On 11/1/12 2:07 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > On 1 November 2012 17:42, Quan, Xueping <x....@im...> wrote: >> My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three >> files: >> 3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log >> 12G 2012-10-29 11:41 HX.fragmentInfo >> 75K 2012-10-29 22:32 unitigger.err >> >> The bestoverlapgraph.log stopped updating for nearly two days now while the >> unitigger (bogart) keep running occupying 100%cpu and 624gb memory but >> producing no result. >> Below is the last few lines of the HX.001.bestoverlapgraph.log >> "BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps). >> BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps). >> BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps). >> BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps). >> BestOverlapGraph()-- fra" >> Was there something going wrong and bogart not really running (though seems >> running in top command) or it is working but not no output yet. If it is the >> second case, how long usually unitigger take to finish or output further >> result? > > I'm not completely certain, but I think it can run for a bit some > times. Bogart can use quite a long time, a couple of weeks depending > on your amount of data and genome. It seems that you are using CA 7.0, > if you use the CVS version you can take advantage of all those 64 CPUs > you have, and complete the bogart stage much, much quicker than if you > use only one CPU (which is the case for CA 7.0). > > Ole The step after this is to examine those 625gb of overlaps and pick out the best for each end of each read. There isn't any logging until it finishes this, at which time it will dump the overlaps to 'best.contains', 'best.edges', and 'best.singletons' files. I don't remember this taking days to finish though. As for speed, on a large 3gb fish with ~ 1.5 billion reads, bogart took around 10 days. With the parallelization it was down to less than 2. If you do choose to restart with the CVS version, you can build the overlap graph on disk, then load it for the unitig computation. Building the overlap graph is LOTS of I/O and is the slower part. Add option '-create' to the bogart command. It will stop after the graph is built. Then you can restart bogart (without -create) to do the computation. b |
From: Ole K. T. <o.k...@bi...> - 2012-11-01 18:31:44
|
On 1 November 2012 17:42, Quan, Xueping <x....@im...> wrote: > My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three > files: > 3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log > 12G 2012-10-29 11:41 HX.fragmentInfo > 75K 2012-10-29 22:32 unitigger.err > > The bestoverlapgraph.log stopped updating for nearly two days now while the > unitigger (bogart) keep running occupying 100%cpu and 624gb memory but > producing no result. > Below is the last few lines of the HX.001.bestoverlapgraph.log > "BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps). > BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps). > BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps). > BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps). > BestOverlapGraph()-- fra" > Was there something going wrong and bogart not really running (though seems > running in top command) or it is working but not no output yet. If it is the > second case, how long usually unitigger take to finish or output further > result? I'm not completely certain, but I think it can run for a bit some times. Bogart can use quite a long time, a couple of weeks depending on your amount of data and genome. It seems that you are using CA 7.0, if you use the CVS version you can take advantage of all those 64 CPUs you have, and complete the bogart stage much, much quicker than if you use only one CPU (which is the case for CA 7.0). Ole > > Thanks very much! > > > Xueping Quan > > Imperial College London > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: Quan, X. <x....@im...> - 2012-11-01 16:43:54
|
My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three files: 3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log 12G 2012-10-29 11:41 HX.fragmentInfo 75K 2012-10-29 22:32 unitigger.err The bestoverlapgraph.log stopped updating for nearly two days now while the unitigger (bogart) keep running occupying 100%cpu and 624gb memory but producing no result. Below is the last few lines of the HX.001.bestoverlapgraph.log "BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps). BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps). BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps). BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps). BestOverlapGraph()-- fra" Was there something going wrong and bogart not really running (though seems running in top command) or it is working but not no output yet. If it is the second case, how long usually unitigger take to finish or output further result? Thanks very much! Xueping Quan Imperial College London |
From: Walenz, B. <bw...@jc...> - 2012-10-23 17:50:24
|
Hi- The quick answer is to use a much much lower memory limit, like 8192 (8gb). There is no gain to using more memory here. It seems that the algorithm that partitions the data — already known to be be not well balanced with mixes of short and long reads — gets more imbalanced with larger memory sizes. If you still have the *BUILDING store, you can take the size of it, divide by 512 for an excellent memory limit. The store build, unfortunately, must restart from scratch; remove Howea.ovlStore.BUILDING then restart runCA. b On 10/23/12 6:38 AM, "Quan, Xueping" <x....@im...> wrote: Hi I am running my job to assembly Illumina and 454 reads. After running runCA 7.0 my job was killed in the ovlstore stage due to out of memory. "/apps/wgs/7.0/wgs-7.0/Linux-amd64/bin/overlapStore -c /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.BUILDING -g /ax3-savolaine nlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.gkpStore -i 0 -M 655360 -L /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.li st > /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.err 2>&1 =>> PBS: job killed: mem 1014599844kb exceeded limit 1006632960kb " Is it possible to set a memory limit which ovlstore will not exceed? "ovlStoreMemory" is currently set as 655360 and seems doesn't work. And am I able to resume the job from the ovlstore stage rather than run from the beginning? It really took very long time to finish the overlapper stage. Thanks very much! Dr. Xueping Quan Imperial College London |
From: Quan, X. <x....@im...> - 2012-10-23 10:39:18
|
Hi I am running my job to assembly Illumina and 454 reads. After running runCA 7.0 my job was killed in the ovlstore stage due to out of memory. "/apps/wgs/7.0/wgs-7.0/Linux-amd64/bin/overlapStore -c /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.BUILDING -g /ax3-savolaine nlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.gkpStore -i 0 -M 655360 -L /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.li st > /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.err 2>&1 =>> PBS: job killed: mem 1014599844kb exceeded limit 1006632960kb " Is it possible to set a memory limit which ovlstore will not exceed? "ovlStoreMemory" is currently set as 655360 and seems doesn't work. And am I able to resume the job from the ovlstore stage rather than run from the beginning? It really took very long time to finish the overlapper stage. Thanks very much! Dr. Xueping Quan Imperial College London |
From: Audrey N. <aud...@ho...> - 2012-10-22 18:28:59
|
Hi wgs-users, I’m trying to generate hybrid assembly with Illumina and 454 reads for a 520Mb estimated genome. Illumina are 108bp Paired-end sequences and 454 are shotgun and 6Kb Paired-end sequences. We already used runCA 7.0 on a subset of these same sequences (including the whole Illumina dataset) and we obtained a complete assembly with a N50 consistent with the partial dataset used. The current “big” assembly is working with 454 shotgun short and long reads, 5 000 000 000 bases; 454 PE, 500 000 000 bases and Illumina PE, 10 000 000 000 bases giving a total of 16 603 061 677 bases for 81 460 599 reads. I understood that ovlHashBits, ovlHashBlockLength and ovlRefBlockSize were critical settings, as set in the below specfile, this runCA started on 4th October on a Silicon Graphics UV 100 with 1TB RAM, 64 CPU and is still running (on 22nd October). A total of 14 overlap jobs were created, 8 jobs were done in almost 2 days… 2 others were then completed on the 14th and 17th October, and the 4 remaining are still in working process… Do you think this is an acceptable run time?! This is just the 0-overlaptrim process! And I know there is still much to do… What parameters would you suggest with such inputs? Are there any other important settings to consider to optimize the process? I already tried to change some parameters settings but today I hesitate to abort this run without knowing exactly what to do to improve the run time. You will find my specfile at the end with the output I got to date. Thanks, Audrey Nisole. #____________________________________________________________ #20121004 # Spec file # Sequences from 454 Titanium technology and Illumina #_____________________________________________________________ # ERROR Rates utgErrorRate = 0.03 utgErrorLimit = 2.5 ovlErrorRate=0.06 cnsErrorRate=0.06 cgwErrorRate=0.10 # Minimum Fragment Length and Minimum Overlap Length frgMinLen = 64 ovlMinLen = 40 # OVERLAPPER overlapper = ovl obtOverlapper = ovl ovlOverlapper = ovl ovlStoreMemory = 100000 saveOverlaps = 1 merSize = 22 # OVL Ovelapper ovlThreads = 6 ovlConcurrency = 9 ovlHashBits = 28 ovlHashBlockLength = 1200000000 ovlRefBlockSize = 600000000 # MERYL calculates K-mer seeds merylMemory = 200000 merylThreads = 24 # ERROR CORRECTION applied to overlaps frgCorrBatchSize = 200000 frgCorrThreads = 6 frgCorrConcurrency = 9 # UNITIGGER unitigger = bog #utgGenomeSize = 520 # SCAFFOLDER computeInsertSize = 0 # CONSENSUS cnsConcurrency = 2 # Terminator closureOverlaps = 0 closurePlacement = 2 createACE = 0 #---------------------------------------------------- # FRG files #---------------------------------------------------- #454 Shotguns, 22 fichiers sff /project/…/frg/454shotgun.frg #Paired-ends, 4 fichiers sff + 2 fichiers fastq /project/…/frg/454Pairend.frg /project/…/sequences/frg/S1.frg # I got this output : runCA -d Budworm_wgs_20121004 –p budworm_20121004 -s specfile_Budworm_20121004 ----------------------------------------START Thu Oct 4 11:13:39 2012 /prg/wgs/7.0/Linux-amd64/bin/gatekeeper -o /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore.BUILDING -T -F /project/…/frg/Budworm_shotgun.frg /project/…/frg/Budworm_Pairend.frg /project/…/frg/S1.frg > /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore.err 2>&1 ----------------------------------------END Thu Oct 4 11:35:38 2012 (1319 seconds) numFrags = 81460599 ----------------------------------------START Thu Oct 4 11:35:40 2012 /prg/wgs/7.0/Linux-amd64/bin/meryl -B -C -v -m 22 -memory 200000 –threads 24 -c 0 -L 2 -s /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore:chain –o /…/wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0 > /…/wgs-assembly/Budworm_wgs_20121004/0-mercounts/meryl.err 2>&1 ----------------------------------------END Thu Oct 4 13:10:59 2012 (5719 seconds) ----------------------------------------START Thu Oct 4 13:10:59 2012 /prg/wgs/7.0/Linux-amd64/bin/estimate-mer-threshold -g /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore:chain –m /…/wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0 > /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0.estMerThresh.out 2> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0.estMerThresh.err ----------------------------------------END Thu Oct 4 13:10:59 2012 (0 seconds) ----------------------------------------START Thu Oct 4 13:10:59 2012 /prg/wgs/7.0/Linux-amd64/bin/meryl -Dt -n 91 –s /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0 > /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.ovl.fasta 2> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.ovl.fasta.err ----------------------------------------END Thu Oct 4 13:12:04 2012 (65 seconds) ----------------------------------------START Thu Oct 4 13:12:04 2012 /prg/wgs/7.0/Linux-amd64/bin/meryl -Dt -n 91 –s /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0 > /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.obt.fasta 2> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.obt.fasta.err ----------------------------------------END Thu Oct 4 13:13:10 2012 (66 seconds) Reset OBT mer threshold from auto to 91. Reset OVL mer threshold from auto to 91. ----------------------------------------START CONCURRENT Thu Oct 4 13:13:10 2012 /.../wgs-assembly/Budworm_wgs_20121004/0-mertrim/mertrim.sh 1 > /.../wgs-assembly/Budworm_wgs_20121004/0-mertrim/budworm_20121004.0001.err 2>&1 (…) HASH 13788239- 16725048 REFR 1- 81460599 STRINGS 2936810 BASES 1200000016 HASH 16725049- 27637181 REFR 1- 81460599 STRINGS 10912133 BASES 1200000017 HASH 27637182- 38807450 REFR 1- 81460599 STRINGS 11170269 BASES 1200000061 HASH 38807451- 50000808 REFR 1- 81460599 STRINGS 11193358 BASES 1200000028 HASH 50000809- 61105478 REFR 1- 81460599 STRINGS 11104670 BASES 1200000077 HASH 61105479- 72196104 REFR 1- 81460599 STRINGS 11090626 BASES 1200000008 HASH 72196105- 81460599 REFR 1- 81460599 STRINGS 9264495 BASES 1003058249 ----------------------------------------END Fri Oct 5 17:59:40 2012 (49 seconds) Created 14 overlap jobs. Last batch '001', last job '000014'. ----------------------------------------START CONCURRENT Fri Oct 5 17:59:40 2012 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 1 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000001.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 2 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000002.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 3 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000003.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 4 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000004.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 5 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000005.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 6 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000006.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 7 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000007.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 8 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000008.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 9 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000009.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 10 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000010.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 11 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000011.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 12 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000012.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 13 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000013.out 2>&1 /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh 14 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000014.out 2>&1 |
From: Walenz, B. <bw...@jc...> - 2012-10-20 02:31:36
|
Hi, Paul- The default minimum allowed read length is 64 bases, and so gatekeeper threw out all your reads. Our error reporting on fastq inputs isn't terribly great, I'm afraid. You can change the minimum with frgMinLen=40 on the command line. The minimum overlap length is 40, an unless you've got great coverage, you might want to drop it as well (ovlMinLen). The CA7 release is rather old, and we're suggesting people start using the 'unstable' version from CVS. b -- Brian Walenz Senior Software Engineer J. Craig Venter Institute ________________________________________ From: Paul Cantalupo [pca...@gm...] Sent: Friday, October 19, 2012 4:01 PM To: wgs-assembler-users Subject: [wgs-assembler-users] gatekeeper failed to add fragments Hi I'm trying for the first time to assemble Illumina fastq reads. After running runCA 7.0 with: runCA -d cabogout -p SRR073769.uniq.bowtie.unmap SRR073769.uniq.bowtie.unmap.frg I got this output: ----------------------------------------START Fri Oct 19 15:54:42 2012 /Users/pgc92/Public/usr/local/wgs-7.0/Darwin-amd64/bin/gatekeeper -o /Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabogout/SRR073769.uniq.bowtie.unmap.gkpStore.BUILDING -T -F /Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.frg > /Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabogout/SRR073769.uniq.bowtie.unmap.gkpStore.err 2>&1 ----------------------------------------END Fri Oct 19 15:54:46 2012 (4 seconds) numFrags = 0 ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /Users/pgc92/Public/usr/local/wgs/Darwin-i386/bin/runCA line 1237 main::caFailure('gatekeeper failed to add fragments', '/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabog...') called at /Users/pgc92/Public/usr/local/wgs/Darwin-i386/bin/runCA line 1698 main::preoverlap('/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR07...') called at /Users/pgc92/Public/usr/local/wgs/Darwin-i386/bin/runCA line 5874 ---------------------------------------- Last few lines of the relevant log file (/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabogout/SRR073769.uniq.bowtie.unmap.gkpStore.err): Starting file '/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.frg'. Processing SINGLE-ENDED SANGER QV encoding reads from: '/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.fq' GKP finished with no alerts or errors. ---------------------------------------- Failure message: gatekeeper failed to add fragments What am I doing wrong? My fastq file contains ~1 million 45 bp reads with sanger quality values. Here is head output of the fastq file (57989) $ head SRR073769.uniq.bowtie.unmap.fq @SRR073769.109 PATHBIO-SOLEXA2:2:1:3:1029 length=45 CTGCCCAGGCATAGTTCACCATCTTTCGGGTCCTAACACGTGCGC +SRR073769.109 PATHBIO-SOLEXA2:2:1:3:1029 length=45 @@?@>@>@7@?9==@B@;@@@29>@6>3950:467>######### @SRR073769.111 PATHBIO-SOLEXA2:2:1:3:1362 length=45 TGGTTAGTTTCTTCTCCTCCGCTGACTAATATGCTTAAATTCAGA +SRR073769.111 PATHBIO-SOLEXA2:2:1:3:1362 length=45 CCCCCCC@CCCCCBCCBCCA@ABBCBBBCCBB8AB?6@ACB;?97 @SRR073769.113 PATHBIO-SOLEXA2:2:1:3:1458 length=45 GATCCACGGGGGCCGACCCGGTGACCCGGTTACCCGCCAGGTCCT Here is the output of the FRG file: (57990) $ cat *frg {VER ver:2 } {LIB act:A acc:SRR073769.uniq.bowtie.unmap ori:U mea:0.000 std:0.000 src: . nft:16 fea: forceBOGunitigger=1 isNotRandom=0 doNotTrustHomopolymerRuns=0 doTrim_initialNone=0 doTrim_initialMerBased=1 doTrim_initialFlowBased=0 doTrim_initialQualityBased=0 doRemoveDuplicateReads=1 doTrim_finalLargestCovered=1 doTrim_finalEvidenceBased=0 doRemoveSpurReads=1 doRemoveChimericReads=1 doConsensusCorrection=0 fastqQualityValues=sanger fastqOrientation=innie fastqReads=/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.fq . } {VER ver:1 } Thank you, Paul Paul Cantalupo University of Pittsburgh |