You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Yeo Z. X. <zhe...@ya...> - 2014-01-06 05:50:32
|
Dear wgs-assembler admins, I am new to runCA. The IT supports from my institute had installed CA8.0 a few months ago. I managed to generate an assembly using my Illumina Miseq data with both the BOG and BOGART Unitiggers. However, only BOGART failed to complete when I included a bigger Hiseq dataset on top of the Miseq dataset (see attached unittiger.err: "terminate called after throwing an instance of 'std::bad_alloc'"). Since BOGART is the recommended Unitigger for Illumina data, we hope to apply it to generate our assembly using all the data we have. For your information, the BOGART is working when I used a subset of the Miseq + Hiseq dataset for assembly by setting the frgMinLen=100 and ovlMinLen=50, suggesting the error is likely due to the size of input data. Our Hiseq data is ~80M of paired-reads (~160M read fragments in total) with most of them are 100bp read after merTrim trimming ( ~50GB of data). The Miseq data is ~10 times smaller than the Hiseq data. Thank you for your supports. Best regards, Zhen Xuan YEO ------------------------------------------------------------ Senior Bioinformatics Specialist @ Yale-NUS College Centre for BioImaging Sciences Department of Biological Sciences Blk S1A, 14 Science Drive 4 Lee Wee Kheng Building 117557 Singapore (O) +65 65162723 (F) +65 67767882 Email: zhe...@ya... |
From: Walenz, B. <bw...@jc...> - 2013-12-17 21:25:58
|
With a great sigh of relief, I have uploaded the CA 8.1 release to sourceforge. Two significant recent enhancements are an increase in the maximum read length from 2Kb to 65Kb, and support for assemblies from uncorrected PacBio reads. Our documentation remains weak — notably missing is a list of changes made since 7.0 — but the main page has four brand new example assemblies. http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page Cheers! b -- Brian Walenz Senior Software Engineer J. Craig Venter Institute |
From: Walenz, B. <bw...@jc...> - 2013-12-06 18:51:35
|
Nope, it requires two mates to scaffold, and possibly sequence overlap if one if implied by the mates. Until repeats are resolved late late in the scaffolding stage, all the pairs are unique. When multiple pairs span a gap, the size of the gap is much more accurately estimated. This will influence later scaffold merges, by making the bounds on the gap size tighter. Looser bounds imply that we can make an incorrect scaffold join because we can stretch the gap to fit in a contig that otherwise has no sequence alignments to the neighbor contigs. E.g., mates imply the inserted contig could overlap its neighbors by 5k, but the loose bound also lets us stretch the gap to fit the contig with no overlaps. You can try reducing the number of pairs needed by decreasing MIN_EDGES from 2 to 1 in src/AS_CGW/GraphCGW_T.H. BUT, the 2-edge assumption has been in the assembler since the start, and this might not be the only place it is set. AND, suffice to say that I haven't tried this. b On 12/6/13 10:28 AM, "Waldbieser, Geoff" <Geo...@AR...> wrote: > Hi, > In an assembly in which unitigs are produced from a relatively low (4-7X) > coverage of longer reads (Sanger, PacBio, Moleculo, hopefully others to come), > is a single pair of reads (PE or MP) sufficient to scaffold two contigs, > assuming the pair is unique in the genome? Would there be any difference > between scaffolding with a unique single pair vs unique multiple pairs that > span a particular gap? > > Geoff > ________________________________ > Geoffrey C. Waldbieser > Research Molecular Biologist > USDA, ARS, Warmwater Aquaculture Research Unit > 141 Experiment Station Road > Stoneville, MS 38776 > |
From: Waldbieser, G. <Geo...@AR...> - 2013-12-06 15:29:16
|
Hi, In an assembly in which unitigs are produced from a relatively low (4-7X) coverage of longer reads (Sanger, PacBio, Moleculo, hopefully others to come), is a single pair of reads (PE or MP) sufficient to scaffold two contigs, assuming the pair is unique in the genome? Would there be any difference between scaffolding with a unique single pair vs unique multiple pairs that span a particular gap? Geoff ________________________________ Geoffrey C. Waldbieser Research Molecular Biologist USDA, ARS, Warmwater Aquaculture Research Unit 141 Experiment Station Road Stoneville, MS 38776 This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |
From: Walenz, B. <bw...@jc...> - 2013-11-25 22:16:30
|
Hi- I think this is the 'out of memory' message. How current is your code? Just before the CA8 release, I increased the max read length to 65k, but this also resulted in the dynamic programming matrix now being 16gb. I have a fix being tested at the moment. How large is the genome? You've got a 160Mb scaffold below. Running on a larger machine would resolve the problem. I hope to get an 8.1 release made this week. b On 11/20/13 5:56 AM, "James Abbott" <j.a...@im...> wrote: > Hello, > > I'm trying to re-assemble a fungal genome which I've previously > assembled successfully using wgs 7.0, but with the addition of some > PacBio CCS sequence, using wgs 8, however cgw is crashing with the > following: > > * Considering edges with weight >= 42.00 (maxWeightEdge 56 weightScale > 0.7500) > isQualityScaffoldMergingEdge()-- Merge scaffolds 19916 (1032943.1bp) and > 20439 (166551122.0bp): gap -802155.4bp +- 3059.9bp weight 56 AB_BA edge > terminate called after throwing an instance of 'std::bad_alloc' > what(): St9bad_alloc > > Failed with 'Aborted' > > Backtrace (mangled): > > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z17AS_UTL_catchCrashiP7siginfoPv+ > 0x31)[0x4391e9] > /lib64/libpthread.so.0[0x370420eb10] > /lib64/libc.so.6(gsignal+0x35)[0x3703630265] > /lib64/libc.so.6(abort+0x110)[0x3703631d10] > /usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[ > 0x32936bed14] > /usr/lib64/libstdc++.so.6[0x32936bce16] > /usr/lib64/libstdc++.so.6[0x32936bce43] > /usr/lib64/libstdc++.so.6[0x32936bcf2a] > /usr/lib64/libstdc++.so.6(_Znwm+0x79)[0x32936bd239] > /usr/lib64/libstdc++.so.6(_Znam+0x9)[0x32936bd2f9] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_ZN13instrumentSCF7analyzeERSt6vec > torI13instrumentLIBSaIS1_EE+0x441)[0x50c223] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46ec1c] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z28isQualityScaffoldMergingEdgeP9 > EdgeCGW_TP9NodeCGW_TS2_P20ScaffoldInstrumenterP12VarArrayTypedd+0x1e4)[0x46fbb > c] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z36ExamineSEdgeForUsability_Inter > leavedP9EdgeCGW_TP16InterleavingSpecP9NodeCGW_TS4_+0x12a)[0x475fb0] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e317] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e524] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z24MergeScaffoldsAggressiveP14Sca > ffoldGraphTPci+0x2a6)[0x46e820] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(main+0x242d)[0x4376ff] > /lib64/libc.so.6(__libc_start_main+0xf4)[0x370361d994] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(__gxx_personality_v0+0x149)[0x4350 > 59] > > Backtrace (demangled): > > [0] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::AS_UTL_catchCrash(int, > siginfo*, void*) + 0x31 [0x4391e9] > [1] /lib64/libpthread.so.0 [0x370420eb10] > [2] /lib64/libc.so.6::(null) + 0x35 [0x3703630265] > [3] /lib64/libc.so.6::(null) + 0x110 [0x3703631d10] > [4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() > + 0x114 [0x32936bed14] > [5] /usr/lib64/libstdc++.so.6 [0x32936bce16] > [6] /usr/lib64/libstdc++.so.6 [0x32936bce43] > [7] /usr/lib64/libstdc++.so.6 [0x32936bcf2a] > [8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x79 > [0x32936bd239] > [9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9 > [0x32936bd2f9] > [10] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::instrumentSCF::analyze(std::vecto > r<instrumentLIB, > std::allocator<instrumentLIB> >&) + 0x441 [0x50c223] > [11] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46ec1c] > [12] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::isQualityScaffoldMergingEdge(Edge > CGW_T*, > NodeCGW_T*, NodeCGW_T*, ScaffoldInstrumenter*, VarArrayType*, double, > double) + 0x1e4 [0x46fbbc] > [13] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::ExamineSEdgeForUsability_Interlea > ved(EdgeCGW_T*, > InterleavingSpec*, NodeCGW_T*, NodeCGW_T*) + 0x12a [0x475fb0] > [14] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e317] > [15] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e524] > [16] > /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::MergeScaffoldsAggressive(Scaffold > GraphT*, > char*, int) + 0x2a6 [0x46e820] > [17] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x242d [0x4376ff] > [18] /lib64/libc.so.6::(null) + 0xf4 [0x370361d994] > [19] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x149 [0x435059] > > I've recompiled with debugging enabled and rerun the job but the cgw.out > is >700Mb, which I can make available via http if this would be useful. > The assembler was compiled and run under CentOS 5.6 (gcc 4.1.2, glibc > 2.5.58), with 32Gb RAM available (we have higher memory machines > available if this may be a factor...). > > The gpkstore.info for the assembly is as follows: > > libIID bgnIID endIID active deleted mated totLen clrLen libName > 0 1 2789548 2789548 0 640132 1631468328 > 1380548976 GLOBAL > 0 0 0 0 0 0 0 0 LegacyUnmatedReads > 1 1 183478 183478 0 176410 179564258 > 179564258 plasmids > 2 183479 658786 475308 0 463722 391222096 > 391222096 fosmids > 3 658787 1208799 550013 0 0 146236541 > 146236541 FLX > 4 1208800 2760713 1551914 0 0 850798225 > 599878873 XLR > 5 2760714 2789548 28835 0 0 63647208 63647208 > PacBio > > So there is a rather low proportion of mate-pairs compared to reads from > fragment libraries (an area the PacBio data was intended to resolve, > however due to DNA quality it has only been possible to generate CCS > reads so far...). The genome in question is highly repetitive (~70%), > with many nested transposons, and I believe our existing assembly is > 'over-scaffolded' due to misplaced mate-pairs in repeat regions, so am > intending to increase cgwMinMergeWeight, and probably > cgwMergeFilterLevel to increase the stringency of the scaffolding, > however as a first pass was running using the default settings. > > Any suggestions as to whether this crash is due to a bug or the nature > of the genome/combination of data types? > > Best Regards, > James |
From: <mic...@ip...> - 2013-11-25 10:10:34
|
Hello, After several restarts of an assembly with error-corrected PacBio reads using CA 8.0, i got stuck at the consensusAfterScaffolder stage. Removing consensus.sh and rerunning the assembly couldnt overcome the following error (at end of post). I found a earlier post about consensusAfterScaffolder - problems (in CA 6.0 i think) in the tracker but wasnt sure if its solution is still valid for CA 8.0. Attached is one of the 3 error-files which didnt succeed (from 8-consensus/ ). When looking at the error-files in 8-consensus/ i found two of them terminating with: #IncBaseCount i out of range (possibly non ACGTN letter?)ctgcns: MultiAlignment_CNS.C:193: int IncBaseCount(BaseCount*, char): Assertion `0' failed. # #Failed with 'Aborted' and one with: # Failed with 'Segmentation fault' $less runCA.sge.out.10 ENV: SGE_TASK_ID needs to be unset, done. /data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/peaxi_026 failed -- no .success. /data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/peaxi_056 failed -- no .success. /data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/peaxi_057 failed -- no .success. ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /home/mmoser/CELERA/wgs-8.0/Linux-amd64/bin/runCA line 1501 main::caFailure('3 consensusAfterScaffolder jobs failed; remove /data/users/mm...', undef) called at /home/mmoser/CELERA/wgs-8.0/Linux-amd64/bin/runCA line 5542 main::postScaffolderConsensus() called at /home/mmoser/CELERA/wgs-8.0/Linux-amd64/bin/runCA line 6317 ---------------------------------------- Failure message: 3 consensusAfterScaffolder jobs failed; remove /data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/consensus.sh to try again |
From: Walenz, B. <bw...@jc...> - 2013-11-21 23:00:45
|
Hi- The surrogates are included in contigs. Searching against scaffolds + degenerates (or contigs + degenerates) will use every assembled sequence. A bit more: Degenerates are either short low coverage unitigs or repeats that were not placed. Repeats that are placed are called surrogates, because they can appear in more than one place. We'll place the reads in the surrogates if they can be placed uniquely (by a mate pair) otherwise, the contig where the surrogate is placed will have zero read coverage in the posmap outputs. b On 11/21/13 4:22 PM, "Cristell Navarro" <cri...@om...> wrote: > Hi > I was analizing the data that did not match in the scaffolds, first I > looked for the degenerates, and now I would like to analize the Surrogate. > > Where can I find a fasta file of the Surrogates contigs? > > Cristell. > > > ------------------------------------------------------------------------------ > Shape the Mobile Experience: Free Subscription > Software experts and developers: Be at the forefront of tech innovation. > Intel(R) Software Adrenaline delivers strategic insight and game-changing > conversations that shape the rapidly evolving mobile landscape. Sign up now. > http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Cristell N. <cri...@om...> - 2013-11-21 21:28:36
|
Hi I was analizing the data that did not match in the scaffolds, first I looked for the degenerates, and now I would like to analize the Surrogate. Where can I find a fasta file of the Surrogates contigs? Cristell. |
From: James A. <j.a...@im...> - 2013-11-20 10:56:40
|
Hello, I'm trying to re-assemble a fungal genome which I've previously assembled successfully using wgs 7.0, but with the addition of some PacBio CCS sequence, using wgs 8, however cgw is crashing with the following: * Considering edges with weight >= 42.00 (maxWeightEdge 56 weightScale 0.7500) isQualityScaffoldMergingEdge()-- Merge scaffolds 19916 (1032943.1bp) and 20439 (166551122.0bp): gap -802155.4bp +- 3059.9bp weight 56 AB_BA edge terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc Failed with 'Aborted' Backtrace (mangled): /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z17AS_UTL_catchCrashiP7siginfoPv+0x31)[0x4391e9] /lib64/libpthread.so.0[0x370420eb10] /lib64/libc.so.6(gsignal+0x35)[0x3703630265] /lib64/libc.so.6(abort+0x110)[0x3703631d10] /usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[0x32936bed14] /usr/lib64/libstdc++.so.6[0x32936bce16] /usr/lib64/libstdc++.so.6[0x32936bce43] /usr/lib64/libstdc++.so.6[0x32936bcf2a] /usr/lib64/libstdc++.so.6(_Znwm+0x79)[0x32936bd239] /usr/lib64/libstdc++.so.6(_Znam+0x9)[0x32936bd2f9] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_ZN13instrumentSCF7analyzeERSt6vectorI13instrumentLIBSaIS1_EE+0x441)[0x50c223] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46ec1c] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z28isQualityScaffoldMergingEdgeP9EdgeCGW_TP9NodeCGW_TS2_P20ScaffoldInstrumenterP12VarArrayTypedd+0x1e4)[0x46fbbc] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z36ExamineSEdgeForUsability_InterleavedP9EdgeCGW_TP16InterleavingSpecP9NodeCGW_TS4_+0x12a)[0x475fb0] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e317] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e524] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z24MergeScaffoldsAggressiveP14ScaffoldGraphTPci+0x2a6)[0x46e820] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(main+0x242d)[0x4376ff] /lib64/libc.so.6(__libc_start_main+0xf4)[0x370361d994] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(__gxx_personality_v0+0x149)[0x435059] Backtrace (demangled): [0] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::AS_UTL_catchCrash(int, siginfo*, void*) + 0x31 [0x4391e9] [1] /lib64/libpthread.so.0 [0x370420eb10] [2] /lib64/libc.so.6::(null) + 0x35 [0x3703630265] [3] /lib64/libc.so.6::(null) + 0x110 [0x3703631d10] [4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() + 0x114 [0x32936bed14] [5] /usr/lib64/libstdc++.so.6 [0x32936bce16] [6] /usr/lib64/libstdc++.so.6 [0x32936bce43] [7] /usr/lib64/libstdc++.so.6 [0x32936bcf2a] [8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x79 [0x32936bd239] [9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9 [0x32936bd2f9] [10] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::instrumentSCF::analyze(std::vector<instrumentLIB, std::allocator<instrumentLIB> >&) + 0x441 [0x50c223] [11] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46ec1c] [12] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::isQualityScaffoldMergingEdge(EdgeCGW_T*, NodeCGW_T*, NodeCGW_T*, ScaffoldInstrumenter*, VarArrayType*, double, double) + 0x1e4 [0x46fbbc] [13] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::ExamineSEdgeForUsability_Interleaved(EdgeCGW_T*, InterleavingSpec*, NodeCGW_T*, NodeCGW_T*) + 0x12a [0x475fb0] [14] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e317] [15] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e524] [16] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::MergeScaffoldsAggressive(ScaffoldGraphT*, char*, int) + 0x2a6 [0x46e820] [17] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x242d [0x4376ff] [18] /lib64/libc.so.6::(null) + 0xf4 [0x370361d994] [19] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x149 [0x435059] I've recompiled with debugging enabled and rerun the job but the cgw.out is >700Mb, which I can make available via http if this would be useful. The assembler was compiled and run under CentOS 5.6 (gcc 4.1.2, glibc 2.5.58), with 32Gb RAM available (we have higher memory machines available if this may be a factor...). The gpkstore.info for the assembly is as follows: libIID bgnIID endIID active deleted mated totLen clrLen libName 0 1 2789548 2789548 0 640132 1631468328 1380548976 GLOBAL 0 0 0 0 0 0 0 0 LegacyUnmatedReads 1 1 183478 183478 0 176410 179564258 179564258 plasmids 2 183479 658786 475308 0 463722 391222096 391222096 fosmids 3 658787 1208799 550013 0 0 146236541 146236541 FLX 4 1208800 2760713 1551914 0 0 850798225 599878873 XLR 5 2760714 2789548 28835 0 0 63647208 63647208 PacBio So there is a rather low proportion of mate-pairs compared to reads from fragment libraries (an area the PacBio data was intended to resolve, however due to DNA quality it has only been possible to generate CCS reads so far...). The genome in question is highly repetitive (~70%), with many nested transposons, and I believe our existing assembly is 'over-scaffolded' due to misplaced mate-pairs in repeat regions, so am intending to increase cgwMinMergeWeight, and probably cgwMergeFilterLevel to increase the stringency of the scaffolding, however as a first pass was running using the default settings. Any suggestions as to whether this crash is due to a bug or the nature of the genome/combination of data types? Best Regards, James -- Dr. James Abbott Lead Bioinformatician Bioinformatics Support Service Imperial College, London |
From: Waldbieser, G. <Geo...@AR...> - 2013-11-18 20:44:50
|
My error - I didn't overwrite this assembly. There were 122 partitions and all but one was successful: Mon Nov 18 14:44:17 geoff@RAMona:~/CocoAssembly/CA_23_wgs8/5-consensus $ cat PBsuper_089.cns.err MultiAlignStore::loadMASRfile()-- Failed to open '/home/geoff/CocoAssembly/CA_23/PBsuper.tigStore/seqDB.v002.p089.utg': magic number mismatch; file=0x00000000 code=0x5253414d Mon Nov 18 14:45:45 geoff@RAMona:~/CocoAssembly/CA_23_wgs8/PBsuper.tigStore $ ll seqDB.v002.p089* -rw-r--r-- 1 geoff users 65482752 Nov 14 08:35 seqDB.v002.p089.dat -rw-r--r-- 1 root root 0 Nov 14 08:36 seqDB.v002.p089.utg From: Walenz, Brian [mailto:bw...@jc...] Sent: Monday, November 18, 2013 2:31 PM To: Ole Kristian Tørresen Cc: Waldbieser, Geoff; wgs...@li... Subject: Re: [wgs-assembler-users] consensus failures in wgs-8.0 I've occasionally had trouble with moderately deep (75x) unitigs that are long (megabases) that can come from PacBio assemblies. Yes, the occasional very deep unitigs (> 100x) caused by repeats/contaminants are now handled well. The utgcns *.err logging now includes the depth of the unitig and the amount of sequence in contained reads: Working on unitig 0 (0 unitigs and 88003 fragments) unitig 0 detected 85431 contains (70.68x, 90.86%) 2572 dovetail (7.11x, 9.14%) unitig 0 removing 85431 contains; processing only 2572 reads In this case, I changed parameters (cnsReduceUnitigs=75 5 IIRC) to use only the dovetail reads for consensus. http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=RunCA#Consensus On 11/18/13 2:47 PM, "Ole Kristian Tørresen" <ol...@st...> wrote: Hi Geoff and Brian. Couldn't this be a very deep unitig? I often have trouble with this. But when I'm looking at the code now, you seem to have fixed this, Brian. What is the content of the last partition's .err file, Geoff? Ole On 18 November 2013 20:28, Walenz, Brian <bw...@jc...> wrote: Hi, Geoff- I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run. With BITS=15, it needed 'only' 4gb. Hard to tell if this is your problem. b On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR... <http://Geo...@AR...> > wrote: I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437. ___________________________________ Geoffrey C. Waldbieser Research Molecular Biologist Warmwater Aquaculture Research Unit Agricultural Research Service United States Department of Agriculture Stoneville, MS 38776 (662) 686-3593 This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Walenz, B. <bw...@jc...> - 2013-11-18 20:31:13
|
I’ve occasionally had trouble with moderately deep (75x) unitigs that are long (megabases) that can come from PacBio assemblies. Yes, the occasional very deep unitigs (> 100x) caused by repeats/contaminants are now handled well. The utgcns *.err logging now includes the depth of the unitig and the amount of sequence in contained reads: Working on unitig 0 (0 unitigs and 88003 fragments) unitig 0 detected 85431 contains (70.68x, 90.86%) 2572 dovetail (7.11x, 9.14%) unitig 0 removing 85431 contains; processing only 2572 reads In this case, I changed parameters (cnsReduceUnitigs=75 5 IIRC) to use only the dovetail reads for consensus. http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=RunCA#Consensus On 11/18/13 2:47 PM, "Ole Kristian Tørresen" <ol...@st...> wrote: Hi Geoff and Brian. Couldn't this be a very deep unitig? I often have trouble with this. But when I'm looking at the code now, you seem to have fixed this, Brian. What is the content of the last partition's .err file, Geoff? Ole On 18 November 2013 20:28, Walenz, Brian <bw...@jc...> wrote: Hi, Geoff- I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run. With BITS=15, it needed ‘only’ 4gb. Hard to tell if this is your problem. b On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR... <http://Geo...@AR...> > wrote: I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437. ___________________________________ Geoffrey C. Waldbieser Research Molecular Biologist Warmwater Aquaculture Research Unit Agricultural Research Service United States Department of Agriculture Stoneville, MS 38776 (662) 686-3593 This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Waldbieser, G. <Geo...@AR...> - 2013-11-18 20:21:18
|
I apologize but I accidentally overwrote that data when restarting a new assembly with wgs 7. I will recompile 8.0 with AS_READ_NORMAL_LEN_BITS = 15 and retry. This is the setting I have used successfully with wgs7 versions, and is likely the source of my problem. Thanks! Geoff From: ti...@gm... [mailto:ti...@gm...] On Behalf Of Ole Kristian Tørresen Sent: Monday, November 18, 2013 1:47 PM To: Walenz, Brian Cc: Waldbieser, Geoff; wgs...@li... Subject: Re: [wgs-assembler-users] consensus failures in wgs-8.0 Hi Geoff and Brian. Couldn't this be a very deep unitig? I often have trouble with this. But when I'm looking at the code now, you seem to have fixed this, Brian. What is the content of the last partition's .err file, Geoff? Ole On 18 November 2013 20:28, Walenz, Brian <bw...@jc...<mailto:bw...@jc...>> wrote: Hi, Geoff- I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run. With BITS=15, it needed 'only' 4gb. Hard to tell if this is your problem. b On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR...<http://Geo...@AR...>> wrote: I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437. ___________________________________ Geoffrey C. Waldbieser Research Molecular Biologist Warmwater Aquaculture Research Unit Agricultural Research Service United States Department of Agriculture Stoneville, MS 38776 (662) 686-3593 This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk _______________________________________________ wgs-assembler-users mailing list wgs...@li...<mailto:wgs...@li...> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Ole K. T. <ol...@st...> - 2013-11-18 19:47:29
|
Hi Geoff and Brian. Couldn't this be a very deep unitig? I often have trouble with this. But when I'm looking at the code now, you seem to have fixed this, Brian. What is the content of the last partition's .err file, Geoff? Ole On 18 November 2013 20:28, Walenz, Brian <bw...@jc...> wrote: > Hi, Geoff- > > I just realized that my future-proofing increase of > AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that > utgcns now needs 16gb memory to run. With BITS=15, it needed ‘only’ 4gb. > Hard to tell if this is your problem. > > b > > > > > On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR...> > wrote: > > I have set up an assembly of PacBio long reads and Illumina single reads > (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 > files. After utgcnsfix errors, I have restarted wgs 3 times (after removing > the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I > have finally come to the last job that will not run. This data assembled in > a few hours using subversion wgs_r4437. > ___________________________________ > Geoffrey C. Waldbieser > Research Molecular Biologist > Warmwater Aquaculture Research Unit > Agricultural Research Service > United States Department of Agriculture > Stoneville, MS 38776 > (662) 686-3593 > > > > > > This electronic message contains information generated by the USDA solely > for the intended recipients. Any unauthorized interception of this message > or the use or disclosure of the information it contains may violate the law > and subject the violator to civil or criminal penalties. If you believe you > have received this message in error, please notify the sender and delete > the email immediately. > > > > ------------------------------------------------------------------------------ > Shape the Mobile Experience: Free Subscription > Software experts and developers: Be at the forefront of tech innovation. > Intel(R) Software Adrenaline delivers strategic insight and game-changing > conversations that shape the rapidly evolving mobile landscape. Sign up > now. > http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Walenz, B. <bw...@jc...> - 2013-11-18 19:30:57
|
Hi, Geoff- I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run. With BITS=15, it needed ‘only’ 4gb. Hard to tell if this is your problem. b On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR...> wrote: I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437. ___________________________________ Geoffrey C. Waldbieser Research Molecular Biologist Warmwater Aquaculture Research Unit Agricultural Research Service United States Department of Agriculture Stoneville, MS 38776 (662) 686-3593 This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |
From: Waldbieser, G. <Geo...@AR...> - 2013-11-16 00:49:23
|
I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437. ___________________________________ Geoffrey C. Waldbieser Research Molecular Biologist Warmwater Aquaculture Research Unit Agricultural Research Service United States Department of Agriculture Stoneville, MS 38776 (662) 686-3593 This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |
From: Walenz, B. <bw...@jc...> - 2013-11-15 19:32:15
|
Hi- The obtStore.err is claiming that the inputs (*.ovb.gz) are not in gzip format. At this stage, all it is doing is 'gzip -l 0-overlaptrim-overlap/001/000462.ovb.gz' to see the true length of the compressed file. Gzip itself is complaining that the file is not gzip format. Are these files valid (non-empty)? Were there any error reported in the overlap job output (0-overlaptrim-overlap/*.err)? You can also clean up this directory (remove *.err and the 001/ directory) and run a few jobs by hand (sh overlap.sh 1, sh overlap.sh 2, etc) to see that they run OK. I don't see anything wrong in your spec, but would suggest some changes. unitigger=bogart utgGraphErrorRate=0.05 utgMergeErrorRate=0.05 batMemory=X (X in gigabytes) batThreads=Y (default is to use all CPUs on the machine) The bogart unitigger seems to work much better than 'bog', but it more expensive to run. It needs to load non-best overlaps into memory. If you leave this unset, it will default to using all physical memory on the machine. I increased the allowed error rate from 3% to 5%. This should result in better unitig construction, but rarely can end up declaring that similar unique sequence is a repeat and breaking the unitig. doFragmentCorrection=0 This is an expensive step, that can improve results. For now, leave it off. b On 11/13/13 11:07 AM, "mic...@ip..." <mic...@ip...> wrote: > Dear WGS developers > > I am trying to assemble a subset of 2 Gbp error corrected PacBio reads with > CA8.0 using SGE to get a feeling about speed and resources needed for the > assembly. The full dataset will be about 13,8 Gbp of error-corrected PacBio > reads. > Read lengths range from 790 up to 19'000 bp. > I assume its curcial to have optimal parameter settings in the spec-file. > Unfortunately, i cant find appropriate examples for my dataset. > > Currently i am stuck with an error in the obt store generation: > Failure message: > > failed to build the obt store > > Would it be possible provide a PacBio spec template and give me some hint > about how to overcome the obt store failure? > Attached are error-files from the runCA.sge.out , the std.err and the > obtStore.err along with the used spec file. > > Thank you very much, > Michel > > > > > ________________________________________ > Von: wgs...@li... > [wgs...@li...] > Gesendet: Mittwoch, 13. November 2013 16:31 > An: Moser, Michel (IPS) > Betreff: Mailman privacy alert > > An attempt was made to subscribe your address to the mailing list > wgs...@li.... You are already subscribed to this > mailing list. > > Note that the list membership is not public, so it is possible that a bad > person was trying to probe the list for its membership. This would be a > privacy violation if we let them do this, but we didn't. > > If you submitted the subscription request and forgot that you were already > subscribed to the list, then you can ignore this message. If you suspect that > an attempt is being made to covertly discover whether you are a member of this > list, and you are worried about your privacy, then feel free to send a message > to the list administrator at wgs...@li.... |
From: Walenz, B. <bw...@jc...> - 2013-11-13 18:07:05
|
You're not supposed to read the documentation! :-) Yes, pacbio-corrected should say the reads are output from pacBioToCA. Cut-n-paste wins again. Thanks. ________________________________________ From: Waldbieser, Geoff [Geo...@AR...] Sent: Wednesday, November 13, 2013 12:55 PM To: wgs...@li... Subject: [wgs-assembler-users] Typo in fastqToCA? Hi, Under the technology flag in fastqToCA, it lists both pacbio-corrected and pacbio-raw as "uncorrected reads". Is this just a typo or a listing for a future fix? -technology p What instrument were these reads generated on ('illumina' is the default): 'none' -- don't set any features; use -feature to set them manually 'sanger' -- reads from dideoxy sequencers '454' -- reads from 454 Life Sciences; FLX, Titanium, FLX+ 'illumina' -- reads from Illumina; GAIIx, MiSeq, HiSeq; shorter than 160bp 'illumina-long' -- reads from Illumina; GAIIx, MiSeq, HiSeq; any length 'pacbio-ccs' -- reads from PacBio; Circular Consensus Sequence (CSS) 'pacbio-corrected' -- reads from PacBio; uncorrected reads 'pacbio-raw' -- reads from PacBio; uncorrected reads Geoff This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. ------------------------------------------------------------------------------ DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Waldbieser, G. <Geo...@AR...> - 2013-11-13 17:55:11
|
Hi, Under the technology flag in fastqToCA, it lists both pacbio-corrected and pacbio-raw as "uncorrected reads". Is this just a typo or a listing for a future fix? -technology p What instrument were these reads generated on ('illumina' is the default): 'none' -- don't set any features; use -feature to set them manually 'sanger' -- reads from dideoxy sequencers '454' -- reads from 454 Life Sciences; FLX, Titanium, FLX+ 'illumina' -- reads from Illumina; GAIIx, MiSeq, HiSeq; shorter than 160bp 'illumina-long' -- reads from Illumina; GAIIx, MiSeq, HiSeq; any length 'pacbio-ccs' -- reads from PacBio; Circular Consensus Sequence (CSS) 'pacbio-corrected' -- reads from PacBio; uncorrected reads 'pacbio-raw' -- reads from PacBio; uncorrected reads Geoff This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |
From: <mic...@ip...> - 2013-11-13 16:24:33
|
Dear WGS developers I am trying to assemble a subset of 2 Gbp error corrected PacBio reads with CA8.0 using SGE to get a feeling about speed and resources needed for the assembly. The full dataset will be about 13,8 Gbp of error-corrected PacBio reads. Read lengths range from 790 up to 19'000 bp. I assume its curcial to have optimal parameter settings in the spec-file. Unfortunately, i cant find appropriate examples for my dataset. Currently i am stuck with an error in the obt store generation: Failure message: failed to build the obt store Would it be possible provide a PacBio spec template and give me some hint about how to overcome the obt store failure? Attached are error-files from the runCA.sge.out , the std.err and the obtStore.err along with the used spec file. Thank you very much, Michel ________________________________________ Von: wgs...@li... [wgs...@li...] Gesendet: Mittwoch, 13. November 2013 16:31 An: Moser, Michel (IPS) Betreff: Mailman privacy alert An attempt was made to subscribe your address to the mailing list wgs...@li.... You are already subscribed to this mailing list. Note that the list membership is not public, so it is possible that a bad person was trying to probe the list for its membership. This would be a privacy violation if we let them do this, but we didn't. If you submitted the subscription request and forgot that you were already subscribed to the list, then you can ignore this message. If you suspect that an attempt is being made to covertly discover whether you are a member of this list, and you are worried about your privacy, then feel free to send a message to the list administrator at wgs...@li.... |
From: Walenz, B. <bw...@jc...> - 2013-11-11 18:54:14
|
The length=20 gaps mean that the mate pairs claim the adjacent contigs overlap, but no sequence alignment could be found. The .asm file contains the true (negative) gap length in the CTP message. http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=ASM_File s#SCF_CTP mea: The mean distance gives the predicted number of bases in the gap between the contigs. It is measured from contig end to contig end. A negative distance indicates that the contigs overlap (according to their aggregate mate pairs) though their consensus sequences do not align. In the FASTA representation of a scaffold, negative gap lengths are represented arbitrarily by 20 N's. b On 11/11/13 1:07 PM, "Cristell Navarro" <cri...@om...> wrote: > Hi! > > I'm looking for an explanation because my scaffolds have most of their > gaps with length = 20. This means something in special?, Is this a > minimal gap lenght that the assembler use for gaps of unknow length? > > I hope you could help me with this issue, because I would like to submit > my data to ncbi... > > thanks in advance! > > Cristell > > ------------------------------------------------------------------------------ > November Webinars for C, C++, Fortran Developers > Accelerate application performance with scalable programming models. Explore > techniques for threading, error checking, porting, and tuning. Get the most > from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Cristell N. <cri...@om...> - 2013-11-11 18:07:34
|
Hi! I'm looking for an explanation because my scaffolds have most of their gaps with length = 20. This means something in special?, Is this a minimal gap lenght that the assembler use for gaps of unknow length? I hope you could help me with this issue, because I would like to submit my data to ncbi... thanks in advance! Cristell |
From: Walenz, B. <bw...@jc...> - 2013-10-28 19:46:08
|
Hi- We hoped to release on Oct 1, but kept delaying because other projects are eating our time. We want to generate a set of example assemblies as documentation before releasing. For many reason, we’re probably going to make a release this week without finishing all the examples. You can grab all the current bits from svn until then. https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile b On 10/27/13 3:40 PM, "Dave Messina" <on...@da...> wrote: Hi, On the main page here: http://sourceforge.net/apps/mediawiki/wgs-assembler/ CA 8.0 is listed as having been release on October 1, but [IN PROGRESS]. The download link (http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.0/) is invalid and reverts to the main download directory where the CA 7.0 is the most recent version available. Is CA 8.0 available for use, and if so could you please point me to it? Thanks! |
From: Dave M. <on...@da...> - 2013-10-27 20:11:45
|
Hi, On the main page here: http://sourceforge.net/apps/mediawiki/wgs-assembler/ CA 8.0 is listed as having been release on October 1, but [IN PROGRESS]. The download link ( http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.0/) is invalid and reverts to the main download directory where the CA 7.0 is the most recent version available. Is CA 8.0 available for use, and if so could you please point me to it? Thanks! |
From: Zhou Qi <zh...@be...> - 2013-10-26 05:14:10
|
Hi, I’m using MaSuRCA assembler which takes CA6.0 as core and I got the some error message. Any input would be very much appreciated. My command is: runCA gkpFixInsertSizes=0 jellyfishHashSize=10000000000 ovlRefBlockSize=1225516 ovlHashBlockSize=122551 ovlCorrBatchSize=40000000 utgErrorRate=0.03 merylMemory=8192 ovlMemory=8GB stopAfter=unitigger ovlMerThreshold=75 bogBreakAtIntersections=0 unitigger=bog bogBadMateDepth=1000000 -p genome -d CA merylThreads=32 frgCorrThreads=1 frgCorrConcurrency=48 cnsConcurrency=48 ovlCorrConcurrency=8 ovlConcurrency=48 ovlThreads=1 doFragmentCorrection=1 doOverlapBasedTrimming=1 doExtendClearRanges=2 ovlMerSize=22 superReadSequences_shr.frg /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/Genomic_reads/Dmir.female.454.frg f5.cor.clean.frg m5.cor.clean.frg f1.cor.clean.frg f2.cor.clean.frg m2.cor.clean.frg 1> runCA0.out 2>&1 and the error message is: ----------------------------------------END CONCURRENT Fri Oct 25 21:36:34 2013 (64 seconds) Overlap correction job 1 (/jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/CA/3-overlapcorrection/0001) failed. ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/runCA line 1121 main::caFailure('1 overlap correction jobs failed; remove /jbods/data01/DATA/d...', undef) called at /jbods/data01/DATA/dmiran da/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/runCA line 3019 main::overlapCorrection() called at /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/ bin/runCA line 5343 ---------------------------------------- Failure message: 1 overlap correction jobs failed; remove /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/CA/3-overlapcorrection /ovlcorr.sh (or run by hand) to try again *** buffer overflow detected ***: /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps terminated ======= Backtrace: ========= /lib/libc.so.6(__fortify_fail+0x37)[0x7f474f7fcb47] /lib/libc.so.6(+0xfea00)[0x7f474f7fba00] /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps[0x4030ee] /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps[0x40520c] /lib/libc.so.6(__libc_start_main+0xfe)[0x7f474f71bd8e] /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps[0x402279] ======= Memory map: ======== 00400000-00427000 r-xp 00000000 08:21 40083 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps 00626000-00627000 r--p 00026000 08:21 40083 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps 00627000-00628000 rw-p 00027000 08:21 40083 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps 00628000-00636000 rw-p 00000000 00:00 0 01234000-68f4b000 rw-p 00000000 00:00 0 [heap] 7f46690d9000-7f46690da000 ---p 00000000 00:00 0 7f46690da000-7f4669ada000 rw-p 00000000 00:00 0 7f4669ada000-7f4669adb000 ---p 00000000 00:00 0 7f4669adb000-7f466a4db000 rw-p 00000000 00:00 0 7f466a4db000-7f466a4dc000 ---p 00000000 00:00 0 7f466a4dc000-7f466aedc000 rw-p 00000000 00:00 0 7f466b8dd000-7f474f6fd000 rw-p 00000000 00:00 0 7f474f6fd000-7f474f877000 r-xp 00000000 08:31 918102 /lib/libc-2.12.1.so 7f474f877000-7f474fa77000 ---p 0017a000 08:31 918102 /lib/libc-2.12.1.so 7f474fa77000-7f474fa7b000 r--p 0017a000 08:31 918102 /lib/libc-2.12.1.so 7f474fa7b000-7f474fa7c000 rw-p 0017e000 08:31 918102 /lib/libc-2.12.1.so 7f474fa7c000-7f474fa81000 rw-p 00000000 00:00 0 7f474fa81000-7f474fa96000 r-xp 00000000 08:31 917564 /lib/libgcc_s.so.1 7f474fa96000-7f474fc95000 ---p 00015000 08:31 917564 /lib/libgcc_s.so.1 7f474fc95000-7f474fc96000 r--p 00014000 08:31 917564 /lib/libgcc_s.so.1 7f474fc96000-7f474fc97000 rw-p 00015000 08:31 917564 /lib/libgcc_s.so.1 7f474fc97000-7f474fd19000 r-xp 00000000 08:31 918106 /lib/libm-2.12.1.so 7f474fd19000-7f474ff18000 ---p 00082000 08:31 918106 /lib/libm-2.12.1.so 7f474ff18000-7f474ff19000 r--p 00081000 08:31 918106 /lib/libm-2.12.1.so 7f474ff19000-7f474ff1a000 rw-p 00082000 08:31 918106 /lib/libm-2.12.1.so 7f474ff1a000-7f4750002000 r-xp 00000000 08:31 14684593 /usr/lib/libstdc++.so.6.0.14 7f4750002000-7f4750201000 ---p 000e8000 08:31 14684593 /usr/lib/libstdc++.so.6.0.14 7f4750201000-7f4750209000 r--p 000e7000 08:31 14684593 /usr/lib/libstdc++.so.6.0.14 7f4750209000-7f475020b000 rw-p 000ef000 08:31 14684593 /usr/lib/libstdc++.so.6.0.14 7f475020b000-7f4750220000 rw-p 00000000 00:00 0 7f4750220000-7f4750248000 r-xp 00000000 08:21 34381 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0 7f4750248000-7f4750448000 ---p 00028000 08:21 34381 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0 7f4750448000-7f4750449000 r--p 00028000 08:21 34381 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0 7f4750449000-7f475044a000 rw-p 00029000 08:21 34381 /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0 7f475044a000-7f4750462000 r-xp 00000000 08:31 918123 /lib/libpthread-2.12.1.so 7f4750462000-7f4750661000 ---p 00018000 08:31 918123 /lib/libpthread-2.12.1.so 7f4750661000-7f4750662000 r--p 00017000 08:31 918123 /lib/libpthread-2.12.1.so 7f4750662000-7f4750663000 rw-p 00018000 08:31 918123 /lib/libpthread-2.12.1.so 7f4750663000-7f4750667000 rw-p 00000000 00:00 0 7f4750667000-7f4750687000 r-xp 00000000 08:31 918051 /lib/ld-2.12.1.so 7f475085b000-7f4750861000 rw-p 00000000 00:00 0 7f4750878000-7f4750887000 rw-p 00000000 00:00 0 7f4750887000-7f4750888000 r--p 00020000 08:31 918051 /lib/ld-2.12.1.so 7f4750888000-7f4750889000 rw-p 00021000 08:31 918051 /lib/ld-2.12.1.so 7f4750889000-7f475088a000 rw-p 00000000 00:00 0 7fff824ab000-7fff824cd000 rw-p 00000000 00:00 0 [stack] 7fff82583000-7fff82584000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] |
From: Walenz, B. <bw...@jc...> - 2013-10-25 22:02:27
|
Q2: No. Long a desired feature, but never enough resources to implement it. Q1: If you load these preassemblies as reads, they will never be broken up. What should then happen is the 40x of Illumina reads will have containment overlaps to the preassemblies, and, hopefully, each preassembly will form a single unitig with gobs of Illumina reads (with mates) inside it. I tried something similar for one of our Salmon assemblies. It kind of worked, but the repeats were so complicated that it could have easily assembled a chimeric repeat. Careful! If you have any unique sequence at the end of the preassembled repeat, this 'read' will look like a normal non-repetitive read and you'll lose all the other copies of the repeat. An alternative approach would be to use the preassemblies to remove repeat reads from the assembly, then use a gap filler to walk through the resulting scaffold gaps. b On 10/25/13 1:20 PM, "Mayank Mahajan" <may...@ic...> wrote: > Hej Brian, > I have preasembled sanger data from the repeated regions in my > assembly. The repeats are too many and pretty messy to handle > manually. I know that I can provide them to the assembler as normal > fragments using fastaToCA. > > Q1.Is there some way to give really high priority to these > contig/unitig size fragments as each one of these reads is much high > quality than the Illumina reads which I use with 40X coverage. > > Q2. Is it possible to do backbone assembly. > > Regards, > Mayank > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |