wgs-assembler-users Mailing List for Whole-Genome Shotgun Assembler (Page 9)

Brought to you by: brianwalenz, jasonmiller9704, mcschatz, skoren

wgs-assembler-users — Discussion about Celera Assembler

You can subscribe to this list here.

2012	_Jan (1)	_Feb (2)	_Mar	_Apr (29)	_May (8)	_Jun (5)	_Jul (46)	_Aug (16)	_Sep (5)	_Oct (6)	_Nov (17)	_Dec (7)
2013	_Jan (5)	_Feb (2)	_Mar (10)	_Apr (13)	_May (20)	_Jun (7)	_Jul (6)	_Aug (14)	_Sep (9)	_Oct (19)	_Nov (17)	_Dec (3)
2014	_Jan (3)	_Feb	_Mar (7)	_Apr (1)	_May (1)	_Jun (30)	_Jul (10)	_Aug (2)	_Sep (18)	_Oct (3)	_Nov (4)	_Dec (13)
2015	_Jan (27)	_Feb	_Mar (19)	_Apr (12)	_May (10)	_Jun (18)	_Jul (4)	_Aug (2)	_Sep (2)	_Oct	_Nov (1)	_Dec (9)
2016	_Jan (6)	_Feb	_Mar	_Apr	_May	_Jun	_Jul (1)	_Aug (1)	_Sep (1)	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 7 8 9 10 11 .. 19 > >> (Page 9 of 19)

[wgs-assembler-users] BOGART error due to large data

From: Yeo Z. X. <zhe...@ya...> - 2014-01-06 05:50:32

Attachments: unitigger.err

Dear wgs-assembler admins,

I am new to runCA. The IT supports from my institute had installed CA8.0 a few months ago. 

I managed to generate an assembly using my Illumina Miseq data with both the BOG and BOGART Unitiggers. However, only BOGART failed to complete when I included a bigger Hiseq dataset on top of the Miseq dataset (see attached unittiger.err: "terminate called after throwing an instance of 'std::bad_alloc'"). Since BOGART is the recommended Unitigger for Illumina data, we hope to apply it to generate our assembly using all the data we have. 

For your information, the BOGART is working when I used a subset of the Miseq + Hiseq dataset for assembly by setting the frgMinLen=100 and ovlMinLen=50, suggesting the error is likely due to the size of input data. Our Hiseq data is ~80M of paired-reads (~160M read fragments in total) with most of them are 100bp read after merTrim trimming ( ~50GB of data). The Miseq data is ~10 times smaller than the Hiseq data.

Thank you for your supports.

Best regards,
Zhen Xuan YEO
------------------------------------------------------------
Senior Bioinformatics Specialist @ Yale-NUS College
Centre for BioImaging Sciences
Department of Biological Sciences
Blk S1A, 14 Science Drive 4
Lee Wee Kheng Building
117557 Singapore
(O) +65 65162723
(F) +65 67767882
Email: zhe...@ya...

[wgs-assembler-users] CA 8.1 released

From: Walenz, B. <bw...@jc...> - 2013-12-17 21:25:58

With a great sigh of relief, I have uploaded the CA 8.1 release to sourceforge.

Two significant recent enhancements are an increase in the maximum read length from 2Kb to 65Kb, and support for assemblies from uncorrected PacBio reads.

Our documentation remains weak — notably missing is a list of changes made since 7.0 — but the main page has four brand new example assemblies.

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page

Cheers!

b
--
Brian Walenz
Senior Software Engineer
J. Craig Venter Institute

Re: [wgs-assembler-users] Number of paired reads per scaffold

From: Walenz, B. <bw...@jc...> - 2013-12-06 18:51:35

Nope, it requires two mates to scaffold, and possibly sequence overlap if
one if implied by the mates.

Until repeats are resolved late late in the scaffolding stage, all the pairs
are unique.

When multiple pairs span a gap, the size of the gap is much more accurately
estimated.  This will influence later scaffold merges, by making the bounds
on the gap size tighter.  Looser bounds imply that we can make an incorrect
scaffold join because we can stretch the gap to fit in a contig that
otherwise has no sequence alignments to the neighbor contigs.  E.g., mates
imply the inserted contig could overlap its neighbors by 5k, but the loose
bound also lets us stretch the gap to fit the contig with no overlaps.

You can try reducing the number of pairs needed by decreasing MIN_EDGES from
2 to 1 in src/AS_CGW/GraphCGW_T.H.  BUT, the 2-edge assumption has been in
the assembler since the start, and this might not be the only place it is
set.  AND, suffice to say that I haven't tried this.

b

On 12/6/13 10:28 AM, "Waldbieser, Geoff" <Geo...@AR...>
wrote:

> Hi,
> In an assembly in which unitigs are produced from a relatively low (4-7X)
> coverage of longer reads (Sanger, PacBio, Moleculo, hopefully others to come),
> is a single pair of reads (PE or MP) sufficient to scaffold two contigs,
> assuming the pair is unique in the genome? Would there be any difference
> between scaffolding with a unique single pair vs unique multiple pairs that
> span a particular gap?
> 
> Geoff
> ________________________________
> Geoffrey C. Waldbieser
> Research Molecular Biologist
> USDA, ARS, Warmwater Aquaculture Research Unit
> 141 Experiment Station Road
> Stoneville, MS 38776
>

[wgs-assembler-users] Number of paired reads per scaffold

From: Waldbieser, G. <Geo...@AR...> - 2013-12-06 15:29:16

Hi,
In an assembly in which unitigs are produced from a relatively low (4-7X) coverage of longer reads (Sanger, PacBio, Moleculo, hopefully others to come), is a single pair of reads (PE or MP) sufficient to scaffold two contigs, assuming the pair is unique in the genome? Would there be any difference between scaffolding with a unique single pair vs unique multiple pairs that span a particular gap?

Geoff
________________________________
Geoffrey C. Waldbieser
Research Molecular Biologist
USDA, ARS, Warmwater Aquaculture Research Unit
141 Experiment Station Road
Stoneville, MS 38776




This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

Re: [wgs-assembler-users] cgw crashing in wgs 8.0

From: Walenz, B. <bw...@jc...> - 2013-11-25 22:16:30

Hi-

I think this is the 'out of memory' message.

How current is your code?  Just before the CA8 release, I increased the max
read length to 65k, but this also resulted in the dynamic programming matrix
now being 16gb.  I have a fix being tested at the moment.

How large is the genome?  You've got a 160Mb scaffold below.

Running on a larger machine would resolve the problem.  I hope to get an 8.1
release made this week.

b


On 11/20/13 5:56 AM, "James Abbott" <j.a...@im...> wrote:

> Hello,
> 
> I'm trying to re-assemble a fungal genome which I've previously
> assembled successfully using wgs 7.0, but with the addition of some
> PacBio CCS sequence, using wgs 8, however cgw is crashing with the
> following:
> 
> * Considering edges with weight >= 42.00 (maxWeightEdge 56 weightScale
> 0.7500)
> isQualityScaffoldMergingEdge()-- Merge scaffolds 19916 (1032943.1bp) and
> 20439 (166551122.0bp): gap -802155.4bp +- 3059.9bp weight 56 AB_BA edge
> terminate called after throwing an instance of 'std::bad_alloc'
>    what():  St9bad_alloc
> 
> Failed with 'Aborted'
> 
> Backtrace (mangled):
> 
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z17AS_UTL_catchCrashiP7siginfoPv+
> 0x31)[0x4391e9]
> /lib64/libpthread.so.0[0x370420eb10]
> /lib64/libc.so.6(gsignal+0x35)[0x3703630265]
> /lib64/libc.so.6(abort+0x110)[0x3703631d10]
> /usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[
> 0x32936bed14]
> /usr/lib64/libstdc++.so.6[0x32936bce16]
> /usr/lib64/libstdc++.so.6[0x32936bce43]
> /usr/lib64/libstdc++.so.6[0x32936bcf2a]
> /usr/lib64/libstdc++.so.6(_Znwm+0x79)[0x32936bd239]
> /usr/lib64/libstdc++.so.6(_Znam+0x9)[0x32936bd2f9]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_ZN13instrumentSCF7analyzeERSt6vec
> torI13instrumentLIBSaIS1_EE+0x441)[0x50c223]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46ec1c]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z28isQualityScaffoldMergingEdgeP9
> EdgeCGW_TP9NodeCGW_TS2_P20ScaffoldInstrumenterP12VarArrayTypedd+0x1e4)[0x46fbb
> c]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z36ExamineSEdgeForUsability_Inter
> leavedP9EdgeCGW_TP16InterleavingSpecP9NodeCGW_TS4_+0x12a)[0x475fb0]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e317]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e524]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z24MergeScaffoldsAggressiveP14Sca
> ffoldGraphTPci+0x2a6)[0x46e820]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(main+0x242d)[0x4376ff]
> /lib64/libc.so.6(__libc_start_main+0xf4)[0x370361d994]
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(__gxx_personality_v0+0x149)[0x4350
> 59]
> 
> Backtrace (demangled):
> 
> [0] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::AS_UTL_catchCrash(int,
> siginfo*, void*) + 0x31  [0x4391e9]
> [1] /lib64/libpthread.so.0 [0x370420eb10]
> [2] /lib64/libc.so.6::(null) + 0x35  [0x3703630265]
> [3] /lib64/libc.so.6::(null) + 0x110  [0x3703631d10]
> [4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler()
> + 0x114  [0x32936bed14]
> [5] /usr/lib64/libstdc++.so.6 [0x32936bce16]
> [6] /usr/lib64/libstdc++.so.6 [0x32936bce43]
> [7] /usr/lib64/libstdc++.so.6 [0x32936bcf2a]
> [8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x79
> [0x32936bd239]
> [9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9
> [0x32936bd2f9]
> [10] 
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::instrumentSCF::analyze(std::vecto
> r<instrumentLIB, 
> std::allocator<instrumentLIB> >&) + 0x441  [0x50c223]
> [11] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46ec1c]
> [12] 
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::isQualityScaffoldMergingEdge(Edge
> CGW_T*, 
> NodeCGW_T*, NodeCGW_T*, ScaffoldInstrumenter*, VarArrayType*, double,
> double) + 0x1e4  [0x46fbbc]
> [13] 
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::ExamineSEdgeForUsability_Interlea
> ved(EdgeCGW_T*, 
> InterleavingSpec*, NodeCGW_T*, NodeCGW_T*) + 0x12a  [0x475fb0]
> [14] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e317]
> [15] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e524]
> [16] 
> /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::MergeScaffoldsAggressive(Scaffold
> GraphT*, 
> char*, int) + 0x2a6  [0x46e820]
> [17] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x242d [0x4376ff]
> [18] /lib64/libc.so.6::(null) + 0xf4  [0x370361d994]
> [19] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x149 [0x435059]
> 
> I've recompiled with debugging enabled and rerun the job but the cgw.out
> is >700Mb, which I can make available via http if this would be useful.
> The assembler was compiled and run under CentOS 5.6 (gcc 4.1.2, glibc
> 2.5.58), with 32Gb RAM available (we have higher memory machines
> available if this may be a factor...).
> 
> The gpkstore.info for the assembly is as follows:
> 
> libIID  bgnIID  endIID  active  deleted mated   totLen  clrLen libName
> 0       1       2789548 2789548 0       640132  1631468328
> 1380548976      GLOBAL
> 0       0       0       0       0       0       0       0 LegacyUnmatedReads
> 1       1       183478  183478  0       176410  179564258
> 179564258       plasmids
> 2       183479  658786  475308  0       463722  391222096
> 391222096       fosmids
> 3       658787  1208799 550013  0       0       146236541
> 146236541       FLX
> 4       1208800 2760713 1551914 0       0       850798225
> 599878873       XLR
> 5       2760714 2789548 28835   0       0       63647208 63647208
> PacBio
> 
> So there is a rather low proportion of mate-pairs compared to reads from
> fragment libraries (an area the PacBio data was intended to resolve,
> however due to DNA quality it has only been possible to generate CCS
> reads so far...). The genome in question is highly repetitive (~70%),
> with many nested transposons, and I believe our existing assembly is
> 'over-scaffolded' due to misplaced mate-pairs in repeat regions, so am
> intending to increase cgwMinMergeWeight, and probably
> cgwMergeFilterLevel to increase the stringency of the scaffolding,
> however as a first pass was running using the default settings.
> 
> Any suggestions as to whether this crash is due to a bug or the nature
> of the genome/combination of data types?
> 
> Best Regards,
> James

[wgs-assembler-users] Stuck at consensusAfterScaffolder

From: <mic...@ip...> - 2013-11-25 10:10:34

Attachments: peaxi_026.err

Hello, 

After several restarts of an assembly with error-corrected PacBio reads using CA 8.0, i got stuck at the consensusAfterScaffolder stage. Removing consensus.sh and rerunning the assembly couldnt overcome the following error (at end of post). 
I found a earlier post about consensusAfterScaffolder  - problems (in CA 6.0 i think) in the tracker but wasnt sure if its solution is still valid for CA 8.0. 
Attached is one of the 3 error-files which didnt succeed (from 8-consensus/ ). 

When looking at the error-files in 8-consensus/ i found two of them terminating with: 

#IncBaseCount i out of range (possibly non ACGTN letter?)ctgcns: MultiAlignment_CNS.C:193: int IncBaseCount(BaseCount*, char): Assertion `0' failed.
#
#Failed with 'Aborted'

and one with: 

# Failed with 'Segmentation fault'




$less runCA.sge.out.10

ENV: SGE_TASK_ID needs to be unset, done.
/data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/peaxi_026 failed -- no .success.
/data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/peaxi_056 failed -- no .success.
/data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/peaxi_057 failed -- no .success.
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /home/mmoser/CELERA/wgs-8.0/Linux-amd64/bin/runCA line 1501
        main::caFailure('3 consensusAfterScaffolder jobs failed; remove /data/users/mm...', undef) called at /home/mmoser/CELERA/wgs-8.0/Linux-amd64/bin/runCA line 5542
        main::postScaffolderConsensus() called at /home/mmoser/CELERA/wgs-8.0/Linux-amd64/bin/runCA line 6317

----------------------------------------
Failure message:

3 consensusAfterScaffolder jobs failed; remove /data/users/mmoser/Celera/main/assembly20.11.13/8-consensus/consensus.sh to try again

Re: [wgs-assembler-users] surrogate

From: Walenz, B. <bw...@jc...> - 2013-11-21 23:00:45

Hi-

The surrogates are included in contigs.  Searching against scaffolds +
degenerates (or contigs + degenerates) will use every assembled sequence.

A bit more:

Degenerates are either short low coverage unitigs or repeats that were not
placed.  Repeats that are placed are called surrogates, because they can
appear in more than one place.  We'll place the reads in the surrogates if
they can be placed uniquely (by a mate pair) otherwise, the contig where the
surrogate is placed will have zero read coverage in the posmap outputs.

b

On 11/21/13 4:22 PM, "Cristell Navarro"
<cri...@om...> wrote:

> Hi
> I was analizing the data that did not match in the scaffolds, first I
> looked for the degenerates, and now I would like to analize the Surrogate.
> 
> Where can I find a fasta file of the Surrogates contigs?
> 
> Cristell.
> 
> 
> ------------------------------------------------------------------------------
> Shape the Mobile Experience: Free Subscription
> Software experts and developers: Be at the forefront of tech innovation.
> Intel(R) Software Adrenaline delivers strategic insight and game-changing
> conversations that shape the rapidly evolving mobile landscape. Sign up now.
> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

[wgs-assembler-users] surrogate

From: Cristell N. <cri...@om...> - 2013-11-21 21:28:36

Hi
I was analizing the data that did not match in the scaffolds, first I 
looked for the degenerates, and now I would like to analize the Surrogate.

Where can I find a fasta file of the Surrogates contigs?

Cristell.

[wgs-assembler-users] cgw crashing in wgs 8.0

From: James A. <j.a...@im...> - 2013-11-20 10:56:40

Hello,

I'm trying to re-assemble a fungal genome which I've previously 
assembled successfully using wgs 7.0, but with the addition of some 
PacBio CCS sequence, using wgs 8, however cgw is crashing with the 
following:

* Considering edges with weight >= 42.00 (maxWeightEdge 56 weightScale 
0.7500)
isQualityScaffoldMergingEdge()-- Merge scaffolds 19916 (1032943.1bp) and 
20439 (166551122.0bp): gap -802155.4bp +- 3059.9bp weight 56 AB_BA edge
terminate called after throwing an instance of 'std::bad_alloc'
   what():  St9bad_alloc

Failed with 'Aborted'

Backtrace (mangled):

/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z17AS_UTL_catchCrashiP7siginfoPv+0x31)[0x4391e9]
/lib64/libpthread.so.0[0x370420eb10]
/lib64/libc.so.6(gsignal+0x35)[0x3703630265]
/lib64/libc.so.6(abort+0x110)[0x3703631d10]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[0x32936bed14]
/usr/lib64/libstdc++.so.6[0x32936bce16]
/usr/lib64/libstdc++.so.6[0x32936bce43]
/usr/lib64/libstdc++.so.6[0x32936bcf2a]
/usr/lib64/libstdc++.so.6(_Znwm+0x79)[0x32936bd239]
/usr/lib64/libstdc++.so.6(_Znam+0x9)[0x32936bd2f9]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_ZN13instrumentSCF7analyzeERSt6vectorI13instrumentLIBSaIS1_EE+0x441)[0x50c223]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46ec1c]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z28isQualityScaffoldMergingEdgeP9EdgeCGW_TP9NodeCGW_TS2_P20ScaffoldInstrumenterP12VarArrayTypedd+0x1e4)[0x46fbbc]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z36ExamineSEdgeForUsability_InterleavedP9EdgeCGW_TP16InterleavingSpecP9NodeCGW_TS4_+0x12a)[0x475fb0]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e317]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw[0x46e524]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(_Z24MergeScaffoldsAggressiveP14ScaffoldGraphTPci+0x2a6)[0x46e820]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(main+0x242d)[0x4376ff]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x370361d994]
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw(__gxx_personality_v0+0x149)[0x435059]

Backtrace (demangled):

[0] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::AS_UTL_catchCrash(int, 
siginfo*, void*) + 0x31  [0x4391e9]
[1] /lib64/libpthread.so.0 [0x370420eb10]
[2] /lib64/libc.so.6::(null) + 0x35  [0x3703630265]
[3] /lib64/libc.so.6::(null) + 0x110  [0x3703631d10]
[4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() 
+ 0x114  [0x32936bed14]
[5] /usr/lib64/libstdc++.so.6 [0x32936bce16]
[6] /usr/lib64/libstdc++.so.6 [0x32936bce43]
[7] /usr/lib64/libstdc++.so.6 [0x32936bcf2a]
[8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x79 
[0x32936bd239]
[9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9 
[0x32936bd2f9]
[10] 
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::instrumentSCF::analyze(std::vector<instrumentLIB, 
std::allocator<instrumentLIB> >&) + 0x441  [0x50c223]
[11] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46ec1c]
[12] 
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::isQualityScaffoldMergingEdge(EdgeCGW_T*, 
NodeCGW_T*, NodeCGW_T*, ScaffoldInstrumenter*, VarArrayType*, double, 
double) + 0x1e4  [0x46fbbc]
[13] 
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::ExamineSEdgeForUsability_Interleaved(EdgeCGW_T*, 
InterleavingSpec*, NodeCGW_T*, NodeCGW_T*) + 0x12a  [0x475fb0]
[14] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e317]
[15] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw [0x46e524]
[16] 
/scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::MergeScaffoldsAggressive(ScaffoldGraphT*, 
char*, int) + 0x2a6  [0x46e820]
[17] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x242d [0x4376ff]
[18] /lib64/libc.so.6::(null) + 0xf4  [0x370361d994]
[19] /scratch/BluGen/wgs-8.0/Linux-amd64/bin/cgw::(null) + 0x149 [0x435059]

I've recompiled with debugging enabled and rerun the job but the cgw.out 
is >700Mb, which I can make available via http if this would be useful. 
The assembler was compiled and run under CentOS 5.6 (gcc 4.1.2, glibc 
2.5.58), with 32Gb RAM available (we have higher memory machines 
available if this may be a factor...).

The gpkstore.info for the assembly is as follows:

libIID  bgnIID  endIID  active  deleted mated   totLen  clrLen libName
0       1       2789548 2789548 0       640132  1631468328 
1380548976      GLOBAL
0       0       0       0       0       0       0       0 LegacyUnmatedReads
1       1       183478  183478  0       176410  179564258 
179564258       plasmids
2       183479  658786  475308  0       463722  391222096 
391222096       fosmids
3       658787  1208799 550013  0       0       146236541 
146236541       FLX
4       1208800 2760713 1551914 0       0       850798225 
599878873       XLR
5       2760714 2789548 28835   0       0       63647208 63647208        
PacBio

So there is a rather low proportion of mate-pairs compared to reads from 
fragment libraries (an area the PacBio data was intended to resolve, 
however due to DNA quality it has only been possible to generate CCS 
reads so far...). The genome in question is highly repetitive (~70%), 
with many nested transposons, and I believe our existing assembly is 
'over-scaffolded' due to misplaced mate-pairs in repeat regions, so am 
intending to increase cgwMinMergeWeight, and probably 
cgwMergeFilterLevel to increase the stringency of the scaffolding, 
however as a first pass was running using the default settings.

Any suggestions as to whether this crash is due to a bug or the nature 
of the genome/combination of data types?

Best Regards,
James

-- 
Dr. James Abbott
Lead Bioinformatician
Bioinformatics Support Service
Imperial College, London

Re: [wgs-assembler-users] consensus failures in wgs-8.0

From: Waldbieser, G. <Geo...@AR...> - 2013-11-18 20:44:50

My error - I didn't overwrite this assembly.
There were 122 partitions and all but one was successful:

Mon Nov 18 14:44:17 geoff@RAMona:~/CocoAssembly/CA_23_wgs8/5-consensus $ cat PBsuper_089.cns.err
MultiAlignStore::loadMASRfile()-- Failed to open '/home/geoff/CocoAssembly/CA_23/PBsuper.tigStore/seqDB.v002.p089.utg': magic number mismatch; file=0x00000000 code=0x5253414d

Mon Nov 18 14:45:45 geoff@RAMona:~/CocoAssembly/CA_23_wgs8/PBsuper.tigStore $ ll seqDB.v002.p089*
-rw-r--r-- 1 geoff users 65482752 Nov 14 08:35 seqDB.v002.p089.dat
-rw-r--r-- 1 root  root         0 Nov 14 08:36 seqDB.v002.p089.utg



From: Walenz, Brian [mailto:bw...@jc...]
Sent: Monday, November 18, 2013 2:31 PM
To: Ole Kristian Tørresen
Cc: Waldbieser, Geoff; wgs...@li...
Subject: Re: [wgs-assembler-users] consensus failures in wgs-8.0

I've occasionally had trouble with moderately deep (75x) unitigs that are long (megabases) that can come from PacBio assemblies.  Yes, the occasional very deep unitigs (> 100x) caused by repeats/contaminants are now handled well.

The utgcns *.err logging now includes the depth of the unitig and the amount of sequence in contained reads:

Working on unitig 0 (0 unitigs and 88003 fragments)
  unitig 0 detected 85431 contains (70.68x, 90.86%) 2572 dovetail (7.11x, 9.14%)
    unitig 0 removing 85431 contains; processing only 2572 reads

In this case, I changed parameters (cnsReduceUnitigs=75 5 IIRC) to use only the dovetail reads for consensus.

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=RunCA#Consensus


On 11/18/13 2:47 PM, "Ole Kristian Tørresen" <ol...@st...> wrote:
Hi Geoff and Brian.
Couldn't this be a very deep unitig? I often have trouble with this.

But when I'm looking at the code now, you seem to have fixed this, Brian.

What is the content of the last partition's .err file, Geoff?

Ole


On 18 November 2013 20:28, Walenz, Brian <bw...@jc...> wrote:
Hi, Geoff-

I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run.  With BITS=15, it needed 'only' 4gb.  Hard to tell if this is your problem.

b




On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR... <http://Geo...@AR...> > wrote:
I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437.
___________________________________
Geoffrey C. Waldbieser
Research Molecular Biologist
Warmwater Aquaculture Research Unit
Agricultural Research Service
United States Department of Agriculture
Stoneville, MS 38776
(662) 686-3593





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] consensus failures in wgs-8.0

From: Walenz, B. <bw...@jc...> - 2013-11-18 20:31:13

I’ve occasionally had trouble with moderately deep (75x) unitigs that are long (megabases) that can come from PacBio assemblies. Yes, the occasional very deep unitigs (> 100x) caused by repeats/contaminants are now handled well.

The utgcns *.err logging now includes the depth of the unitig and the amount of sequence in contained reads:

Working on unitig 0 (0 unitigs and 88003 fragments)
unitig 0 detected 85431 contains (70.68x, 90.86%) 2572 dovetail (7.11x, 9.14%)
unitig 0 removing 85431 contains; processing only 2572 reads

In this case, I changed parameters (cnsReduceUnitigs=75 5 IIRC) to use only the dovetail reads for consensus.

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=RunCA#Consensus

On 11/18/13 2:47 PM, "Ole Kristian Tørresen" <ol...@st...> wrote:

Hi Geoff and Brian.
Couldn't this be a very deep unitig? I often have trouble with this.

But when I'm looking at the code now, you seem to have fixed this, Brian.

What is the content of the last partition's .err file, Geoff?

Ole

On 18 November 2013 20:28, Walenz, Brian <bw...@jc...> wrote:
Hi, Geoff-

I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run. With BITS=15, it needed ‘only’ 4gb. Hard to tell if this is your problem.

On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR... <http://Geo...@AR...> > wrote:

I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437.
___________________________________
Geoffrey C. Waldbieser
Research Molecular Biologist
Warmwater Aquaculture Research Unit
Agricultural Research Service
United States Department of Agriculture
Stoneville, MS 38776
(662) 686-3593

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] consensus failures in wgs-8.0

From: Waldbieser, G. <Geo...@AR...> - 2013-11-18 20:21:18

I apologize but I accidentally overwrote that data when restarting a new assembly with wgs 7. I will recompile 8.0 with AS_READ_NORMAL_LEN_BITS = 15 and retry. This is the setting I have used successfully with wgs7 versions, and is likely the source of my problem. Thanks!

Geoff

From: ti...@gm... [mailto:ti...@gm...] On Behalf Of Ole Kristian Tørresen
Sent: Monday, November 18, 2013 1:47 PM
To: Walenz, Brian
Cc: Waldbieser, Geoff; wgs...@li...
Subject: Re: [wgs-assembler-users] consensus failures in wgs-8.0

Hi Geoff and Brian.
Couldn't this be a very deep unitig? I often have trouble with this.

But when I'm looking at the code now, you seem to have fixed this, Brian.

What is the content of the last partition's .err file, Geoff?

Ole

On 18 November 2013 20:28, Walenz, Brian <bw...@jc...<mailto:bw...@jc...>> wrote:
Hi, Geoff-

I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run.  With BITS=15, it needed 'only' 4gb.  Hard to tell if this is your problem.

b

On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR...<http://Geo...@AR...>> wrote:
I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437.
___________________________________
Geoffrey C. Waldbieser
Research Molecular Biologist
Warmwater Aquaculture Research Unit
Agricultural Research Service
United States Department of Agriculture
Stoneville, MS 38776
(662) 686-3593

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...<mailto:wgs...@li...>
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] consensus failures in wgs-8.0

From: Ole K. T. <ol...@st...> - 2013-11-18 19:47:29

Hi Geoff and Brian.
Couldn't this be a very deep unitig? I often have trouble with this.

But when I'm looking at the code now, you seem to have fixed this, Brian.

What is the content of the last partition's .err file, Geoff?

Ole


On 18 November 2013 20:28, Walenz, Brian <bw...@jc...> wrote:

>  Hi, Geoff-
>
> I just realized that my future-proofing increase of
> AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that
> utgcns now needs 16gb memory to run.  With BITS=15, it needed ‘only’ 4gb.
>  Hard to tell if this is your problem.
>
> b
>
>
>
>
> On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR...>
> wrote:
>
> I have set up an assembly of PacBio long reads and Illumina single reads
> (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122
> files. After utgcnsfix errors, I have restarted wgs 3 times (after removing
> the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I
> have finally come to the last job that will not run. This data assembled in
> a few hours using subversion wgs_r4437.
> ___________________________________
> Geoffrey C. Waldbieser
> Research Molecular Biologist
> Warmwater Aquaculture Research Unit
> Agricultural Research Service
> United States Department of Agriculture
> Stoneville, MS 38776
> (662) 686-3593
>
>
>
>
>
> This electronic message contains information generated by the USDA solely
> for the intended recipients. Any unauthorized interception of this message
> or the use or disclosure of the information it contains may violate the law
> and subject the violator to civil or criminal penalties. If you believe you
> have received this message in error, please notify the sender and delete
> the email immediately.
>
>
>
> ------------------------------------------------------------------------------
> Shape the Mobile Experience: Free Subscription
> Software experts and developers: Be at the forefront of tech innovation.
> Intel(R) Software Adrenaline delivers strategic insight and game-changing
> conversations that shape the rapidly evolving mobile landscape. Sign up
> now.
> http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>
>

Re: [wgs-assembler-users] consensus failures in wgs-8.0

From: Walenz, B. <bw...@jc...> - 2013-11-18 19:30:57

Hi, Geoff-

I just realized that my future-proofing increase of AS_READ_MAX_NORMAL_LEN_BITS from the usual default of 11 to 16 means that utgcns now needs 16gb memory to run.  With BITS=15, it needed ‘only’ 4gb.  Hard to tell if this is your problem.

b



On 11/15/13 7:34 PM, "Waldbieser, Geoff" <Geo...@AR...> wrote:

I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437.
___________________________________
Geoffrey C. Waldbieser
Research Molecular Biologist
Warmwater Aquaculture Research Unit
Agricultural Research Service
United States Department of Agriculture
Stoneville, MS 38776
(662) 686-3593





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

[wgs-assembler-users] consensus failures in wgs-8.0

From: Waldbieser, G. <Geo...@AR...> - 2013-11-16 00:49:23

I have set up an assembly of PacBio long reads and Illumina single reads (84bp to 4kb length) with wgs-8.0. Consensus was partitioned into 122 files. After utgcnsfix errors, I have restarted wgs 3 times (after removing the 5-consensus/consensus.sh file). Each iteration fixes more jobs, but I have finally come to the last job that will not run. This data assembled in a few hours using subversion wgs_r4437.
___________________________________
Geoffrey C. Waldbieser
Research Molecular Biologist
Warmwater Aquaculture  Research Unit
Agricultural Research Service
United States Department of Agriculture
Stoneville, MS 38776
(662) 686-3593





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

Re: [wgs-assembler-users] CA 8.0 on large genome using PacBio

From: Walenz, B. <bw...@jc...> - 2013-11-15 19:32:15

Hi-

The obtStore.err is claiming that the inputs (*.ovb.gz) are not in gzip
format.  At this stage, all it is doing is 'gzip -l
0-overlaptrim-overlap/001/000462.ovb.gz' to see the true length of the
compressed file.  Gzip itself is complaining that the file is not gzip
format.  Are these files valid (non-empty)?  Were there any error reported
in the overlap job output (0-overlaptrim-overlap/*.err)?  You can also clean
up this directory (remove *.err and the 001/ directory) and run a few jobs
by hand (sh overlap.sh 1, sh overlap.sh 2, etc) to see that they run OK.

I don't see anything wrong in your spec, but would suggest some changes.

unitigger=bogart
utgGraphErrorRate=0.05
utgMergeErrorRate=0.05
batMemory=X (X in gigabytes)
batThreads=Y (default is to use all CPUs on the machine)

The bogart unitigger seems to work much better than 'bog', but it more
expensive to run.  It needs to load non-best overlaps into memory.  If you
leave this unset, it will default to using all physical memory on the
machine.

I increased the allowed error rate from 3% to 5%.  This should result in
better unitig construction, but rarely can end up declaring that similar
unique sequence is a repeat and breaking the unitig.

doFragmentCorrection=0

This is an expensive step, that can improve results.  For now, leave it off.

b

On 11/13/13 11:07 AM, "mic...@ip..."
<mic...@ip...> wrote:

> Dear WGS developers
> 
> I am trying to assemble a subset of 2 Gbp error corrected PacBio reads with
> CA8.0 using SGE to get a feeling about speed and  resources needed for the
> assembly. The full dataset will be about 13,8 Gbp of error-corrected PacBio
> reads. 
> Read lengths range from 790 up to 19'000 bp.
> I assume its curcial to have optimal parameter settings in the spec-file.
> Unfortunately, i cant find appropriate examples for my dataset.
> 
> Currently i am stuck with an error in the obt store generation:
> Failure message:
> 
> failed to build the obt store
> 
> Would it be possible provide a PacBio spec template and give me some hint
> about how to overcome the obt store failure?
> Attached are error-files from the runCA.sge.out , the std.err and the
> obtStore.err along with the used spec file.
> 
> Thank you very much,
> Michel
> 
> 
> 
> 
> ________________________________________
> Von: wgs...@li...
> [wgs...@li...]
> Gesendet: Mittwoch, 13. November 2013 16:31
> An: Moser, Michel (IPS)
> Betreff: Mailman privacy alert
> 
> An attempt was made to subscribe your address to the mailing list
> wgs...@li....  You are already subscribed to this
> mailing list.
> 
> Note that the list membership is not public, so it is possible that a bad
> person was trying to probe the list for its membership.  This would be a
> privacy violation if we let them do this, but we didn't.
> 
> If you submitted the subscription request and forgot that you were already
> subscribed to the list, then you can ignore this message.  If you suspect that
> an attempt is being made to covertly discover whether you are a member of this
> list, and you are worried about your privacy, then feel free to send a message
> to the list administrator at wgs...@li....

Re: [wgs-assembler-users] Typo in fastqToCA?

From: Walenz, B. <bw...@jc...> - 2013-11-13 18:07:05

You're not supposed to read the documentation!  :-)

Yes, pacbio-corrected should say the reads are output from pacBioToCA.  Cut-n-paste wins again.  Thanks.

________________________________________
From: Waldbieser, Geoff [Geo...@AR...]
Sent: Wednesday, November 13, 2013 12:55 PM
To: wgs...@li...
Subject: [wgs-assembler-users] Typo in fastqToCA?

Hi,
Under the technology flag in fastqToCA, it lists both pacbio-corrected and pacbio-raw as "uncorrected reads". Is this just a typo or a listing for a future fix?

-technology p      What instrument were these reads generated on ('illumina' is the default):
                       'none'               -- don't set any features; use -feature to set them manually
                       'sanger'             -- reads from dideoxy sequencers
                       '454'                -- reads from 454 Life Sciences; FLX, Titanium, FLX+
                       'illumina'           -- reads from Illumina; GAIIx, MiSeq, HiSeq; shorter than 160bp
                       'illumina-long'      -- reads from Illumina; GAIIx, MiSeq, HiSeq; any length
                       'pacbio-ccs'         -- reads from PacBio; Circular Consensus Sequence (CSS)
                       'pacbio-corrected'   -- reads from PacBio; uncorrected reads
                       'pacbio-raw'         -- reads from PacBio; uncorrected reads

Geoff

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

[wgs-assembler-users] Typo in fastqToCA?

From: Waldbieser, G. <Geo...@AR...> - 2013-11-13 17:55:11

Hi,
Under the technology flag in fastqToCA, it lists both pacbio-corrected and pacbio-raw as "uncorrected reads". Is this just a typo or a listing for a future fix?



-technology p      What instrument were these reads generated on ('illumina' is the default):
                       'none'               -- don't set any features; use -feature to set them manually
                       'sanger'             -- reads from dideoxy sequencers
                       '454'                -- reads from 454 Life Sciences; FLX, Titanium, FLX+
                       'illumina'           -- reads from Illumina; GAIIx, MiSeq, HiSeq; shorter than 160bp
                       'illumina-long'      -- reads from Illumina; GAIIx, MiSeq, HiSeq; any length
                       'pacbio-ccs'         -- reads from PacBio; Circular Consensus Sequence (CSS)
                       'pacbio-corrected'   -- reads from PacBio; uncorrected reads
                       'pacbio-raw'         -- reads from PacBio; uncorrected reads

Geoff





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

[wgs-assembler-users] CA 8.0 on large genome using PacBio

From: <mic...@ip...> - 2013-11-13 16:24:33

Attachments: peaxi-subset.obtStore.err peaxi-subset.spec2 runCA.sge.out.02 runCA.sh.e23349

Dear WGS developers

I am trying to assemble a subset of 2 Gbp error corrected PacBio reads with CA8.0 using SGE to get a feeling about speed and resources needed for the assembly. The full dataset will be about 13,8 Gbp of error-corrected PacBio reads.
Read lengths range from 790 up to 19'000 bp.
I assume its curcial to have optimal parameter settings in the spec-file. Unfortunately, i cant find appropriate examples for my dataset.

Currently i am stuck with an error in the obt store generation:
Failure message:

failed to build the obt store

Would it be possible provide a PacBio spec template and give me some hint about how to overcome the obt store failure?
Attached are error-files from the runCA.sge.out , the std.err and the obtStore.err along with the used spec file.

Thank you very much,
Michel

________________________________________
Von: wgs...@li... [wgs...@li...]
Gesendet: Mittwoch, 13. November 2013 16:31
An: Moser, Michel (IPS)
Betreff: Mailman privacy alert

An attempt was made to subscribe your address to the mailing list
wgs...@li.... You are already subscribed to this mailing list.

Note that the list membership is not public, so it is possible that a bad
person was trying to probe the list for its membership. This would be a
privacy violation if we let them do this, but we didn't.

If you submitted the subscription request and forgot that you were already
subscribed to the list, then you can ignore this message. If you suspect that
an attempt is being made to covertly discover whether you are a member of this
list, and you are worried about your privacy, then feel free to send a message
to the list administrator at wgs...@li....

Re: [wgs-assembler-users] gap length

From: Walenz, B. <bw...@jc...> - 2013-11-11 18:54:14

The length=20 gaps mean that the mate pairs claim the adjacent contigs
overlap, but no sequence alignment could be found.  The .asm file contains
the true (negative) gap length in the CTP message.

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=ASM_File
s#SCF_CTP

mea: The mean distance gives the predicted number of bases in the gap
between the contigs. It is measured from contig end to contig end. A
negative distance indicates that the contigs overlap (according to their
aggregate mate pairs) though their consensus sequences do not align. In the
FASTA representation of a scaffold, negative gap lengths are represented
arbitrarily by 20 N's.

b

On 11/11/13 1:07 PM, "Cristell Navarro"
<cri...@om...> wrote:

> Hi!
> 
> I'm looking for an explanation because my scaffolds have most of their
> gaps with length = 20. This means something in special?, Is this a
> minimal gap lenght that the assembler use for gaps of unknow length?
> 
> I hope you could help me with this issue, because I would like to submit
> my data to ncbi...
> 
> thanks in advance!
> 
> Cristell
> 
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models. Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

[wgs-assembler-users] gap length

From: Cristell N. <cri...@om...> - 2013-11-11 18:07:34

Hi!

I'm looking for an explanation because my scaffolds have most of their 
gaps with length = 20. This means something in special?, Is this a 
minimal gap lenght that the assembler use for gaps of unknow length?

I hope you could help me with this issue, because I would like to submit 
my data to ncbi...

thanks in advance!

Cristell

Re: [wgs-assembler-users] CA8.0 download link

From: Walenz, B. <bw...@jc...> - 2013-10-28 19:46:08

Hi-

We hoped to release on Oct 1, but kept delaying because other projects are eating our time.  We want to generate a set of example assemblies as documentation before releasing.  For many reason, we’re probably going to make a release this week without finishing all the examples.

You can grab all the current bits from svn until then.

https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile

b



On 10/27/13 3:40 PM, "Dave Messina" <on...@da...> wrote:

Hi,

On the main page here: http://sourceforge.net/apps/mediawiki/wgs-assembler/

CA 8.0 is listed as having been release on October 1, but [IN PROGRESS]. The download link (http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.0/) is invalid and reverts to the main download directory where the CA 7.0 is the most recent version available.

Is CA 8.0 available for use, and if so could you please point me to it?


Thanks!

[wgs-assembler-users] CA8.0 download link

From: Dave M. <on...@da...> - 2013-10-27 20:11:45

Hi,

On the main page here: http://sourceforge.net/apps/mediawiki/wgs-assembler/

CA 8.0 is listed as having been release on October 1, but [IN PROGRESS].
The download link (
http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.0/)
is invalid and reverts to the main download directory where the CA 7.0 is
the most recent version available.

Is CA 8.0 available for use, and if so could you please point me to it?


Thanks!

[wgs-assembler-users] correct-olaps error

From: Zhou Qi <zh...@be...> - 2013-10-26 05:14:10

Hi, 

I’m using MaSuRCA assembler which takes CA6.0 as core and I got the some error message. Any input would be very much appreciated. My command is:

runCA  gkpFixInsertSizes=0 jellyfishHashSize=10000000000 ovlRefBlockSize=1225516 ovlHashBlockSize=122551 ovlCorrBatchSize=40000000 utgErrorRate=0.03 merylMemory=8192 ovlMemory=8GB stopAfter=unitigger ovlMerThreshold=75 bogBreakAtIntersections=0 unitigger=bog bogBadMateDepth=1000000 -p genome -d CA merylThreads=32 frgCorrThreads=1 frgCorrConcurrency=48 cnsConcurrency=48 ovlCorrConcurrency=8 ovlConcurrency=48 ovlThreads=1 doFragmentCorrection=1 doOverlapBasedTrimming=1 doExtendClearRanges=2 ovlMerSize=22 superReadSequences_shr.frg /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/Genomic_reads/Dmir.female.454.frg f5.cor.clean.frg m5.cor.clean.frg f1.cor.clean.frg f2.cor.clean.frg m2.cor.clean.frg   1> runCA0.out 2>&1

and the error message is:

----------------------------------------END CONCURRENT Fri Oct 25 21:36:34 2013 (64 seconds)
Overlap correction job 1 (/jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/CA/3-overlapcorrection/0001) failed.
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/runCA line 1121
	main::caFailure('1 overlap correction jobs failed; remove /jbods/data01/DATA/d...', undef) called at /jbods/data01/DATA/dmiran
da/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/runCA line 3019
	main::overlapCorrection() called at /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/
bin/runCA line 5343

----------------------------------------
Failure message:

1 overlap correction jobs failed; remove /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/CA/3-overlapcorrection
/ovlcorr.sh (or run by hand) to try again

*** buffer overflow detected ***: /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps terminated
======= Backtrace: =========
/lib/libc.so.6(__fortify_fail+0x37)[0x7f474f7fcb47]
/lib/libc.so.6(+0xfea00)[0x7f474f7fba00]
/jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps[0x4030ee]
/jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps[0x40520c]
/lib/libc.so.6(__libc_start_main+0xfe)[0x7f474f71bd8e]
/jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps[0x402279]
======= Memory map: ========
00400000-00427000 r-xp 00000000 08:21 40083                              /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps
00626000-00627000 r--p 00026000 08:21 40083                              /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps
00627000-00628000 rw-p 00027000 08:21 40083                              /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/new/MaSuRCA-2.1.0/CA/Linux-amd64/bin/correct-olaps
00628000-00636000 rw-p 00000000 00:00 0 
01234000-68f4b000 rw-p 00000000 00:00 0                                  [heap]
7f46690d9000-7f46690da000 ---p 00000000 00:00 0 
7f46690da000-7f4669ada000 rw-p 00000000 00:00 0 
7f4669ada000-7f4669adb000 ---p 00000000 00:00 0 
7f4669adb000-7f466a4db000 rw-p 00000000 00:00 0 
7f466a4db000-7f466a4dc000 ---p 00000000 00:00 0 
7f466a4dc000-7f466aedc000 rw-p 00000000 00:00 0 
7f466b8dd000-7f474f6fd000 rw-p 00000000 00:00 0 
7f474f6fd000-7f474f877000 r-xp 00000000 08:31 918102                     /lib/libc-2.12.1.so
7f474f877000-7f474fa77000 ---p 0017a000 08:31 918102                     /lib/libc-2.12.1.so
7f474fa77000-7f474fa7b000 r--p 0017a000 08:31 918102                     /lib/libc-2.12.1.so
7f474fa7b000-7f474fa7c000 rw-p 0017e000 08:31 918102                     /lib/libc-2.12.1.so
7f474fa7c000-7f474fa81000 rw-p 00000000 00:00 0 
7f474fa81000-7f474fa96000 r-xp 00000000 08:31 917564                     /lib/libgcc_s.so.1
7f474fa96000-7f474fc95000 ---p 00015000 08:31 917564                     /lib/libgcc_s.so.1
7f474fc95000-7f474fc96000 r--p 00014000 08:31 917564                     /lib/libgcc_s.so.1
7f474fc96000-7f474fc97000 rw-p 00015000 08:31 917564                     /lib/libgcc_s.so.1
7f474fc97000-7f474fd19000 r-xp 00000000 08:31 918106                     /lib/libm-2.12.1.so
7f474fd19000-7f474ff18000 ---p 00082000 08:31 918106                     /lib/libm-2.12.1.so
7f474ff18000-7f474ff19000 r--p 00081000 08:31 918106                     /lib/libm-2.12.1.so
7f474ff19000-7f474ff1a000 rw-p 00082000 08:31 918106                     /lib/libm-2.12.1.so
7f474ff1a000-7f4750002000 r-xp 00000000 08:31 14684593                   /usr/lib/libstdc++.so.6.0.14
7f4750002000-7f4750201000 ---p 000e8000 08:31 14684593                   /usr/lib/libstdc++.so.6.0.14
7f4750201000-7f4750209000 r--p 000e7000 08:31 14684593                   /usr/lib/libstdc++.so.6.0.14
7f4750209000-7f475020b000 rw-p 000ef000 08:31 14684593                   /usr/lib/libstdc++.so.6.0.14
7f475020b000-7f4750220000 rw-p 00000000 00:00 0 
7f4750220000-7f4750248000 r-xp 00000000 08:21 34381                      /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0
7f4750248000-7f4750448000 ---p 00028000 08:21 34381                      /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0
7f4750448000-7f4750449000 r--p 00028000 08:21 34381                      /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0
7f4750449000-7f475044a000 rw-p 00029000 08:21 34381                      /jbods/data01/DATA/dmiranda/assemblyV3/D.miranda/MaSuRCA/MaSuRCA-2.1.0/lib/libjellyfish-2.0.so.2.0.0
7f475044a000-7f4750462000 r-xp 00000000 08:31 918123                     /lib/libpthread-2.12.1.so
7f4750462000-7f4750661000 ---p 00018000 08:31 918123                     /lib/libpthread-2.12.1.so
7f4750661000-7f4750662000 r--p 00017000 08:31 918123                     /lib/libpthread-2.12.1.so
7f4750662000-7f4750663000 rw-p 00018000 08:31 918123                     /lib/libpthread-2.12.1.so
7f4750663000-7f4750667000 rw-p 00000000 00:00 0 
7f4750667000-7f4750687000 r-xp 00000000 08:31 918051                     /lib/ld-2.12.1.so
7f475085b000-7f4750861000 rw-p 00000000 00:00 0 
7f4750878000-7f4750887000 rw-p 00000000 00:00 0 
7f4750887000-7f4750888000 r--p 00020000 08:31 918051                     /lib/ld-2.12.1.so
7f4750888000-7f4750889000 rw-p 00021000 08:31 918051                     /lib/ld-2.12.1.so
7f4750889000-7f475088a000 rw-p 00000000 00:00 0 
7fff824ab000-7fff824cd000 rw-p 00000000 00:00 0                          [stack]
7fff82583000-7fff82584000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Re: [wgs-assembler-users] using contigs from sanger shotgun in the assembly.

From: Walenz, B. <bw...@jc...> - 2013-10-25 22:02:27

Q2: No.  Long a desired feature, but never enough resources to implement it.

Q1: If you load these preassemblies as reads, they will never be broken up.
What should then happen is the 40x of Illumina reads will have containment
overlaps to the preassemblies, and, hopefully, each preassembly will form a
single unitig with gobs of Illumina reads (with mates) inside it.

I tried something similar for one of our Salmon assemblies.  It kind of
worked, but the repeats were so complicated that it could have easily
assembled a chimeric repeat.

Careful!  If you have any unique sequence at the end of the preassembled
repeat, this 'read' will look like a normal non-repetitive read and you'll
lose all the other copies of the repeat.

An alternative approach would be to use the preassemblies to remove repeat
reads from the assembly, then use a gap filler to walk through the resulting
scaffold gaps.

b

On 10/25/13 1:20 PM, "Mayank Mahajan" <may...@ic...> wrote:

> Hej Brian,
> I have preasembled sanger data from the repeated regions in my
> assembly. The repeats are too many and pretty messy to handle
> manually. I know that I can provide them to the assembler as normal
> fragments using fastaToCA.
> 
> Q1.Is there some way to give really high priority to these
> contig/unitig size fragments as each one of these reads is much high
> quality than the Illumina reads which I use with 40X coverage.
> 
> Q2. Is it possible to do backbone assembly.
> 
> Regards,
> Mayank
> 
> 
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 7 8 9 10 11 .. 19 > >> (Page 9 of 19)

2012	Jan (1)	Feb (2)	Mar	Apr (29)	May (8)	Jun (5)	Jul (46)	Aug (16)	Sep (5)	Oct (6)	Nov (17)	Dec (7)
2013	Jan (5)	Feb (2)	Mar (10)	Apr (13)	May (20)	Jun (7)	Jul (6)	Aug (14)	Sep (9)	Oct (19)	Nov (17)	Dec (3)
2014	Jan (3)	Feb	Mar (7)	Apr (1)	May (1)	Jun (30)	Jul (10)	Aug (2)	Sep (18)	Oct (3)	Nov (4)	Dec (13)
2015	Jan (27)	Feb	Mar (19)	Apr (12)	May (10)	Jun (18)	Jul (4)	Aug (2)	Sep (2)	Oct	Nov (1)	Dec (9)
2016	Jan (6)	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (1)	Sep (1)	Oct	Nov	Dec