Whole-Genome Shotgun Assembler / Feature Requests / #129 problem solexa assembly

problem solexa assembly

#129 problem solexa assembly

Milestone: Assembly_pipeline

Status: pending

Owner: Jason Miller

Labels: Feature (48)

Priority: 5

Updated: 2011-09-30

Created: 2011-09-28

Creator: Z-G LI

Private: No

Dear all,
My Solexa reads are 100bp PE, which are supported by Celera Assembler. I assembled the data using CA, But the result is incomprehensible that there were only <1M in contigs and scaffolds and many sequences in .deg.fasta file. The organism should be ~40M.

Here is my spec file:
#---------------------------------
overlapper=ovl
unitigger = bog
merylThreads = 30
ovlThreads = 30
cgwUseUnitigOverlaps=1
#--------------------------------
Other parameters were default.

Could anybody tell me the reason?
I will appreciate any help.

Discussion

Jason Miller - 2011-09-30

labels: --> Feature

assigned_to: nobody --> jasonmiller9704

status: open --> pending
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason Miller - 2011-09-30

For genomes larger than a bacteria, increase K with the merSize parameter. The default is 14 for bacteria and we use 22 for mammals. You might try a value of 18.

For Illumina data, it is wise to set utgErrorLimit to 2.5 so that short overlaps with up to 2 errors can be used in the low coverage regions.

Instead of unitigger=bog, you are welcome to try unitigger=bogart but you have to check out and build the latest source code version from CVS. Search for instructions on our Source Forge pages.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

problem solexa assembly

Group

Searches

Help

#129 problem solexa assembly

Discussion