#129 problem solexa assembly

Assembly_pipeline
pending
Feature (48)
5
2011-09-30
2011-09-28
Z-G LI
No

Dear all,
My Solexa reads are 100bp PE, which are supported by Celera Assembler. I assembled the data using CA, But the result is incomprehensible that there were only <1M in contigs and scaffolds and many sequences in .deg.fasta file. The organism should be ~40M.

Here is my spec file:
#---------------------------------
overlapper=ovl
unitigger = bog
merylThreads = 30
ovlThreads = 30
cgwUseUnitigOverlaps=1
#--------------------------------
Other parameters were default.

Could anybody tell me the reason?
I will appreciate any help.

Discussion

  • Jason Miller

    Jason Miller - 2011-09-30
    • labels: --> Feature
    • assigned_to: nobody --> jasonmiller9704
    • status: open --> pending
     
  • Jason Miller

    Jason Miller - 2011-09-30

    For genomes larger than a bacteria, increase K with the merSize parameter. The default is 14 for bacteria and we use 22 for mammals. You might try a value of 18.

    For Illumina data, it is wise to set utgErrorLimit to 2.5 so that short overlaps with up to 2 errors can be used in the low coverage regions.

    Instead of unitigger=bog, you are welcome to try unitigger=bogart but you have to check out and build the latest source code version from CVS. Search for instructions on our Source Forge pages.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks