Dear all,
My Solexa reads are 100bp PE, which are supported by Celera Assembler. I assembled the data using CA, But the result is incomprehensible that there were only <1M in contigs and scaffolds and many sequences in .deg.fasta file. The organism should be ~40M.
Here is my spec file:
#---------------------------------
overlapper=ovl
unitigger = bog
merylThreads = 30
ovlThreads = 30
cgwUseUnitigOverlaps=1
#--------------------------------
Other parameters were default.
Could anybody tell me the reason?
I will appreciate any help.
For genomes larger than a bacteria, increase K with the merSize parameter. The default is 14 for bacteria and we use 22 for mammals. You might try a value of 18.
For Illumina data, it is wise to set utgErrorLimit to 2.5 so that short overlaps with up to 2 errors can be used in the low coverage regions.
Instead of unitigger=bog, you are welcome to try unitigger=bogart but you have to check out and build the latest source code version from CVS. Search for instructions on our Source Forge pages.