Version 5.0 Release Notes
This is Celera Assembler v5.0.
Full documentation, including downloading the latest version, compiling and running, is in runCA50.pdf.
Please see the wiki or the project page for help.
Note that this distribution already includes the kmer package.
When referencing this work, please cite the following papers.
Arthur L. Delcher, Sergey Koren, Jason R. Miller, Eli Venter, Brian P. Walenz, Anushka Brownley, Justin Johnson, Kelvin Li, Clark Mobarry, and Granger Sutton.
Aggressive Assembly of Pyrosequencing Reads with Mates.
Istrail et al. (2004)
Whole-Genome Shotgun Assembly and Comparison of Human Genome Assemblies. PNAS 101 1916-1921.
Myers et al. (2000)
A Whole-Genome Assembly of Drosophila.
Science 287 2196-2204.
Venter et al. (2001)
The Sequence of the Human Genome.
Science 291 1304-1351.
For the impatient, CA can be compiled with:
% cd kmer % sh configure.sh % gmake install % cd ../src % gmake % cd ..
Download fragments from NCBI TraceDB, convert to Celera Assembler .frg format, assemble.
% ftp ftp ftp.ncbi.nih.gov ftp> cd pub/TraceDB/porphyromonas_gingivalis_w83 ftp> ls -l 227 Entering Passive Mode (130,14,29,30,196,62). 150 Opening ASCII mode data connection for file list -r--r--r-- 1 ftp anonymous 651216 Apr 13 2003 anc.porphyromonas_gingivalis_w83.001.gz -r--r--r-- 1 ftp anonymous 211115 Apr 13 2003 clip.porphyromonas_gingivalis_w83.001.gz -r--r--r-- 1 ftp anonymous 9345640 Apr 13 2003 fasta.porphyromonas_gingivalis_w83.001.gz -r--r--r-- 1 ftp anonymous 16063422 Apr 13 2003 qual.porphyromonas_gingivalis_w83.001.gz -r--r--r-- 1 ftp anonymous 923704 Apr 13 2003 xml.porphyromonas_gingivalis_w83.001.gz 226 Transfer complete. ftp> bin ftp> prompt ftp> mget fasta* qual* xml* ftp> bye
Convert the NCBI files to Celera Assembler input. The last step takes about a minute.
% perl wgs/Linux-amd64/bin/tracedb-to-frg.pl -xml xml.porphyromonas_gingivalis_w83.001.gz % perl wgs/Linux-amd64/bin/tracedb-to-frg.pl -lib xml.porphyromonas_gingivalis_w83.001.gz % perl wgs/Linux-amd64/bin/tracedb-to-frg.pl -frg xml.porphyromonas_gingivalis_w83.001.gz % ls -l -rw-rw-r-- 1 bwalenz tigr 1183 Jun 20 01:17 porphyromonas_gingivalis_w83.1.lib.frg -rw-rw-r-- 1 bwalenz tigr 17922184 Jun 20 01:18 porphyromonas_gingivalis_w83.2.001.frg.bz2 -rw-rw-r-- 1 bwalenz tigr 685205 Jun 20 01:17 porphyromonas_gingivalis_w83.3.lkg.frg
% perl wgs/Linux-amd64/bin/runCA -p pging -d testassembly fakeUIDs=1 porphyromonas_gingivalis_w83.* % ls -l testassembly/9-terminator/*fasta -rw-rw-r-- 1 bwalenz tigr 2345968 Jun 20 01:31 testassembly/9-terminator/pging.ctgcns.fasta -rw-rw-r-- 1 bwalenz tigr 94320 Jun 20 01:31 testassembly/9-terminator/pging.degcns.fasta -rw-rw-r-- 1 bwalenz tigr 2349011 Jun 20 01:31 testassembly/9-terminator/pging.scfcns.fasta -rw-rw-r-- 1 bwalenz tigr 563326 Jun 20 01:31 testassembly/9-terminator/pging.singleton.fasta -rw-rw-r-- 1 bwalenz tigr 2624848 Jun 20 01:31 testassembly/9-terminator/pging.utgcns.fasta
CHANGES since version 4.4
Major bug fixes and new features
- Clean up 454 reads. Reads with N's are now discarded. Some changes in trimming hueristics seem to work better. Options 'discardReadsWithNs', 'doNotQVTrim' and 'goodBadQVThreshold' apply.
- extendClearRanges can now skip or only process specific scaffolds or gaps. This is helpful in progressing past rare failures in the algorithm.
- Preliminary support for 454 mated reads has been added.
- Significant improvements to the Best Overlap Graph (BOG) unitigger.
Minor bug fixes
- Input files may now be listed in the spec file.
- Update many exit codes from -1 to EXIT_FAILURE (== 1). SGE interprets an exit code of -1 as the job wanting to restart itself, which results in infinite loops.
- The SGE restart switch was removed from qsub commands.
- When loading fragments, replace invalid sequence/quality values with valid 'n'/0. The fragment is loaded, but marked as deleted. Previously, gatekeeper would fail.
- One letter too many was being included in the library name when loading 454 reads. This resulted in a single half-plate having 5 libraries instead of 1.
- Add support for dumping Newbler-ready *.fna and *.fna.qual files.
- Seeds were not being output symettrically; only A-vs-B was reported, not both A-vs-B and B-vs-A. This caused the seed extender (olap-from-seeds) to see only half the overlaps when correcting sequencing errors.
- A couple of non-critical memory leaks were fixed.
- An array out of bounds was fixed.
- Contained reads were being incorrectly given a low corrected error rate, by removing bad alignment at the end of the read (making it no longer contained).
- An off by one error was sometimes extending alignments one base past the clear range.
- The mer limit threshold was tuned; more overlaps are now discovered.
overlap based trimming
- Fragments with no high-quality non-vector sequence were incorrectly trimmed to the high-quality vector sequence.
- Improve logging of actions taken.
- Improve performance of several components (initialTrim, prefixDelete, merge-trimming).
- The chimera detection was sometimes generating fragments smaller than the minimum allowed size.
- The chimera detection routines were tuned and are more sensitive now.
- Stability improvements in ContigContainment().
- Improve stability in extendClearRanges by using an alternate alignment method.
- Sometimes, consensus would move the placement of fragments to to produce a better alignment. This movement could break containment relationships, which would later cause a failure. We now detect and correct this.
- Place temporary files in a temporary directory.
- Use the library name (SEQ_LIB_ID) as the UID, instead of creating a hopefully unique integer.
- Allow extracting just one library.
Copyright 1999-2004, Applera Corporation. All rights reserved.
Copyright 2005-2008, J. Craig Venter Institute.
The Celera Assembler Software (the "Software") is covered by one or more U.S. patents and is being made available free of charge by Applera Corporation subject to the terms and conditions of the GNU General Public License, version 2, as published by the Free Software Foundation (the "GNU General Public License"). The Software is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License. The Software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTIES, EXPRESS OR IMPLIED (INCLUDING, WITHOUT LIMITATION, ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE). You should have received (LICENSE.txt) a copy of the GNU General Public License along with the Software; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA