Main Page
From wgs-assembler
Celera Assembler : scientific software for biological research. Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler has enabled many advances in genomics, including the first whole genome shotgun sequence of a multi-cellular organism (Myers 2000) and the first diploid sequence of an individual human (Levy 2007). Celera Assembler was developed at Celera Genomics starting in 1999. It was released to SourceForge in 2004 as the wgs-assembler under the GNU General Public License. The pipeline revised for 454 data was named CABOG (Miller 2008).
Celera Assembler can use any combination of reads from:
- dideoxy (Sanger) sequencing platforms such as the Applied Biosystems 3730 DNA Analyzer and 3730xl DNA Analyzer
- pyrosequencing platforms such as the 454 Life Sciences Genome Sequencer FLX Titanium and GS Junior.
(Reads from the discontinued Genome Sequencer FLX before Titanium reagents and Genome Sequencer 20 are supported as well.) - sequencing by synthesis platforms such as the Illumina HiSeq 2000, Genome Analyzer IIx and Genome Analyzer IIe.
(Reads shorter than 75bp are not supported.) - single-molecule sequencing platforms such as the Pacific Biosciences PacBio RS (after correction with a complementary technology using the pacBioToCA pipeline.)
Resources
User guides
- Celera Assembler Terminology and Theory.
- runCA, RunCA Dissection, RunCA Examples, SpecFiles, Utilities.
- Get Help from the Developers.
- Report bugs. Please use Bug Tracker instead of Email.
- Request Features.
Input formats
The Celera Assembler expects input fragment data to be in the FRG format. We provide several utilities for converting a variety of data types into this format:
- fastaToCA - converts sequence and quality values in fasta format.
- tracearchiveToCA - converts xml, qual and fasta from the NCBI TraceDB into FRG format.
- sffToCA - converts 454 SFF files into FRG format, optionally searching each read for 'linker' sequence indicating the read is a pair of mated reads.
- fastqToCA - generates a FRG file that allows direct loading of Illumina FastQ files.
- pacBioToCA - A correction pipeline for PacBio RS sequencing data. Uses short-read technologies to generate high-accuracy consensus for PacBio RS sequences. The output is a FRG file (along with fasta and qual).
Output formats
- ASM Files = The Celera Assembler native output format.
- QC Metrics = The statistical summary.
- POSMAP = Positional maps in perl-friendly text files.
- FASTA Files = With consensus sequence and quality values.
Downloads
Start by downloading a tested release package. Releases include pre-compiled binaries for Linux. Adventurous users are welcome to check out any version of the source code (including what is currently in development), compile it, and hope for the best.
- Download the latest! Packages Database.
- List of release packages with release notes and errata.
- Latest version: 6.1, released 30 March 2010 (release notes, errata).
- check out and compile the source code from the CVS repository
- Requirements for running Celera Assembler.
Events
CA 7.0 Release
Celera Assembler 7.0 was released on January 12, 2012. Download. Read the release notes. See the change log. Find any known problems.
Mailing List
Users of Celera Assembler are encouraged to sign up to the wgs-assembler-users mailing list. The list is intended for discussion on using Celera Assembler. We'll announce new releases, new features and bug fixes too. Bug reports should still be reported to the bug tracker.
User Group Meeting: Jan 2012
The J. Craig Venter Institute will host the CAUG 2012 Celera Assembler User Group Meeting Thursday & Friday, 12-13 January 2012. Contact us about registration (ATGatJCVIdotORG). The format will be similar to the CAUG 2010 of 26-27 August 2010. Thanks to all 30 participants from around the world, and to the U.S. National Institute of General Medical Sciences (NIGMS) for funding.
CA 6.1 Release
Celera Assembler 6.1 was released on April 30th, 2010. This is the first version with support for Illumina sequence data. See Releases, fastq support, release notes, the change log, known problems, and test results.
Internship Opportunity
The J. Craig Venter Institute will hire summer interns to work on a variety of scientific endeavors including the Celera Assembler software. Students at the graduate, undergraduate, and high school levels should apply through the JCVI Internship Program. Funding for Celera Assembler internships is provided by a grant from the National Institute of General Medical Sciences (NIGMS). It is too late to apply for a summer 2011 position so please apply in regard to future semesters.
Publications
- Myers et al. (2000) A Whole-Genome Assembly of Drosophila. Science 287 2196-2204.
- Venter et al. (2001) The Sequence of the Human Genome. Science 291 1304-1351.
- Mural et al. (2002) A Comparison of Whole-Genome Shotgun-Derived Mouse Chromosome 16 and the Human Genome. Science 296 1661-1671.
- Holt et al. (2002) The Genome Sequence of the Marlaria Mosquito Anopheles Gambiae. Science 298 129-149.
- Zdobnov et al. (2002) Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster. Science 298 149-159.
- Fasulo et al. (2002) Efficiently detecting polymorphisms during the fragment assembly process. Bioinformatics 18 Supp(1):S294-302
- Istrail et al. (2004) Whole-Genome Shotgun Assembly and Comparison of Human Genome Assemblies. PNAS 101 1916-1921.
- Venter et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304 66-74.
- Goldberg et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. PNAS 103(43):16057
- Rhesus Macaque Consortium (2007) Evolutionary and Biomedical Insights from the Rhesus Macaque Genome. Science 316 222-234.
- Ghedin et al. (2007) Draft Genome of the Filarial Nematode Parasite Brugia malayi Science 21, September.
- Carlton et al. (2007) Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis. Science 315 207-212.
- Rusch et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biology 1821060.
- Levy et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biology 0050254.
- Denisov et al. (2008) Consensus Generation and Variant Detection by Celera Assembler. Bioinformatics 24(8):1035-40
- Miller et al. (2008) Aggressive Assembly of Pyrosequencing Reads with Mates. Bioinformatics 24(24):2818-2824
- Zimin et al. (2009) A whole-genome assembly of the domestic cow, Bos taurus. Genome Biology 10:R42
- Miller et al. (2009) Shotgun Assembly of a Repetitive Plant Genome. Cucumber Poster
- Rausch et al. (2009) A consistency-based consensus algorithm for de novo... Bioinformatics 25(9):1118-1124
- Chapman et al. (2010) The dynamic genome of Hydra Nature March 14.
- Shulaev et al. (2010) The genome of woodland strawberry Nature Genetics December 26.
- Spanu et al. (2010) Genome expansion in powdery mildew fungi Science, December 10.
- Lorenzi et al. (2010) New assembly of Entamoeba histolytica PLoS Neglected Tropical Diseases, June 15
- Miller, Koren, Sutton (2010) Assembly algorithms for next-generation sequencing data. Genomics, March 6.
- Miller et al. (2010) Bonobo genome de novo assembly generated by CABOG. Bonobo Poster ISMB, Boston
- Dalloul et al. (2010) Multi-platform next-generation sequencing of domestic turkey (Meleagris gallopavo), PLoS Biology
- Kirkness et al. (2010) Genome sequence of the human body louse Science, July
- Koren, Miller, Walenz, Sutton (2010) Automated Closure Algorithm BMC Bioinformatics, September
- Lawniczak et al. (2010) Widespread Divergence Between Incipient Anopheles gambiae Species Revealed by Whole Genome Sequences. Science, October.
- Nelson, Weinstock et al. (2010) A Catalog of Reference Genomes from the Human Microbiome. Science, May 21.
- Inskeep, Rusch et al. (2010) Metagenomes from High-Temperature Chemotrophic Systems. PLoS One, March.
- O'Neal, Dzurisin et al. (2010) Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics.
- Jones et al. (2011) Genomic insights into the physiology and ecology of the marine filamentous cyanobacterium Lyngbya majuscula. PNAS, 10.1073.
- Miller, Hayes et al. (2011) Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). PNAS, June.
- Wóycicki, Witkowicz et al. (2011) The Genome Sequence of the North-European Cucumber (Cucumis sativus L.) PLoS one, July.
- Star, Nederbragt et al. (2011) The genome sequence of Atlantic cod reveals a unique immune system. Nature, August.
- Wang, Chen et al. (2011) The draft genome of the carcinogenic human liver fluke Clonorchis sinensis. Genome Biology, October.
- Walenz, Sutton, Miller (2011) Pair classification within Illumina mate pair data, Cold Spring Harbor Genome Informatics, November 2-5, 2011.
- Gillespie et al. (2012) A Rickettsia Genome Overrun by Mobile Genetic Elements Provides Insight into the Acquisition of Genes Characteristic of an Obligate Intracellular Lifestyle. Journal of Bacteriology, January.
- Prüfer, et al. (2012) The bonobo genome compared with the chimpanzee and human genomes, Nature, June 2012.
- Koren, Schatz, Walenz et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads, Nature Biotechnology, July 2012.
Sponsors
- The National Institute of General Medical Sciences (NIGMS), NIH; Grant 2R01-GM077117-04A1.
- The J. Craig Venter Institute (JCVI)
- The University of Maryland Center for Bioinformatics and Computational Biology (CBCB)
- Historical
- The J. Craig Venter Science Foundation (JCVSF)
- The Institute for Genomic Research (TIGR)
- Celera Genomics and Applied Biosystems
- The National Institute of Allergy and Infectious Disease (NIAID), NIH; Contract N01-AI-30071, "Microbial Genome Centers" (MSC).
- The National Institute of Allergy and Infectious Disease (NIAID), NIH; Contract HHSN266200400038C, "Bioinformatics Resource Centers for Biodefense and Emerging/Re-emerging Infectious Diseases" (BRC).
