Celera Assembler : scientific software for biological research. Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler has enabled many advances in genomics, including the first whole genome shotgun sequence of a multi-cellular organism (Myers 2000) and the first diploid sequence of an individual human (Levy 2007). Celera Assembler was developed at Celera Genomics starting in 1999. It was released to SourceForge in 2004 as the wgs-assembler under the GNU General Public License. The pipeline revised for 454 data was named CABOG (Miller 2008).
Celera Assembler can use any combination of reads from:
- [Yersinia_pestis_KIM_D27,_using_454_8_Kbp_mated_reads,_with_CA8.1] (with CA8.0)
- [Yersinia_pestis_KIM_D27,_using_Illumina_paired-end_reads,_with_CA8.1] (with CA8.0)
- [Porphyromonas_gingivalis_W83,_using_454_3_Kbp_mated_reads,_with_CA8.1] (with CA8.0)
- Escherichia coli K12 MG1655, using corrected PacBio reads with CA8.1
- [Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1] (with CA8.0)
- Older examples
The Celera Assembler expects input fragment data to be in the FRG format. We provide several utilities for converting a variety of data types into this format:
- [FastaToCA] - converts sequence and quality values in fasta format.
- [TracearchiveToCA] - converts xml, qual and fasta from the NCBI TraceDB into FRG format.
- [SffToCA] - converts 454 SFF files into FRG format, optionally searching each read for 'linker' sequence indicating the read is a pair of mated reads.
- [FastqToCA] - generates a FRG file that allows direct loading of Illumina FastQ files.
- [PacBioToCA] - A correction pipeline for PacBio RS sequencing data. Uses only PacBio RS sequences or short-read technologies to generate high-accuracy consensus. The output is a FRG file (along with fasta and qual).
CA 8.1 Release
Celera Assembler 8.1 was released on 16 December, 2013. Download. Release notes. Change log. Errata.
CA 8.0 Release
Celera Assembler 8.0 was released on 5 November, 2013. Download. Release notes. Change log. Errata.
CA 7.0 Release
Celera Assembler 7.0 was released on January 12, 2012. Download. Release notes. Change log. Errata. See also [Best_Practices].
Users of Celera Assembler are encouraged to sign up to the wgs-assembler-users mailing list. The list is intended for discussion on using Celera Assembler. We'll announce new releases, new features and bug fixes too. Bug reports should still be reported to the bug tracker.
User Group Meeting: Jan 2012
The J. Craig Venter Institute will host the [CAUG_2012] Celera Assembler User Group Meeting Thursday & Friday, 12-13 January 2012. Contact us about registration (ATGatJCVIdotORG). The format will be similar to the [CAUG_2010] of 26-27 August 2010. Thanks to all 30 participants from around the world, and to the U.S. National Institute of General Medical Sciences (NIGMS) for funding.
CA 6.1 Release
Celera Assembler 6.1 was released on April 30th, 2010. This is the first version with support for Illumina sequence data. See Releases, fastq support, release notes, the change log, errata, and test results.
The J. Craig Venter Institute will hire summer interns to work on a variety of scientific endeavors including the Celera Assembler software. Students at the graduate, undergraduate, and high school levels should apply through the JCVI Internship Program. Funding for Celera Assembler internships is provided by a grant from the National Institute of General Medical Sciences (NIGMS). It is too late to apply for a summer 2011 position so please apply in regard to future semesters.
- Myers et al. (2000) A Whole-Genome Assembly of Drosophila. Science 287 2196-2204.
- Venter et al. (2001) The Sequence of the Human Genome. Science 291 1304-1351.
- Mural et al. (2002) A Comparison of Whole-Genome Shotgun-Derived Mouse Chromosome 16 and the Human Genome. Science 296 1661-1671.
- Holt et al. (2002) The Genome Sequence of the Marlaria Mosquito Anopheles Gambiae. Science 298 129-149.
- Zdobnov et al. (2002) Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster. Science 298 149-159.
- Fasulo et al. (2002) Efficiently detecting polymorphisms during the fragment assembly process. Bioinformatics 18 Supp(1):S294-302
- Istrail et al. (2004) Whole-Genome Shotgun Assembly and Comparison of Human Genome Assemblies. PNAS 101 1916-1921.
- Venter et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304 66-74.
- Goldberg et al. (2006) A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. PNAS 103(43):16057
- Rhesus Macaque Consortium (2007) Evolutionary and Biomedical Insights from the Rhesus Macaque Genome. Science 316 222-234.
- Ghedin et al. (2007) _Brugia malayi_">Draft Genome of the Filarial Nematode Parasite Brugia malayi Science 21, September.
- Carlton et al. (2007) Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis. Science 315 207-212.
- Rusch et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biology 1821060.
- Levy et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biology 0050254.
- Denisov et al. (2008) Consensus Generation and Variant Detection by Celera Assembler. Bioinformatics 24(8):1035-40
- Miller et al. (2008) Aggressive Assembly of Pyrosequencing Reads with Mates. Bioinformatics 24(24):2818-2824
- Salzberg et al. (2008) Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae BMC Genomics.
- Zimin et al. (2009) A whole-genome assembly of the domestic cow, Bos taurus. Genome Biology 10:R42
- Miller et al. (2009) Shotgun Assembly of a Repetitive Plant Genome. [Cucumber_Poster]
- Rausch et al. (2009) A consistency-based consensus algorithm for de novo... Bioinformatics 25(9):1118-1124
- Chapman et al. (2010) The dynamic genome of Hydra Nature March 14.
- Shulaev et al. (2010) The genome of woodland strawberry Nature Genetics December 26.
- Spanu et al. (2010) Genome expansion in powdery mildew fungi Science, December 10.
- Lorenzi et al. (2010) New assembly of _Entamoeba histolytica_">Entamoeba histolytica PLoS Neglected Tropical Diseases, June 15
- Miller, Koren, Sutton (2010) Assembly algorithms for next-generation sequencing data. Genomics, March 6.
- Miller et al. (2010) Bonobo genome de novo assembly generated by CABOG. [Bonobo_Poster] ISMB, Boston
- Dalloul et al. (2010) _Meleagris gallopavo_)">Multi-platform next-generation sequencing of domestic turkey (Meleagris gallopavo), PLoS Biology
- Kirkness et al. (2010) Genome sequence of the human body louse Science, July
- Koren, Miller, Walenz, Sutton (2010) Automated Closure Algorithm BMC Bioinformatics, September
- Lawniczak et al. (2010) _Anopheles gambiae_ Species Revealed by Whole Genome Sequences">Widespread Divergence Between Incipient Anopheles gambiae Species Revealed by Whole Genome Sequences. Science, October.
- Nelson, Weinstock et al. (2010) A Catalog of Reference Genomes from the Human Microbiome. Science, May 21.
- Inskeep, Rusch et al. (2010) Metagenomes from High-Temperature Chemotrophic Systems. PLoS One, March.
- O'Neal, Dzurisin et al. (2010) Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics.
- Jones et al. (2011) _Lyngbya majuscula_.">Genomic insights into the physiology and ecology of the marine filamentous cyanobacterium Lyngbya majuscula. PNAS, 10.1073.
- Miller, Hayes et al. (2011) _Sarcophilus harrisii_ (Tasmanian devil).">Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). PNAS, June.
- Wóycicki, Witkowicz et al. (2011) _Cucumis sativus_ L.)">The Genome Sequence of the North-European Cucumber (Cucumis sativus L.) PLoS one, July.
- Star, Nederbragt et al. (2011) The genome sequence of Atlantic cod reveals a unique immune system. Nature, August.
- Wang, Chen et al. (2011) _Clonorchis sinensis_.">The draft genome of the carcinogenic human liver fluke Clonorchis sinensis. Genome Biology, October.
- Walenz, Sutton, Miller (2011) [Pair_classification_within_Illumina_mate_pair_data], Cold Spring Harbor Genome Informatics, November 2-5, 2011.
- Gillespie et al. (2012) A Rickettsia Genome Overrun by Mobile Genetic Elements Provides Insight into the Acquisition of Genes Characteristic of an Obligate Intracellular Lifestyle. Journal of Bacteriology, January.
- Prüfer, et al. (2012) The bonobo genome compared with the chimpanzee and human genomes, Nature, June 2012.
- Koren, Schatz, Walenz et al. (2012) Hybrid error correction and de novo assembly of single-molecule sequencing reads, Nature Biotechnology, July 2012.
- Tatti et al (2013) Draft Genome Sequences of Bordetella holmesii Strains ASM Genome Announcements.