1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Main Page

From wgs-assembler

(Difference between revisions)
Jump to: navigation, search
(Publications)
m
 
(14 intermediate revisions not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
 +
<div style="float: left; width: 100%">
<div style="float: left; width: 100%">
'''Celera Assembler''' : scientific software for biological research. Celera Assembler is a ''de novo'' whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by [http://en.wikipedia.org/wiki/Shotgun_sequencing whole-genome shotgun sequencing]. Celera Assembler has enabled many advances in genomics, including the first whole genome shotgun sequence of a multi-cellular organism [http://www.sciencemag.org/cgi/content/abstract/287/5461/2196 (Myers 2000)] and the first diploid sequence of an individual human [http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0050254 (Levy 2007)]. Celera Assembler was developed at [http://www.celera.com Celera Genomics] starting in 1999. It was released to SourceForge in 2004 as the '''wgs-assembler''' under the GNU [http://www.gnu.org/licenses/gpl.html General Public License]. The pipeline revised for 454 data was named '''CABOG''' [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn548 (Miller 2008)].
'''Celera Assembler''' : scientific software for biological research. Celera Assembler is a ''de novo'' whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by [http://en.wikipedia.org/wiki/Shotgun_sequencing whole-genome shotgun sequencing]. Celera Assembler has enabled many advances in genomics, including the first whole genome shotgun sequence of a multi-cellular organism [http://www.sciencemag.org/cgi/content/abstract/287/5461/2196 (Myers 2000)] and the first diploid sequence of an individual human [http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0050254 (Levy 2007)]. Celera Assembler was developed at [http://www.celera.com Celera Genomics] starting in 1999. It was released to SourceForge in 2004 as the '''wgs-assembler''' under the GNU [http://www.gnu.org/licenses/gpl.html General Public License]. The pipeline revised for 454 data was named '''CABOG''' [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn548 (Miller 2008)].
Line 7: Line 8:
*'''pyrosequencing''' platforms such as the [http://454.com/index.asp 454 Life Sciences] [http://454.com/products-solutions/system-benefits.asp Genome Sequencer FLX Titanium] and [http://www.gsjunior.com/ GS Junior].<br>(Reads from the discontinued Genome Sequencer FLX before Titanium reagents and Genome Sequencer 20 are supported as well.)
*'''pyrosequencing''' platforms such as the [http://454.com/index.asp 454 Life Sciences] [http://454.com/products-solutions/system-benefits.asp Genome Sequencer FLX Titanium] and [http://www.gsjunior.com/ GS Junior].<br>(Reads from the discontinued Genome Sequencer FLX before Titanium reagents and Genome Sequencer 20 are supported as well.)
*'''sequencing by synthesis''' platforms such as the [http://www.illumina.com/ Illumina] [http://www.illumina.com/systems/hiseq_2000.ilmn HiSeq 2000], [http://www.illumina.com/systems/genome_analyzer_iix.ilmn Genome Analyzer IIx] and [http://www.illumina.com/systems/genome_analyzer.ilmn Genome Analyzer IIe].<br>(Reads shorter than 75bp are not supported.)
*'''sequencing by synthesis''' platforms such as the [http://www.illumina.com/ Illumina] [http://www.illumina.com/systems/hiseq_2000.ilmn HiSeq 2000], [http://www.illumina.com/systems/genome_analyzer_iix.ilmn Genome Analyzer IIx] and [http://www.illumina.com/systems/genome_analyzer.ilmn Genome Analyzer IIe].<br>(Reads shorter than 75bp are not supported.)
-
*'''single-molecule sequencing''' platforms such as the [http://www.pacificbiosciences.com/ Pacific Biosciences] [http://www.pacificbiosciences.com/products PacBio RS] (after correction using the [[pacBioToCA]] pipeline.)
+
*'''single-molecule sequencing''' platforms such as the [http://www.pacificbiosciences.com/ Pacific Biosciences] [http://www.pacificbiosciences.com/products PacBio RS] (after correction using the [[PBcR]] pipeline.)
</div>
</div>
<div style="float: left; width: 50%">
<div style="float: left; width: 50%">
== Resources ==
== Resources ==
 +
 +
=== Downloads ===
 +
 +
*  [[Requirements]] for running Celera Assembler.
 +
 +
*  Celera Assembler 8.1 was released on December 16th, 2013.
 +
** [http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.1/ download page]
 +
** [[Version 8.1 Release Notes |release notes]]
 +
** [[Version 8.1 Release Errata | errata]].
 +
 +
*  [[Check out and Compile | Check out and compile]] the latest unreleased version from the [http://sourceforge.net/p/wgs-assembler/svn/HEAD/tree/trunk/src/ subversion repository].
 +
 +
*  List of all [[Release History | released versions]] with release notes and errata.
=== User guides ===
=== User guides ===
-
*Celera Assembler [[Celera Assembler Terminology | Terminology]] and [[Celera Assembler Theory | Theory]].
+
* Celera Assembler [[Celera Assembler Terminology | Terminology]] and [[Celera Assembler Theory | Theory]].
-
*[[runCA]], [[RunCA Dissection]], [[RunCA Examples]], [[SpecFiles]], [[Utilities]].
+
* [[runCA]], the main program for running Celera Assembler.
-
*[[Help | Get Help]] from the [[Developers]].
+
[[SpecFiles | Spec files]], how to configure a Celera Assembler run.
-
*Report bugs. Please use [http://sourceforge.net/tracker/?limit=50&func=&group_id=106905&atid=645639 Bug Tracker] instead of Email.
+
[[Best Practices]]
-
*[http://sourceforge.net/tracker/?limit=50&func=&group_id=106905&atid=645642 Request Features].
+
[[RunCA Dissection]], a step-by-step explanation of what is going on (out of date, but still generally applicable).
 +
[[Utilities]].
 +
 
 +
* [[Help | Get Help]] from the [[Developers]].
 +
 
 +
* [https://sourceforge.net/p/wgs-assembler/bugs/ Report bugs].
 +
 
 +
=== Examples ===
 +
 
 +
*[[Yersinia pestis KIM D27, using 454 8 Kbp mated reads, with CA8.1]] (with [[Yersinia pestis KIM D27, using 454 8 Kbp mated reads, with CA8 | CA8.0]])
 +
*[[Yersinia pestis KIM D27, using Illumina paired-end reads, with CA8.1]] (with [[Yersinia pestis KIM D27, using Illumina paired-end reads, with CA8 | CA8.0]])
 +
*[[Porphyromonas gingivalis W83, using 454 3 Kbp mated reads, with CA8.1]] (with [[Porphyromonas gingivalis W83, using 454 3 Kbp mated reads, with CA8 | CA8.0]])
 +
*[[PacBioToCA#Self-Correction_With_C2_Sequences_.28or_newer.29 | Escherichia coli K12 MG1655, using corrected PacBio reads with CA8.1]]
 +
*[[Escherichia coli K12 MG1655, using uncorrected PacBio reads, with CA8.1]] (with [[Escherichia coli K12 MG1655, using uncorrected PacBio reads, with CA8 | CA8.0]])
 +
*[[Homo sapiens, J. Craig Venter, using Sanger reads, with CA8]]
 +
*Older [[RunCA Examples | examples]]
=== Input formats ===
=== Input formats ===
-
The Celera Assembler expects input fragment data to be in the [[FRG Files |FRG format]].  We provide several utilities for converting
+
The Celera Assembler expects input fragment data to be in the [[FRG Files |FRG format]].  We provide several utilities for converting a variety of data types into this format:
-
a variety of data types into this format:
+
*[[fastaToCA]] - converts sequence and quality values in fasta format.
*[[fastaToCA]] - converts sequence and quality values in fasta format.
Line 30: Line 58:
*[[sffToCA]] - converts 454 SFF files into FRG format, optionally searching each read for 'linker' sequence indicating the read is a pair of mated reads.
*[[sffToCA]] - converts 454 SFF files into FRG format, optionally searching each read for 'linker' sequence indicating the read is a pair of mated reads.
*[[fastqToCA]] - generates a FRG file that allows direct loading of Illumina FastQ files.
*[[fastqToCA]] - generates a FRG file that allows direct loading of Illumina FastQ files.
-
*[[pacBioToCA]] - A correction pipeline for PacBio RS sequencing data. Uses short-read technologies to generate high-accuracy consensus for PacBio RS sequences. The output is a FRG file (along with fasta and qual).
+
*[[pacBioToCA]] - A correction pipeline for PacBio RS sequencing data. Uses only PacBio RS sequences or short-read technologies to generate high-accuracy consensus. The output is a FRG file (along with fasta and qual).
=== Output formats ===
=== Output formats ===
Line 38: Line 66:
*[[POSMAP]] = Positional maps in perl-friendly text files.
*[[POSMAP]] = Positional maps in perl-friendly text files.
*[[FASTA Files]] = With consensus sequence and quality values.
*[[FASTA Files]] = With consensus sequence and quality values.
-
 
-
=== Downloads ===
 
-
 
-
Start by downloading a tested release package. Releases include pre-compiled binaries for Linux. Adventurous users are welcome to check out any version of the source code (including what is currently in development), compile it, and hope for the best.
 
-
* Download the latest!  [http://sourceforge.net/project/showfiles.php?group_id=106905 Packages Database].
 
-
* List of [[Release History | release packages]] with release notes and errata.
 
-
* Latest version: 6.1, released 30 March 2010 ([[Version 6.1 Release Notes |release notes]], [[Version 6.1 Release Errata | errata]]).
 
-
* [[Check out and Compile | check out and compile]] the source code from the [http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/ CVS repository]
 
-
* [[Requirements]] for running Celera Assembler.
 
</div>
</div>
<div style="float: left; width: 50%">
<div style="float: left; width: 50%">
 +
== Events ==
== Events ==
 +
 +
=== CA 8.1 Release ===
 +
 +
Celera Assembler 8.1 was released on 16 December, 2013.  [http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.1/ Download]. [[Version 8.1 Release Notes | Release notes]]. [[Version 8.1 Changes | Change log]]. [[Version 8.1 Release Errata | Errata]].
 +
 +
=== CA 8.0 Release ===
 +
 +
Celera Assembler 8.0 was released on 5 November, 2013.  [http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.0/ Download]. [[Version 8.0 Release Notes | Release notes]]. [[Version 8.0 Changes | Change log]]. [[Version 8.0 Release Errata | Errata]].
=== CA 7.0 Release ===
=== CA 7.0 Release ===
-
Celera Assembler 7.0 was released on January 12, 2012.  [http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-7.0/ Download]. [[Version 7.0 Release Notes | Read the release notes]]. [[Version 7.0 Changes | See the change log]]. [[Version 7.0 Known Problems | Find any known problems]].
+
Celera Assembler 7.0 was released on January 12, 2012.  [http://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-7.0/ Download]. [[Version 7.0 Release Notes | Release notes]]. [[Version 7.0 Changes | Change log]]. [[Version 7.0 Release Errata | Errata]]. See also [[Best Practices]].
=== Mailing List ===
=== Mailing List ===
Line 66: Line 94:
=== CA 6.1 Release ===
=== CA 6.1 Release ===
-
Celera Assembler 6.1 was released on April 30th, 2010.  This is the first version with support for Illumina sequence data. See [http://sourceforge.net/project/showfiles.php?group_id=106905 Releases], [[FastqToCA | fastq]] support, [[Version 6.1 Release Notes | release notes]], the [[Version 6.1 Changes | change log]], [[Version 6.1 Known Problems | known problems]], and [[Version 6.1 Test Results | test results]].
+
Celera Assembler 6.1 was released on April 30th, 2010.  This is the first version with support for Illumina sequence data. See [http://sourceforge.net/project/showfiles.php?group_id=106905 Releases], [[FastqToCA | fastq]] support, [[Version 6.1 Release Notes | release notes]], the [[Version 6.1 Changes | change log]], [[Version 6.1 Release Errata | errata]], and [[Version 6.1 Test Results | test results]].
=== Internship Opportunity ===
=== Internship Opportunity ===
Line 103: Line 131:
*Miller, Koren, Sutton (2010) [http://www.ncbi.nlm.nih.gov/pubmed/20211242 Assembly algorithms for next-generation sequencing data.] Genomics, March 6.
*Miller, Koren, Sutton (2010) [http://www.ncbi.nlm.nih.gov/pubmed/20211242 Assembly algorithms for next-generation sequencing data.] Genomics, March 6.
*Miller et al. (2010) Bonobo genome <i>de novo</i> assembly generated by CABOG.  [[Bonobo Poster]] ISMB, Boston
*Miller et al. (2010) Bonobo genome <i>de novo</i> assembly generated by CABOG.  [[Bonobo Poster]] ISMB, Boston
-
*Dalloul et al. (2010) [https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page&action=edit&section=10 Multi-platform next-generation sequencing of domestic turkey (''Meleagris gallopavo'')], PLoS Biology
+
*Dalloul et al. (2010) [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000475 Multi-platform next-generation sequencing of domestic turkey (''Meleagris gallopavo'')], PLoS Biology
*Kirkness et al. (2010) Genome sequence of the [http://www.ncbi.nlm.nih.gov/pubmed/20566863 human body louse] Science, July
*Kirkness et al. (2010) Genome sequence of the [http://www.ncbi.nlm.nih.gov/pubmed/20566863 human body louse] Science, July
*Koren, Miller, Walenz, Sutton (2010) [http://www.ncbi.nlm.nih.gov/pubmed/20831800 Automated Closure Algorithm] BMC Bioinformatics, September
*Koren, Miller, Walenz, Sutton (2010) [http://www.ncbi.nlm.nih.gov/pubmed/20831800 Automated Closure Algorithm] BMC Bioinformatics, September

Current revision as of 22:50, 10 January 2014


Celera Assembler : scientific software for biological research. Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler has enabled many advances in genomics, including the first whole genome shotgun sequence of a multi-cellular organism (Myers 2000) and the first diploid sequence of an individual human (Levy 2007). Celera Assembler was developed at Celera Genomics starting in 1999. It was released to SourceForge in 2004 as the wgs-assembler under the GNU General Public License. The pipeline revised for 454 data was named CABOG (Miller 2008).

Celera Assembler can use any combination of reads from:

Resources

Downloads

User guides

Examples

Input formats

The Celera Assembler expects input fragment data to be in the FRG format. We provide several utilities for converting a variety of data types into this format:

  • fastaToCA - converts sequence and quality values in fasta format.
  • tracearchiveToCA - converts xml, qual and fasta from the NCBI TraceDB into FRG format.
  • sffToCA - converts 454 SFF files into FRG format, optionally searching each read for 'linker' sequence indicating the read is a pair of mated reads.
  • fastqToCA - generates a FRG file that allows direct loading of Illumina FastQ files.
  • pacBioToCA - A correction pipeline for PacBio RS sequencing data. Uses only PacBio RS sequences or short-read technologies to generate high-accuracy consensus. The output is a FRG file (along with fasta and qual).

Output formats

  • ASM Files = The Celera Assembler native output format.
  • QC Metrics = The statistical summary.
  • POSMAP = Positional maps in perl-friendly text files.
  • FASTA Files = With consensus sequence and quality values.

Events

CA 8.1 Release

Celera Assembler 8.1 was released on 16 December, 2013. Download. Release notes. Change log. Errata.

CA 8.0 Release

Celera Assembler 8.0 was released on 5 November, 2013. Download. Release notes. Change log. Errata.

CA 7.0 Release

Celera Assembler 7.0 was released on January 12, 2012. Download. Release notes. Change log. Errata. See also Best Practices.

Mailing List

Users of Celera Assembler are encouraged to sign up to the wgs-assembler-users mailing list. The list is intended for discussion on using Celera Assembler. We'll announce new releases, new features and bug fixes too. Bug reports should still be reported to the bug tracker.

User Group Meeting: Jan 2012

The J. Craig Venter Institute will host the CAUG 2012 Celera Assembler User Group Meeting Thursday & Friday, 12-13 January 2012. Contact us about registration (ATGatJCVIdotORG). The format will be similar to the CAUG 2010 of 26-27 August 2010. Thanks to all 30 participants from around the world, and to the U.S. National Institute of General Medical Sciences (NIGMS) for funding.

CA 6.1 Release

Celera Assembler 6.1 was released on April 30th, 2010. This is the first version with support for Illumina sequence data. See Releases, fastq support, release notes, the change log, errata, and test results.

Internship Opportunity

The J. Craig Venter Institute will hire summer interns to work on a variety of scientific endeavors including the Celera Assembler software. Students at the graduate, undergraduate, and high school levels should apply through the JCVI Internship Program. Funding for Celera Assembler internships is provided by a grant from the National Institute of General Medical Sciences (NIGMS). It is too late to apply for a summer 2011 position so please apply in regard to future semesters.

Publications

Sponsors

Personal tools
Navigation
documentation