1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Main Page

From seqware

Jump to: navigation, search

Contents

MIGRATION

We are in the process of migrating to GitHub. Check out our new site at http://seqware.github.com The SourceForge site is still available but should be considered deprecated. GitHub is our current source repository, please clone/fork from there. If you check in source code to sourceforge it will not be considered part of the canonical SeqWare codebase. This wiki is still mostly accurate but should be considered partially deprecated. SeqWare developers should add docs to the seqware-distribution/docs folder on GitHub and not here.

About

SeqWare currently provides four tools specifically designed to support massively parallel sequencing technologies (Illumina, ABI SOLiD, 454). The first is a LIMS-like web application (SeqWare Portal) to manage samples, record computational events, and present results back to end users. The second component is a pipeline (SeqWare Pipeline) which consists of many different programs useful for processing and annotating sequence data. These can be combined with other tools (BFAST, BWA, SAMtools, etc) and strung together to form more complex workflows to support many experiment types. Third, a query tool (SeqWare Query Engine) is available to database and query variants and other events inferred from sequence data. Finally, SeqWare MetaDB provides a common database to store metadata used by all components. All four tools can be used together or separately. It is currently used by a variety of NGS users including the Lineberger Comprehensive Cancer Center at UNC (for TCGA), at OICR (ICGC and other sequencing projects), and Nimbus Informatics.

The SeqWare project was created by Brian O'Connor who is the current project lead. You can contact him at briandoconnor at gmail dot com. See the "About SeqWare" section for information on citing the project.

Users interested in support contracts, local/cloud installations, and/or custom implementations of workflows please take a look at our exclusive commercial partner Nimbus Informatics. They offer SeqWare-based services on Amazon's Cloud including whole human exome/genome analysis services.

File:seqware_project_overview.png

For more information please see:

SeqWare is released under the GNU General Public License v3.

The SeqWare Sourceforge developer site is located at http://sourceforge.net/projects/seqware.

Follow @SeqWare on Twitter


News

Features

  • A centralized metadata database that tracks samples annotations and analysis (SeqWare MetaDB) and a web application (SeqWare Portal) to visualize it
  • A module specification and execution engine that lets you package computational tools and use them to build and run complex analytical workflows (SeqWare Pipeline)
  • Support for running workflows irrespective of the underlying cluster environment thanks to the use of Pegasus, Condor, and the Globus Toolkit (SeqWare Pipeline)
  • An advanced query engine (SeqWare Query Engine) that allows you to store and search variants, coverage, and annotations produced in your workflows using either a simple (BerkeleyDB) or distributed (HBase) database backend
  • Three ways to run SeqWare tools, a standalone virtual machine (VirtualBox), as an on-demand cluster on Amazon's EC2 (StarCluster), or installed on your own cluster and web/database servers
  • More Features...

Installing SeqWare

For a walk through of setting up SeqWare at a genome sequencing center please see Deploying SeqWare at UNC. This is a good read to get the big picture view of how SeqWare could be used as an infrastructure at a large institution.

There are three ways to install and use SeqWare:

  1. Download and run a standalone virtual machine using Virtual Box. This is free for all platforms, see Using the SeqWare VM. This is really the recommended route for installation since it is quick and easy to get started, or
  2. Use StarCluster plus our configuration and plugins to configure a SeqWare cluster on Amazon's EC2 cloud. See Using SeqWare on EC2, or
  3. Install the SeqWare components on your own infrastructure. This is more work than the previous two options but gives you more control. It is more complex than the other two options and requires Linux admin expertise:
    1. First, get the code from subversion here
    2. Setup SeqWare MetaDB
    3. Setup SeqWare Portal
    4. Setup SeqWare WebService
    5. Setup SeqWare Pipeline: Also see Creating a SeqWare VM for valuable information on setting up SeqWare Pipeline dependencies (Pegasus/Globus/GRAM/SGE) on CentOS 6.
    6. Setup SeqWare Query Engine with BerkeleyDB: this is based on BerkeleyDB which is a good choice for small databases, prototyping, and testing
    7. or Setup SeqWare Query Engine with PostgreSQL: information on using PostgreSQL as a backend for the SeqWare Query Engine. This is easier to setup than HBase and has better performance than BerkeleyDB but is still a work in progress.
    8. or Setup SeqWare Query Engine with HBase: information on using HBase as a backend for the SeqWare Query Engine. This is much more difficult to setup but is capable of providing substantial scalability (HBase, Hadoop, HDFS) and enhanced analytical options (Map/Reduce).

For more information see the SeqWare Installation Guides.

Using SeqWare

Once you have followed an installation path you can follow these guide to get started using the SeqWare tools:

Reference Manuals

For more information see the SeqWare User Guides.

Reporting Tools

  • Study Reporter : Create a nested tree structure of all of the output files from a particular sample, or all of the samples in a study
  • Sequencer Run Reporter: Gives you a view of all the sequencer runs/lanes/barcodes and the associated analysis processing events.
  • Workflow Run Reporter: Find the identity and library samples and input and output files from one or more workflow runs.

Import Tools

  • FileLinker : Import files into the MetaDB and link them with IUS's or lanes.

Metadata Tools

  • AttributeAnnotator : Annotate items in the MetaDB with 'skip' or key-value pairs (as of 0.12.0, lanes, sequencer runs and iuses can be annotated).

Developing for SeqWare

Although we provide the modules and workflows UNC, UCLA, and OICR have built, SeqWare is really geared towards building infrastructure and not necessarily providing a one-stop-shop for all possible analytical workflows. So you will want to take a look at the guides below to learn how to create modules and workflows that will support the experimental designs for your own projects. If you extend SeqWare we encourage you to become a developer and share your modules and workflows back with the community, see the Community Portal for more information.

The main developer page for SeqWare can be found here and the source code can be downloaded via subversion here.

Once you've read the deployment guide to see how the various pieces go together take a look at the SeqWare Pipeline Developer Crash Course which is a quick start guide to the process of creating workflows and modules. It will walk you through the creation of a very simple workflow (HelloWorld).

Developer Documentation

SeqWare MetaDB

SeqWare Pipeline

Query Engine

SeqWare WebService

For more developer guides please see the SeqWare Developer Guides page.

Developer Proposals

These are proposals and works in progress, please give feedback on the seqware-devel mailing list.

Administering SeqWare

This is a guide on how to setup SeqWare in a production capacity based on the deployment of SeqWare at UNC for the Cancer Genome Atlas project. It should give you an idea of how the SeqWare project and it's various components can be deployed in a real environment.

Deploying SeqWare at UNC

Here's another guide that we're currently working on describing the setup of SeqWare at OICR:

Deploying SeqWare at OICR

These guides lack specifics for security reasons but they should give you an idea of how the setup works for genome centers producing a lot of data.

About SeqWare

SeqWare was created by Brian O'Connor while a postdoc at UCLA, later as a research associate at UNC, and continues to be developed by him as a Software Architect for OICR in Toronto. If you would like to cite SeqWare please use our publication here. For more information about SeqWare, including users, contributors, getting help, and news items, please see the Community Portal.

  • Release Feature Lists: a place to list the numerous To Do items. At some point this should be migrated to our Greenhopper instance at OICR.
  • Feature Backlog: a place to list the numerous feature requests and ideas for the future. At some point this should be migrated to our Greenhopper instance at OICR.
Personal tools