Home
Name Modified Size InfoDownloads / Week
2.2 2012-11-28
2.1 2012-08-21
2.0 2012-06-21
README.txt 2012-06-21 4.9 kB
Totals: 4 Items   4.9 kB 0
Seurat
---

About Seurat
--

Seurat is an sequence analysis program for somatic mutation and allelic
imbalance discovery in paired tumor and normal genome and transcriptome
data.

Copyright

--
Seurat is Copyright (c) 2012 by The Translational Genomics Research
Institute. All rights reserved. This License is limited to, and you may
use the Software solely for, your own internal and non-commercial use
for academic and research purposes. Without limiting the foregoing, you
may not use the Software as part of, or in any way in connection with
the production, marketing, sale or support of any commercial product or
service or for any governmental purposes. For commercial or governmental
use, please contact dcraig@tgen.org. By installing this Software you are
agreeing to the terms of the LICENSE file distributed with this
software.

This software contains portions of the Genome Analysis Toolkit (GATK)
which is Copyright (c) 2010, The Broad Institute and is used under
license.

Releases
--

1.0 -   Initial release.
1.1 -   BAQ support, initial indel support
2.0 -   indel support, overlapping transcript support, structural
        variation support, new allelic imbalance detection algorithm,
        major algorithm improvements, bug fixes, performance and output improvements

Usage
--

Seurat is a command-line Java application, packaged as a stand-alone JAR file.
It can be executed via the command prompt of the operating system, by moving to
the directory that the JAR file is located and typing:

java -jar Seurat.jar -T Seurat -R (reference sequence FASTA file) -I:dna_normal
(indexed BAM of normal genome) -I:dna_tumor (indexed BAM of tumor genome)
-I:rna_normal (indexed BAM of normal RNA BAM)] [-I:rna_tumor (indexed BAM of tumor
RNA BAM)] -o somatic_variants.vcf] -go large_events.txt [ARGUMENTS...]

Where [ARGUMENTS...] are options picked from the section below.

Arguments
--

Required:

-o <out> Output VCF.
-go <gene_out> Output for non-small events. Most large event analyses
require the 'refseq' argument below.

Optional:

-refseq <refseq_file> Name of RefSeq transcript annotation file. If specified,
gene-wide events can be detected, and SNVs/LOH events will be annotated with the
gene name.
-Q <integer> Minimum phred-scale for reported events. Default = 10.
-mbq <integer> Minimum base quality required to consider a base for calling.
Default = 10.
-mmq <integer> Minimum mapping quality for reads to be considered in the pileup.
Default = 10.
-ref <true/false> Whether or not only reference-matching homozygous positions
are allowed on the normal, for SNV discovery. Reduces false positives due to
faulty alignments. Default = true.
-alpha <integer> Alpha parameter of the beta-distribution used for evaluating
homozygosity likelihood. Default = 1.
-beta <integer> Beta parameter of the beta-distribution used for evaluating
homozygosity likelihood. Default = 301.
--both_strands Whether or not variant evidence needs to appear on both strands on the tumor in order to be considered.
-coding_only Reduces full-genome and transcript analyses to coding regions of genes.
-mm Maximum number of mismatches against the reference that are allowed in a read.
    Reads surpassing this number are filtered out.


Seurat also accepts global GATK arguments that can affect its functions. For
more information on the GATK framework, please visit
http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit

Example / Suggested use
--

java -jar Seurat.jar -T Seurat -R ref.fasta -I:dna_normal DNA_normal.BAM -I:dna_tumor
DNA_tumor.BAM -I:rna_tumor RNA_tumor.BAM -o somatic_variants.vcf -go large_events.txt
-Q 15 -refseq refseq.rod

Output
---

VCF (-o):

The text file follows the Variant Call Format (VCF) version 4.1, with one line
per call. Please refer to the VCF format definition at
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
for more information.

Output will look like the following records:

chr2    51001432    .   G   C   13.7    PASS    TYPE=somatic_SNV;PILEUP=ggggGCGG/ccgGgcCgGcgGccCgg;DP=17
chr2	204596009	.	T	G	20.4	PASS	TYPE=somatic_SNV;PILEUP=TttTtttttTttTtttttTTTttTtt/TTttttTtTttttttTTtTTTTTTTTTtGtTTtTtttgTGgGtt;DP=26
chr2    40279008    .   A   <DEL>   11.4    PASS    TYPE=somatic_deletion;PILEUP=aaaaAAAAAaaD/aADddddaaa;DP=10


The TYPE tag in the INFO field describes the somatic event that was detected.
The types currently are somatic_SNV, somatic_deletion, somatic_insertion and
LOH.

The ALT genotype describes either the variant detected in the tumor genome
(in case of a somatic SNV event), or the variant allele that is lost in
the tumor (in case of an LOH event). The strings "<INS>" and "<DEL>" represent
indels.

Large event list (-go):

A simple two-field tab-delimited text file in the following format:

[region]    [gene name] [event name/description]    [quality]   [additional info fields]

Source: README.txt, updated 2012-06-21