Name | Modified | Size | Downloads / Week |
---|---|---|---|
2.2 | 2012-11-28 | ||
2.1 | 2012-08-21 | ||
2.0 | 2012-06-21 | ||
README.txt | 2012-06-21 | 4.9 kB | |
Totals: 4 Items | 4.9 kB | 0 |
Seurat --- About Seurat -- Seurat is an sequence analysis program for somatic mutation and allelic imbalance discovery in paired tumor and normal genome and transcriptome data. Copyright -- Seurat is Copyright (c) 2012 by The Translational Genomics Research Institute. All rights reserved. This License is limited to, and you may use the Software solely for, your own internal and non-commercial use for academic and research purposes. Without limiting the foregoing, you may not use the Software as part of, or in any way in connection with the production, marketing, sale or support of any commercial product or service or for any governmental purposes. For commercial or governmental use, please contact dcraig@tgen.org. By installing this Software you are agreeing to the terms of the LICENSE file distributed with this software. This software contains portions of the Genome Analysis Toolkit (GATK) which is Copyright (c) 2010, The Broad Institute and is used under license. Releases -- 1.0 - Initial release. 1.1 - BAQ support, initial indel support 2.0 - indel support, overlapping transcript support, structural variation support, new allelic imbalance detection algorithm, major algorithm improvements, bug fixes, performance and output improvements Usage -- Seurat is a command-line Java application, packaged as a stand-alone JAR file. It can be executed via the command prompt of the operating system, by moving to the directory that the JAR file is located and typing: java -jar Seurat.jar -T Seurat -R (reference sequence FASTA file) -I:dna_normal (indexed BAM of normal genome) -I:dna_tumor (indexed BAM of tumor genome) -I:rna_normal (indexed BAM of normal RNA BAM)] [-I:rna_tumor (indexed BAM of tumor RNA BAM)] -o somatic_variants.vcf] -go large_events.txt [ARGUMENTS...] Where [ARGUMENTS...] are options picked from the section below. Arguments -- Required: -o <out> Output VCF. -go <gene_out> Output for non-small events. Most large event analyses require the 'refseq' argument below. Optional: -refseq <refseq_file> Name of RefSeq transcript annotation file. If specified, gene-wide events can be detected, and SNVs/LOH events will be annotated with the gene name. -Q <integer> Minimum phred-scale for reported events. Default = 10. -mbq <integer> Minimum base quality required to consider a base for calling. Default = 10. -mmq <integer> Minimum mapping quality for reads to be considered in the pileup. Default = 10. -ref <true/false> Whether or not only reference-matching homozygous positions are allowed on the normal, for SNV discovery. Reduces false positives due to faulty alignments. Default = true. -alpha <integer> Alpha parameter of the beta-distribution used for evaluating homozygosity likelihood. Default = 1. -beta <integer> Beta parameter of the beta-distribution used for evaluating homozygosity likelihood. Default = 301. --both_strands Whether or not variant evidence needs to appear on both strands on the tumor in order to be considered. -coding_only Reduces full-genome and transcript analyses to coding regions of genes. -mm Maximum number of mismatches against the reference that are allowed in a read. Reads surpassing this number are filtered out. Seurat also accepts global GATK arguments that can affect its functions. For more information on the GATK framework, please visit http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit Example / Suggested use -- java -jar Seurat.jar -T Seurat -R ref.fasta -I:dna_normal DNA_normal.BAM -I:dna_tumor DNA_tumor.BAM -I:rna_tumor RNA_tumor.BAM -o somatic_variants.vcf -go large_events.txt -Q 15 -refseq refseq.rod Output --- VCF (-o): The text file follows the Variant Call Format (VCF) version 4.1, with one line per call. Please refer to the VCF format definition at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 for more information. Output will look like the following records: chr2 51001432 . G C 13.7 PASS TYPE=somatic_SNV;PILEUP=ggggGCGG/ccgGgcCgGcgGccCgg;DP=17 chr2 204596009 . T G 20.4 PASS TYPE=somatic_SNV;PILEUP=TttTtttttTttTtttttTTTttTtt/TTttttTtTttttttTTtTTTTTTTTTtGtTTtTtttgTGgGtt;DP=26 chr2 40279008 . A <DEL> 11.4 PASS TYPE=somatic_deletion;PILEUP=aaaaAAAAAaaD/aADddddaaa;DP=10 The TYPE tag in the INFO field describes the somatic event that was detected. The types currently are somatic_SNV, somatic_deletion, somatic_insertion and LOH. The ALT genotype describes either the variant detected in the tumor genome (in case of a somatic SNV event), or the variant allele that is lost in the tumor (in case of an LOH event). The strings "<INS>" and "<DEL>" represent indels. Large event list (-go): A simple two-field tab-delimited text file in the following format: [region] [gene name] [event name/description] [quality] [additional info fields]