Home
Name Modified Size InfoDownloads / Week
taxidfile.txt 2013-01-17 57.7 MB
taxon_hierarchy.txt 2013-01-17 312.5 kB
sample_usearch_results_97.txt 2013-01-17 28.0 MB
sample_summary_rdp_97.txt 2013-01-17 727 Bytes
sample_summary_strain_97..txt 2013-01-17 752 Bytes
sample_summary_taxonomy_97.txt 2013-01-17 1.7 kB
sample_assignment_97.txt 2013-01-17 828.2 kB
sample_summary_97.txt 2013-01-17 543 Bytes
vaginal_16S_V1V3_refdb1-1.fa 2013-01-17 666.2 kB
sample.fasta 2013-01-17 4.9 MB
stirrups_pipeline_1.0.pl 2013-01-17 16.9 kB
stirrups_README.txt 2013-01-17 5.8 kB
Totals: 12 Items   92.4 MB 1
#############################################
# STIRRUPS README
#############################################

This document provides details on running
     the STIRRUPS script.

#############################################

Requirements:

	1.) STIRRUPS script
	2.) Taxonomic ID File
	3.) Taxonomic Hierarchy File
	4.) Reference Library (Fasta Format)
	5.) Sample Reads (Fasta Format)
	6.) USEARCH version 4 or higher in path

#############################################

Functionality:

The STIRRUPS script takes a collection of 
    reference fasta files and read fasta files 
    and runs a USEARCH alignment. Using the 
    resulting output from USEARCH it formats 
    the output and calculates statistics based 
    on percentage make up of a strain and 
    species for a particular sample id. 

The script can optionally handle RDP data to
    perform additional analysis. If the USEARCH
    analysis is already run, USEARCH execution
    can be skipped so that only the results 
    analysis is performed.

#############################################

Usage:

The program can be called in its most basic
    fashion using the following form:

	    perl stirrups_pipeline.pl -l REFERENCE.fa -r READS.fa
	    
    Example:

	    perl stirrups_pipeline_1.0.pl -l vaginal_16S_V1V3_refdb1-1.fa -r sample.fasta
	    
In this example, vaginal_16S_V1V3_refdb1-1.fa is the FASTA formatted
    reference library and sample.fasta is the sample data to be processed.

Note that example input files and the resulting output files 
    (i.e., sample_summary_97.txt, sample_summary_rdp_97.txt, 
    sample_summary_taxonomy_97.txt, sample_assignment_97.txt, 
    and sample_usearch_results_97.txt) are provided.

#############################################

Additional Options:

The program supports a number of additional 
    options for finer control:

Options:
    -T VALUE    Identity Threshold [default = 97]
    -nousearch  Do not run USEARCH
    -rdp        Fasta contains RDP data
    -strain     Perform Strain Level Analysis

Threshold:

The threshold option controls the minimum identity
    score for reads to cluster together.

No USEARCH:

This option can be used if the USEARCH results are
    already avaialble and the alignment step is not
    necessary. The script will require the file to be 
    named properly in order to execute. Example:

      INPUT FILE NAME: reads.fa
    USEARCH FILE NAME: reads_usearch_results_[threshold_value].txt

To run the script without USEARCH, simply provide the
    base INPUT FILE NAME as normal, adding in the 
    -nousearch option. The script will locate the 
    approriate USEARCH FILE NAME for results analysis.  

RDP:

The script can use RDP data to perform additional 
    results analysis. If the input file contains RDP
    information, use the -rdp option. This will 
    generate an additional rdp summary file and a 
    taxonomic summary file. 

Strain:

To generate an additional strain level analysis,
    enable this feature by  using the -strain option.

#############################################

Input Read File Format:

This file should be a FASTA formatted file with each 
     header containing a sample name followed by 
     a read name. They should be separated by a "|".

     Examples:)
	>SAMPLE_NAME|READ_NAME
	>MOCK|HPD98O302INMJN

If the file contains the optional RDP information,
     that should follow, again delimited by "|":

     Examples:)
	>SAMPLE_NAME|READ_NAME|RDP_TAXON|TAXON_LEVEL|RDP_CONFIDENCE_SCORE
	>MOCK|HPD98O302INMJN|Lactobacillaceae|family|0.84

RDP Confidence Scores range between 0.0 and 1.0

#############################################

Input Reference File Format:

This file must also be FASTA formatted with
     headers that use the following format:

     Example:)
	>gi|173695|gb|M59083.1|genus|Acetitomaculum|species|ruminis|strain|Acetomaculum ruminis|start|31|end|509|length|479|details|AETRR16S/31-509 Acetomaculum ruminis 16S ribosomal RNA

#############################################

Output Files:

Assignment File:
     
The assignment file describes which taxa a given read was
    assigned to. It will also provide the identity score
    of the given to that particular taxa.

Summary File:

The summary file describes the population of the 
    various clusters. It gives the name of the sample,
    the name of the taxa, the number of reads in the 
    taxa, the percentage of reads from the sample that
    mapped to that taxa, and the average identity score
    for the taxa. 

Taxonomy Summary File:

The taxonomy file provides the full taxonomic breakdown
    for each taxa in addition to standard STIRRUPS results.

RDP Summary File:

The RDP summary file provides additional RDP catagorization
    information along with the standard STIRRUPS results. 

Strain Summary File:

The strain summary file provides the standard STIRRUPS output
    with additional strain level detail for the taxa. 

USEARCH Results File:

This file gives the raw output from the USEARCH program.

#############################################

Citation:

If you use the STIRRUPS method or software, 
or the 16S Vaginal Microbiome Database for your research, 
please cite:

       Fettweis JM, Serrano MG, Sheth NU, Mayer CM, Glascock AL, et al. (2012) Species-level classification of the vaginal microbiome. BMC Genomics 13: S17. doi:10.1186/1471-2164-13-S8-S17

Refer to the publication for additional information 
regarding the STIRRUPS method and appropriate usage. 
Note the the reference database file needs to be 
created or customized based on experimental design 
(e.g., select reference sequences to reflect body 
site or environment, trim reference sequences to 
primers, and cluster references into 'species-level taxa' 
based on sequencing orientation.)

#############################################

Source: stirrups_README.txt, updated 2013-01-17