#############################################
# STIRRUPS README
#############################################
This document provides details on running
the STIRRUPS script.
#############################################
Requirements:
1.) STIRRUPS script
2.) Taxonomic ID File
3.) Taxonomic Hierarchy File
4.) Reference Library (Fasta Format)
5.) Sample Reads (Fasta Format)
6.) USEARCH version 4 or higher in path
#############################################
Functionality:
The STIRRUPS script takes a collection of
reference fasta files and read fasta files
and runs a USEARCH alignment. Using the
resulting output from USEARCH it formats
the output and calculates statistics based
on percentage make up of a strain and
species for a particular sample id.
The script can optionally handle RDP data to
perform additional analysis. If the USEARCH
analysis is already run, USEARCH execution
can be skipped so that only the results
analysis is performed.
#############################################
Usage:
The program can be called in its most basic
fashion using the following form:
perl stirrups_pipeline.pl -l REFERENCE.fa -r READS.fa
Example:
perl stirrups_pipeline_1.0.pl -l vaginal_16S_V1V3_refdb1-1.fa -r sample.fasta
In this example, vaginal_16S_V1V3_refdb1-1.fa is the FASTA formatted
reference library and sample.fasta is the sample data to be processed.
Note that example input files and the resulting output files
(i.e., sample_summary_97.txt, sample_summary_rdp_97.txt,
sample_summary_taxonomy_97.txt, sample_assignment_97.txt,
and sample_usearch_results_97.txt) are provided.
#############################################
Additional Options:
The program supports a number of additional
options for finer control:
Options:
-T VALUE Identity Threshold [default = 97]
-nousearch Do not run USEARCH
-rdp Fasta contains RDP data
-strain Perform Strain Level Analysis
Threshold:
The threshold option controls the minimum identity
score for reads to cluster together.
No USEARCH:
This option can be used if the USEARCH results are
already avaialble and the alignment step is not
necessary. The script will require the file to be
named properly in order to execute. Example:
INPUT FILE NAME: reads.fa
USEARCH FILE NAME: reads_usearch_results_[threshold_value].txt
To run the script without USEARCH, simply provide the
base INPUT FILE NAME as normal, adding in the
-nousearch option. The script will locate the
approriate USEARCH FILE NAME for results analysis.
RDP:
The script can use RDP data to perform additional
results analysis. If the input file contains RDP
information, use the -rdp option. This will
generate an additional rdp summary file and a
taxonomic summary file.
Strain:
To generate an additional strain level analysis,
enable this feature by using the -strain option.
#############################################
Input Read File Format:
This file should be a FASTA formatted file with each
header containing a sample name followed by
a read name. They should be separated by a "|".
Examples:)
>SAMPLE_NAME|READ_NAME
>MOCK|HPD98O302INMJN
If the file contains the optional RDP information,
that should follow, again delimited by "|":
Examples:)
>SAMPLE_NAME|READ_NAME|RDP_TAXON|TAXON_LEVEL|RDP_CONFIDENCE_SCORE
>MOCK|HPD98O302INMJN|Lactobacillaceae|family|0.84
RDP Confidence Scores range between 0.0 and 1.0
#############################################
Input Reference File Format:
This file must also be FASTA formatted with
headers that use the following format:
Example:)
>gi|173695|gb|M59083.1|genus|Acetitomaculum|species|ruminis|strain|Acetomaculum ruminis|start|31|end|509|length|479|details|AETRR16S/31-509 Acetomaculum ruminis 16S ribosomal RNA
#############################################
Output Files:
Assignment File:
The assignment file describes which taxa a given read was
assigned to. It will also provide the identity score
of the given to that particular taxa.
Summary File:
The summary file describes the population of the
various clusters. It gives the name of the sample,
the name of the taxa, the number of reads in the
taxa, the percentage of reads from the sample that
mapped to that taxa, and the average identity score
for the taxa.
Taxonomy Summary File:
The taxonomy file provides the full taxonomic breakdown
for each taxa in addition to standard STIRRUPS results.
RDP Summary File:
The RDP summary file provides additional RDP catagorization
information along with the standard STIRRUPS results.
Strain Summary File:
The strain summary file provides the standard STIRRUPS output
with additional strain level detail for the taxa.
USEARCH Results File:
This file gives the raw output from the USEARCH program.
#############################################
Citation:
If you use the STIRRUPS method or software,
or the 16S Vaginal Microbiome Database for your research,
please cite:
Fettweis JM, Serrano MG, Sheth NU, Mayer CM, Glascock AL, et al. (2012) Species-level classification of the vaginal microbiome. BMC Genomics 13: S17. doi:10.1186/1471-2164-13-S8-S17
Refer to the publication for additional information
regarding the STIRRUPS method and appropriate usage.
Note the the reference database file needs to be
created or customized based on experimental design
(e.g., select reference sequences to reflect body
site or environment, trim reference sequences to
primers, and cluster references into 'species-level taxa'
based on sequencing orientation.)
#############################################