Clinical Pathoscope is a program to identify pathogens/commensals/contaminants in unassembled sequencing reads.
Download Clinical Pathoscope code from http://sourceforge.net/projects/pathoscope/files/clinical_pathoscope_v1.0.3.tar.gz/download
Extract the code to a separate folder
You could issue the following command to extract the files:
"tar xvf clinical_pathoscope_v1.0.3.tar.gz"
Download bowtie2 from http://sourceforge.net/projects/bowtie-bio/
Example sample fastq file can be downloaded from http://sourceforge.net/projects/pathoscope/files/simulated_sample.fastq.gz/download
Reference databases (human, viral, & bacterial) as well as their associated alignment indexes can be downloaded from http://www.bu.edu/jlab/wp-assets/databases.tar.gz
Prerequisite: Need to have python 2.7.3 or later version installed and add python to your PATH variable (Usually already done as part of python installation). For earlier versions of python, you will need to install the argparse module: https://pypi.python.org/pypi/argparse
Change directory to where you extracted the code
Create a config file by filling in the necessary information shown in config.txt
Simply run runClinicalPathoscope.py with the config file to generate the shell script to run Clinical Pathoscope for a particular sample ("python runClinicalPathoscope.py config.txt")
TSV file format (You may need to rename this file as .csv for opening in some version of Excel such as LibreOffice):
At the top of the file in the first row, there are two fields called "Total Number of Aligned Reads" and "Total Number of Mapped Genomes". They represent the total number of reads that are aligned and the total number of genomes to which those reads align from the given alignment file.
Columns in the TSV file:
Updated alignment file:
Pathoscope will generate an updated alignment file in either .sam or BLAST (bl8) format depending on the initial input format type. This updated file will contain all reads in the input file, but replacing the the previous alignment scores with post-Pathoscope reassignment scores. Alignments that don't achieve the Pathoscope threshold value (parameter -s, default 0.01) will be deleted from this file. For example, for a default score, the updated file will not retain any alignments with reassignment probabilities less that 1% after Pathoscope. This means that the updated file will likely be smaller than the original, and will contain only the high-probability reassignments. This new file can then be used for downstream analyses such as SNP calling, and genome/scaffold assembly.
Shell script containing all commands and parameters that were executed during a given run. This allows the user to reproduce their exact analysis.
Clincal Pathoscope comes bundled with the original Pathoscope (Version 1.0 ), 3 prebuilt bowtie2 databases for human, bacteria, and virus, and our 1 simulated dataset.
The human host library consisted of two sequences; the GRCh37/hg19 build of the human genome, as well as the human ribosomal DNA sequence (GenBank:U13369).
The bacterial library was downloaded from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/all.fna.tar.gz, 12/15/12).
The viral library was also obtained from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/all.fna.tar.gz, 1/10/13).
To use databases other than those provided with the software, the user must provide their own Bowtie2 indexes. See http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml for specific details regarding how to crea
te an index using Bowtie2.
For the simulated reads (simulated_sample.fastq), 10 million 100-base-pair (bp) reads were generated for each sample with 90% of reads originating from the host transcriptome (human RNA), 9% from bacterial genomes, and 1% from viral genomes.
Simulated reads composition (Accession Number | Name):
Bateria:
gi|296112228 | Moraxella_catarrhalis_RH4_chromosome,_complete_genome
gi|378696079 | Haemophilus_influenzae_10810,_complete_genome
gi|16271976 | Haemophilus_influenzae_Rd_KW20_chromosome,_complete_genome
gi|387787130 | Streptococcus_pneumoniae_ST556_chromosome,_complete_genome
gi|392427891 | Streptococcus_intermedius_JTH08,_complete_genome
Virus:
gi|49169782 | Human coronavirus NL63 (HCoV-NL63)
gi|9627719 | Human enterovirus A (HEV-A)
gi|160700581 | Human rhinovirus C (HRV-C)
gi|8486122,gi|8486125,gi|8486127,gi|8486129 | Influenza A virus (A/Puerto Rico/8/34/H1N1)
gi|8486131,gi|8486134,gi|8486136,gi|8486138 | (H1N1)
gi|77125236 | Human bocavirus (HBoV)
gi|56160876 | Human adenovirus type 7 (AdV7)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Pathoscope is developed at the JohnsonLab in Boston University.
W. Evan Johnson, Ph.D.
Division of Computational Biomedicine
Boston University School of Medicine
72 E. Concord St., E-645
Boston, MA 02118
For support queries, please open a ticket or contact us at
jperezrogers@users.sourceforge.net
mani2012@users.sourceforge.net
https://sourceforge.net/p/pathoscope/tickets/