Download Latest Version pathoscope_2.0.6_clinical_pathoscope_1.0.3.tar.gz (29.3 MB)
Email in envelope

Get an email when there's a new version of Pathoscope

Home / archive
Name Modified Size InfoDownloads / Week
Parent folder
README_02_09_2015 2015-11-20 7.5 kB
pathoqc_v0.1.3.tar.gz 2015-11-19 38.3 MB
pathoscope_2.0.5_clinical_pathoscope_1.0.3.tar.gz 2015-04-24 29.3 MB
pathoqc_v0.1.2.tar.gz 2015-04-07 38.5 MB
pathoscope_2.0.4_clinical_pathoscope_1.0.3.tar.gz 2015-03-27 29.3 MB
pathoscope_2.0.3_clinical_pathoscope_1.0.3.tar.gz 2015-02-09 29.3 MB
pathoscope_2.0.2_clinical_pathoscope_1.0.3.tar.gz 2014-10-14 29.4 MB
pathoscope_2.0.2_clinical_pathoscope_1.0.3_tutorial_2.0.tar.gz 2014-10-14 29.4 MB
pathoscope_2.0.1.tar.gz 2014-07-15 29.2 MB
clinical_pathoscope_v1.0.2.tar.gz 2014-04-14 133.8 kB
pathoqc_v0.1.1.tar.gz 2014-04-09 76.9 MB
pathoscope_2.0.tar.gz 2014-02-18 28.6 MB
pathoscope2.0_tutorial.pdf 2014-02-18 3.1 MB
clinical_pathoscope_v1.0.1.tar.gz 2013-11-18 143.4 kB
pathoscope_v1.tar.gz 2013-06-19 3.6 MB
pathoscope_v0.1.tar.gz 2013-03-31 3.6 MB
Totals: 16 Items   368.9 MB 1
Pathoscope 2.0
--------------

Introduction:
Pathoscope 2.0 consists of four core and two optional analysis modules for sequencing-based metagenomic profiling. The PathoLib module extracts genome reference libraries (target or host/filter) from all available sequences in the NCBI Nucleotide database that belong to a user-defined taxonomic clade. The PathoMap module aligns the reads to the target reference library and removes any reads that have sequence similarity with the host or filter genomes. PathoID resolves read ambiguity, identifies which of the target genomes are present in the sample and estimates the proportions of reads originating from each genome. PathoReport provides two report files: 1) a summary report (.tsv) that contains the numbers and proportions of reads aligned to each genome identified in the sample, and 2) detailed report (.xml) including read coverage, read assignments, and contiguous sequences generated by combining the reads. The PathoDB is an optional module that provides additional annotation (organism taxonomic lineage, gene loci, protein products) for all sequences identified in the sample. The PathoQC module can be used to preprocess the reads prior to alignment with PathoMap.

1. Installation

1.1 Download the code from http://sourceforge.net/projects/pathoscope/

1.2 Extract the code to a separate folder
    You could issue the following command to extract the files:
    "tar xvf pathoscope_2.0.tar.gz"

1.3 Optional: Download PathoQC (pathoqc_vXXX.tar.gz) from http://sourceforge.net/projects/pathoscope/
    Install "pathoqc" sub-directory under "pathoscope" directory.

2. Running

2.1 Prerequisite: Need to have python 2.7.3 or later version installed and add python to your PATH variable (Usually already done as part of python installation) 

2.2 Change directory to where you extracted the code 

2.3 Simply run "python pathoscope/pathoscope.py -h" for top level usage information.
Run "python pathoscope/pathoscope.py LIB -h" for detailed usage information to run patholib.
Run "python pathoscope/pathoscope.py MAP -h" for detailed usage information to run pathomap.
Run "python pathoscope/pathoscope.py ID -h" for detailed usage information to run pathoid.
Run "python pathoscope/pathoscope.py REP -h" for detailed usage information to run pathoreport.
If you have installed the PathoQC submodule:
Run "python pathoscope/pathoscope.py QC -h" for detailed usage information to run pathoqc.

2.4 There are also some unit tests for testing the validity of the functions. 
Change directory to "pathoscope/pathomap/bowtie2wrapper/unittest" and simply run "python testBowtie2Wrap.py".
Change directory to "pathoscope/pathoid/unittest" and simply run "python testPathoID.py".

3. Example

3.1 There is a sample alignment file called 'MAP_3852_align.sam' that is included with this package in the example folder to test the pathoid and pathoreport modules.
    You may also download the files called nt_ti.fa.gz and pathoscope2_example.tar.gz separately for running patholib and pathomap. 

3.2 Test using the example alignment file included in the package as follows:

    Suppose you have the alignment file 'MAP_3852_align.sam' in the 'example' directory and want the outputs generated in the 'results' directory, then run the following command.

pathoid and pathoreport:
Generate TSV(Tab Separated Value) file Report that can be opened in Excel and an updated alignment file:
    "python pathoscope/pathoscope.py  ID -alignFile example/MAP_3852_align.sam -expTag 3852 -outDir results"

Generate XML file and TSV file Report using the pathoreport module:
    "python pathoscope/pathoscope.py  REP -samfile results/updated_MAP_3852_align.sam -outDir results"

3.3 TSV file format:

    At the top of the file in the first row, there are two fields called "Total Number of Aligned Reads" and "Total Number of Mapped Genomes". They represent the total number of reads that are aligned and the total number of genomes to which those reads align from the given alignment file.

Columns in the TSV file:
1. Genome:
   This is the name of the genome found in the alignment file.
2. Initial Guess:
    This represent the percentage of reads that are mapped to the genome in Column 1 (reads aligning to multiple genomes are assigned proportionally) before pathoscope reassignment is performed.
3. Initial Best Hit:
    This represent the percentage of reads that are mapped to the genome in Column 1 after assigning each read uniquely to the genome with the highest score and before pathoscope reassignment is performed.
4. Initial Best Hit Read Numbers:
    This represent the number of best hit reads that are mapped to the genome in Column 1 (may include a fraction when a read is aligned to multiple top hit genomes with the same highest score) and before pathoscope reassignment is performed.
5. Final Guess:
    This represent the percentage of reads that are mapped to the genome in Column 1 (reads aligning to multiple genomes are assigned proportionally) after pathoscope reassignment is performed.
6. Final Best Hit:
    This represent the percentage of reads that are mapped to the genome in Column 1 after assigning each read uniquely to the genome with the highest score and after pathoscope reassignment is performed.
7. Final Best Hit Read Numbers:
    This represent the number of best hit reads that are mapped to the genome in Column 1 (may include a fraction when a read is aligned to multiple top hit genomes with the same highest score) and after pathoscope reassignment is performed.
8. Initial 50%-100% Hit:
    This represent the percentage of reads that are mapped to the genome in Column 1 with an alignment hit score of 50%-100% to this genome and before pathoscope reassignment is performed.
9. Initial 1%-50% Hit:
    This represent the percentage of reads that are mapped to the genome in Column 1 with an alignment hit score of 1%-50% to this genome and before pathoscope reassignment is performed.
10. Final 50%-100% Hit:
    This represent the percentage of reads that are mapped to the genome in Column 1 with an alignment hit score of 50%-100% to this genome and after pathoscope reassignment is performed.
10. Final 1%-50% Hit:
    This represent the percentage of reads that are mapped to the genome in Column 1 with an alignment hit score of 1%-50% to this genome and after pathoscope reassignment is performed.


4. License: GNU-GPL
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

5. Support and Contact

5.1 Pathoscope is developed at the JohnsonLab in Boston University.
W. Evan Johnson, Ph.D.
Division of Computational Biomedicine
Boston University School of Medicine
72 E. Concord St., E-645
Boston, MA 02118

Developers:
Solaiappan Manimaran
Changjin Hong

5.2: For support queries, please use the pathoscope google groups support forum at the following location or contact us at mani2012@users.sourceforge.net
https://groups.google.com/d/forum/pathoscope

Source: README_02_09_2015, updated 2015-11-20