File | Date | Author | Commit |
---|---|---|---|
bin | 2018-04-25 |
![]() |
[259e48] Initial commit |
lib | 2018-04-25 |
![]() |
[259e48] Initial commit |
scripts | 2018-04-25 |
![]() |
[259e48] Initial commit |
test | 2018-04-25 |
![]() |
[259e48] Initial commit |
CRISPRtrack.py | 2018-04-25 |
![]() |
[259e48] Initial commit |
README.md | 2018-04-25 |
![]() |
[8026ba] update README |
example_command.sh | 2018-04-25 |
![]() |
[259e48] Initial commit |
=========================================
CRISPRtrack
Version: v1.0.0 (April 22, 2018)
Developers: Yuzhen Ye (yye@indiana.edu) and Tony J. Lam (tjlam@indiana.edu)
School of Informatics, Computing and Engineering, Indiana University, Bloomington
This work was supported by NIH grant 1R01AI108888 to YY
CRISPRtrack is free software under the terms of the GNU General Public License as published by
the Free Software Foundation.
==========================================
CRISPRtrack uses CRISPR spacers as molecular markers to track bacterial strains.
It can be used to estimate the microbiome similarity based on the sharing of CRISPR spacer contents
between the microbiomes. It can also be used to quantify the retention of donor strains in recipients
that receive microbiota transfer treatment (such as fecal microbiota transfer, FMT) using microbiome data.
CRISPRtrack Utilizes two approaches for prediction of CRISPR arrays: de novo approach (default) and reference based (optional).
- de novo prediction utilizes CRISPRone,
- reference based prediction utilizes CRISPRAlign.
The reference based approach relies on reference CRISPR repeats to identify CRISPR arrays that contain repeats similar to the reference repeats.
We include in the package a set of reference repeats for characterizing human gut microbiomes (gutref-expanded.fna, see below); these repeats were extracted from human gut-associated bacterial genomes.
Python 2.7+, Java
usage: CRISPRtrack.py [-h] [-i INPUT_DIRECTORY] [-o OUTPUT_DIR] -m METADATA
[-r] [--ref_fast REF_FAST] [--CRISPRAlign CRISPRALIGN]
[--cdhit CDHIT] [--CRISPRone CRISPRONE]
CRISPRtrack, CRISPR based strain tracking
optional arguments:
-h, --help show this help message and exit
-i INPUT_DIRECTORY, --input_directory INPUT_DIRECTORY
Input directory of genome assembly files. (default:
current working directory).
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Output directory to create output files in (default:
current working directory).
-m METADATA, --metadata METADATA
Input metadata file. (required).
-r, --reference Run reference based (CRISPRAlign), default=FALSE
--ref_fast REF_FAST Input CRISPR repeat for reference based search in
fasta foremat, default = CRISPRtrack/bin/gutref-
expanded.fna
--CRISPRAlign CRISPRALIGN
Path to CRISPRAlign. (default =
CRISPRtrack/bin/CRISPRAlign/CrisprAlign)
--cdhit CDHIT Path to CD-HIT. (default =
CRISPRtrack/bin/CRISPRone/bin/cd-
hit-v4.6.1-2012-08-27/)
--CRISPRone CRISPRONE
Path to CRISPRone. (default =
CRISPRtrack/bin/CRISPRone/crisprone-local-nocas.php)
txt file in space or tab delimited format
<sample-name> <subject> <assembly-file> <donor> <donor/recipient> <date>
CRISPRtrack evaluate all files included in metadata file.
Note that CRISPRtrack can be used to estimate the similarity between any two microbiomes (not necessarily from donor and/or recipient).
For this case, CRISPRtrack will utilize the first three columns of the metafile, while also invoking the -p flag.
<sample-name> <subject> <assembly-file>
Utilization of the -p flag will override the standard similarity output, and output pairwise similarities between all samples in metadata file.
python CRISPRtrack.py -i test/ -m test/FMT_metadata_example1.txt -r
Runs CRISPRtrack, sets directory containing contigs as 'test/', output directory set as current working directory, utilizes both denovo-based and reference-based methods for CRISPR prediction, output similarity table for between donor and recipient samples based on spacer content.
python CRISPRtrack.py -i test -o test -m test/FMT_metadata_example2.txt -p
Runs CRISPRtrack, using test directory as output directory, denovo-based method for CRISPR prediction, outputs pairwise similarity table based on spacer content.
There are two main outputs from CRISPRtrack.py:
1. spacer-subject table (spacertable.\<prediction type="">.txt)
2. sample similarity based on spacer content sharing (sample_similarity_table.\<prediction type="">.txt)
- (alternatively) pairwise sample similarity, based on spacer content sharing. (pairwise_similarity_table.\<prediction type="">.txt)</prediction></prediction></prediction>
The spacer-subject table: lists the spacers for each subject (sample): the rows are the samples, and the columns are the spacers.
The sample similarity table: shows the similarity between the subject samples and their donors based on their sharing of the CRISPR spacers.
The pairwise similarity table: compares sample subjects show the similarity between all permutations of samples listed in the metadata file
Example usage of the spacer-subject table:
- PCA plot showing the clustering of the samples based on their spacer profiles
Example usage of the sample similarity table:
- donor strain tracking plot (the recipient-donor microbiome similarity plot)
Dependencies for visualization scripts:
- R
Users can use their favorite tools to visualize the spacer sharing between microbiomes based on the outputs from CRISPRtrack. We include in this package some scripts for visualization for your reference.
Tracking plot visualization example:
Rscript ./CRISPRtrack/scripts/tracking-plots.R sample_similarity_table.<prediction type>.txt
PCA of spacer clusters:
Rscript ./CRISPRtrack/scripts/tracking-plots.R sample_similarity_table.<prediction type>.txt