Download Latest Version OptItDBA v1.0.0.zip (18.2 kB)
Email in envelope

Get an email when there's a new version of OptItDBA

Home
Name Modified Size InfoDownloads / Week
Readme.txt 2012-06-26 2.7 kB
OptItDBA v1.0.0.zip 2012-06-26 18.2 kB
Totals: 2 Items   20.9 kB 0
Scripts supporting “Conservation of Gene Cassettes Among Diverse Viruses of the Human Gut” by Samuel Minot , Gary Wu, James Lewis,  and Frederic Bushman (2012) PLoS One

Iterative.assembly.sh:
   OptItDBA: A BASH script that will iteratively assemble a set of paired-end metagenomic reads
This script requires three command-line options, paired-end read 1, paired-end read 2, and the output folder name. These are specified as follows:
>bash iterative_assembly.sh pe_read1.fastq pe_read2.fastq outputname
This script also requires that the following programs are installed and accessible to the PATH: 
* R
* Python
* Samtools
* Bwa
* SOAPdenovo-63mer
It also requires that a set of supporting scripts (contained on this page) located in the folder (or linked to the folder) where the assembly is taking place from. In other words, in the same place that iterative_assembly.sh is (or is linked to).

Protein Cassette Discovery: 
	Unless otherwise noted, all of the BASH wrappers used below take a single input: $1
1. If it doesn’t already exist, make a table with the length of each contig that the ORFs are to be predicted from. The python script ‘fasta_name_len_table.py’ used above is appropriate for this. The table must be tab-delimited, have the contig name in the first column, and the length in the third column.
2. Once glimmer is installed and tigr-glimmer is on your PATH, use this wrapper to run glimmer [glimmer-wrapper.sh]. It will use translateFasta.R[translateFasta.R] package, and so you will need R installed, and place translateFasta.R in a folder of your choosing (updating glimmer-wrapper.sh to reflect this location).
a. This will make a file with the ORF sequences: *.fastp
b. And a file with the location of those ORFs on each contig: *.predict.formatted
3. With ORFs in hand, group them into clusters using this wrapper[uclust.sh]
a. This will make a file with the cluster into which each ORF has been placed: *.clstr.tsv
4. The following files will have been generated:
a. A table with the name and length of each contig, for example ‘test.length.table’
b. A table with the location of each ORF on each contig, for example ‘test.predict.formatted’
c. A table with the cluster that each ORF has been assigned to, for example ‘test.clstr.tsv’
5. In order to cluster the ORFs, execute this R script [protein_cassette.R]. It will require the files listed above to be specified in the following manner, as well as the name of the output files, for example ‘outputfp.’
a. From within R:
i. source(‘protein_cassette.R’)
ii. module.wrapper(len.table='test.length.table’,cluster.table='test.clstr.tsv',orf.pos='test.predict.formatted',fo.base='outfp')
Source: Readme.txt, updated 2012-06-26