| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Readme.txt | 2012-06-26 | 2.7 kB | |
| OptItDBA v1.0.0.zip | 2012-06-26 | 18.2 kB | |
| Totals: 2 Items | 20.9 kB | 0 |
Scripts supporting Conservation of Gene Cassettes Among Diverse Viruses of the Human Gut by Samuel Minot , Gary Wu, James Lewis, and Frederic Bushman (2012) PLoS One Iterative.assembly.sh: OptItDBA: A BASH script that will iteratively assemble a set of paired-end metagenomic reads This script requires three command-line options, paired-end read 1, paired-end read 2, and the output folder name. These are specified as follows: >bash iterative_assembly.sh pe_read1.fastq pe_read2.fastq outputname This script also requires that the following programs are installed and accessible to the PATH: * R * Python * Samtools * Bwa * SOAPdenovo-63mer It also requires that a set of supporting scripts (contained on this page) located in the folder (or linked to the folder) where the assembly is taking place from. In other words, in the same place that iterative_assembly.sh is (or is linked to). Protein Cassette Discovery: Unless otherwise noted, all of the BASH wrappers used below take a single input: $1 1. If it doesnt already exist, make a table with the length of each contig that the ORFs are to be predicted from. The python script fasta_name_len_table.py used above is appropriate for this. The table must be tab-delimited, have the contig name in the first column, and the length in the third column. 2. Once glimmer is installed and tigr-glimmer is on your PATH, use this wrapper to run glimmer [glimmer-wrapper.sh]. It will use translateFasta.R[translateFasta.R] package, and so you will need R installed, and place translateFasta.R in a folder of your choosing (updating glimmer-wrapper.sh to reflect this location). a. This will make a file with the ORF sequences: *.fastp b. And a file with the location of those ORFs on each contig: *.predict.formatted 3. With ORFs in hand, group them into clusters using this wrapper[uclust.sh] a. This will make a file with the cluster into which each ORF has been placed: *.clstr.tsv 4. The following files will have been generated: a. A table with the name and length of each contig, for example test.length.table b. A table with the location of each ORF on each contig, for example test.predict.formatted c. A table with the cluster that each ORF has been assigned to, for example test.clstr.tsv 5. In order to cluster the ORFs, execute this R script [protein_cassette.R]. It will require the files listed above to be specified in the following manner, as well as the name of the output files, for example outputfp. a. From within R: i. source(protein_cassette.R) ii. module.wrapper(len.table='test.length.table,cluster.table='test.clstr.tsv',orf.pos='test.predict.formatted',fo.base='outfp')