Menu

Tree [edad1b] master /
 History

HTTPS access


File Date Author Commit
 lib 2015-12-04 Joshua Burkhart Joshua Burkhart [a38ff8] changing filepaths & adding .pbs for ACISS exec...
 test 2013-09-10 Mosquito-Lab Mosquito-Lab [7538e3] updating unit tests..
 .gitignore 2015-11-03 Joshua Burkhart Joshua Burkhart [ef930f] formatting, ignoring .zip files
 .project 2014-03-22 Joshua Burkhart Joshua Burkhart [f9ca90] adding aptana project file, removing exit
 README.md 2015-12-04 Joshua Burkhart Joshua Burkhart [b698d1] Updating .pbs in Readme
 deet.rb 2013-12-04 Joshua Burkhart Joshua Burkhart [b709db] ..
 deet_local_db.rb 2016-04-14 Joshua Burkhart Joshua Burkhart [edad1b] chmod +x deet_local_db.rb
 run-deet_local.pbs 2015-12-06 Joshua Burkhart Joshua Burkhart [1156a6] updating pbs for longfat
 run-deet_local_photo.pbs 2016-04-14 Joshua Burkhart Joshua Burkhart [edad1b] chmod +x deet_local_db.rb

Read Me

DEET DOI

A program that finds and annotates genes using contigs and singletons yielded from mosquito microarray assays.

Input

1. A fasta file containing both contig & singleton sequences, specified with -f.

2. A directory containing either Nimblegen Microarray or RNAseq data files,  
   respectively specified with -m or -r.

Examples

Microarray

$ ruby deet.rb \
-f ~/path/to/Contigs_and_Singletons.simple.fasta \
-m ~/path/to/microarray/ma_dat/photoperiod/

RNA-Seq

$ ruby deet.rb \
-f ~/path/to/contigs.fa \
-r ~/path/to/rna_seq/rna_dat/

Algorithm Overview

1. parse provided fasta file(s), storing sequences longer than MIN_SEQ_LEN 
   (set to 100 by default)
2. report sequences shorter than or equal to MIN_SEQ_LEN in .seqs.csv
2. hash expression data by sequence id, setting each value to that sequence's expression pattern 
   (where 0=under expressed, 1=over expressed, and X=not significantly expressed)
3. filter fasta sequences, assuring their existance in the hash
4. query NCBI web interface for sequence
5. record unmatched sequences and those with a low e value in .seqs.csv 
   (E_LIM set to 1.0e-5 by default)
6. group sequences by accession number, then by expression signature
7. report groups of sequences in .result .seqs.csv files

DEET Local DB Prep

Reference

http://www.ncbi.nlm.nih.gov/books/NBK1763/

Download and Combine Files

$ wget -r -nc -A "*.rna.fna.gz" -w 30 ftp://ftp.ncbi.nlm.nih.gov/refseq/release/invertebrate/
$ cd /N/u/joshburk/Mason/refseq_complete/ftp.ncbi.nlm.nih.gov/refseq/release/invertebrate
$ gunzip ./*
$ cat ./* > invertebrate.rna.fna

Make Blast DB

https://www.biostars.org/p/97829/

$ module load blast/2.2.29+
$ makeblastdb -in invertebrate.rna.fna -dbtype nucl -parse_seqids -out invertebrate_rna.fna

Make Header File

$ cat invertebrate.rna.fna | grep -e '^>' > invertebrate_rna.fna.headers_only

Issues

1. lib/LocalDbAnnotFinder.rb:line 6 must be updated to match invertebrate_rna.fna.headers_only path
2. lib/LocalDbBlaster.rb:line 18 must be updated to match tblastx and invertebrate_rna.fna paths

Run DEET Against Local DB

$ module load ruby
$ ruby deet_local_db.rb \
-f /path/to/Contigs_and_Singletons.simple.fasta \
-m /path/to/new_photoperiod_ma-2015-05-05/new_photoperiod_ma/

run-deet_local.pbs

--
#PBS -N deet_local
#PBS -l walltime=24:00:00
#PBS -q fatnodes
#PBS -l vmem=512gb
#PBS -M burkhart.joshua@gmail.com
#PBS -m abe
#PBS -l nodes=1:ppn=32

module load ruby
module load blast/2.2.30+

cd ~/software_projects/DEET && \
ruby deet_local_db.rb \
-f ~/biting_microarray/fasta/454wyeomyia.CONTIGnATCG.fa \
-m ~/biting_microarray/ma/
--