DEET Code
Brought to you by:
joshuaburkhart
File | Date | Author | Commit |
---|---|---|---|
lib | 2015-12-04 | Joshua Burkhart | [a38ff8] changing filepaths & adding .pbs for ACISS exec... |
test | 2013-09-10 | Mosquito-Lab | [7538e3] updating unit tests.. |
.gitignore | 2015-11-03 | Joshua Burkhart | [ef930f] formatting, ignoring .zip files |
.project | 2014-03-22 | Joshua Burkhart | [f9ca90] adding aptana project file, removing exit |
README.md | 2015-12-04 | Joshua Burkhart | [b698d1] Updating .pbs in Readme |
deet.rb | 2013-12-04 | Joshua Burkhart | [b709db] .. |
deet_local_db.rb | 2016-04-14 | Joshua Burkhart | [edad1b] chmod +x deet_local_db.rb |
run-deet_local.pbs | 2015-12-06 | Joshua Burkhart | [1156a6] updating pbs for longfat |
run-deet_local_photo.pbs | 2016-04-14 | Joshua Burkhart | [edad1b] chmod +x deet_local_db.rb |
A program that finds and annotates genes using contigs and singletons yielded from mosquito microarray assays.
1. A fasta file containing both contig & singleton sequences, specified with -f.
2. A directory containing either Nimblegen Microarray or RNAseq data files,
respectively specified with -m or -r.
$ ruby deet.rb \
-f ~/path/to/Contigs_and_Singletons.simple.fasta \
-m ~/path/to/microarray/ma_dat/photoperiod/
$ ruby deet.rb \
-f ~/path/to/contigs.fa \
-r ~/path/to/rna_seq/rna_dat/
1. parse provided fasta file(s), storing sequences longer than MIN_SEQ_LEN
(set to 100 by default)
2. report sequences shorter than or equal to MIN_SEQ_LEN in .seqs.csv
2. hash expression data by sequence id, setting each value to that sequence's expression pattern
(where 0=under expressed, 1=over expressed, and X=not significantly expressed)
3. filter fasta sequences, assuring their existance in the hash
4. query NCBI web interface for sequence
5. record unmatched sequences and those with a low e value in .seqs.csv
(E_LIM set to 1.0e-5 by default)
6. group sequences by accession number, then by expression signature
7. report groups of sequences in .result .seqs.csv files
http://www.ncbi.nlm.nih.gov/books/NBK1763/
$ wget -r -nc -A "*.rna.fna.gz" -w 30 ftp://ftp.ncbi.nlm.nih.gov/refseq/release/invertebrate/
$ cd /N/u/joshburk/Mason/refseq_complete/ftp.ncbi.nlm.nih.gov/refseq/release/invertebrate
$ gunzip ./*
$ cat ./* > invertebrate.rna.fna
https://www.biostars.org/p/97829/
$ module load blast/2.2.29+
$ makeblastdb -in invertebrate.rna.fna -dbtype nucl -parse_seqids -out invertebrate_rna.fna
$ cat invertebrate.rna.fna | grep -e '^>' > invertebrate_rna.fna.headers_only
1. lib/LocalDbAnnotFinder.rb:line 6 must be updated to match invertebrate_rna.fna.headers_only path
2. lib/LocalDbBlaster.rb:line 18 must be updated to match tblastx and invertebrate_rna.fna paths
$ module load ruby
$ ruby deet_local_db.rb \
-f /path/to/Contigs_and_Singletons.simple.fasta \
-m /path/to/new_photoperiod_ma-2015-05-05/new_photoperiod_ma/
--
#PBS -N deet_local
#PBS -l walltime=24:00:00
#PBS -q fatnodes
#PBS -l vmem=512gb
#PBS -M burkhart.joshua@gmail.com
#PBS -m abe
#PBS -l nodes=1:ppn=32
module load ruby
module load blast/2.2.30+
cd ~/software_projects/DEET && \
ruby deet_local_db.rb \
-f ~/biting_microarray/fasta/454wyeomyia.CONTIGnATCG.fa \
-m ~/biting_microarray/ma/
--