PRADA Wiki

PRADA : Pipeline for RNA-Sequencing Data Analysis

Brought to you by: rahulsimham, roelverhaak, syzheng, wandaliztorres

Home

Authors:

Attachments

pyPRADA.pdf (972724 bytes)

PRADA:Pipeline for RNA-Sequencing Data Analysis

Documentation

Link to pyPRADA Wiki
Link to Documentation

PRADA
Overview
Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. The BAM files generated by the pipeline are readily compatible with different tools for mutation calling and to obtain read counts for further downstream analysis.

Description
PRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates.

Modules
PRADA currently supports 7 modules to process and identify abnormalities from RNAseq data:
preprocess: Generates aligned and recalibrated BAM files.
expression: Generates gene expression (RPKM) and quality metrics.
fusion: Identifies candidate gene fusions.
guess-ft: Supervised search for fusion transcripts.
guess-if: Supervised search for intragenic fusions.
homology: Calculates homology between given two genes.
frame: Predicts functional consequence of fusion transcript

Installation
PRADA is written in python programing language and intended to run in a command line environment on UNIX or LINUX operating systems. To run pyPRADA, download the pre-compiled package and unzip to preferred installation location.
The hg19 reference files are available for download here

Once the reference files are downloaded and extracted, generate index files for all the FASTA files in reference folder:
pyPRADA_DIR/tools/bwa-0.5.7-mh/bwa index -a bwtsw HG19_REF/Ensembl64.transcriptome.fasta
pyPRADA_DIR/tools/bwa-0.5.7-mh/bwa index -a bwtsw HG19_REF/Ensembl64.transcriptome.formatted.fasta
pyPRADA_DIR/tools/bwa-0.5.7-mh/bwa index -a bwtsw HG19_REF/Ensembl64.transcriptome.plus.genome.fasta
pyPRADA_DIR/tools/bwa-0.5.7-mh/bwa index -a bwtsw HG19_REF/Homo_sapiens_assembly19.fasta

Set the configuration file (ref.txt):
reference files
compdb_fasta HG19_REF/Ensembl64.transcriptome.plus.genome.fasta
compdb_fai HG19_REF/Ensembl64.transcriptome.plus.genome.fasta.fai
compdb_map HG19_REF/Ensembl64.transcriptome.plus.genome.map
genome_fasta HG19_REF/Homo_sapiens_assembly19.fasta
genome_gtf HG19_REF/Homo_sapiens.GRCh37.64.gtf
dbsnp_vcf HG19_REF/dbsnp_135.b37.vcf
select_tx HG19_REF/Ensembl64.selected.transcripts
feature_file HG19_REF/Ensembl64.canonical.gene.exons.tab.txt
tx_seq_file HG19_REF/Ensembl64.transcriptome.fasta
ref_anno HG19_REF/Ensembl64.transcriptome.annotations
ref_map HG19_REF/Ensembl64.transcriptome.formatted.map
ref_fasta HG19_REF/Ensembl64.transcriptome.formatted.fasta
cds_file HG19_REF/ensembl.hg19.cds.txt
txcat_file HG19_REF/Ensembl64_primary_transcript.txt

Preprocess step parameters when using PBS system
pbs_queue long #queue name, for preprocessing module
pbs_email userid@mdanderson.org #email used in PBS for notification
parallel_n_threads 24 #number of cores used in alignment and recalibration

Development Information
Language: Python
Current Version: 1.1
Platforms: Unix (OpenPBS)
License: MIT
Status:Active
Last Updated: April 2013

References
Citations: No Formal Publications
Help and Support:
Contact: Roel Verhaak