Name | Modified | Size | Downloads / Week |
---|---|---|---|
Datasets | 2017-05-10 | ||
QSdpR_v3.2.tar.gz | 2017-05-10 | 3.1 MB | |
license.txt | 2017-03-29 | 35.8 kB | |
QSdpR_v3.1.tar.gz | 2017-03-29 | 3.1 MB | |
QSdpR_v3.0.tar.gz | 2017-03-18 | 3.1 MB | |
QSdpR_v2.0.tar.gz | 2016-05-28 | 3.1 MB | |
QSdpR_v1.0.tar.gz | 2016-05-17 | 3.1 MB | |
README.txt | 2016-05-16 | 4.2 kB | |
Totals: 8 Items | 15.6 MB | 1 |
# QSdpR ## Description: This is an Viral Quasispecies Reconstruction Software to solve the Quasispecies Reconstruction problem. This project contains the software implementation of a novel algorithm based on semidefinite relaxation of a correlation clustering framework, known as QSdpR. Specifically, this software has the following functions - 1) Reconstruct Full Length Haplotypes from Reference genome, aligned Reads and variant information - 2) Estimate the abundances of individual species in the mixture - 3) Choose optimal number of quasispecies strains present in the population ## Files: ### Source Files: Python Scripts - QSdpR_vcf_to_snp.py - QSdpR_create_frag.py - QSdpR_K_iter.py - QSdpR_F.py - QSdpR_F_class.py - QSdpR_class.py - QSdpR_func.py Bash Scripts - QSdpR_master.sh C files - quasi_clus.c ### Sample Data File: - sample.fasta - sample.sam - sample.vcf ## Software Requirements: * Python 2.7 or above (installation location should be included in $PATH) * Python Packages : pysam * Python Packages : pyvcf (included) * ATLAS and LAPACK for gcc compilation * Samtools ## OS Requirements: This software has been tested on Unix Debian 7 and RHEL 6 distribution ## Input Data Required: * Reference genome in '*.fasta' format * 'Sam' file containing reads aligned to the above reference genome * 'vcf' file containing polymorphisms (based on above reference) ## Steps: 1) After unzipping, place the Source files in a local directory. We will refer to the full path to this directory as SOURCE. 2) Transfer the input data files to another local directory. We will refer to the full path of this directory as DATA. 3) Denote the full path to the directory where Samtools is installed as SAMTOOLS 4) [Optional] Edit the file QSdpR_master.sh to change the default values of the following parameters - SNP quality threshold (Cutoff threshold of SNP quality in VCF format) sample Probability (for random sub sampling of reads) read length thresholds (cutoff threshold for read length filtering) Also, if ATLAS in installed in non-standard location, then change the variables $ATLAS_LIB and $ATLAS_INCL accordingly in this file. 5) Navigate to SOURCE and change the Permission of the file QSdpR_master.sh as follows : chmod +x QSdpR_master.sh 6) Navigate to DATA directory and type the following command to start the quasispecies reconstruction : SOURCE/QSdpR_master.sh K1 K2 SOURCE DATA NAME REG1 REG2 SAMTOOLS where, 'K1,K2' - start and end value of number of quasispecies variants for model selection (K1 > 1) 'NAME' - user-defined name of the experiment 'REG1, REG2' - start and end of the genomic region over which quasispecies reconstruction is to be performed NOTE: Rename the input files *.fasta,*.sam and *.vcf to NAME.fasta, NAME.sam and NAME.vcf before running the program ## Output Files: Main file(s) - NAME_K_recon.fasta - Full length Reconstructed Fasta file with 'K' haplotypes and their frequencies (e.g. sample_2_recon.fasta), K=K1,...,K2 - Optimal K Secondary output files - NAME.frags (read-snp format) - NAME.snp (snp locations) - NAME.Fstats (Pseudo F statistics) - tmp_output_K.txt (Contains MEC, error rate, runtime information) - qsdpr_output_K.txt (snp-variant format) - tmp_objective_K.txt - run_QSdpR.sh - read_snp_info.txt ## Sample Run: Download the source files to a directory and call its full path as SOURCE. Download the sample data to another directory and call its full path as DATA. Type the following commands in a Bash shell - chmod +x SOURCE/QSdpR_master.sh; cd DATA/sample_data; SOURCE/QSdpR_master.sh 2 10 SOURCE DATA sample 1 1000 SAMTOOLS ### Contact Information : Somsubhra Barik Electrical and Computer Engineering Department The University of Texas at Austin Austin Texas 78712 USA email: sbarik@utexas.edu