Name | Modified | Size | Downloads / Week |
---|---|---|---|
HapAltMin.tar.gz | 2017-05-10 | 199.5 kB | |
license.txt | 2017-03-29 | 35.8 kB | |
README.txt | 2017-03-19 | 4.7 kB | |
Totals: 3 Items | 240.1 kB | 0 |
# HapAltMin ## Description HapAltMin is a software package for reconstructing haplotypes for diploid organisms from next generation sequencing reads. This project implements a matrix completion algorithm using alternating minimization procedure to infer the phasing of the haplotype. Specifically, this software has the following functions - 1) Infer the phases of the haplotypes in a diploid species, from a set of NGS reads - 2) Compute the overall MEC score associated with the resultant phasing ## Citation Please cite the following if you plan to use HapAltMin for haplotype reconstruction. ``` S. Barik and H. Vikalo, "Binary matrix completion with performance guarantees for single individual haplotyping", to be to IEEE ICASSP 2017. ``` ## Files ### Source Files Python script - HapAltMin.py ### Sample data chr22.frags ## Software Requirements: * Python 2.7 or above (installation location should be included in $PATH) * Python Packages : numpy * Python Packages : libprism (customized and included) ## OS Requirements: This software has been tested on Ubuntu 16.04.1 LTS installation as well as Windows 8.1 environment. ## Input Data Required: * '*.frags' file containing reads aligned to the above reference genome ### Format of fragment input file (white space delimited, NOT tab delimited) First line : Describes the number of fragments and the number of SNPs in the file Subsequent lines : Each line describes a SNP fragment with the below format segment_num fragment_name start_site1 segment1 start_site2 segment2 ..... quality_score where, \segment_num : Number of segments which do not have gaps in the segments. \fragment_name : The name of the fragment. \start_site'i' : The first bases' position of i-th segment \segment'i' : The sequence of i-th segment. For reference, see ``Bansal, Vikas, and Vineet Bafna. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24.16 (2008): i153-i159.'' # Steps ## Setup (for numpy installation) ``` sudo apt-get update sudo apt-get install python-setuptools sudo apt-get install python-numpy ``` ## Installation and Running Download and extract the software in a local directory. ``` wget "https://sourceforge.net/projects/hapaltmin/files/latest/download" tar -zxvf HapAltMin_v1.x.tar.gz cd HapAltMin_v1.x # input fragment file need to be sorted according to position of starting SNP head -1 HapAltMin_data/chr22.frags > HapAltMin_data/chr22.SORTED.frags tail -n+2 HapAltMin_data/chr22.frags | sort -n -k 3 >> HapAltMin_data/chr22.SORTED.frags python HapAltMin_source/HapAltMin.py --reads HapAltMin_data/chr22.SORTED.frags --output chr22.txt ``` ## Output The output of the above command is the text file chr22.txt, which contains the phased haplotypes with the following header for each SNP block : Block:'' Number of sites:'' Total MEC:'' followed by the phase of the haplotypes with the following format. SNP_ID A1 A2 where SNP_ID refers to the indix of each SNP, and A1, A2 are either 0 or 1, indicating the phase of the haplotype at that SNP. * The symbol '-' in place of A1 and A2 indicate that the corresponding SNP has not been phased (due to insufficient coverage) ## Help For help, type python HapAltMin_source/HapAltMin.py --help ``` usage: hapaltmin.py [-h] --reads READS [READS ...] --output OUTPUT [OUTPUT ...] [--maxIter MAXITER] [--random_init RANDOM_INIT] [--clipping CLIPPING] [--err_thresh ERR_THRESH] [--t_thresh T_THRESH] [--verbose VERBOSE] Single-individual haplotyping using Alternating Minimization optional arguments: -h, --help show this help message and exit --reads READS [READS ...] path to reads file --output OUTPUT [OUTPUT ...] output file containing reconstructed haplotypes --maxIter MAXITER maximum number of iterations (default: 50) --random_init RANDOM_INIT random initialization of iterates (default: False) --clipping CLIPPING clipping at the initial step (default: True) --err_thresh ERR_THRESH Relative error threshold (default: 10^-4) --t_thresh T_THRESH max iterations with iterates not changing (default: 20) --verbose VERBOSE print verbos details (default: False) ``` ### Contact Information : Somsubhra Barik Electrical and Computer Engineering Department The University of Texas at Austin Austin Texas 78712 USA email: sbarik@utexas.edu