Download Latest Version HapAltMin.tar.gz (199.5 kB)
Email in envelope

Get an email when there's a new version of HapAltMin

Home
Name Modified Size InfoDownloads / Week
HapAltMin.tar.gz 2017-05-10 199.5 kB
license.txt 2017-03-29 35.8 kB
README.txt 2017-03-19 4.7 kB
Totals: 3 Items   240.1 kB 0
# HapAltMin

## Description
HapAltMin is a software package for reconstructing haplotypes for diploid organisms from next generation sequencing reads. This project implements a matrix completion algorithm using alternating minimization procedure to infer the phasing of the haplotype. Specifically, this software has the following functions

-	1) Infer the phases of the haplotypes in a diploid species, from a set of NGS reads 
-	2) Compute the overall MEC score associated with the resultant phasing

## Citation
Please cite the following if you plan to use HapAltMin for haplotype reconstruction. 

```
S. Barik and H. Vikalo, "Binary matrix completion with performance guarantees for single individual haplotyping", to be  to IEEE ICASSP 2017. 
 
```

## Files
### Source Files
	Python script
		- HapAltMin.py
		
### Sample data
	chr22.frags 
	
## Software Requirements: 

* Python 2.7 or above (installation location should be included in $PATH)
* Python Packages : numpy
* Python Packages : libprism (customized and included)

## OS Requirements: 

This software has been tested on Ubuntu 16.04.1 LTS installation as well as Windows 8.1 environment. 


## Input Data Required: 
* '*.frags' file containing reads aligned to the above reference genome
### Format of fragment input file 
	(white space delimited, NOT tab delimited)
    First line : Describes the number of fragments and the number of SNPs in the file 
    Subsequent lines : Each line describes a SNP fragment with the below format 
        
		segment_num fragment_name start_site1 segment1 start_site2 segment2 ..... quality_score
		
		where,
        \segment_num : Number of segments which do not have gaps in the segments.
        \fragment_name : The name of the fragment.
        \start_site'i' : The first bases' position of i-th segment 
        \segment'i' : The sequence of i-th segment.
	For reference, see ``Bansal, Vikas, and Vineet Bafna. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24.16 (2008): i153-i159.''
	
# Steps

## Setup  (for numpy installation)
```
	sudo apt-get update	
    sudo apt-get install python-setuptools
    sudo apt-get install python-numpy    
```	
## Installation and Running 
Download and extract the software in a local directory. 
```
wget "https://sourceforge.net/projects/hapaltmin/files/latest/download"
tar -zxvf HapAltMin_v1.x.tar.gz
cd HapAltMin_v1.x

# input fragment file need to be sorted according to position of starting SNP 
head -1 HapAltMin_data/chr22.frags > HapAltMin_data/chr22.SORTED.frags
tail -n+2 HapAltMin_data/chr22.frags | sort -n -k 3 >> HapAltMin_data/chr22.SORTED.frags

python HapAltMin_source/HapAltMin.py --reads HapAltMin_data/chr22.SORTED.frags --output chr22.txt
```
## Output 
The output of the above command is the text file chr22.txt, which contains the phased haplotypes with the following header 
for each SNP block :
Block:''  Number of sites:''  Total MEC:''
followed by the phase of the haplotypes with the following format. 
SNP_ID A1 A2

where SNP_ID refers to the indix of each SNP, and A1, A2 are either 0 or 1, indicating the phase of the haplotype at that SNP. 
*  The symbol '-' in place of A1 and A2 indicate that the corresponding SNP has not been phased (due to insufficient coverage)

## Help 
For help, type 
python HapAltMin_source/HapAltMin.py --help
```
usage: hapaltmin.py [-h] --reads READS [READS ...] --output OUTPUT
                    [OUTPUT ...] [--maxIter MAXITER]
                    [--random_init RANDOM_INIT] [--clipping CLIPPING]
                    [--err_thresh ERR_THRESH] [--t_thresh T_THRESH]
                    [--verbose VERBOSE]

Single-individual haplotyping using Alternating Minimization

optional arguments:
  -h, --help            show this help message and exit
  --reads READS [READS ...]
                        path to reads file
  --output OUTPUT [OUTPUT ...]
                        output file containing reconstructed haplotypes
  --maxIter MAXITER     maximum number of iterations (default: 50)
  --random_init RANDOM_INIT
                        random initialization of iterates (default: False)
  --clipping CLIPPING   clipping at the initial step (default: True)
  --err_thresh ERR_THRESH
                        Relative error threshold (default: 10^-4)
  --t_thresh T_THRESH   max iterations with iterates not changing (default:
                        20)
  --verbose VERBOSE     print verbos details (default: False)
```

### Contact Information : 

    Somsubhra Barik
    Electrical and Computer Engineering Department
    The University of Texas at Austin
    Austin
    Texas 78712
    USA
    email: sbarik@utexas.edu




  
  




Source: README.txt, updated 2017-03-19