| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| HapAltMin.tar.gz | 2017-05-10 | 199.5 kB | |
| license.txt | 2017-03-29 | 35.8 kB | |
| README.txt | 2017-03-19 | 4.7 kB | |
| Totals: 3 Items | 240.1 kB | 0 |
# HapAltMin
## Description
HapAltMin is a software package for reconstructing haplotypes for diploid organisms from next generation sequencing reads. This project implements a matrix completion algorithm using alternating minimization procedure to infer the phasing of the haplotype. Specifically, this software has the following functions
- 1) Infer the phases of the haplotypes in a diploid species, from a set of NGS reads
- 2) Compute the overall MEC score associated with the resultant phasing
## Citation
Please cite the following if you plan to use HapAltMin for haplotype reconstruction.
```
S. Barik and H. Vikalo, "Binary matrix completion with performance guarantees for single individual haplotyping", to be to IEEE ICASSP 2017.
```
## Files
### Source Files
Python script
- HapAltMin.py
### Sample data
chr22.frags
## Software Requirements:
* Python 2.7 or above (installation location should be included in $PATH)
* Python Packages : numpy
* Python Packages : libprism (customized and included)
## OS Requirements:
This software has been tested on Ubuntu 16.04.1 LTS installation as well as Windows 8.1 environment.
## Input Data Required:
* '*.frags' file containing reads aligned to the above reference genome
### Format of fragment input file
(white space delimited, NOT tab delimited)
First line : Describes the number of fragments and the number of SNPs in the file
Subsequent lines : Each line describes a SNP fragment with the below format
segment_num fragment_name start_site1 segment1 start_site2 segment2 ..... quality_score
where,
\segment_num : Number of segments which do not have gaps in the segments.
\fragment_name : The name of the fragment.
\start_site'i' : The first bases' position of i-th segment
\segment'i' : The sequence of i-th segment.
For reference, see ``Bansal, Vikas, and Vineet Bafna. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24.16 (2008): i153-i159.''
# Steps
## Setup (for numpy installation)
```
sudo apt-get update
sudo apt-get install python-setuptools
sudo apt-get install python-numpy
```
## Installation and Running
Download and extract the software in a local directory.
```
wget "https://sourceforge.net/projects/hapaltmin/files/latest/download"
tar -zxvf HapAltMin_v1.x.tar.gz
cd HapAltMin_v1.x
# input fragment file need to be sorted according to position of starting SNP
head -1 HapAltMin_data/chr22.frags > HapAltMin_data/chr22.SORTED.frags
tail -n+2 HapAltMin_data/chr22.frags | sort -n -k 3 >> HapAltMin_data/chr22.SORTED.frags
python HapAltMin_source/HapAltMin.py --reads HapAltMin_data/chr22.SORTED.frags --output chr22.txt
```
## Output
The output of the above command is the text file chr22.txt, which contains the phased haplotypes with the following header
for each SNP block :
Block:'' Number of sites:'' Total MEC:''
followed by the phase of the haplotypes with the following format.
SNP_ID A1 A2
where SNP_ID refers to the indix of each SNP, and A1, A2 are either 0 or 1, indicating the phase of the haplotype at that SNP.
* The symbol '-' in place of A1 and A2 indicate that the corresponding SNP has not been phased (due to insufficient coverage)
## Help
For help, type
python HapAltMin_source/HapAltMin.py --help
```
usage: hapaltmin.py [-h] --reads READS [READS ...] --output OUTPUT
[OUTPUT ...] [--maxIter MAXITER]
[--random_init RANDOM_INIT] [--clipping CLIPPING]
[--err_thresh ERR_THRESH] [--t_thresh T_THRESH]
[--verbose VERBOSE]
Single-individual haplotyping using Alternating Minimization
optional arguments:
-h, --help show this help message and exit
--reads READS [READS ...]
path to reads file
--output OUTPUT [OUTPUT ...]
output file containing reconstructed haplotypes
--maxIter MAXITER maximum number of iterations (default: 50)
--random_init RANDOM_INIT
random initialization of iterates (default: False)
--clipping CLIPPING clipping at the initial step (default: True)
--err_thresh ERR_THRESH
Relative error threshold (default: 10^-4)
--t_thresh T_THRESH max iterations with iterates not changing (default:
20)
--verbose VERBOSE print verbos details (default: False)
```
### Contact Information :
Somsubhra Barik
Electrical and Computer Engineering Department
The University of Texas at Austin
Austin
Texas 78712
USA
email: sbarik@utexas.edu