Download Latest Version software.zip (27.7 kB)
Email in envelope

Get an email when there's a new version of SDhaP

Home
Name Modified Size InfoDownloads / Week
software.zip 2016-06-03 27.7 kB
README.txt 2015-01-29 3.5 kB
Fosmid_all_chromosomes.zip 2015-01-29 7.8 MB
HuRef_all_chromosomes.zip 2015-01-29 59.3 MB
Simulated_data.zip 2015-01-29 18.2 MB
Totals: 5 Items   85.4 MB 0
SDhaP

SDhaP is a fast, accurate, and scalable haplotyping assembly method for diploid and polyploid organisms. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure-namely, the low rank of the fragment data matrix to solve it rapidly and with high accuracy.

There are 2 C codes distributed - SDhaP (for diploids) and SDhaP_poly (for polyploids).

------------------------------------------------------------------------------------------------
Compiling SDhaP

After downloading the C codes, change your directory to the folder in which SDhaP is stored. To compile SDhaP, use

gcc SDhaP.c -o hap -L/usr/lib64/atlas -llapack -lcblas -I/usr/include/atlas

"hap" is the executable. Note that ATLAS and LAPACK need to be installed on your machine for SDhaP to work.

Running SDhaP

To run SDhaP, use the following command

./hap /home/local/altair/Fosmid/chr1.matrix.SORTED /home/local/altair/Fosmid/phase.txt

The second argument is the input file and the third argument is the output file containing the phased haplotypes.

---------------------------------------------------------------------------------------------------
Compiling SDhaP_poly

After downloading the C codes, change your directory to the folder in which SDhaP_poly is stored. To compile SDhaP_poly, use

gcc SDhaP_poly.c -o hap -L/usr/lib64/atlas -llapack -lcblas -I/usr/include/atlas

"hap" is the executable. Note that ATLAS and LAPACK need to be installed on your machine for SDhaP to work.

Running SDhaP_hap

To run SDhaP, use the following command

./hap /home/local/altair/Fosmid/chr1.matrix.SORTED /home/local/altair/Fosmid/phase.txt 3

The second argument is the input file, the third argument is the output file containing the phased haplotypes and the fourth argument is the ploidy (which must be provided).

--------------------------------------------------------------------------------------------------

Input file format

The input file format for the diploid case is as follows

Number of reads
Number of columns
Number of contiguos segments  Read identifier 	Position of first SNP segment		Continuous bases in read 	 Position of next SNP segment	Continuous bases in read ..... Quality scores (in fastq format)


Example:

SDhaP(diploid)

5568 
22801
2 chr22_SPA9_8733 2 0 5 0 ==
1 chr22_SPH2_1940 3 100 C==



SDhaP_poly(polyploid)

4500
1000
2	chr3_1	1	24214	37	2243	IIIIIIIII
2	chr3_2	1	2421432	46	221413	IIIIIIIIIIIII
2	chr3_3	1	4331112	44	4112	IIIIIIIIIII
2	chr3_4	1	433111	40	41134	IIIIIIIIIII


For diploids. presently only "0" and "1" encoding of the bases is accepted. For polyploids, "A,C,G,T" can be encoded as "1,2,3,4". 


-------------------------------------------------------------------------------------------------

Output file format 

The output file contains phased haplotypes for each haplotype block. the length of the block, the number of reads and the MEC per block. Subsequently, the phased haplotype is printed in the following format:
SNP position 	first haplotype		second haplotype

Example:

Block 1	 Length of haplotype block 329	 Number of reads 93	 Total MEC 77
2	 0	 1
3	 1	 0
4	 1	 0
5	 0	 1
6	 1	 0
7	 0	 1
Block 2	 Length of haplotype block 50	 Number of reads 9	 Total MEC 2
1589	 1	 0
1590	 1	 0
1591	 1	 0
1592	 1	 0
1593	 0	 1
1594	 0	 1
1595	 0	 1

For higher ploidy, there are K phased haplotypes instead of 2.


--------------------------------------------------------------------------------------------------




Source: README.txt, updated 2015-01-29