Name | Modified | Size | Downloads / Week |
---|---|---|---|
software.zip | 2016-06-03 | 27.7 kB | |
README.txt | 2015-01-29 | 3.5 kB | |
Fosmid_all_chromosomes.zip | 2015-01-29 | 7.8 MB | |
HuRef_all_chromosomes.zip | 2015-01-29 | 59.3 MB | |
Simulated_data.zip | 2015-01-29 | 18.2 MB | |
Totals: 5 Items | 85.4 MB | 0 |
SDhaP SDhaP is a fast, accurate, and scalable haplotyping assembly method for diploid and polyploid organisms. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure-namely, the low rank of the fragment data matrix to solve it rapidly and with high accuracy. There are 2 C codes distributed - SDhaP (for diploids) and SDhaP_poly (for polyploids). ------------------------------------------------------------------------------------------------ Compiling SDhaP After downloading the C codes, change your directory to the folder in which SDhaP is stored. To compile SDhaP, use gcc SDhaP.c -o hap -L/usr/lib64/atlas -llapack -lcblas -I/usr/include/atlas "hap" is the executable. Note that ATLAS and LAPACK need to be installed on your machine for SDhaP to work. Running SDhaP To run SDhaP, use the following command ./hap /home/local/altair/Fosmid/chr1.matrix.SORTED /home/local/altair/Fosmid/phase.txt The second argument is the input file and the third argument is the output file containing the phased haplotypes. --------------------------------------------------------------------------------------------------- Compiling SDhaP_poly After downloading the C codes, change your directory to the folder in which SDhaP_poly is stored. To compile SDhaP_poly, use gcc SDhaP_poly.c -o hap -L/usr/lib64/atlas -llapack -lcblas -I/usr/include/atlas "hap" is the executable. Note that ATLAS and LAPACK need to be installed on your machine for SDhaP to work. Running SDhaP_hap To run SDhaP, use the following command ./hap /home/local/altair/Fosmid/chr1.matrix.SORTED /home/local/altair/Fosmid/phase.txt 3 The second argument is the input file, the third argument is the output file containing the phased haplotypes and the fourth argument is the ploidy (which must be provided). -------------------------------------------------------------------------------------------------- Input file format The input file format for the diploid case is as follows Number of reads Number of columns Number of contiguos segments Read identifier Position of first SNP segment Continuous bases in read Position of next SNP segment Continuous bases in read ..... Quality scores (in fastq format) Example: SDhaP(diploid) 5568 22801 2 chr22_SPA9_8733 2 0 5 0 == 1 chr22_SPH2_1940 3 100 C== SDhaP_poly(polyploid) 4500 1000 2 chr3_1 1 24214 37 2243 IIIIIIIII 2 chr3_2 1 2421432 46 221413 IIIIIIIIIIIII 2 chr3_3 1 4331112 44 4112 IIIIIIIIIII 2 chr3_4 1 433111 40 41134 IIIIIIIIIII For diploids. presently only "0" and "1" encoding of the bases is accepted. For polyploids, "A,C,G,T" can be encoded as "1,2,3,4". ------------------------------------------------------------------------------------------------- Output file format The output file contains phased haplotypes for each haplotype block. the length of the block, the number of reads and the MEC per block. Subsequently, the phased haplotype is printed in the following format: SNP position first haplotype second haplotype Example: Block 1 Length of haplotype block 329 Number of reads 93 Total MEC 77 2 0 1 3 1 0 4 1 0 5 0 1 6 1 0 7 0 1 Block 2 Length of haplotype block 50 Number of reads 9 Total MEC 2 1589 1 0 1590 1 0 1591 1 0 1592 1 0 1593 0 1 1594 0 1 1595 0 1 For higher ploidy, there are K phased haplotypes instead of 2. --------------------------------------------------------------------------------------------------