Help save net neutrality! Learn more.
Close

Best practices with SNPTools

Jin Yu

Step 1: EBD calcualtion

Input: BAM files
Output: EBD files

for i in `cat $BAM_list` ; do 
    echo "pileup $i" | msub -q analysis -d $PWD -V -l nodes=1:ppn=1,mem=10000M; 
done

Step 2: SNP calling

Input: EBD files, reference file, chromosome number
Output: SNP calls in VCF file

echo "varisite $EBD_list $Chr ~/reference/human_b37/"$Chr".fa " | msub -q analysis -d $PWD -V -l nodes=1:ppn=8

Step 3: Genotype likelihood calculation

Input: Sample list, BAM files, SNP calls in VCF file
Output: GL in RAW files

Note 1: the following script calculates genotype likelihood using all the BAM files of the same sample
Note 2: remember to check the position order in the VCF file.

for sample in  `cat $Sample_list`; do
    echo "bamodel $sample $VCF `grep $sample $BAM_list | tr '\n' ' '`" | msub -q analysis -d $PWD -V -l nodes=1:ppn=1,pmem=8192M -N $sample;
done;

Step 4: Combine GL of each individual to one file

Input: RAW files, SNP calls in VCF,
Output: Prob file

echo "poprob $VCF $RAW_list $Prob -b 25600" | msub -q analysis -d $PWD -V -l nodes=1:ppn=8,mem=30G

Step 5: Divide Prob files in bins to parallel imputation

Input: Prob file, chromosome number
Output: Bin files

echo "probin $Prob $Chr -f $Bin_directory" | msub -q analysis -d $PWD -V

Step 6: Imputation

Input: Bin files in list
Output: imputation result of each bin will be generated in the same directory of Bin

Note: to make use of multiply CPUs of the nodes, it is suggested to split the Bin list in different parts, and then submit the imputation job of each part to different nodes

for i in `ls part.*`; do 
    echo "impute -l $i " | msub -q analysis -d $PWD -V -l nodes=1:ppn=4; 
done

Step 7: Bind haplotype blocks together in one VCF

Input: Bin directory
Output: genotype/haplotype in VCF

echo "hapfuse $VCF $Bin_directory" |msub -q analysis -d $PWD -V

Appendix: Convert binary GL file to VCF format

Input: Prob file created from step 4
Output: VCF file of population

echo "prob2vcf in.prob out.vcf.gz chr" | msub -q analysis -d $PWD -V

Related

Wiki: Best practices of integrated SNP analysis
Wiki: Home
Wiki: Run SNPTools on Amazon Cloud