From: Thu H. Le <le....@mb...> - 2017-02-27 21:20:09
|
Hi, After imputation, I have 10 files for 5 animals: 5 imputed ones and 5 original ones. I want to merge imputed files for 5 animals into 1 file called impu.merge, and merge original files for 5 animals into 1 file called ori.merge. Then get the imputation accuracy as the correlation between impu.merge and ori.merge. These 2 files used for stats option look like this: - Original #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 105204405 138226506 138323507 138384403 154646809 7 82 . A G 1132.85 PASS AN=10;AC=1 GT:AD:DP:GQ:PL 0/1:8,1:9:7:7,0,188 0/0:4,0:4:12:0,12,121 0/0:4,0:4:6:0,6,58 0/0:2,0:2:6:0,6,58 0/0:2,0:2:6:0,6,56 7 324 rs324597822 C T 2376.35 PASS AN=10;AC=1 GT:AD:DP:GQ:PL 0/0:9,0:9:24:0,24,252 0/0:2,0:2:6:0,6,58 0/0:14,0:14:39:0,39,390 0/1:7,6:13:99:137,0,158 0/0:13,0:13:33:0,33,330 7 411 rs332929486 C T 3110.59 PASS AN=10;AC=1 GT:AD:DP:GQ:PL 0/0:19,0:19:57:0,57,603 0/0:7,0:7:21:0,21,207 0/0:9,0:9:27:0,27,267 0/1:7,7:14:99:170,0,168 0/0:11,0:11:33:0,33,317 - Imputed #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 105204405 138226506 138323507 138384403 154646809 7 82 . A G . PASS AR2=0;DR2=0;IMP;AF=0.18 GT:DS 0|0:0.32 0|0:0.29 0|0:0.34 0|0:0.4 0|0:0.36 7 324 rs324597822 C T . PASS AR2=0;DR2=0;IMP;AF=0.002 GT:DS 0|0:0 0|0:0 0|0:0 0|1:0.81 0|0:0 7 411 rs332929486 C T . PASS AR2=0;DR2=0;IMP;AF=0.002 GT:DS 0|0:0 0|0:0 0|0:0 0|1:0.86 0|0:0 7 466 rs339994861 G A . PASS AR2=0;DR2=0;IMP;AF=0.002 GT:DS 0|0:0 0|0:0 0|0:0 0|1:0.76 0|0:0 I used bcftools merge and bcftools stats for the correlation, however the result looks like below: # Definition of sets: # ID [2]id [3]tab-separated file names ID 0 ori.merge.vcf.gz ID 1 impu.merge.vcf.gz ID 2 ori.merge.vcf.gz impu.merge.vcf.gz # SN, Summary numbers: # SN [2]id [3]key [4]value SN 0 number of samples: 5 SN 1 number of samples: 5 SN 0 number of records: 0 SN 0 number of no-ALTs: 0 SN 0 number of SNPs: 0 SN 0 number of MNPs: 0 SN 0 number of indels: 0 SN 0 number of others: 0 SN 0 number of multiallelic sites: 0 SN 0 number of multiallelic SNP sites: 0 SN 1 number of records: 0 SN 1 number of no-ALTs: 0 SN 1 number of SNPs: 0 SN 1 number of MNPs: 0 SN 1 number of indels: 0 SN 1 number of others: 0 SN 1 number of multiallelic sites: 0 SN 1 number of multiallelic SNP sites: 0 SN 2 number of records: 557522 SN 2 number of no-ALTs: 0 SN 2 number of SNPs: 482188 SN 2 number of MNPs: 0 SN 2 number of indels: 75334 SN 2 number of others: 0 SN 2 number of multiallelic sites: 0 SN 2 number of multiallelic SNP sites: 0 # TSTV, transitions/transversions: # TSTV [2]id [3]ts [4]tv [5]ts/tv [6]ts (1st ALT) [7]tv (1st ALT)[8]ts/tv (1st ALT) TSTV 0 0 0 0.00 0 0 0.00 TSTV 1 0 0 0.00 0 0 0.00 TSTV 2 345017 137171 2.52 345017 137171 2.52 # SiS, Singleton stats: # SiS [2]id [3]allele count [4]number of SNPs [5]number of transitions[6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable SiS 0 1 0 0 0 0 0 0 0 SiS 1 1 0 0 0 0 0 0 0 SiS 2 1 108177 77191 30986 16626 0 0 16626 # AF, Stats by non-reference allele frequency: # AF [2]id [3]allele frequency [4]number of SNPs [5]number of transitions [6]number of transversions [7]number of indels [8]repeat-consistent [9]repeat-inconsistent [10]not applicable AF 2 0.000000 155706 110819 44887 24277 0 0 24277 AF 2 19.000000 88639 63282 25357 14162 0 0 14162 AF 2 24.000000 2060 1496 564 554 0 0 554 AF 2 29.000000 66821 47973 18848 10480 0 0 10480 AF 2 33.000000 37 23 14 27 0 0 27 ................................................................................................................... AF 2 74.000000 615 433 182 167 0 0 167 AF 2 79.000000 18261 13151 5110 2540 0 0 2540 AF 2 82.000000 11 7 4 3 0 0 3 AF 2 86.000000 539 387 152 108 0 0 108 AF 2 89.000000 11415 8025 3390 1662 0 0 1662 AF 2 99.000000 7271 5101 2170 1114 0 0 1114 # QUAL, Stats by quality: # QUAL [2]id [3]Quality [4]number of SNPs [5]number of transitions (1st ALT) [6]number of transversions (1st ALT) [7]number of indels QUAL 2 30 43 20 23 9 QUAL 2 31 33 21 12 10 QUAL 2 32 34 14 20 17 QUAL 2 33 42 27 15 16 QUAL 2 34 41 19 22 10 QUAL 2 35 30 15 15 9 QUAL 2 36 40 20 20 11 QUAL 2 37 47 31 16 8 .................................................................... QUAL 2 996 35 29 6 9 QUAL 2 997 42 30 12 5 QUAL 2 998 461047 331219 129828 66436 # IDD, InDel distribution: # IDD [2]id [3]length (deletions negative) [4]count IDD 2 -42 1 IDD 2 -41 1 IDD 2 -40 2 IDD 2 -39 2 IDD 2 -38 4 ................................. ST 2 A>T 15165 ST 2 C>A 19417 ST 2 C>G 15966 ST 2 C>T 91789 ST 2 G>A 92114 ST 2 G>C 15556 ST 2 G>T 19306 ST 2 T>A 15383 ST 2 T>C 80274 ST 2 T>G 18253 # DP, Depth distribution # DP [2]id [3]bin [4]number of genotypes [5]fraction of genotypes (%) [6]number of sites [7]fraction of sites (%) I don't understand why 1) the ID=0 and ID=1 in Summary number did not have records 2) the Depth distribution was empty If you know what is wrong, please let me know. I really appreciate for any suggestion! Kind regards, Thu |