From: Laurent F. <lau...@gm...> - 2011-02-28 12:30:48
|
Dear VCFTools Team, I am trying to compare two VCF files containing the same individual (one with sequence data, the other with immunochip data) using vcftools --diff --diff-site-discordance --diff-discordance-matrix I know that the documentation says that this functionality is likely buggy and so this might be the root of my incomprehensions. Here is the head of the .diff.sites: CHROM POS FILES MATCHING_ALLELES N_COMMON_CALLED N_DISCORD DISCORDANCE 1 10009 2 0 0 0 nan 1 10109 2 0 0 0 nan 1 10150 2 0 0 0 nan 1 10180 2 0 0 0 nan 1 10234 2 0 0 0 nan .... And here are the result of summing each of the columns: Loci in file 1 only: 119529 (2.911%) Loci in file 2 only: 978611 (23.832%) Loci in both files: 3008166 (73.257%) Matching Alleles: 3006754 (99.953%) Common called: 3008166 (100.000%) Discordant: 64541 (2.146%) Discordance: 64541 (2.146%) Here, I am not sure what the MATCHING_ALLELES, N_COMMON_CALLED and N_DISCORD columns are exactly. Could you please explain those? Now below is the output for the .diff.discordance_matrix: N_0/0_file1 N_0/1_file1 N_1/1_file1 N_./._file1 N_0/0_file2 0 0 0 0 N_0/1_file2 2742 1879154 2914 0 N_1/1_file2 54656 2817 1064471 0 N_./._file2 0 0 0 0 Here I think everything is quite straight forward except that I do not have any 0/0 loci any of the two files, yet there are non-0 values in the N_0/0_file1 column. Could you please let me know how this could happen? Thanks a lot, Laurent -- Laurent Francioli PhD Student UMC Utrecht |