|
From: Óscar M. <om...@bi...> - 2020-06-23 15:20:16
|
Dear all, I'm using VCFtools 0.1.17 to filter out loci with low (or extremely high) coverage, and it is is surprisingly fast even with big data sets. First I check the mean values (and plot them) to determine a range of mean depth, and then I filter out the loci out of that range. example command line: vcftools --vcf populations.snps.vcf --site-mean-depth --out dataset_m01r5p19R7h5 vcftools --vcf populations.snps.vcf --min-meanDP 4 --max-meanDP 20 --recode --out dataset_m01r5p19R7h5_depth Problem is the number of loci with a specific depth are not the same in the ".ldepth.mean" table than in the output file after pruning loci. For example: If I check how many loci there are with MEAN_DEPTH above 4; in the table there is 5926; but if I prune out loci with "--min-meanDP 4" there are only 2157 loci left in the output file. Tried this with many different data, and combinations of values for --min-meanDP and --max-meanDP, and I always get way fewer loci in the output file than if I calculate their number manually from the table. What could be the reason of this? I can send you the log files or any other file if that helps. Many thanks -- Dr. Óscar Mira Department of Biology University of Zagreb Rooseveltov trg 6 10000 Zagreb, Croatia |