|
From: Petr D. <pd...@sa...> - 2011-05-18 15:40:40
|
Hi Mao, this is a known issue. Alas, vcf-merge is slow and in its current incarnation there cannot be done much about it. It parses every record completely in order to merge correctly multiallelic sites (the numeric indexes to ALT may change). There is some space for improvement, but I don't see it coming anytime soon. For now, best is to run the program in parallel for each chromosome separately, see the -r option (or -c with older versions). Best, Petr On Wed, 2011-05-18 at 15:27 +0200, Mao Jianfeng wrote: > Dear all, > > This may not a general question. But, it is very serious for me. If > you can, please show me some references or others (links). > > I came from biology (population genetics), know very little about > computer comparing to bioinformaticians. I would like to learn some > more from you. > > Now, I would like to get your directions on speeding up VCFtools > computation/process. My vcftools process is usually executed by one > core (I have 8 cores in my computer), although it does not ask for > many of mermory. > > For example, when I use "merge-vcf" to merge 11 individual vcf files > (SNPs for the whole genome of 200 Mb, each file for one specific > individual), this process will run for more than 24 hours. > > I want to know: > > (1). Can I use more than one core in the process of VCFtools? How to > do that? > (2). Are there any alternative means except parallelization in a > workstation? I have ever heard something "cluster" computer, what is > that? Can it speed up computation of VCFtools? > > Thanks in advance. > > -- > Jian-Feng, Mao > > the Institute of Botany, > Chinese Academy of Botany, > ------------------------------------------------------------------------------ > What Every C/C++ and Fortran developer Should Know! > Read this article and learn how Intel has extended the reach of its > next-generation tools to help Windows* and Linux* C/C++ and Fortran > developers boost performance applications - including clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ Vcftools-help mailing list Vcf...@li... https://lists.sourceforge.net/lists/listinfo/vcftools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |