|
From: CHEN Yi-S. <Yi-...@3d...> - 2016-07-19 14:25:09
|
Hi All, hts_idx_load_core seems to consume a significant amount of running time when VCF files are queried on small regions. I have thousands of VCF files (more are coming) with the file size of vcf.gz about 100 MB and vcf.gz.tbi about 1 MB. A common use case is to query these VCF files on nearby small regions, for example the CDS regions of a gene. I have C++ codes linked to htslib to run my queries and I found that hts_idx_load_core takes a lot of running time because hts_idx_load_core always loads the whole tbi even though the query regions are on a single chromosome. I tinkered with hts_idx_load_core so that it loads only the chromosome of interest instead of the whole tbi. And I can get VCF query running 10 times faster. I am wondering if this is the right approach to speed up VCF query. Any suggestion is welcome. Thanks, Yi-Shiou Chen 3DS.COM/BIOVIA<http://www.3ds.com/BIOVIA> This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer |