Menu

#7 Remove variant slowness

v1.0_(example)
open
nobody
query (1)
5
2012-12-10
2011-11-17
Bo Peng
No

When I delete a large proportion of variants from a project, 'vtools remove variants table' takes a very long time. This is because the query

'DELETE FROM variant WHERE variant_id IN (SELECT variant_id in table)'

needs to update indexes for every record, and remove records one by one is slow. A better method would be to

1. create a new table temp (with structure from variant)
2. INSERT INTO temp SELECT * FROM variant WHERE variant IN SELECT variant_id FROM table)
3. DROP TABLE variant;
4. rename temp to varant
5. rebuild indexes.

Discussion

  • Gao Wang

    Gao Wang - 2011-11-17

    Agreed. But I see that slowest part is in removing variants in samples -- processing each genotype table and get rid of variants. That'd take about 3 hrs on my SSD.

     
  • Bo Peng

    Bo Peng - 2011-11-17

    I have found 'vtools init --parent' is much faster in getting a project with required variants, so command 'vtools remove variants' should only be used for cases with a small number of variants (e.g. for quality control).

     
  • Bo Peng

    Bo Peng - 2011-11-17

    The slowness with removing variants from samples is because genotype tables do not have index. This can be easily fixed.

     
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.