Menu

#7 Remove variant slowness

v1.0_(example)
open
nobody
query (1)
5
2012-12-10
2011-11-17
Bo Peng
No

When I delete a large proportion of variants from a project, 'vtools remove variants table' takes a very long time. This is because the query

'DELETE FROM variant WHERE variant_id IN (SELECT variant_id in table)'

needs to update indexes for every record, and remove records one by one is slow. A better method would be to

1. create a new table temp (with structure from variant)
2. INSERT INTO temp SELECT * FROM variant WHERE variant IN SELECT variant_id FROM table)
3. DROP TABLE variant;
4. rename temp to varant
5. rebuild indexes.

Discussion

  • Gao Wang

    Gao Wang - 2011-11-17

    Agreed. But I see that slowest part is in removing variants in samples -- processing each genotype table and get rid of variants. That'd take about 3 hrs on my SSD.

     
  • Bo Peng

    Bo Peng - 2011-11-17

    I have found 'vtools init --parent' is much faster in getting a project with required variants, so command 'vtools remove variants' should only be used for cases with a small number of variants (e.g. for quality control).

     
  • Bo Peng

    Bo Peng - 2011-11-17

    The slowness with removing variants from samples is because genotype tables do not have index. This can be easily fixed.