vcftools / Feature Requests / #19 vcf2fq -- use INDELs in consensus sequence

#19 vcf2fq -- use INDELs in consensus sequence

The current behaviour of the vcf2fq procedure is to identify regions surrounding INDELs and convert the INDEL plus the surrounding region to lowercase. This means that the generated FASTA sequence will be exactly the same length as the input sequence, but will only differ from the original sequence for SNPs (rather than boths SNPs and INDELs).

The attached patch changes this behaviour to include INDELs with a high likelihood in the final reference sequence (likelihood modified by the '-L' option).

In addition, the patch allows for a reference FASTA sequence to be provided, so that the VCF file only needs to have non-reference information included -- previously, the VCF file needed to have one line of information for every base.

While it should work correctly for multi-fasta files, this code has only been tested on small single-chromosome sequences (mitochondria, enterococci).

1 Attachments

vcf2fq_indels.diff

Discussion

David Eccles (gringer) - 2013-02-21

I've modified the diff file to account for additional fields in the fasta file headers that are not part of the sequence name

vcf2fq_indels_v2.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Auton - 2013-02-21

assigned_to: Petr Danecek
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriel - 2015-05-04

Dear David,
Thanks for this useful script. I would like to use it but I am having problems to apply the changes in "vcf2fq_indels_v2.diff" to the "vcfutils.pl" file. I would be grateful if you send me your modified version of vcfutils.pl.
Thank you very much.
GAbriel

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vcf2fq -- use INDELs in consensus sequence

Group

Searches

Help

#19 vcf2fq -- use INDELs in consensus sequence

Discussion