|
From: Petr D. <pd...@sa...> - 2011-02-17 13:09:54
|
Hi,
there is vcf-subset script which can subset VCF files by SNPs or indels.
It would be straightforward to do the same for other variants as well.
On Thu, 2011-02-17 at 11:34 +0100, Mao Jianfeng wrote:
> Dear vcftools listers,
>
> I just do NGS analysis depending on VCFTools. And, I would like to
> conserve the consistency of various variant definition and statistic.
>
> Sorry for my simple question.
>
> I have got my vcf file annotated. So, now I want to count the number
> of different variants (say, SNPs/indels/other variants in my case)
> overlaped cds/intron/exon/genes. But I do not know how to identify the
> distinct variants when counting. I guess I may need to divide my vcf
> file (with mulitple individuals genotyped) into three different
> subsets each of which corresponding to SNPs, indels or other variants.
you could use vcf-annoate to add information about cds/intron/exon/genes
into INFO column of your VCF file. Then vcf-stats script can be then
used to get the numbers for each category.
> I think this should be easy to implement by using VCFTools, because
> vcf-stats can identify each of the different variant already. So,
> could drop me lines of code/scripts to do this subset by variants?
Not tested, but something like this should give you a good start:
$vcf = Vcf->new(file=>$file);
while (my $line=$vcf->next_data_array)
{
my $ref = $$line[4];
my (@alts) = split(/,/$$line[5]);
for my $alt (@alts)
{
my ($type,$len,$ht) = $vcf->event_type($ref,$alt);
# More info about return type in `perldoc Vcf.pm`
}
..
if ($ok_to_print)
{
print $vcf->format_line($line);
}
}
Petr
> I am a population genetics and I have few bioinformatic programming
> skills in my own side.
>
> I expect your reply, thank you in advance.
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|