Menu

Home

bioruan

Give summary and plot(auto detect variable type) for all or specified fields in a vcf file (except CHROM POS ID fields,which I think not necessary).
Extract arbitrary fixed fields or values in sample fields to TAB delimited file.
Natural support for multi-sample vcf

before using
chmod +x vcf-summarize.sh

Simplest usage: vcf-summarize.sh -f filename.vcf -a #extract and summarize all fields and subfields
For large file: nohup vcf-summarize.sh -f filename.vcf -a &

Usage
-f [required] Take 1 file. The target vcf file. Support plain txt and gz,bz file
-a [optional] Take 0 argument. If specified, extract and summarize all variables
-q [optional] Take 0 argument. If specified, will skip REF, ALT, QUAL, FILTER fields
-c [optional] Take 1 string. e.g. -c "chr1" will limit analysis to records with CHROM fields equal to chr1
-i [optional] Take 1 string. e.g. -i "CHROM POS" will used CHR_POS [default] as the index of extracted tables. You can choose any combination of "CHROM POS ID REF ALT". The idea is to generate unique index with the smallest number of fields.
-I [optional] Take 1 string. e.g. -I "AN DB" will extract and summarize AN and DB subfields in INFO field. Will overwrite option -a, which analyze all subfields in INFO.
-F [optional] Take 1 string. e.g. -F "GT AD DP" will extract and summarize GT AD DP subfields in sample columns. Will overwrite option -a, which analyze all subfields in sample columns
-s [optional] Take 0 argument. If specified, just do data extraction. Suppress summarization and plotting
-o [optional] Take 1 string. The output directory name. Default is vcfsummarize
-h [optional] show this help

This is a personal effort without any funding support
Report suggestion and bug to ruansun@163.com

Note: Two small bugs were fixed on Jun 11 2014
The first bug may cause program to crash if names in FORMAT contain "_" (i.e. underline). You are ok if your FORMAT field do not have name with "_".
The second bug may cause a sample field (e.g. AD) be filled with value from another field (e.g. DP), if AD field does not exist for that variant locus. This generally won't affect the GT extract.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.