Home

Authors:

COVA: Comparison of Variants and functional annotation for next-generation sequencing

What is COVA?

It’s a variant annotation and comparison tool for next-generation sequencing. It annotates the effects of variants on genes and compares those among multiple samples, which helps to pinpoint causal variation(s) relating to phenotype.

Features

Comparable: You can compare variants among multiple samples.
Multiple species: Supports multiple codon tables.
Structural variants: Annotates structural variants.

Typical usage

Input: The inputs are predicted variants (SNPs, insertions, deletions and structural variants). The input file is usually obtained as a result of a sequencing experiment, and COVA can annotate the following variant file format: SAMtools-pileup, VCF, MAQ, BreakDancer and GFF3 formatted coverage of gene generated by coverageBed.

Output: COVS analyzed the input variants. It annotates the effects of variants on genes and compares those among multiple samples. Output files are comma-delimited file (CSV), so you can analyze results in Excel.

Getting Started

Availability and requirements

Operating system: Platform Independent. Tested on Mac OS X and Red Hat Linux.
Programming language: Ruby 1.8.7 or 1.9.x
Other requirements: RubyGems package management software and the following libraries: BioRuby 1.4.x.
License: MIT
Any restrictions to use by non-academics: None

Installation

Before you use COVA, you have to install BioRuby as the following command:

gem install bio

You can download the program from the “Files” page. Then you have to uncompress the ZIP file and copy the contents of the ZIP file to wherever you want the program install. If you have a Unix or a Mac system, the command line would be:

unzip COVA_version.zip
mv COVA_version /path/to/install

The install can be tests by running the following command. This should print the list of available options.

ruby /path_to_COVA/cova.rb -h

Preparation of reference files

COVA can utilize annotation data sets conforming to Genbank Format which is easily downloadable from NCBI website. Once you downloaded Genbank file(s), you have to instruct the program which reference files you use. Create the tab-delimited text file 'reflist.txt' and add the content below.

#Chr    Codon    Path
chr     11       /path_to_genbank /NC_000964.gbk
p1      11       /path_to_genbank /NC_0009xx.gbk
p2      11       /path_to_genbank /NC_0009xx.gbk

Chromosome names should be corresponding to those of variant files.
You have to specify the genetic codon table number in each chromosome.
Table 11 is used for Bacteria, Archaea, prokaryotic viruses and chloroplast proteins. You can see the detail information at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Preparation of variant files

COVA supports the following variant file type:

Format	File type*
VCF genotype calling format (.vcf file)	vcf
SAMtools genotype-calling pileup format (.pileup file)	pileup
MAQ genotype calling format (cns.snp)	maq
BreakDancer structural variations format	breakd
GFF3 format coverage of gene generated by coverageBed	gffcov

*File type is specified in ‘varlist.txt’ described below.

Once you prepared variant file(s), you have to instruct the program which variant files does each sample correspond to. You can specify plural type of variant file for each sample, and you can specify plural samples. Create the tab-delimited text file 'varlist.txt' and add the content below.

#Name   Filetype Path
wt      pileup  /path_to_file/wt.pileup
wt      breakd  /path_to_file/wt.breakdancer
wt      gffcov  /path_to_file/wt.coverage.gff
mut1    pileup  /path_to_file/mut1.pileup
mut1    breakd  /path_to_file/mut1.breakdancer
mut1    gffcov  /path_to_file/mut1.coverage.gff
mut2    pileup  /path_to_file/mut2.pileup
mut2    breakd  /path_to_file/mut2.breakdancer
mut2    gffcov  /path_to_file/mut2.coverage.gff

The first sample (‘wt’ in this example) is recognized as the parental sample. When COVA compare variants among all samples, the variations being common to the parental sample are flagged.

Get it started

Once two tab-delimited text file 'reflist.txt' and 'varlist.txt' are available, you can annotate and compare variant files:

ruby /path_to_COVA/cova.rb -o outdir -r reflist.txt -v varlist.txt

Output files will be generated in ‘outdir’ directory.

Output files

Output file 1 (annotated variant files of each sample)

The first file contains annotation for all variants (SNP/InDel), such as type and probability of variant and it’s effect of gene. This file is generated from vcf/pileup variant file for all samples. This file is comma-delimited file, so you can open this file in Excel.