Menu

Tree [a9c01c] master /
 History

HTTPS access


File Date Author Commit
 src 2017-04-22 biolex biolex [bd3afe] version++
 README 2017-04-22 biolex biolex [bd3afe] version++

Read Me

ConsPred2 1.05

ConsPred2 is combining several gene annotations into one consensus with highly customizable 
rules. The annotations must be present in the gff file format and can be complete genes or 
fragments. The customizable rules are for weighting annotation sources, declare blacklists 
and different filtering steps.
A part of the annotation sources can be also used for validation.

For questions and feedback please contact:
        Alexander Platzer ( alexander.platzer@univie.ac.at )


INTRODUCTION

The tool's shortcut is for CONSensus PREDiction of gene annotation.

ConsPred2 can combine gene annotations in various ways, the customizable rules include:
	- weighting/ranking of the different annotation sources for TIS and for presence
	- use fragmental sources
	- filtering or checking of gene lengts and correct start/stop-codon
	- using different maximal overlaps for within predicted genes and with a blacklist
	- use some annotation sources for validation
	
The tool can be seen as collection/sequence of set operations, which make or can make sense
for gene annotation. The main purpose is for combining protein coding genes, but with 
changing the rules it can be used for any combination, as long as the annotation sources are
present as gff files.
The annotation sources are divided in fragments and genes, in predicted and validated, and in
blacklisted. Genes have always a valid start- and stop-codon, where fragments are just 
reporting a protein encoded there, but not if it is complete. The blacklist is for regions
where for sure no protein is encoded (e.g. tRNAs, rRNAs). Validated sources can be used for
annotation, where they have always higher priority as the predicted class of annotations, and
for validation.
The latter is implemented because usually it is the first interest of a certain combined
annotation makes sense, given a annotation with much higher (wet lab) support.
All annotations are seen as incomplete, or, in other words, presence has always precedence.
Therefore, the combination of gene annotation sources is always OR-combined. The 
overlap-filter and other filtering steps prevent the combination to be just a concatenation 
of the annotation sources.

The '2' in ConsPred2 refers to the design. It is used in the ConsPred-pipeline, but only a
part of the capabilities are used there. Its standalone usage is very useful when trying 
different parameters.

The tool is quite efficient and takes just seconds for the average bacterial input.


PREREQUISITES

None, the tool is a single executable, which is also present precompiled.
If you are using another OS than Linux or the executable is not working, then you need to
compile the source code. For that a C++ compiler and corresponding libraries are needed, 
or for convenience a C++ environment with GUI, e.g. eclipse.


INSTALLATION

Copy the executable into the analysis folder. No further action is needed.
The executable can be either taken from the binaries/ folder or can be generated from the 
src folder with
g++ * -o ConsPred2



APPLICATION

Make a new folder where the annotation should happen. Collect there all source 
annotation/input data.
Prepare a config file (use the config file in the demo-run as template, documentation is in
comments therein).
The config file format is the ini file format.

usage: 
./ConsPred2 <config file>

e.g. at the demo-run:
./ConsPred2 config/config_NC013093.dat

the file extension/ending is not the matter, it must be just in the ini file format.
See https://en.wikipedia.org/wiki/INI_file


EXAMPLE DATA

A demo-run with example data is in the archive at the file section.



LICENSE

https://creativecommons.org/licenses/by/4.0/