ConsPred2 Code
ConsPred2 combines several annotations/GFFs rule-based into one
Brought to you by:
biolex
| File | Date | Author | Commit |
|---|---|---|---|
| src | 2017-04-22 |
|
[bd3afe] version++ |
| README | 2017-04-22 |
|
[bd3afe] version++ |
ConsPred2 1.05
ConsPred2 is combining several gene annotations into one consensus with highly customizable
rules. The annotations must be present in the gff file format and can be complete genes or
fragments. The customizable rules are for weighting annotation sources, declare blacklists
and different filtering steps.
A part of the annotation sources can be also used for validation.
For questions and feedback please contact:
Alexander Platzer ( alexander.platzer@univie.ac.at )
INTRODUCTION
The tool's shortcut is for CONSensus PREDiction of gene annotation.
ConsPred2 can combine gene annotations in various ways, the customizable rules include:
- weighting/ranking of the different annotation sources for TIS and for presence
- use fragmental sources
- filtering or checking of gene lengts and correct start/stop-codon
- using different maximal overlaps for within predicted genes and with a blacklist
- use some annotation sources for validation
The tool can be seen as collection/sequence of set operations, which make or can make sense
for gene annotation. The main purpose is for combining protein coding genes, but with
changing the rules it can be used for any combination, as long as the annotation sources are
present as gff files.
The annotation sources are divided in fragments and genes, in predicted and validated, and in
blacklisted. Genes have always a valid start- and stop-codon, where fragments are just
reporting a protein encoded there, but not if it is complete. The blacklist is for regions
where for sure no protein is encoded (e.g. tRNAs, rRNAs). Validated sources can be used for
annotation, where they have always higher priority as the predicted class of annotations, and
for validation.
The latter is implemented because usually it is the first interest of a certain combined
annotation makes sense, given a annotation with much higher (wet lab) support.
All annotations are seen as incomplete, or, in other words, presence has always precedence.
Therefore, the combination of gene annotation sources is always OR-combined. The
overlap-filter and other filtering steps prevent the combination to be just a concatenation
of the annotation sources.
The '2' in ConsPred2 refers to the design. It is used in the ConsPred-pipeline, but only a
part of the capabilities are used there. Its standalone usage is very useful when trying
different parameters.
The tool is quite efficient and takes just seconds for the average bacterial input.
PREREQUISITES
None, the tool is a single executable, which is also present precompiled.
If you are using another OS than Linux or the executable is not working, then you need to
compile the source code. For that a C++ compiler and corresponding libraries are needed,
or for convenience a C++ environment with GUI, e.g. eclipse.
INSTALLATION
Copy the executable into the analysis folder. No further action is needed.
The executable can be either taken from the binaries/ folder or can be generated from the
src folder with
g++ * -o ConsPred2
APPLICATION
Make a new folder where the annotation should happen. Collect there all source
annotation/input data.
Prepare a config file (use the config file in the demo-run as template, documentation is in
comments therein).
The config file format is the ini file format.
usage:
./ConsPred2 <config file>
e.g. at the demo-run:
./ConsPred2 config/config_NC013093.dat
the file extension/ending is not the matter, it must be just in the ini file format.
See https://en.wikipedia.org/wiki/INI_file
EXAMPLE DATA
A demo-run with example data is in the archive at the file section.
LICENSE
https://creativecommons.org/licenses/by/4.0/