This software allows us to create a relational database in PostgreSQL hosting full bacterial genomes. Besides the database, there is software, like a parser, to convert EMBL or GBK files to the CpDB relational schema. Once in the CpDB, one can extract unlimited reports from bacterial genomes using SQL. This software is part of the Ph.D. in Bioinformatics from Anderson Santos developed under the Corynebacterium pseudotuberculosis (Cp) pangenome project. The Cp pangenome delivered to the scientific community fifteen bacterial strains deposited at the GenBank database between the years of 2009 and 2012. The thesis was written in Brazilian Portuguese. However, an English book chapter explaining the software is available at this address. CpDB is the backbone for the Pannotator software. Both software still is alive and kicking.
svn checkout svn://svn.code.sf.net/p/cpdbtutorial/code/ cpdbtutorial-code
Note: The download zip is a copy of this above checkout result.
For Ubuntu 10 OS or higher:
1. Install flex:
sudo apt install flex
sudo apt install bison
sudo apt install build-essential
./make
Enjoy it.
BUG 1) Large EMBL/Genbank text qualifiers are not supported (>255 characters).
They certainly will throw a stack overflow. To walk around: just remove from the
EMBL/Genbank target file those text qualifiers containing texts greater than 255
characters.