Download Latest Version GenomeDatabase-0.1.1.zip (191.9 kB)
Email in envelope

Get an email when there's a new version of GenomeDatabase

Home
Name Modified Size InfoDownloads / Week
GenomeDatabase-0.1.1.zip 2016-11-21 191.9 kB
README.txt 2016-09-14 2.9 kB
GenomeDatabase-0.1.zip 2016-09-14 192.3 kB
Totals: 3 Items   387.1 kB 0
Genome Database - A tool to create a local database of reference genome sequences

Usage: java path/to/GenomeDatabase.jar [options]

By Marc Strous, 2016

This tool enables you to download fasta files of protein and RNA sequences encoded
in reference genomes at NCBOI. You can select relevant genomes with a set of queries.
Each query has four fields, separated by comma's. Example of valid queries are:

superkingdom,Bacteria,genus,ftp
superkingdom,Archaea,genus,ftp
superkingdom,Eukaryota,phylum,ftp
superkingdom,Viruses,family,elink

The first query would download (with ftp) all available reference genomes of the 
superkingdom Bacteria, limited to one genome per genus. The second query would do
the same for superkingdom Archaea. The third would download all Eukaryotic genomes,
a single representative for each phylum. The fourth would download all available 
viral genomes, one representative per family, via the ncbi elink tool.

Multiple queries can be concatenated, separated by "~".

The program creates three files: a protein fasta file of all protein coding genes
of all genomes ("genome-database.faa"), a nucleotide fasta file of all RNA genes 
(rRNA, tRNA, etc) of all genomes ("genome-database.fna") and a taxonomy file 
("genome-taxonomy.txt") that lists all downloaded taxa.

If you run the tool in the same folder multiple times, the changes will be
incremental, e.g. the information already downloaded will not be downloaded again.
This way, you can easily keep your database up to date.

Optionally, you can use this tool to format your database for diamond searches and
you can extract specific genes using a hmm profile database, with hmmsearch. These
options require that the "hmmer" programs and "diamond" are in your path.

==========================================
Depends:

wget
hmmer, version 3.1b (optional)
diamond (optional)
a internet connection

==========================================
Options:

-update [queries]     Updates the database by downloading newly available information
                      from the NCBI with the queries provided. Default:
                      superkingdom,Bacteria,genus,ftp~superkingdom,Archaea,genus,ftp

-dir [/path/to/dir]   Builds the database in the specified folder (if omitted, will
                      build the database in the present dir.

-hmm [/path/to/file]  Extracts the genes in the local database that hit a hmm profile.

-e [evalue]           Evalue cutoff for hmm searches (default 1e-25).

-diamond [block-size] Create diamond database with the specified block size. See
                      diamond manual for default value and choosing the correct block 
                      size.

-processors [x]       Will use x processors for creation of diamond database.
                      Default 4.

-help                 Print this text

Copyright Marc Strous, 2016
Source: README.txt, updated 2016-09-14