GenomeDatabase - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
GenomeDatabase-0.1.1.zip	2016-11-21	191.9 kB
README.txt	2016-09-14	2.9 kB
GenomeDatabase-0.1.zip	2016-09-14	192.3 kB
Totals: 3 Items		387.1 kB

Genome Database - A tool to create a local database of reference genome sequences

Usage: java path/to/GenomeDatabase.jar [options]

By Marc Strous, 2016

This tool enables you to download fasta files of protein and RNA sequences encoded
in reference genomes at NCBOI. You can select relevant genomes with a set of queries.
Each query has four fields, separated by comma's. Example of valid queries are:

superkingdom,Bacteria,genus,ftp
superkingdom,Archaea,genus,ftp
superkingdom,Eukaryota,phylum,ftp
superkingdom,Viruses,family,elink

The first query would download (with ftp) all available reference genomes of the 
superkingdom Bacteria, limited to one genome per genus. The second query would do
the same for superkingdom Archaea. The third would download all Eukaryotic genomes,
a single representative for each phylum. The fourth would download all available 
viral genomes, one representative per family, via the ncbi elink tool.

Multiple queries can be concatenated, separated by "~".

The program creates three files: a protein fasta file of all protein coding genes
of all genomes ("genome-database.faa"), a nucleotide fasta file of all RNA genes 
(rRNA, tRNA, etc) of all genomes ("genome-database.fna") and a taxonomy file 
("genome-taxonomy.txt") that lists all downloaded taxa.

If you run the tool in the same folder multiple times, the changes will be
incremental, e.g. the information already downloaded will not be downloaded again.
This way, you can easily keep your database up to date.

Optionally, you can use this tool to format your database for diamond searches and
you can extract specific genes using a hmm profile database, with hmmsearch. These
options require that the "hmmer" programs and "diamond" are in your path.

==========================================
Depends:

wget
hmmer, version 3.1b (optional)
diamond (optional)
a internet connection

==========================================
Options:

-update [queries]     Updates the database by downloading newly available information
                      from the NCBI with the queries provided. Default:
                      superkingdom,Bacteria,genus,ftp~superkingdom,Archaea,genus,ftp

-dir [/path/to/dir]   Builds the database in the specified folder (if omitted, will
                      build the database in the present dir.

-hmm [/path/to/file]  Extracts the genes in the local database that hit a hmm profile.

-e [evalue]           Evalue cutoff for hmm searches (default 1e-25).

-diamond [block-size] Create diamond database with the specified block size. See
                      diamond manual for default value and choosing the correct block 
                      size.

-processors [x]       Will use x processors for creation of diamond database.
                      Default 4.

-help                 Print this text

Copyright Marc Strous, 2016

Source: README.txt, updated 2016-09-14

GenomeDatabase Files

For creating a local database of reference genomes

GenomeDatabase Files

For creating a local database of reference genomes

Get an email when there's a new version of GenomeDatabase