GenomeDownloader is a command-line Perl program to download genomic data (using wget) from NCBI. It has been recently (2017-10) completely rewritten to work with the "new" data organization structure at NCBI. Assembly completion level (i.e., Contig, Scaffold, Chromosome or Complete Genome) can also be selected as a criterion for downloading data.
Genomic data can be downloaded from all organisms belonging to a certain taxon (e.g., Mammalia or 40674), and downloads can be limited to certain kinds of files (e.g., faa or faa,gbff etc.). Search terms can also be used to further limit results.
This program runs in Linux but could be made to run on Mac OS (maybe with a few modifications, and provided that dependencies are met).
Features
- Automatically search and retrieve genomic data from NCBI
- Searches using NCBI's taxonomic information, either as a name or as a taxon identifier number
- Retrieves either all info for each genome, or only files ending in user-defined extensions (e.g. to download FASTA genome and GenBank, use fna,gbff)
- User provided list of search terms (e.g. Strep) further limits which genomes will be retrieved
- Multiple search terms can be provided, one per line in a file, or one term can be given on the command-line
- Can also limit download to certain assembly completion levels (Contig, Scaffold, Chromosome or Complete Genome)