CDSbank
multi-sequence extraction, filtering & formatting
CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5’ and 3’ ends, full taxonomic data, and a heuristic to rank the scientific interest of a species. This rich information allows fully automated data set preparation with a level of sophistication that meets or exceeds manual processing. Defaults ensure ease of use for typical...