Evo 2 is a DNA language model system designed for long-context genome modeling and biological sequence design across all domains of life. The project models DNA at single-nucleotide resolution and supports context windows of up to one million base pairs, which places it in a class of models built for very large genomic reasoning tasks. According to the repository, it uses the StripedHyena 2 architecture, was pretrained with Savanna, and was trained autoregressively on the OpenGenome2 dataset containing 8.8 trillion tokens. The codebase is focused on local inference and generation through the Vortex inference stack rather than serving as a full training framework alone, although it also points users to training and fine-tuning resources. It supports multiple ways of working with the model, including forward passes, embeddings, generation workflows, notebooks, hosted APIs, and self-hosted deployment through NVIDIA NIM.
Features
- Single-nucleotide DNA modeling
- Context length up to 1 million base pairs
- Local inference and generation through Vortex
- Support for forward passes embeddings and generation
- Hosted API and NVIDIA NIM deployment options
- Published checkpoints for multiple model variants