fairseq2
FAIR Sequence Modeling Toolkit 2
...Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a large, monolithic codebase—fairseq2 introduces a clean, plugin-oriented architecture designed for long-term maintainability and rapid experimentation. It supports multi-GPU and multi-node distributed training using DDP, FSDP, and tensor parallelism, capable of scaling up to 70B+ parameter models. The framework integrates seamlessly with PyTorch 2.x features such as torch.compile, Fully Sharded Data Parallel (FSDP), and modern configuration management.