DeepSWE-Preview is a 32.8B parameter open-source coding agent trained solely with reinforcement learning (RL) to perform complex software engineering (SWE) tasks. Built on top of Qwen3-32B, it achieves 59% accuracy on the SWE-Bench-Verified benchmark—currently the highest among open-weight models. The model navigates and edits large codebases using tools like a file editor, bash execution, and search, within the R2E-Gym environment. Its training emphasizes sparse reward signals, test-time scaling, and innovative policy gradient strategies adapted from GRPO, DAPO, Dr.GRPO, and RLOO. DeepSWE-Preview showcases strong reasoning, file navigation, and patch submission skills. It is ideal for agent-based code repair, debugging, and PR generation across real-world repositories. The model is served using platforms like vLLM and Hugging Face TGI, with support for 64k context length and OpenAI-compatible APIs.
Features
- Trained entirely via reinforcement learning with no supervised fine-tuning
- #1 performance on SWE-Bench-Verified (59%) among open-weight agents
- Built on Qwen3-32B with thinking mode and 64k context support
- Uses R2E-Gym tools like file editor, bash, and search for task completion
- Employs enhanced GRPO-based RL algorithm for stable and efficient training
- Includes hybrid test-time scaling for high pass@1 accuracy
- Sparse reward model simulates realistic software engineering feedback
- Openly licensed (MIT) for accessible and extensible AI development