VibeThinker
Diversity-driven optimization and large-model reasoning ability
...It contains about 1.5 billion parameters, far smaller than many “frontier” models, yet it is explicitly optimized for reasoning, mathematics, and code generation tasks rather than general open-domain chat. The innovation lies in its training methodology: the team uses what they call the Spectrum-to-Signal Principle (SSP), where a first stage emphasizes diversity of reasoning paths (the “spectrum” phase) and a second stage uses reinforcement techniques (the “signal” phase) to refine toward correctness and strong reasoning. The result is a model that outpaces many much larger models on domain-specific benchmarks, demonstrating that smaller models, if trained carefully and with the right objectives, can achieve high performance in reasoning-centric tasks.