Kimi K2 Thinking
Kimi K2 Thinking is an advanced open source reasoning model developed by Moonshot AI, designed specifically for long-horizon, multi-step workflows where the system interleaves chain-of-thought processes with tool invocation across hundreds of sequential tasks. The model uses a mixture-of-experts architecture with a total of 1 trillion parameters, yet only about 32 billion parameters are activated per inference pass, optimizing efficiency while maintaining vast capacity. It supports a context window of up to 256,000 tokens, enabling the handling of extremely long inputs and reasoning chains without losing coherence. Native INT4 quantization is built in, which reduces inference latency and memory usage without performance degradation. Kimi K2 Thinking is explicitly built for agentic workflows; it can autonomously call external tools, manage sequential logic steps (up to and typically between 200-300 tool calls in a single chain), and maintain consistent reasoning.
Learn more
GLM-5.1
GLM-5.1 is the latest iteration of Z.ai’s GLM series, designed as a frontier-level, agent-oriented AI model optimized for coding, reasoning, and long-horizon workflows. It builds on the GLM-5 architecture, which uses a Mixture-of-Experts (MoE) design to deliver high performance while keeping inference costs efficient, and is part of a broader push toward open-weight, developer-accessible models. A core focus of GLM-5.1 is enabling agentic behavior, meaning it can plan, execute, and iterate across multi-step tasks rather than simply responding to single prompts. It is specifically designed to handle complex workflows such as debugging code, navigating repositories, and executing chained operations with sustained context. Compared to earlier models, GLM-5.1 improves reliability in long interactions, maintaining coherence across extended sessions and reducing breakdowns in multi-step reasoning.
Learn more
LongCat-2.0
LongCat-2.0 is a 1.6 trillion total-parameter Mixture-of-Experts language model built on AI ASIC superpods, with about 48 billion parameters activated per token and strong performance across coding and agentic tasks. It is a substantial step up from previous LongCat models, combining large-scale sparse architecture with dedicated post-training for real-world software engineering, tool use, long-context reasoning, and multi-step agent workflows. LongCat-2.0 is trained and deployed entirely on AI ASIC superpods, with pretraining spanning more than 35 trillion tokens and millions of accelerator-hours, demonstrating frontier-scale training on alternative hardware platforms. To strengthen long-horizon tasks, the model introduces LongCat Sparse Attention and is trained on hundreds of billions of tokens of 1M-context data, giving it native support for ultra-long context tasks and reliable long-document understanding.
Learn more
Nemotron 3 Super
Nemotron-3 Super is part of NVIDIA’s Nemotron 3 family of open models designed to enable advanced agentic AI systems that can reason, plan, and execute multi-step workflows across complex environments. The model introduces a hybrid Mamba-Transformer Mixture-of-Experts architecture that combines the efficiency of state-space Mamba layers with the contextual understanding of transformer attention, allowing it to process long sequences and complex reasoning tasks with high accuracy and throughput. This architecture activates only a subset of model parameters for each token, improving computational efficiency while maintaining strong reasoning capabilities and enabling scalable inference for large workloads. Nemotron-3 Super contains roughly 120 billion parameters with around 12 billion active during inference, accelerating multi-step reasoning and collaborative agent interactions across large contexts.
Learn more