Alternatives to DeepSeek R1

Compare DeepSeek R1 alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to DeepSeek R1 in 2026. Compare features, ratings, user reviews, pricing, and more from DeepSeek R1 competitors and alternatives in order to make an informed decision for your business.

  • 1
    Mistral AI

    Mistral AI

    Mistral AI

    Mistral AI is a pioneering artificial intelligence startup specializing in open-source generative AI. The company offers a range of customizable, enterprise-grade AI solutions deployable across various platforms, including on-premises, cloud, edge, and devices. Flagship products include "Le Chat," a multilingual AI assistant designed to enhance productivity in both personal and professional contexts, and "La Plateforme," a developer platform that enables the creation and deployment of AI-powered applications. Committed to transparency and innovation, Mistral AI positions itself as a leading independent AI lab, contributing significantly to open-source AI and policy development.
  • 2
    Kimi K2

    Kimi K2

    Moonshot AI

    Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and stabilized by MuonClip’s attention-logit clamping, it delivers exceptional performance in frontier knowledge, reasoning, mathematics, coding, and general agentic workflows. Moonshot AI provides two variants, Kimi-K2-Base for research-level fine-tuning and Kimi-K2-Instruct pre-trained for immediate chat and tool-driven interactions, enabling both custom development and drop-in agentic capabilities. Benchmarks show it outperforms leading open source peers and rivals top proprietary models in coding tasks and complex task breakdowns, while its 128 K-token context length, tool-calling API compatibility, and support for industry-standard inference engines.
  • 3
    Kimi K2 Thinking

    Kimi K2 Thinking

    Moonshot AI

    Kimi K2 Thinking is an advanced open source reasoning model developed by Moonshot AI, designed specifically for long-horizon, multi-step workflows where the system interleaves chain-of-thought processes with tool invocation across hundreds of sequential tasks. The model uses a mixture-of-experts architecture with a total of 1 trillion parameters, yet only about 32 billion parameters are activated per inference pass, optimizing efficiency while maintaining vast capacity. It supports a context window of up to 256,000 tokens, enabling the handling of extremely long inputs and reasoning chains without losing coherence. Native INT4 quantization is built in, which reduces inference latency and memory usage without performance degradation. Kimi K2 Thinking is explicitly built for agentic workflows; it can autonomously call external tools, manage sequential logic steps (up to and typically between 200-300 tool calls in a single chain), and maintain consistent reasoning.
  • 4
    Magistral

    Magistral

    Mistral AI

    Magistral is Mistral AI’s first reasoning‑focused language model family, released in two sizes: Magistral Small, a 24 B‑parameter open‑weight model under Apache 2.0 (downloadable on Hugging Face), and Magistral Medium, a more capable enterprise version available via Mistral’s API, Le Chat platform, and major cloud marketplaces. Built for domain‑specific, transparent, multilingual reasoning across tasks like math, physics, structured calculations, programmatic logic, decision trees, and rule‑based systems, Magistral produces chain‑of‑thought outputs in the user’s language that you can follow and verify. This launch marks a shift toward compact yet powerful transparent AI reasoning. Magistral Medium is currently available in preview on Le Chat, the API, SageMaker, WatsonX, Azure AI, and Google Cloud Marketplace. Magistral is ideal for general-purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs.
  • 5
    Manus AI
    Manus is a versatile general AI agent that bridges the gap between thought and action, seamlessly executing tasks in both professional and personal contexts. From data analysis and travel planning to educational material creation and stock insights, Manus helps users get things done while they focus on other priorities. With its ability to perform complex research, design interactive presentations, and analyze market trends, Manus is designed to improve productivity and efficiency. It also generates clear, actionable insights, making it an essential tool for professionals and individuals seeking to simplify their workflows and gain deeper insights. Manus Desktop with the “My Computer” capability enables an AI agent to operate directly on a user’s local machine rather than being confined to the cloud. It interacts with files, applications, and development environments through command line execution, allowing seamless control over local workflows.
  • 6
    Mistral Large 3
    Mistral Large 3 is a next-generation, open multimodal AI model built with a powerful sparse Mixture-of-Experts architecture featuring 41B active parameters out of 675B total. Designed from scratch on NVIDIA H200 GPUs, it delivers frontier-level reasoning, multilingual performance, and advanced image understanding while remaining fully open-weight under the Apache 2.0 license. The model achieves top-tier results on modern instruction benchmarks, positioning it among the strongest permissively licensed foundation models available today. With native support across vLLM, TensorRT-LLM, and major cloud providers, Mistral Large 3 offers exceptional accessibility and performance efficiency. Its design enables enterprise-grade customization, letting teams fine-tune or adapt the model for domain-specific workflows and proprietary applications. Mistral Large 3 represents a major advancement in open AI, offering frontier intelligence without sacrificing transparency or control.
  • 7
    Open R1

    Open R1

    Open R1

    Open R1 is a community-driven, open-source initiative aimed at replicating the advanced AI capabilities of DeepSeek-R1 through transparent methodologies. You can try Open R1 AI model or DeepSeek R1 free online chat on Open R1. The project offers a comprehensive implementation of DeepSeek-R1's reasoning-optimized training pipeline, including tools for GRPO training, SFT fine-tuning, and synthetic data generation, all under the MIT license. While the original training data remains proprietary, Open R1 provides the complete toolchain for users to develop and fine-tune their own models.
  • 8
    OpenAI deep research
    OpenAI's deep research is an AI-powered tool designed to autonomously conduct complex, multi-step research tasks across various domains, such as science, coding, and mathematics. By analyzing user-provided inputs—such as questions, text documents, images, PDFs, or spreadsheets—the system formulates a structured research plan, gathers relevant information, and delivers comprehensive responses within minutes. It also provides process summaries with citations, helping users verify sources. While this tool significantly accelerates research efficiency, it may occasionally produce inaccuracies or struggle to differentiate between authoritative sources and misinformation. Currently available to ChatGPT Pro users, deep research represents a step toward AI-driven knowledge discovery, with ongoing improvements planned for accuracy and response time.
  • 9
    OpenAI o1-pro
    OpenAI o1-pro is the enhanced version of OpenAI's o1 model, designed to tackle more complex and demanding tasks with greater reliability. It features significant performance improvements over its predecessor, the o1 preview, with a notable 34% reduction in major errors and the ability to think 50% faster. This model excels in areas like math, physics, and coding, where it can provide detailed and accurate solutions. Additionally, the o1-pro mode can process multimodal inputs, including text and images, and is particularly adept at reasoning tasks that require deep thought and problem-solving. It's accessible through a ChatGPT Pro subscription, offering unlimited usage and enhanced capabilities for users needing advanced AI assistance.
    Starting Price: $200/month
  • 10
    OpenAI o3
    OpenAI o3 is an advanced AI model designed to enhance reasoning capabilities by breaking down complex instructions into smaller, more manageable steps. It offers significant improvements over previous AI iterations, excelling in coding tasks, competitive programming, and achieving high scores in mathematics and science benchmarks. Available for widespread use, OpenAI o3 supports advanced AI-driven problem-solving and decision-making processes. The model incorporates deliberative alignment techniques to ensure its responses align with established safety and ethical guidelines, making it a powerful tool for developers, researchers, and enterprises seeking sophisticated AI solutions.
    Starting Price: $2 per 1 million tokens
  • 11
    OpenAI o3-mini
    OpenAI o3-mini is a lightweight version of the advanced o3 AI model, offering powerful reasoning capabilities in a more efficient and accessible package. Designed to break down complex instructions into smaller, manageable steps, o3-mini excels in coding tasks, competitive programming, and problem-solving in mathematics and science. This compact model provides the same high-level precision and logic as its larger counterpart but with reduced computational requirements, making it ideal for use in resource-constrained environments. With built-in deliberative alignment, o3-mini ensures safe, ethical, and context-aware decision-making, making it a versatile tool for developers, researchers, and businesses seeking a balance between performance and efficiency.
  • 12
    OpenAI o3-mini-high
    The o3-mini-high model from OpenAI advances AI reasoning by refining deep problem-solving in coding, mathematics, and complex tasks. It features adaptive thinking time with adjustable reasoning modes (low, medium, high) to optimize performance based on task complexity. Outperforming the o1 series by 200 Elo points on Codeforces, it delivers high efficiency at a lower cost while maintaining speed and accuracy. As part of the o3 family, it pushes AI problem-solving boundaries while remaining accessible, offering a free tier and expanded limits for Plus subscribers.
  • 13
    OpenAI o4-mini-high
    OpenAI o4-mini-high is an enhanced version of the o4-mini, optimized for higher reasoning capacity and performance. It maintains the same compact size but significantly boosts its ability to handle more complex tasks with improved efficiency. Whether you're dealing with large datasets, advanced mathematical computations, or intricate coding problems, o4-mini-high provides faster, more accurate responses, making it perfect for high-demand applications.
  • 14
    Llama 4 Behemoth
    Llama 4 Behemoth is Meta's most powerful AI model to date, featuring a massive 288 billion active parameters. It excels in multimodal tasks, outperforming previous models like GPT-4.5 and Gemini 2.0 Pro across multiple STEM-focused benchmarks such as MATH-500 and GPQA Diamond. As the teacher model for the Llama 4 series, Behemoth sets the foundation for models like Llama 4 Maverick and Llama 4 Scout. While still in training, Llama 4 Behemoth demonstrates unmatched intelligence, pushing the boundaries of AI in fields like math, multilinguality, and image understanding.
  • 15
    Janus-Pro-7B
    Janus-Pro-7B is an innovative open-source multimodal AI model from DeepSeek, designed to excel in both understanding and generating content across text, images, and videos. It leverages a unique autoregressive architecture with separate pathways for visual encoding, enabling high performance in tasks ranging from text-to-image generation to complex visual comprehension. This model outperforms competitors like DALL-E 3 and Stable Diffusion in various benchmarks, offering scalability with versions from 1 billion to 7 billion parameters. Licensed under the MIT License, Janus-Pro-7B is freely available for both academic and commercial use, providing a significant leap in AI capabilities while being accessible on major operating systems like Linux, MacOS, and Windows through Docker.
  • 16
    Phi-4-mini-reasoning
    Phi-4-mini-reasoning is a 3.8-billion parameter transformer-based language model optimized for mathematical reasoning and step-by-step problem solving in environments with constrained computing or latency. Fine-tuned with synthetic data generated by the DeepSeek-R1 model, it balances efficiency with advanced reasoning ability. Trained on over one million diverse math problems spanning multiple levels of difficulty from middle school to Ph.D. level, Phi-4-mini-reasoning outperforms its base model on long sentence generation across various evaluations and surpasses larger models like OpenThinker-7B, Llama-3.2-3B-instruct, and DeepSeek-R1. It features a 128K-token context window and supports function calling, enabling integration with external tools and APIs. Phi-4-mini-reasoning can be quantized using Microsoft Olive or Apple MLX Framework for deployment on edge devices such as IoT, laptops, and mobile devices.
  • 17
    Phi-4-reasoning
    Phi-4-reasoning is a 14-billion parameter transformer-based language model optimized for complex reasoning tasks, including math, coding, algorithmic problem solving, and planning. Trained via supervised fine-tuning of Phi-4 on carefully curated "teachable" prompts and reasoning demonstrations generated using o3-mini, it generates detailed reasoning chains that effectively leverage inference-time compute. Phi-4-reasoning incorporates outcome-based reinforcement learning to produce longer reasoning traces. It outperforms significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B and approaches the performance levels of the full DeepSeek-R1 model across a wide range of reasoning tasks. Phi-4-reasoning is designed for environments with constrained computing or latency. Fine-tuned with synthetic data generated by DeepSeek-R1, it provides high-quality, step-by-step problem solving.
  • 18
    Phi-4-reasoning-plus
    Phi-4-reasoning-plus is a 14-billion parameter open-weight reasoning model that builds upon Phi-4-reasoning capabilities. It is further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy. Despite its significantly smaller size, Phi-4-reasoning-plus achieves better performance than OpenAI o1-mini and DeepSeek-R1 at most benchmarks, including mathematical reasoning and Ph.D. level science questions. It surpasses the full DeepSeek-R1 model (with 671 billion parameters) on the AIME 2025 test, the 2025 qualifier for the USA Math Olympiad. Phi-4-reasoning-plus is available on Azure AI Foundry and HuggingFace.
  • 19
    Selene 1
    Atla's Selene 1 API offers state-of-the-art AI evaluation models, enabling developers to define custom evaluation criteria and obtain precise judgments on their AI applications' performance. Selene outperforms frontier models on commonly used evaluation benchmarks, ensuring accurate and reliable assessments. Users can customize evaluations to their specific use cases through the Alignment Platform, allowing for fine-grained analysis and tailored scoring formats. The API provides actionable critiques alongside accurate evaluation scores, facilitating seamless integration into existing workflows. Pre-built metrics, such as relevance, correctness, helpfulness, faithfulness, logical coherence, and conciseness, are available to address common evaluation scenarios, including detecting hallucinations in retrieval-augmented generation applications or comparing outputs to ground truth data.
  • 20
    QwQ-32B

    QwQ-32B

    Alibaba

    ​QwQ-32B is an advanced reasoning model developed by Alibaba Cloud's Qwen team, designed to enhance AI's problem-solving capabilities. With 32 billion parameters, it achieves performance comparable to state-of-the-art models like DeepSeek's R1, which has 671 billion parameters. This efficiency is achieved through optimized parameter utilization, allowing QwQ-32B to perform complex tasks such as mathematical reasoning, coding, and general problem-solving with fewer resources. The model supports a context length of up to 32,000 tokens, enabling it to process extensive input data effectively. QwQ-32B is accessible via Alibaba's chatbot service, Qwen Chat, and is open sourced under the Apache 2.0 license, promoting collaboration and further development within the AI community.
  • 21
    QwQ-Max-Preview
    QwQ-Max-Preview is an advanced AI model built on the Qwen2.5-Max architecture, designed to excel in deep reasoning, mathematical problem-solving, coding, and agent-related tasks. This preview version offers a sneak peek at its capabilities, which include improved performance in a wide range of general-domain tasks and the ability to handle complex workflows. QwQ-Max-Preview is slated for an official open-source release under the Apache 2.0 license, offering further advancements and refinements in its full version. It also paves the way for a more accessible AI ecosystem, with the upcoming launch of the Qwen Chat app and smaller variants of the model like QwQ-32B, aimed at developers seeking local deployment options.
  • 22
    Qwen2.5-1M

    Qwen2.5-1M

    Alibaba

    Qwen2.5-1M is an open-source language model developed by the Qwen team, designed to handle context lengths of up to one million tokens. This release includes two model variants, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, marking the first time Qwen models have been upgraded to support such extensive context lengths. To facilitate efficient deployment, the team has also open-sourced an inference framework based on vLLM, integrated with sparse attention methods, enabling processing of 1M-token inputs with a 3x to 7x speed improvement. Comprehensive technical details, including design insights and ablation experiments, are available in the accompanying technical report.
  • 23
    Qwen2.5-Max
    Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed by the Qwen team, pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). In evaluations, it outperforms models like DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results in other assessments, including MMLU-Pro. Qwen2.5-Max is accessible via API through Alibaba Cloud and can be explored interactively on Qwen Chat.
  • 24
    Qwen2.5-VL

    Qwen2.5-VL

    Alibaba

    Qwen2.5-VL is the latest vision-language model from the Qwen series, representing a significant advancement over its predecessor, Qwen2-VL. This model excels in visual understanding, capable of recognizing a wide array of objects, including text, charts, icons, graphics, and layouts within images. It functions as a visual agent, capable of reasoning and dynamically directing tools, enabling applications such as computer and phone usage. Qwen2.5-VL can comprehend videos exceeding one hour in length and can pinpoint relevant segments within them. Additionally, it accurately localizes objects in images by generating bounding boxes or points and provides stable JSON outputs for coordinates and attributes. The model also supports structured outputs for data like scanned invoices, forms, and tables, benefiting sectors such as finance and commerce. Available in base and instruct versions across 3B, 7B, and 72B sizes, Qwen2.5-VL is accessible through platforms like Hugging Face and ModelScope.
  • 25
    Qwen3

    Qwen3

    Alibaba

    Qwen3, the latest iteration of the Qwen family of large language models, introduces groundbreaking features that enhance performance across coding, math, and general capabilities. With models like the Qwen3-235B-A22B and Qwen3-30B-A3B, Qwen3 achieves impressive results compared to top-tier models, thanks to its hybrid thinking modes that allow users to control the balance between deep reasoning and quick responses. The platform supports 119 languages and dialects, making it an ideal choice for global applications. Its pre-training process, which uses 36 trillion tokens, enables robust performance, and advanced reinforcement learning (RL) techniques continue to refine its capabilities. Available on platforms like Hugging Face and ModelScope, Qwen3 offers a powerful tool for developers and researchers working in diverse fields.
  • 26
    R1 1776

    R1 1776

    Perplexity AI

    Perplexity AI has open-sourced R1 1776, a large language model (LLM) based on DeepSeek R1 designed to enhance transparency and foster community collaboration in AI development. This release allows researchers and developers to access the model's architecture and codebase, enabling them to contribute to its improvement and adaptation for various applications. By sharing R1 1776 openly, Perplexity AI aims to promote innovation and ethical practices within the AI community.
  • 27
    Sarvam 30B
    Sarvam-30B is an open source, next-generation large language model designed as a unified system for both real-time conversational AI and deep reasoning workloads, built with a strong focus on multilingual intelligence and practical deployment. The 30B model is optimized for speed and efficiency, using a Mixture-of-Experts (MoE) architecture that activates only a subset of parameters per request, enabling high throughput, low latency, and deployment even in resource-constrained environments such as local machines or edge systems. It delivers strong performance in conversational tasks, coding, and reasoning while achieving state-of-the-art results across more than 20 Indian languages, making it highly effective for multilingual applications and voice-based systems. It represents a dual-tier architecture, a fast, deployable “conversational workhorse”, leveraging MoE designs to reduce compute cost while maintaining high performance.
  • 28
    gpt-oss-120b
    gpt-oss-120b is a reasoning model engineered for deep, transparent thinking, delivering full chain-of-thought explanations, adjustable reasoning depth, and structured outputs, while natively invoking tools like web search and Python execution via the API. Built to slot seamlessly into self-hosted or edge deployments, it eliminates dependence on proprietary infrastructure. Although it includes default safety guardrails, its open-weight architecture allows fine-tuning that could override built-in controls, so implementers are responsible for adding input filtering, output monitoring, and governance measures to achieve enterprise-grade security. As a community–driven model card rather than a managed service spec, it emphasizes transparency, customization, and the need for downstream safety practices.
  • 29
    gpt-oss-20b
    gpt-oss-20b is a 20-billion-parameter, text-only reasoning model released under the Apache 2.0 license and governed by OpenAI’s gpt-oss usage policy, built to enable seamless integration into custom AI workflows via the Responses API without reliance on proprietary infrastructure. Trained for robust instruction following, it supports adjustable reasoning effort, full chain-of-thought outputs, and native tool use (including web search and Python execution), producing structured, explainable answers. Developers must implement their own deployment safeguards, such as input filtering, output monitoring, and usage policies, to match the system-level protections of hosted offerings and mitigate risks from malicious or unintended behaviors. Its open-weight design makes it ideal for on-premises or edge deployments where control, customization, and transparency are paramount.
  • 30
    Tülu 3
    Tülu 3 is an advanced instruction-following language model developed by the Allen Institute for AI (Ai2), designed to enhance capabilities in areas such as knowledge, reasoning, mathematics, coding, and safety. Built upon the Llama 3 Base, Tülu 3 employs a comprehensive four-stage post-training process: meticulous prompt curation and synthesis, supervised fine-tuning on a diverse set of prompts and completions, preference tuning using both off- and on-policy data, and a novel reinforcement learning approach to bolster specific skills with verifiable rewards. This open-source model distinguishes itself by providing full transparency, including access to training data, code, and evaluation tools, thereby closing the performance gap between open and proprietary fine-tuning methods. Evaluations indicate that Tülu 3 outperforms other open-weight models of similar size, such as Llama 3.1-Instruct and Qwen2.5-Instruct, across various benchmarks.
  • 31
    Yi-Lightning

    Yi-Lightning

    Yi-Lightning

    Yi-Lightning, developed by 01.AI under the leadership of Kai-Fu Lee, represents the latest advancement in large language models with a focus on high performance and cost-efficiency. It boasts a maximum context length of 16K tokens and is priced at $0.14 per million tokens for both input and output, making it remarkably competitive. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, incorporating fine-grained expert segmentation and advanced routing strategies, which contribute to its efficiency in training and inference. This model has excelled in various domains, achieving top rankings in categories like Chinese, math, coding, and hard prompts on the chatbot arena, where it secured the 6th position overall and 9th in style control. Its development included comprehensive pre-training, supervised fine-tuning, and reinforcement learning from human feedback, ensuring both performance and safety, with optimizations in memory usage and inference speed.
  • 32
    Claude Sonnet 3.5
    Claude Sonnet 3.5 sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval). It shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone. Claude Sonnet 3.5 operates at twice the speed of Claude Opus 3. This performance boost, combined with cost-effective pricing, makes Claude Sonnet 3.5 ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows.
  • 33
    Claude Sonnet 3.7
    Claude Sonnet 3.7, developed by Anthropic, is a cutting-edge AI model that combines rapid response with deep reflective reasoning. This innovative model allows users to toggle between quick, efficient responses and more thoughtful, reflective answers, making it ideal for complex problem-solving. By allowing Claude to self-reflect before answering, it excels at tasks that require high-level reasoning and nuanced understanding. With its ability to engage in deeper thought processes, Claude Sonnet 3.7 enhances tasks such as coding, natural language processing, and critical thinking applications. Available across various platforms, it offers a powerful tool for professionals and organizations seeking a high-performance, adaptable AI.
  • 34
    Claude Opus 4.1
    Claude Opus 4.1 is an incremental upgrade to Claude Opus 4 that boosts coding, agentic reasoning, and data-analysis performance without changing deployment complexity. It raises coding accuracy to 74.5 percent on SWE-bench Verified and sharpens in-depth research and detailed tracking for agentic search tasks. GitHub reports notable gains in multi-file code refactoring, while Rakuten Group highlights its precision in pinpointing exact corrections within large codebases without introducing bugs. Independent benchmarks show about a one-standard-deviation improvement on junior developer tests compared to Opus 4, mirroring major leaps seen in prior Claude releases.
  • 35
    Claude Opus 4.5
    Claude Opus 4.5 is Anthropic’s newest flagship model, delivering major improvements in reasoning, coding, agentic workflows, and real-world problem solving. It outperforms previous models and leading competitors on benchmarks such as SWE-bench, multilingual coding tests, and advanced agent evaluations. Opus 4.5 also introduces stronger safety features, including significantly higher resistance to prompt injection and improved alignment across sensitive tasks. Developers gain new controls through the Claude API—like effort parameters, context compaction, and advanced tool use—allowing for more efficient, longer-running agentic workflows. Product updates across Claude, Claude Code, the Chrome extension, and Excel integrations expand how users interact with the model for software engineering, research, and everyday productivity. Overall, Claude Opus 4.5 marks a substantial step forward in capability, reliability, and usability for developers, enterprises, and end users.
  • 36
    Claude Sonnet 4
    Claude Sonnet 4, the latest evolution of Anthropic’s language models, offers a significant upgrade in coding, reasoning, and performance. Designed for diverse use cases, Sonnet 4 builds upon the success of its predecessor, Claude Sonnet 3.7, delivering more precise responses and better task execution. With a state-of-the-art 72.7% performance on the SWE-bench, it stands out in agentic scenarios, offering enhanced steerability and clear reasoning capabilities. Whether handling software development, multi-feature app creation, or complex problem-solving, Claude Sonnet 4 ensures higher code quality, reduced errors, and a smoother development process.
    Starting Price: $3 / 1 million tokens (input)
  • 37
    Claude Sonnet 4.5
    Claude Sonnet 4.5 is Anthropic’s latest frontier model, designed to excel in long-horizon coding, agentic workflows, and intensive computer use while maintaining safety and alignment. It achieves state-of-the-art performance on the SWE-bench Verified benchmark (for software engineering) and leads on OSWorld (a computer use benchmark), with the ability to sustain focus over 30 hours on complex, multi-step tasks. The model introduces improvements in tool handling, memory management, and context processing, enabling more sophisticated reasoning, better domain understanding (from finance and law to STEM), and deeper code comprehension. It supports context editing and memory tools to sustain long conversations or multi-agent tasks, and allows code execution and file creation within Claude apps. Sonnet 4.5 is deployed at AI Safety Level 3 (ASL-3), with classifiers protecting against inputs or outputs tied to risky domains, and includes mitigations against prompt injection.
  • 38
    Command A

    Command A

    Cohere AI

    Command A, introduced by Cohere, is a high-performance AI model designed to maximize efficiency with minimal computational resources. This model outperforms or matches other top-tier models like GPT-4 and DeepSeek-V3 in agentic enterprise tasks while significantly reducing compute costs. It is tailored for applications requiring fast, efficient AI-driven solutions, providing businesses with the capability to perform advanced tasks across various domains, all while optimizing performance and computational demands.
    Starting Price: $2.50 / 1M tokens
  • 39
    DeepCoder

    DeepCoder

    Agentica Project

    DeepCoder is a fully open source code-reasoning and generation model released by Agentica Project in collaboration with Together AI. It is fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning, achieving a 60.6% accuracy on LiveCodeBench (representing an 8% improvement over the base), a performance level that matches that of proprietary models such as o3-mini (2025-01-031 Low) and o1 while using only 14 billion parameters. It was trained over 2.5 weeks on 32 H100 GPUs with a curated dataset of roughly 24,000 coding problems drawn from verified sources (including TACO-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench submissions), each problem requiring a verifiable solution and at least five unit tests to ensure reliability for RL training. To handle long-range context, DeepCoder employs techniques such as iterative context lengthening and overlong filtering.
  • 40
    DeepSWE

    DeepSWE

    Agentica Project

    DeepSWE is a fully open source, state-of-the-art coding agent built on top of the Qwen3-32B foundation model and trained exclusively via reinforcement learning (RL), without supervised finetuning or distillation from proprietary models. It is developed using rLLM, Agentica’s open source RL framework for language agents. DeepSWE operates as an agent; it interacts with a simulated development environment (via the R2E-Gym environment) using a suite of tools (file editor, search, shell-execution, submit/finish), enabling it to navigate codebases, edit multiple files, compile/run tests, and iteratively produce patches or complete engineering tasks. DeepSWE exhibits emergent behaviors beyond simple code generation; when presented with bugs or feature requests, the agent reasons about edge cases, seeks existing tests in the repository, proposes patches, writes extra tests for regressions, and dynamically adjusts its “thinking” effort.
  • 41
    DeepScaleR

    DeepScaleR

    Agentica Project

    DeepScaleR is a 1.5-billion-parameter language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning and a novel iterative context-lengthening strategy that gradually increases its context window from 8K to 24K tokens during training. It was trained on ~40,000 carefully curated mathematical problems drawn from competition-level datasets like AIME (1984–2023), AMC (pre-2023), Omni-MATH, and STILL. DeepScaleR achieves 43.1% accuracy on AIME 2024, a roughly 14.3 percentage point boost over the base model, and surpasses the performance of the proprietary O1-Preview model despite its much smaller size. It also posts strong results on a suite of math benchmarks (e.g., MATH-500, AMC 2023, Minerva Math, OlympiadBench), demonstrating that small, efficient models tuned with RL can match or exceed larger baselines on reasoning tasks.
  • 42
    DeepSeek Coder
    DeepSeek Coder is a cutting-edge software tool designed to revolutionize the landscape of data analysis and coding. By leveraging advanced machine learning algorithms and natural language processing capabilities, it empowers users to seamlessly integrate data querying, analysis, and visualization into their workflow. The intuitive interface of DeepSeek Coder enables both novice and experienced programmers to efficiently write, test, and optimize code. Its robust set of features includes real-time syntax checking, intelligent code completion, and comprehensive debugging tools, all designed to streamline the coding process. Additionally, DeepSeek Coder's ability to understand and interpret complex data sets ensures that users can derive meaningful insights and create sophisticated data-driven applications with ease.
  • 43
    DeepSeek-Coder-V2
    DeepSeek-Coder-V2 is an open source code language model designed to excel in programming and mathematical reasoning tasks. It features a Mixture-of-Experts (MoE) architecture with 236 billion total parameters and 21 billion activated parameters per token, enabling efficient processing and high performance. The model was trained on an extensive dataset of 6 trillion tokens, enhancing its capabilities in code generation and mathematical problem-solving. DeepSeek-Coder-V2 supports over 300 programming languages and has demonstrated superior performance on benchmarks such surpassing other models. It is available in multiple variants, including DeepSeek-Coder-V2-Instruct, optimized for instruction-based tasks; DeepSeek-Coder-V2-Base, suitable for general text generation; and lightweight versions like DeepSeek-Coder-V2-Lite-Base and DeepSeek-Coder-V2-Lite-Instruct, designed for environments with limited computational resources.
  • 44
    DeepSeek R2

    DeepSeek R2

    DeepSeek

    DeepSeek R2 is the anticipated successor to DeepSeek R1, a groundbreaking AI reasoning model launched in January 2025 by the Chinese AI startup DeepSeek. Building on R1’s success, which disrupted the AI industry with its cost-effective performance rivaling top-tier models like OpenAI’s o1, R2 promises a quantum leap in capabilities. It is expected to deliver exceptional speed and human-like reasoning, excelling in complex tasks such as advanced coding and high-level mathematical problem-solving. Leveraging DeepSeek’s innovative Mixture-of-Experts architecture and efficient training methods, R2 aims to outperform its predecessor while maintaining a low computational footprint, potentially expanding its reasoning abilities to languages beyond English.
  • 45
    DeepSeek-V3

    DeepSeek-V3

    DeepSeek

    DeepSeek-V3 is a state-of-the-art AI model designed to deliver unparalleled performance in natural language understanding, advanced reasoning, and decision-making tasks. Leveraging next-generation neural architectures, it integrates extensive datasets and fine-tuned algorithms to tackle complex challenges across diverse domains such as research, development, business intelligence, and automation. With a focus on scalability and efficiency, DeepSeek-V3 provides developers and enterprises with cutting-edge tools to accelerate innovation and achieve transformative outcomes.
  • 46
    DeepSeek V3.1
    DeepSeek V3.1 is a groundbreaking open-weight large language model featuring a massive 685-billion parameters and an extended 128,000‑token context window, enabling it to process documents equivalent to 400-page books in a single prompt. It delivers integrated capabilities for chat, reasoning, and code generation within a unified hybrid architecture, seamlessly blending these functions into one coherent model. V3.1 supports a variety of tensor formats to give developers flexibility in optimizing performance across different hardware. Early benchmark results show robust performance, including a 71.6% score on the Aider coding benchmark, putting it on par with or ahead of systems like Claude Opus 4 and doing so at a far lower cost. Made available under an open source license on Hugging Face with minimal fanfare, DeepSeek V3.1 is poised to reshape access to high-performance AI, challenging traditional proprietary models.
  • 47
    DeepSeek-V3.2
    DeepSeek-V3.2 is a next-generation open large language model designed for efficient reasoning, complex problem solving, and advanced agentic behavior. It introduces DeepSeek Sparse Attention (DSA), a long-context attention mechanism that dramatically reduces computation while preserving performance. The model is trained with a scalable reinforcement learning framework, allowing it to achieve results competitive with GPT-5 and even surpass it in its Speciale variant. DeepSeek-V3.2 also includes a large-scale agent task synthesis pipeline that generates structured reasoning and tool-use demonstrations for post-training. The model features an updated chat template with new tool-calling logic and the optional developer role for agent workflows. With gold-medal performance in the IMO and IOI 2025 competitions, DeepSeek-V3.2 demonstrates elite reasoning capabilities for both research and applied AI scenarios.
  • 48
    DeepSeek-V3.2-Exp
    Introducing DeepSeek-V3.2-Exp, our latest experimental model built on V3.1-Terminus, debuting DeepSeek Sparse Attention (DSA) for faster and more efficient inference and training on long contexts. DSA enables fine-grained sparse attention with minimal loss in output quality, boosting performance for long-context tasks while reducing compute costs. Benchmarks indicate that V3.2-Exp performs on par with V3.1-Terminus despite these efficiency gains. The model is now live across app, web, and API. Alongside this, the DeepSeek API prices have been cut by over 50% immediately to make access more affordable. For a transitional period, users can still access V3.1-Terminus via a temporary API endpoint until October 15, 2025. DeepSeek welcomes feedback on DSA via its feedback portal. In conjunction with the release, DeepSeek-V3.2-Exp has been open-sourced: the model weights and supporting technology (including key GPU kernels in TileLang and CUDA) are available on Hugging Face.
  • 49
    DeepSeek-V3.2-Speciale
    DeepSeek-V3.2-Speciale is a high-compute variant of the DeepSeek-V3.2 model, created specifically for deep reasoning and advanced problem-solving tasks. It builds on DeepSeek Sparse Attention (DSA), a custom long-context attention mechanism that reduces computational overhead while preserving high performance. Through a large-scale reinforcement learning framework and extensive post-training compute, the Speciale variant surpasses GPT-5 on reasoning benchmarks and matches the capabilities of Gemini-3.0-Pro. The model achieved gold-medal performance in the International Mathematical Olympiad (IMO) 2025 and International Olympiad in Informatics (IOI) 2025. DeepSeek-V3.2-Speciale does not support tool-calling, making it purely optimized for uninterrupted reasoning and analytical accuracy. Released under the MIT license, it provides researchers and developers an open, state-of-the-art model focused entirely on high-precision reasoning.
  • 50
    DeepSeek-V4

    DeepSeek-V4

    DeepSeek

    DeepSeek-V4 is a next-generation open-source language model designed for high-performance reasoning, coding, and long-context intelligence. It introduces a powerful architecture with up to one million token context length, enabling seamless handling of large datasets and complex multi-step workflows. The model comes in two variants: DeepSeek-V4-Pro for maximum performance and DeepSeek-V4-Flash for efficiency and speed. DeepSeek-V4-Pro features 1.6 trillion total parameters with 49 billion activated, delivering near state-of-the-art performance comparable to leading closed-source models. It excels in agentic coding, mathematical reasoning, and world knowledge tasks. The model integrates advanced attention mechanisms, including token-wise compression and sparse attention, significantly reducing compute and memory costs. It is also optimized for AI agents, supporting tool use and multi-step workflows.