Alternatives to Composer 2.5
Compare Composer 2.5 alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Composer 2.5 in 2026. Compare features, ratings, user reviews, pricing, and more from Composer 2.5 competitors and alternatives in order to make an informed decision for your business.
-
1
Claude Code
Anthropic
Claude Code is an AI-powered coding agent designed to work directly inside your existing development environment. It goes beyond simple autocomplete by understanding entire codebases and helping developers build, debug, refactor, and ship features faster. Developers can interact with Claude Code from the terminal, IDEs, Slack, or the web, making it easy to stay in flow without switching tools. By describing tasks in natural language, users can let Claude handle code exploration, modifications, and explanations. Claude Code can analyze project structure, dependencies, and architecture to onboard developers quickly. It integrates with common command-line tools, version control systems, and testing workflows. This makes it a powerful companion for both individual developers and teams working on complex software projects.Starting Price: $20/month -
2
Claude Opus 4.6
Anthropic
Claude Opus 4.6 is an advanced AI model developed by Anthropic, designed for high-level reasoning, coding, and knowledge work tasks. It introduces significant improvements in coding, debugging, and code review capabilities. The model can handle long, complex workflows and sustain agentic tasks with greater reliability. It features a 1 million token context window in beta, enabling it to process and retain large amounts of information. Claude Opus 4.6 is optimized for tasks such as financial analysis, research, and document creation. It also integrates with tools like Excel and PowerPoint for enhanced productivity. Overall, it is a state-of-the-art AI model built for complex, real-world professional applications. -
3
Claude Opus 4.7
Anthropic
Claude Opus 4.7 is the latest Anthropic AI model release designed to significantly improve performance in advanced software engineering and complex problem-solving tasks. It builds upon the previous Opus 4.6 model by delivering stronger results on difficult coding challenges and long-running workflows. The model is known for its ability to follow instructions precisely and verify its own outputs for greater reliability. It also introduces enhanced multimodal capabilities, particularly in processing high-resolution images with improved accuracy. Opus 4.7 supports more detailed visual tasks such as analyzing dense screenshots and extracting data from complex diagrams. In professional settings, it produces higher-quality outputs including documents, presentations, and user interfaces. The model includes updated safety features that detect and block high-risk cybersecurity-related requests.Starting Price: $5 per million tokens (input) -
4
Claude Sonnet 4.6
Anthropic
Claude Sonnet 4.6 is Anthropic’s most advanced Sonnet model to date, delivering significant upgrades across coding, computer use, long-context reasoning, agent planning, and knowledge work. It introduces a 1 million token context window in beta, allowing users to analyze entire codebases, lengthy contracts, or large research collections in a single session. The model demonstrates major improvements in instruction following, consistency, and reduced hallucinations compared to previous Sonnet versions. In developer testing, users strongly preferred Sonnet 4.6 over Sonnet 4.5 and even favored it over Opus 4.5 in many coding scenarios. Its enhanced computer-use capabilities enable it to interact with real software interfaces similarly to a human, improving automation for legacy systems without APIs. Sonnet 4.6 also performs strongly on major benchmarks, approaching Opus-level intelligence at a more accessible price point. -
5
Claude Mythos
Anthropic
Claude Mythos Preview is a highly advanced AI model developed with strong capabilities in cybersecurity, particularly in identifying and exploiting software vulnerabilities. It demonstrates the ability to autonomously discover zero-day vulnerabilities across major operating systems, browsers, and critical software systems. The model can also generate complex exploit chains, including privilege escalation and remote code execution attacks. Its capabilities extend beyond vulnerability detection to reverse engineering and exploit development in both open-source and closed-source environments. Mythos Preview operates through agentic workflows, enabling it to analyze codebases, test hypotheses, and validate exploits independently. These abilities represent a significant leap compared to previous models, which struggled with exploit generation. Overall, Claude Mythos Preview highlights a new era where AI can both strengthen and challenge global cybersecurity practices. -
6
Cursor
Cursor
Cursor is an advanced AI-powered IDE designed to make developers exponentially more productive. Built with deep codebase understanding and intelligent automation, it combines natural language interaction with precise, context-aware editing tools. Its Agent feature acts as a human-AI coding partner capable of planning and executing entire development workflows, while the Tab model delivers remarkably accurate autocompletion and targeted suggestions. Cursor seamlessly integrates across environments—from GitHub and Slack to the command line—ensuring AI assistance is available wherever you code. Supporting leading models like GPT-5, Claude Sonnet, Gemini Pro, and Grok Code, it gives developers full control over autonomy and model selection. Fast, intuitive, and built for serious builders, Cursor is redefining what an IDE can be.Starting Price: $20 per month -
7
Composer 2
Cursor
Composer 2 is an advanced AI coding model integrated into Cursor, designed to deliver high-level programming performance at a cost-efficient price. It is trained on long-horizon coding tasks, enabling it to solve complex problems that require multiple steps and actions. The model demonstrates strong improvements across key benchmarks, including Terminal-Bench and SWE-bench Multilingual. With enhanced intelligence and efficiency, it provides faster and more accurate code generation. Composer 2 combines strong performance with affordable pricing, making it accessible for developers and teams.Starting Price: $0.50/M input -
8
Grok 4.3
xAI
Grok 4.3 is the latest iteration of xAI’s Grok model, designed to deliver improved reasoning, real-time information access, and advanced task automation. It builds on earlier Grok 4 models by enhancing performance in complex problem-solving, coding, and analytical workflows. The model is integrated with real-time web and X (formerly Twitter) data, allowing it to provide up-to-date insights and answers. Grok 4.3 supports multimodal capabilities, enabling it to work with text, images, and other data types. It operates within the SuperGrok Heavy tier, offering access to more powerful compute and advanced features. The model is designed to handle long-context tasks and multi-step reasoning with greater accuracy. It also supports tool use and integrations, enabling it to interact with external systems and automate workflows. Overall, Grok 4.3 is positioned as a high-performance AI assistant for real-time, data-driven tasks. -
9
Kimi K2.5
Moonshot AI
Kimi K2.5 is a next-generation multimodal AI model designed for advanced reasoning, coding, and visual understanding tasks. It features a native multimodal architecture that supports both text and visual inputs, enabling image and video comprehension alongside natural language processing. Kimi K2.5 delivers open-source state-of-the-art performance in agent workflows, software development, and general intelligence tasks. The model offers ultra-long context support with a 256K token window, making it suitable for large documents and complex conversations. It includes long-thinking capabilities that allow multi-step reasoning and tool invocation for solving challenging problems. Kimi K2.5 is fully compatible with the OpenAI API format, allowing developers to switch seamlessly with minimal changes. With strong performance, flexibility, and developer-focused tooling, Kimi K2.5 is built for production-grade AI applications.Starting Price: Free -
10
Kimi K2.6
Moonshot AI
Kimi K2.6 is a next-generation agentic AI model developed by Moonshot AI, designed to push forward real-world execution, coding, and multi-step reasoning beyond earlier K2 and K2.5 versions. It builds on a Mixture-of-Experts architecture and the multimodal, agent-first foundation of the Kimi series, combining language understanding, coding, and tool use into a single system capable of planning and executing complex workflows. It introduces deeper reasoning capabilities and significantly improved agent planning, allowing it to break down tasks, coordinate tools, and handle multi-file or multi-step problems with greater accuracy and efficiency. It supports advanced tool calling with high reliability, enabling integration with external systems such as web search or APIs, and includes built-in validation mechanisms to ensure correct execution formats.Starting Price: Free -
11
MiniMax M2.7
MiniMax
MiniMax M2.7 is an advanced AI model designed to enhance real-world productivity across coding, search, and office workflows. It is trained with reinforcement learning across numerous real-world environments, enabling it to handle complex, multi-step tasks effectively. The model excels in problem-solving by breaking down challenges before generating solutions across multiple programming languages. It delivers high-speed performance with rapid token generation, allowing tasks to be completed efficiently. With optimized reasoning and cost-effective pricing, it provides powerful capabilities while minimizing resource usage. It also achieves strong performance in software engineering benchmarks, reducing incident response time and improving development efficiency. Additionally, it supports advanced agentic workflows and professional-grade office tasks, making it highly versatile for modern work environments.Starting Price: Free -
12
GLM-5.1
Zhipu AI
GLM-5.1 is the latest iteration of Z.ai’s GLM series, designed as a frontier-level, agent-oriented AI model optimized for coding, reasoning, and long-horizon workflows. It builds on the GLM-5 architecture, which uses a Mixture-of-Experts (MoE) design to deliver high performance while keeping inference costs efficient, and is part of a broader push toward open-weight, developer-accessible models. A core focus of GLM-5.1 is enabling agentic behavior, meaning it can plan, execute, and iterate across multi-step tasks rather than simply responding to single prompts. It is specifically designed to handle complex workflows such as debugging code, navigating repositories, and executing chained operations with sustained context. Compared to earlier models, GLM-5.1 improves reliability in long interactions, maintaining coherence across extended sessions and reducing breakdowns in multi-step reasoning.Starting Price: Free -
13
GPT-5.4
OpenAI
GPT-5.4 is an advanced artificial intelligence model developed by OpenAI to support complex professional and technical work. The model combines improvements in reasoning, coding, and agent-based workflows into a single system designed for real-world productivity tasks. GPT-5.4 can generate, analyze, and edit documents, spreadsheets, presentations, and other work outputs with greater accuracy and efficiency. It also features improved tool integration, enabling the model to interact with software environments and external tools to complete multi-step workflows. With enhanced context capabilities supporting up to one million tokens, GPT-5.4 can process and reason over very large amounts of information. The model also improves factual accuracy and reduces errors compared to earlier versions. By combining strong reasoning, coding ability, and tool use, GPT-5.4 helps users complete complex tasks faster and with fewer iterations. -
14
GPT-5.5
OpenAI
GPT-5.5 is an advanced AI model designed to handle complex, real-world tasks with greater autonomy and efficiency. It quickly understands user intent and can execute multi-step workflows such as coding, research, data analysis, and document creation with minimal guidance. Instead of requiring step-by-step instructions, GPT-5.5 plans tasks, uses tools, evaluates outputs, and continues working until completion. It excels in knowledge work, software development, and analytical problem-solving, helping users move from idea to execution faster. The model is built to operate across tools and environments, making it highly effective for modern digital workflows. With strong reasoning and persistence, GPT-5.5 enables individuals and teams to complete demanding work more efficiently and accurately.Starting Price: $5 per 1M tokens (input) -
15
Gemini 3.1 Pro
Google
Gemini 3.1 Pro is Google’s upgraded core intelligence model designed for complex tasks that require advanced reasoning. Building on the Gemini 3 series, it delivers significant improvements in problem-solving performance and logical pattern recognition. On the ARC-AGI-2 benchmark, Gemini 3.1 Pro achieved a verified score of 77.1%, more than doubling the reasoning performance of Gemini 3 Pro. The model is engineered for challenges where simple answers are insufficient, enabling deeper analysis, synthesis, and creative output. It can generate practical outputs such as animated, website-ready SVGs directly from text prompts, combining intelligence with real-world usability. Gemini 3.1 Pro is rolling out in preview across consumer, developer, and enterprise platforms including the Gemini app, NotebookLM, Gemini API, Gemini Enterprise Agent Platform, and Android Studio. With expanded access for Google AI Pro and Ultra users, 3.1 Pro sets a stronger baseline for agentic workflows. -
16
Gemini 3.5 Flash
Google
Gemini 3.5 Flash is Google’s latest frontier AI model designed to combine advanced intelligence, high-speed performance, and agentic workflow execution for developers, enterprises, and everyday users. Built as part of the Gemini 3.5 family, the model excels at coding, long-horizon reasoning, multimodal understanding, and complex multi-step automation tasks while delivering significantly faster output speeds than many competing frontier models. Gemini 3.5 Flash powers AI agents capable of planning, executing, and managing workflows such as application development, codebase maintenance, data analysis, and financial document preparation through the Antigravity harness. The model also supports rich multimodal experiences by generating interactive graphics, dynamic web interfaces, animations, and advanced visual content. Gemini 3.5 Flash is integrated across Google products including the Gemini app, Google Search AI Mode, Google Antigravity, Google AI Studio, Android Studio, and more.Starting Price: $1.50 per 1M tokens (input) -
17
Qwen3.6
Alibaba
Qwen3.6 is a large language model developed by Alibaba as part of its Qwen AI model family, designed for real-world applications and advanced reasoning tasks. It focuses on improving stability, usability, and performance compared to earlier versions. The model supports multimodal capabilities, allowing it to process and reason across text, images, and other data types. Qwen3.6 is particularly strong in coding and developer workflows, offering improved accuracy for complex programming tasks. It uses a mixture-of-experts architecture, enabling efficient performance while maintaining large-scale model capabilities. The model is designed to be deployable in production environments, including enterprise and cloud-based systems. It can be integrated into applications or run locally using open-weight variants. Overall, Qwen3.6 delivers a powerful, efficient, and versatile AI solution for modern use cases.Starting Price: Free -
18
MiMo-V2.5
Xiaomi Technology
Xiaomi MiMo-V2.5 is an advanced open-source AI model designed to combine strong agentic capabilities with native multimodal understanding. It can process and reason across text, images, and audio within a single unified system. The model uses a sparse Mixture-of-Experts architecture with hundreds of billions of parameters for efficient performance. It supports an extended context window of up to one million tokens, enabling long and complex workflows. MiMo-V2.5 is built to handle tasks such as coding, reasoning, and multimodal analysis with high accuracy. It incorporates dedicated visual and audio encoders to enhance perception and cross-modal reasoning. The model demonstrates strong benchmark performance across coding, reasoning, and multimodal tasks. By combining multimodality, efficiency, and agentic intelligence, MiMo-V2.5 advances the capabilities of open-source AI systems. -
19
MiMo-V2.5-Pro
Xiaomi Technology
Xiaomi MiMo-V2.5-Pro is an advanced open-source AI model designed to handle complex, long-horizon tasks with strong agentic capabilities. It features a Mixture-of-Experts architecture with over one trillion parameters and a large context window of up to one million tokens. The model is built to perform sophisticated reasoning, coding, and problem-solving across extended workflows. It demonstrates high performance on benchmark tests related to software engineering, reasoning, and general intelligence. MiMo-V2.5-Pro can autonomously complete complex projects, such as building full software systems or optimizing engineering designs. It uses hybrid attention mechanisms to balance efficiency and performance across long contexts. The model is also optimized for token efficiency, reducing computational cost while maintaining strong results. By combining scalability, efficiency, and advanced reasoning, MiMo-V2.5-Pro represents a major step forward in open-source AI models. -
20
SWE-1.6
Cognition
SWE-1.6 is an engineering–focused AI model developed by Cognition and integrated into the Windsurf environment, designed to optimize both raw intelligence and what the company calls “model UX,” or the overall feel and efficiency of interacting with an AI agent. It represents a new iteration in the SWE model family, improving performance on benchmarks such as SWE-Bench Pro by over 10% compared to SWE-1.5 while maintaining similar underlying capabilities. It was trained from scratch to jointly improve reasoning quality and user experience, addressing issues observed in earlier versions such as overthinking simple problems, taking too many steps, looping in repetitive reasoning, and relying excessively on terminal commands instead of specialized tools. SWE-1.6 introduces behavioral improvements such as more frequent parallel tool usage, faster context retrieval, and reduced need for user input, resulting in smoother and more efficient workflows. -
21
Composer 1.5
Cursor
Composer 1.5 is the latest agentic coding model from Cursor that balances speed and intelligence for everyday code tasks by scaling reinforcement learning approximately 20x more than its predecessor, enabling stronger performance on real-world programming challenges. It’s designed as a “thinking model” that generates internal reasoning tokens to analyze a user’s codebase and plan next steps, responding quickly to simple problems and engaging deeper reasoning on complex ones, while remaining interactive and fast for daily development workflows. To handle long-running tasks, Composer 1.5 introduces self-summarization, allowing the model to compress and carry forward context when it reaches context limits, which helps maintain accuracy across varying input lengths. Internal benchmarks show it surpasses Composer 1 in coding tasks, especially on more difficult issues, making it more capable for interactive use within Cursor’s environment. -
22
Composer 1
Cursor
Composer is Cursor’s custom-built agentic AI model optimized specifically for software engineering tasks and designed to power fast, interactive coding assistance directly within the Cursor IDE, a VS Code-derived editor enhanced with intelligent automation. It is a mixture-of-experts model trained with reinforcement learning (RL) on real-world coding problems across large codebases, so it can produce high-speed, context-aware responses, from code edits and planning to answers that understand project structure, tools, and conventions, with generation speeds roughly four times faster than similar models in benchmarks. Composer is specialized for development workflows, leveraging long-context understanding, semantic search, and limited tool access (like file editing and terminal commands) so it can solve complex engineering requests with efficient and practical outputs.Starting Price: $20 per month -
23
MiniMax M2.5
MiniMax
MiniMax M2.5 is a frontier AI model engineered for real-world productivity across coding, agentic workflows, search, and office tasks. Extensively trained with reinforcement learning in hundreds of thousands of real-world environments, it achieves state-of-the-art performance in benchmarks such as SWE-Bench Verified and BrowseComp. The model demonstrates strong architectural thinking, decomposing complex problems before generating code across more than ten programming languages. M2.5 operates at high throughput speeds of up to 100 tokens per second, enabling faster completion of multi-step tasks. It is optimized for efficient reasoning, reducing token usage and execution time compared to previous versions. With dramatically lower pricing than competing frontier models, it delivers powerful performance at minimal cost. Integrated into MiniMax Agent, M2.5 supports professional-grade office workflows, financial modeling, and autonomous task execution.Starting Price: Free -
24
DeepCoder
Agentica Project
DeepCoder is a fully open source code-reasoning and generation model released by Agentica Project in collaboration with Together AI. It is fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning, achieving a 60.6% accuracy on LiveCodeBench (representing an 8% improvement over the base), a performance level that matches that of proprietary models such as o3-mini (2025-01-031 Low) and o1 while using only 14 billion parameters. It was trained over 2.5 weeks on 32 H100 GPUs with a curated dataset of roughly 24,000 coding problems drawn from verified sources (including TACO-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench submissions), each problem requiring a verifiable solution and at least five unit tests to ensure reliability for RL training. To handle long-range context, DeepCoder employs techniques such as iterative context lengthening and overlong filtering.Starting Price: Free -
25
Reka Flash 3
Reka
Reka Flash 3 is a 21-billion-parameter multimodal AI model developed by Reka AI, designed to excel in general chat, coding, instruction following, and function calling. It processes and reasons with text, images, video, and audio inputs, offering a compact, general-purpose solution for various applications. Trained from scratch on diverse datasets, including publicly accessible and synthetic data, Reka Flash 3 underwent instruction tuning on curated, high-quality data to optimize performance. The final training stage involved reinforcement learning using REINFORCE Leave One-Out (RLOO) with both model-based and rule-based rewards, enhancing its reasoning capabilities. With a context length of 32,000 tokens, Reka Flash 3 performs competitively with proprietary models like OpenAI's o1-mini, making it suitable for low-latency or on-device deployments. The model's full precision requires 39GB (fp16), but it can be compressed to as small as 11GB using 4-bit quantization. -
26
GLM-5
Zhipu AI
GLM-5 is Z.ai’s latest large language model built for complex systems engineering and long-horizon agentic tasks. It scales significantly beyond GLM-4.5, increasing total parameters and training data while integrating DeepSeek Sparse Attention to reduce deployment costs without sacrificing long-context capacity. The model combines enhanced pre-training with a new asynchronous reinforcement learning infrastructure called slime, improving training efficiency and post-training refinement. GLM-5 achieves best-in-class performance among open-source models across reasoning, coding, and agent benchmarks, narrowing the gap with leading frontier models. It ranks highly on evaluations such as Vending Bench 2, demonstrating strong long-term planning and operational capabilities. The model is open-sourced under the MIT License.Starting Price: Free -
27
Qwen3-Coder
Qwen
Qwen3‑Coder is an agentic code model available in multiple sizes, led by the 480B‑parameter Mixture‑of‑Experts variant (35B active) that natively supports 256K‑token contexts (extendable to 1M) and achieves state‑of‑the‑art results comparable to Claude Sonnet 4. Pre‑training on 7.5T tokens (70 % code) and synthetic data cleaned via Qwen2.5‑Coder optimized both coding proficiency and general abilities, while post‑training employs large‑scale, execution‑driven reinforcement learning, scaling test‑case generation for diverse coding challenges, and long‑horizon RL across 20,000 parallel environments to excel on multi‑turn software‑engineering benchmarks like SWE‑Bench Verified without test‑time scaling. Alongside the model, the open source Qwen Code CLI (forked from Gemini Code) unleashes Qwen3‑Coder in agentic workflows with customized prompts, function calling protocols, and seamless integration with Node.js, OpenAI SDKs, and environment variables.Starting Price: Free -
28
OpenAI o1
OpenAI
OpenAI o1 represents a new series of AI models designed by OpenAI, focusing on enhanced reasoning capabilities. These models, including o1-preview and o1-mini, are trained using a novel reinforcement learning approach to spend more time "thinking" through problems before providing answers. This approach allows o1 to excel in complex problem-solving tasks in areas like coding, mathematics, and science, outperforming previous models like GPT-4o in certain benchmarks. The o1 series aims to tackle challenges that require deeper thought processes, marking a significant step towards AI systems that can reason more like humans, although it's still in the preview stage with ongoing improvements and evaluations. -
29
DeepSWE
Agentica Project
DeepSWE is a fully open source, state-of-the-art coding agent built on top of the Qwen3-32B foundation model and trained exclusively via reinforcement learning (RL), without supervised finetuning or distillation from proprietary models. It is developed using rLLM, Agentica’s open source RL framework for language agents. DeepSWE operates as an agent; it interacts with a simulated development environment (via the R2E-Gym environment) using a suite of tools (file editor, search, shell-execution, submit/finish), enabling it to navigate codebases, edit multiple files, compile/run tests, and iteratively produce patches or complete engineering tasks. DeepSWE exhibits emergent behaviors beyond simple code generation; when presented with bugs or feature requests, the agent reasons about edge cases, seeks existing tests in the repository, proposes patches, writes extra tests for regressions, and dynamically adjusts its “thinking” effort.Starting Price: Free -
30
Grok 4.1 Fast
xAI
Grok 4.1 Fast is an xAI model designed to deliver advanced tool-calling capabilities with a massive 2-million-token context window. It excels at complex real-world tasks such as customer support, finance, troubleshooting, and dynamic agent workflows. The model pairs seamlessly with the new Agent Tools API, which enables real-time web search, X search, file retrieval, and secure code execution. This combination gives developers the power to build fully autonomous, production-grade agents that plan, reason, and use tools effectively. Grok 4.1 Fast is trained with long-horizon reinforcement learning, ensuring stable multi-turn accuracy even across extremely long prompts. With its speed, cost-efficiency, and high benchmark scores, it sets a new standard for scalable enterprise-grade AI agents. -
31
Grok 3 Think
xAI
Grok 3 Think, the latest iteration of xAI's AI model, is designed to enhance reasoning capabilities using advanced reinforcement learning. It can think through complex problems for extended periods, from seconds to minutes, improving its answers by backtracking, exploring alternatives, and refining its approach. This model, trained on an unprecedented scale, delivers remarkable performance in tasks such as mathematics, coding, and world knowledge, showing impressive results in competitions like the American Invitational Mathematics Examination. Grok 3 Think not only provides accurate solutions but also offers transparency by allowing users to inspect the reasoning behind its decisions, setting a new standard for AI problem-solving.Starting Price: Free -
32
KAT-Coder-Pro V2
StreamLake
KAT-Coder is an agentic AI coding system designed to go beyond traditional autocomplete tools by enabling end-to-end software development workflows driven by reasoning, planning, and execution. It is positioned as a flagship coding model within the KAT ecosystem, built specifically for “agentic coding,” where the model does not just generate snippets but can diagnose issues, propose fixes, run tests, and iterate across multiple files as part of a continuous development loop. It integrates directly with developer environments through API endpoints and proxy layers compatible with tools like Claude Code, allowing seamless use inside existing IDE workflows without changing the interface developers are already familiar with. KAT-Coder is trained using a multi-stage pipeline that includes supervised fine-tuning and large-scale reinforcement learning, enabling it to understand programming context, and reason over complex tasks.Starting Price: $0.30 per month -
33
ERNIE 5.1
Baidu
ERNIE 5.1 is Baidu’s latest large language model designed to deliver advanced reasoning, agentic AI capabilities, creative writing, and world knowledge performance while operating with significantly improved efficiency. The model builds on the foundation of ERNIE 5.0 while reducing total parameters and training costs, allowing it to achieve flagship-level intelligence at a fraction of the computational expense of comparable models. ERNIE 5.1 performs strongly across international benchmarks for reasoning, search, knowledge, and agentic tasks, ranking among the top global AI models and leading among Chinese-developed models on multiple leaderboards. The platform introduces a new fully asynchronous reinforcement learning infrastructure that improves training efficiency, scalability, and stability for complex long-horizon AI tasks. ERNIE 5.1 also features advanced creative writing capabilities. -
34
Tülu 3
Ai2
Tülu 3 is an advanced instruction-following language model developed by the Allen Institute for AI (Ai2), designed to enhance capabilities in areas such as knowledge, reasoning, mathematics, coding, and safety. Built upon the Llama 3 Base, Tülu 3 employs a comprehensive four-stage post-training process: meticulous prompt curation and synthesis, supervised fine-tuning on a diverse set of prompts and completions, preference tuning using both off- and on-policy data, and a novel reinforcement learning approach to bolster specific skills with verifiable rewards. This open-source model distinguishes itself by providing full transparency, including access to training data, code, and evaluation tools, thereby closing the performance gap between open and proprietary fine-tuning methods. Evaluations indicate that Tülu 3 outperforms other open-weight models of similar size, such as Llama 3.1-Instruct and Qwen2.5-Instruct, across various benchmarks.Starting Price: Free -
35
Grok Code Fast 1
xAI
Grok Code Fast 1 is a high-speed, economical reasoning model designed specifically for agentic coding workflows. Unlike traditional models that can feel slow in tool-based loops, it delivers near-instant responses, excelling in everyday software development tasks. Built from scratch with a programming-rich corpus and refined on real-world pull requests, it supports languages like TypeScript, Python, Java, Rust, C++, and Go. Developers can use it for everything from zero-to-one project building to precise bug fixes and codebase Q&A. With optimized inference and caching techniques, it achieves impressive responsiveness and a 90%+ cache hit rate when integrated with partners like GitHub Copilot, Cursor, and Cline. Offered at just $0.20 per million input tokens and $1.50 per million output tokens, Grok Code Fast 1 strikes a strong balance between speed, performance, and affordability.Starting Price: $0.20 per million input tokens -
36
SWE-1.5
Cognition
SWE-1.5 is the latest agent-model release by Cognition, purpose-built for software engineering and characterized by a “frontier-size” architecture comprising hundreds of billions of parameters and optimized end-to-end (model, inference engine, and agent harness) for both speed and intelligence. It achieves near-state-of-the-art coding performance and sets a new benchmark in latency, delivering inference speeds up to 950 tokens/second, roughly six times faster than its predecessor Haiku 4.5 and thirteen times faster than Sonnet 4.5. The model was trained using extensive reinforcement learning in realistic coding-agent environments with multi-turn workflows, unit tests, quality rubrics, and browser-based agentic execution; it also benefits from tightly integrated software tooling and high-throughput hardware (including thousands of GB200 NVL72 chips and a custom hypervisor infrastructure). -
37
LTM-1
Magic AI
Magic’s LTM-1 enables 50x larger context windows than transformers. Magic's trained a Large Language Model (LLM) that’s able to take in the gigantic amounts of context when generating suggestions. For our coding assistant, this means Magic can now see your entire repository of code. Larger context windows can allow AI models to reference more explicit, factual information and their own action history. We hope to be able to utilize this research to improve reliability and coherence. -
38
Qwen3
Alibaba
Qwen3, the latest iteration of the Qwen family of large language models, introduces groundbreaking features that enhance performance across coding, math, and general capabilities. With models like the Qwen3-235B-A22B and Qwen3-30B-A3B, Qwen3 achieves impressive results compared to top-tier models, thanks to its hybrid thinking modes that allow users to control the balance between deep reasoning and quick responses. The platform supports 119 languages and dialects, making it an ideal choice for global applications. Its pre-training process, which uses 36 trillion tokens, enables robust performance, and advanced reinforcement learning (RL) techniques continue to refine its capabilities. Available on platforms like Hugging Face and ModelScope, Qwen3 offers a powerful tool for developers and researchers working in diverse fields.Starting Price: Free -
39
Yi-Lightning
Yi-Lightning
Yi-Lightning, developed by 01.AI under the leadership of Kai-Fu Lee, represents the latest advancement in large language models with a focus on high performance and cost-efficiency. It boasts a maximum context length of 16K tokens and is priced at $0.14 per million tokens for both input and output, making it remarkably competitive. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, incorporating fine-grained expert segmentation and advanced routing strategies, which contribute to its efficiency in training and inference. This model has excelled in various domains, achieving top rankings in categories like Chinese, math, coding, and hard prompts on the chatbot arena, where it secured the 6th position overall and 9th in style control. Its development included comprehensive pre-training, supervised fine-tuning, and reinforcement learning from human feedback, ensuring both performance and safety, with optimizations in memory usage and inference speed. -
40
GLM-4.7
Zhipu AI
GLM-4.7 is an advanced large language model designed to significantly elevate coding, reasoning, and agentic task performance. It delivers major improvements over GLM-4.6 in multilingual coding, terminal-based tasks, and real-world software engineering benchmarks such as SWE-bench and Terminal Bench. GLM-4.7 supports “thinking before acting,” enabling more stable, accurate, and controllable behavior in complex coding and agent workflows. The model also introduces strong gains in UI and frontend generation, producing cleaner webpages, better layouts, and more polished slides. Enhanced tool-using capabilities allow GLM-4.7 to perform more effectively in web browsing, automation, and agent benchmarks. Its reasoning and mathematical performance has improved substantially, showing strong results on advanced evaluation suites. GLM-4.7 is available via Z.ai, API platforms, coding agents, and local deployment for flexible adoption.Starting Price: Free -
41
Olmo 2
Ai2
Olmo 2 is a family of fully open language models developed by the Allen Institute for AI (AI2), designed to provide researchers and developers with transparent access to training data, open-source code, reproducible training recipes, and comprehensive evaluations. These models are trained on up to 5 trillion tokens and are competitive with leading open-weight models like Llama 3.1 on English academic benchmarks. Olmo 2 emphasizes training stability, implementing techniques to prevent loss spikes during long training runs, and utilizes staged training interventions during late pretraining to address capability deficiencies. The models incorporate state-of-the-art post-training methodologies from AI2's Tülu 3, resulting in the creation of Olmo 2-Instruct models. An actionable evaluation framework, the Open Language Modeling Evaluation System (OLMES), was established to guide improvements through development stages, consisting of 20 evaluation benchmarks assessing core capabilities. -
42
Granite Code
IBM
We introduce the Granite series of decoder-only code models for code generative tasks (e.g., fixing bugs, explaining code, documenting code), trained with code written in 116 programming languages. A comprehensive evaluation of the Granite Code model family on diverse tasks demonstrates that our models consistently reach state-of-the-art performance among available open source code LLMs. The key advantages of Granite Code models include: All-rounder Code LLM: Granite Code models achieve competitive or state-of-the-art performance on different kinds of code-related tasks, including code generation, explanation, fixing, editing, translation, and more. Demonstrating their ability to solve diverse coding tasks. Trustworthy Enterprise-Grade LLM: All our models are trained on license-permissible data collected following IBM's AI Ethics principles and guided by IBM’s Corporate Legal team for trustworthy enterprise usage.Starting Price: Free -
43
Qwen3.6-27B
Alibaba
Qwen3.6-27B is a dense, open source multimodal language model in the Qwen3.6 series, designed to deliver flagship-level performance in coding, reasoning, and agent-based workflows while maintaining a relatively efficient parameter size of 27 billion. It is positioned as a high-performance general model that “punches above its weight,” achieving results competitive with or superior to significantly larger models on key benchmarks, particularly in agentic coding tasks. It supports both thinking and non-thinking modes, allowing it to dynamically balance deep reasoning with fast responses depending on the task, and integrates capabilities across text and multimodal inputs such as images and video. Built as part of the Qwen3.6 family, the model emphasizes real-world usability, stability, and developer productivity, incorporating improvements driven by community feedback and practical deployment needs.Starting Price: Free -
44
Qwen3.5
Alibaba
Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.Starting Price: Free -
45
Devstral 2
Mistral AI
Devstral 2 is a next-generation, open source agentic AI model tailored for software engineering: it doesn’t just suggest code snippets, it understands and acts across entire codebases, enabling multi-file edits, bug fixes, refactoring, dependency resolution, and context-aware code generation. The Devstral 2 family includes a large 123-billion-parameter model as well as a smaller 24-billion-parameter variant (“Devstral Small 2”), giving teams flexibility; the larger model excels in heavy-duty coding tasks requiring deep context, while the smaller one can run on more modest hardware. With a vast context window of up to 256 K tokens, Devstral 2 can reason across extensive repositories, track project history, and maintain a consistent understanding of lengthy files, an advantage for complex, real-world projects. The CLI tracks project metadata, Git statuses, and directory structure to give the model context, making “vibe-coding” more powerful.Starting Price: Free -
46
Qwen2.5-Max
Alibaba
Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed by the Qwen team, pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). In evaluations, it outperforms models like DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results in other assessments, including MMLU-Pro. Qwen2.5-Max is accessible via API through Alibaba Cloud and can be explored interactively on Qwen Chat.Starting Price: Free -
47
GLM-4.1V
Zhipu AI
GLM-4.1V is a vision-language model, providing a powerful, compact multimodal model designed for reasoning and perception across images, text, and documents. The 9-billion-parameter variant (GLM-4.1V-9B-Thinking) is built on the GLM-4-9B foundation and enhanced through a specialized training paradigm using Reinforcement Learning with Curriculum Sampling (RLCS). It supports a 64k-token context window and accepts high-resolution inputs (up to 4K images, any aspect ratio), enabling it to handle complex tasks such as optical character recognition, image captioning, chart and document parsing, video and scene understanding, GUI-agent workflows (e.g., interpreting screenshots, recognizing UI elements), and general vision-language reasoning. In benchmark evaluations at the 10 B-parameter scale, GLM-4.1V-9B-Thinking achieved top performance on 23 of 28 tasks.Starting Price: Free -
48
CodeGemma
Google
CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following. CodeGemma has 3 model variants, a 7B pre-trained variant that specializes in code completion and generation from code prefixes and/or suffixes, a 7B instruction-tuned variant for natural language-to-code chat and instruction following; and a state-of-the-art 2B pre-trained variant that provides up to 2x faster code completion. Complete lines, and functions, and even generate entire blocks of code, whether you're working locally or using Google Cloud resources. Trained on 500 billion tokens of primarily English language data from web documents, mathematics, and code, CodeGemma models generate code that's not only more syntactically correct but also semantically meaningful, reducing errors and debugging time. -
49
DeepSeek-V4-Pro
DeepSeek
DeepSeek-V4-Pro is a large-scale Mixture-of-Experts (MoE) language model designed for advanced reasoning, coding, and long-context understanding. It features 1.6 trillion total parameters with 49 billion activated parameters, enabling high performance while maintaining efficiency. The model supports an exceptionally large context window of up to one million tokens, allowing it to process extensive documents and workflows. It uses a hybrid attention architecture to optimize long-context performance and reduce computational cost. DeepSeek-V4-Pro is trained on over 32 trillion tokens, improving its knowledge and reasoning capabilities. It also includes advanced optimization techniques for stability and faster convergence during training. The model supports multiple reasoning modes, allowing users to balance speed and accuracy based on their needs. Overall, it provides a powerful open-source solution for complex AI tasks and large-scale applications.Starting Price: Free -
50
GPT-4o mini
OpenAI
A small model with superior textual intelligence and multimodal reasoning. GPT-4o mini enables a broad range of tasks with its low cost and latency, such as applications that chain or parallelize multiple model calls (e.g., calling multiple APIs), pass a large volume of context to the model (e.g., full code base or conversation history), or interact with customers through fast, real-time text responses (e.g., customer support chatbots). Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future. The model has a context window of 128K tokens, supports up to 16K output tokens per request, and has knowledge up to October 2023. Thanks to the improved tokenizer shared with GPT-4o, handling non-English text is now even more cost effective.