Alternatives to Ornith-1.0

Compare Ornith-1.0 alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Ornith-1.0 in 2026. Compare features, ratings, user reviews, pricing, and more from Ornith-1.0 competitors and alternatives in order to make an informed decision for your business.

  • 1
    Qwen3.5

    Qwen3.5

    Alibaba

    Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.
  • 2
    Qwen3.6

    Qwen3.6

    Alibaba

    Qwen3.6 is a large language model developed by Alibaba as part of its Qwen AI model family, designed for real-world applications and advanced reasoning tasks. It focuses on improving stability, usability, and performance compared to earlier versions. The model supports multimodal capabilities, allowing it to process and reason across text, images, and other data types. Qwen3.6 is particularly strong in coding and developer workflows, offering improved accuracy for complex programming tasks. It uses a mixture-of-experts architecture, enabling efficient performance while maintaining large-scale model capabilities. The model is designed to be deployable in production environments, including enterprise and cloud-based systems. It can be integrated into applications or run locally using open-weight variants. Overall, Qwen3.6 delivers a powerful, efficient, and versatile AI solution for modern use cases.
  • 3
    Qwen3.7-Max
    Qwen3.7-Max is Qwen’s latest proprietary model designed for the agent era, built to be a versatile agent foundation that is equally capable of writing and debugging code, automating office workflows, and sustaining autonomous browser sessions over long horizons. It reaches frontier-level coding performance, with stronger results across software engineering, terminal tasks, GUI grounding, web browsing, and agentic tool use. Qwen3.7-Max is designed to reduce the gap between model intelligence and real agent execution by supporting planning, long-context reasoning, reliable function calling, and multi-step task completion across complex workflows. It also strengthens multimodal and document-oriented work through Qwen Studio, which supports chatbot interaction, image and video understanding, image generation, document processing, presentation generation, coding assistance, deep research, and web development.
  • 4
    Claude Fable 5
    Claude Fable 5 is an advanced AI model from Anthropic designed to assist with software engineering, research, knowledge work, vision tasks, and complex reasoning. Built on the Mythos-class architecture, it delivers significantly improved performance across coding, analysis, and long-context workflows. The model can handle extended autonomous tasks while maintaining focus and consistency over large amounts of information. Claude Fable 5 integrates advanced reasoning, multimodal understanding, and memory capabilities to support professional and enterprise use cases. Anthropic has implemented specialized safeguards that automatically route certain high-risk cybersecurity, biology, chemistry, and model distillation requests to a different model. Claude Fable 5 helps organizations and professionals accelerate complex work while maintaining strong safety and governance controls.
    Starting Price: $10 per 1 million (input)
  • 5
    Claude Opus 4.7
    Claude Opus 4.7 is the latest Anthropic AI model release designed to significantly improve performance in advanced software engineering and complex problem-solving tasks. It builds upon the previous Opus 4.6 model by delivering stronger results on difficult coding challenges and long-running workflows. The model is known for its ability to follow instructions precisely and verify its own outputs for greater reliability. It also introduces enhanced multimodal capabilities, particularly in processing high-resolution images with improved accuracy. Opus 4.7 supports more detailed visual tasks such as analyzing dense screenshots and extracting data from complex diagrams. In professional settings, it produces higher-quality outputs including documents, presentations, and user interfaces. The model includes updated safety features that detect and block high-risk cybersecurity-related requests.
    Starting Price: $5 per million tokens (input)
  • 6
    Claude Opus 4.8
    Claude Opus 4.8 is a powerful AI model from Anthropic designed to deliver stronger coding, reasoning, agentic workflows, and advanced collaboration capabilities for developers, enterprises, and AI-powered productivity tasks. The model builds on Claude Opus 4.7 with improvements across coding benchmarks, practical knowledge work, alignment, and reliability while maintaining the same pricing structure. Claude Opus 4.8 introduces enhanced honesty and reasoning behavior, making it less likely to generate unsupported claims or overlook flaws during complex tasks such as software development and agent execution. The release also includes new features such as effort control settings, fast mode for lower-cost high-speed processing, and dynamic workflows in Claude Code that allow the system to coordinate hundreds of parallel subagents for large-scale tasks.
    Starting Price: $5 per 1M (input)
  • 7
    Composer 2.5
    Composer 2.5 is the latest AI coding model released by Cursor, offering major improvements in intelligence, collaboration, and long-task performance compared to Composer 2. The model is designed to follow complex instructions more accurately while providing a smoother and more natural user experience during coding sessions. Cursor enhanced Composer 2.5 through larger-scale training, more advanced reinforcement learning environments, and improved behavioral tuning focused on communication and effort calibration. The model uses targeted reinforcement learning with textual feedback to correct specific mistakes during training, helping it avoid issues like invalid tool calls or poor coding behavior. Composer 2.5 was also trained using significantly more synthetic coding tasks, enabling it to handle increasingly difficult programming challenges and real-world development scenarios.
    Starting Price: $0.50/M input
  • 8
    DeepSeek-V4-Pro
    DeepSeek-V4-Pro is a large-scale Mixture-of-Experts (MoE) language model designed for advanced reasoning, coding, and long-context understanding. It features 1.6 trillion total parameters with 49 billion activated parameters, enabling high performance while maintaining efficiency. The model supports an exceptionally large context window of up to one million tokens, allowing it to process extensive documents and workflows. It uses a hybrid attention architecture to optimize long-context performance and reduce computational cost. DeepSeek-V4-Pro is trained on over 32 trillion tokens, improving its knowledge and reasoning capabilities. It also includes advanced optimization techniques for stability and faster convergence during training. The model supports multiple reasoning modes, allowing users to balance speed and accuracy based on their needs. Overall, it provides a powerful open-source solution for complex AI tasks and large-scale applications.
  • 9
    GPT-5.5

    GPT-5.5

    OpenAI

    GPT-5.5 is an advanced AI model designed to handle complex, real-world tasks with greater autonomy and efficiency. It quickly understands user intent and can execute multi-step workflows such as coding, research, data analysis, and document creation with minimal guidance. Instead of requiring step-by-step instructions, GPT-5.5 plans tasks, uses tools, evaluates outputs, and continues working until completion. It excels in knowledge work, software development, and analytical problem-solving, helping users move from idea to execution faster. The model is built to operate across tools and environments, making it highly effective for modern digital workflows. With strong reasoning and persistence, GPT-5.5 enables individuals and teams to complete demanding work more efficiently and accurately.
    Starting Price: $5 per 1M tokens (input)
  • 10
    GPT-5.6 Terra
    GPT-5.6 Terra is a balanced model in the GPT-5.6 series designed for everyday work, coding, agentic workflows, cybersecurity support, biology analysis, and enterprise automation. It sits between GPT-5.6 Sol, the flagship model, and GPT-5.6 Luna, the faster and lower-cost option. Terra is positioned to deliver competitive performance to GPT-5.5 while being significantly cheaper to run. The model supports improved reasoning, coding, tool coordination, long-horizon workflows, and legitimate defensive security work. It is part of a model family built with layered safeguards, including trained refusals, real-time misuse classifiers, account-level review, differentiated access, monitoring, and continued red-team testing. GPT-5.6 Terra helps developers, enterprises, and technical teams access strong AI capabilities with a more practical balance of intelligence, speed, and cost.
    Starting Price: $2.50 per 1M tokens (input)
  • 11
    Kimi K2.7 Code

    Kimi K2.7 Code

    Moonshot AI

    Kimi K2.7 Code is an open-source, coding-focused agentic AI model developed by Moonshot AI for long-horizon software engineering tasks. It is designed to improve coding performance, agent workflows, and real-world development assistance compared with earlier Kimi K2 versions. The model supports a 256K context window, making it useful for working with large codebases, long technical documents, and complex multi-step programming tasks. Kimi K2.7 Code is available through Kimi Code and API access, with OpenAI- and Anthropic-compatible options for easier integration into developer workflows. It is also listed on Hugging Face and supports deployment through inference engines such as vLLM, SGLang, and KTransformers. With improved agentic capabilities, long-context support, and reduced thinking-token usage compared with K2.6, Kimi K2.7 Code gives developers a flexible open-source option for AI-assisted coding.
  • 12
    Hy3

    Hy3

    Tencent

    Hy3 preview is Tencent Hy’s most intelligent model in the Hy series to date, built as a 295B-parameter Mixture-of-Experts model with 21B activated parameters, 3.8B MTP layer parameters, and support for up to a 256K token context window. As the first model trained on Tencent Hy’s rebuilt infrastructure, Hy3 preview is designed to improve real-world usability across complex reasoning, instruction following, context learning, coding, agent capabilities, and overall inference performance. It integrates both fast and slow thinking capabilities, allowing direct responses for simpler tasks and deeper reasoning for complex math, coding, and reasoning work. The model is built around well-rounded capabilities across long-context understanding, instruction following, tool use, and agent workflows, with evaluation focused not only on standard benchmarks but also on authentic business and development scenarios.
  • 13
    Gemma 4

    Gemma 4

    Google

    Gemma 4 is an AI model introduced by Google and built on the Gemini architecture to deliver improved performance and flexibility. The model is designed to run efficiently on a single GPU or TPU, making it more accessible to developers and researchers. Gemma 4 enhances capabilities in natural language understanding and text generation, supporting a wide range of AI-driven applications. Its architecture allows it to handle complex tasks while maintaining efficient resource usage. Developers can use the model to build applications that rely on advanced language processing and automation. The design emphasizes scalability so that it can support both smaller projects and larger AI systems. By combining efficiency with powerful language capabilities, Gemma 4 helps advance the development of modern AI solutions.
  • 14
    GLM-5.2

    GLM-5.2

    Zhipu AI

    GLM-5.2 is an advanced AI foundation model designed to support complex reasoning, coding, and long-range agentic tasks. It helps developers, teams, and organizations build intelligent systems that can understand instructions, solve technical problems, and assist with demanding workflows. The model is especially useful for software engineering, automation, research, and productivity-focused applications. GLM-5.2 is built to handle large amounts of context, making it suitable for projects that require deeper understanding across extended conversations, documents, or codebases. Its mixture-of-experts design helps balance strong performance with more efficient model operation. GLM-5.2 gives businesses and developers a powerful AI tool for creating smarter applications, improving technical workflows, and supporting advanced digital experiences.
  • 15
    Grok 4.3
    Grok 4.3 is the latest iteration of xAI’s Grok model, designed to deliver improved reasoning, real-time information access, and advanced task automation. It builds on earlier Grok 4 models by enhancing performance in complex problem-solving, coding, and analytical workflows. The model is integrated with real-time web and X (formerly Twitter) data, allowing it to provide up-to-date insights and answers. Grok 4.3 supports multimodal capabilities, enabling it to work with text, images, and other data types. It operates within the SuperGrok Heavy tier, offering access to more powerful compute and advanced features. The model is designed to handle long-context tasks and multi-step reasoning with greater accuracy. It also supports tool use and integrations, enabling it to interact with external systems and automate workflows. Overall, Grok 4.3 is positioned as a high-performance AI assistant for real-time, data-driven tasks.
  • 16
    Grok Build 0.1
    Grok Build 0.1 is a specialized AI coding model from xAI designed for agentic software engineering workflows and multi-step development tasks. The model is optimized to help coding agents perform actions such as planning, debugging, implementing changes, and iterating on code rather than simply generating one-time code responses. It supports both text and image inputs while producing text-based outputs, making it useful for analyzing code, screenshots, and technical documentation. Grok Build 0.1 includes support for tool use, structured outputs, function calling, and large-context reasoning capabilities. With a context window of up to 256,000 tokens, the model can process large codebases and complex projects within a single workflow. The platform is built for developers and engineering teams seeking faster and more capable AI-assisted software development.
    Starting Price: $1 per 1M tokens (input)
  • 17
    Ling 2.6

    Ling 2.6

    Ant Group

    Ling 2.6 is a general-purpose large language model series independently developed and open-sourced by Ant Group, built on a Mixture of Experts architecture and designed for inference efficiency, long context modeling, training technology, and AI Agent collaborative reasoning. Ling’s MoE architecture routes each token to activate only the most relevant expert subnetworks, compressing actual computation to a minimal fraction while maintaining large-scale model capacity. The Ling 2.6 series further advances long-sequence modeling, with Ling-2.6-1T supporting up to a 1M native context window and the official API exposing a 256K context window, while Ling-2.6-flash provides a native 256K context window capable of processing approximately 200,000 characters of long-form input. The models are designed for reliable long-range information retrieval, with no noticeable degradation whether information appears at the beginning, middle, or end of the context.
    Starting Price: $0.0028 per 1M tokens
  • 18
    MiniMax M3

    MiniMax M3

    MiniMax

    MiniMax M3 is an open-weight multimodal AI model designed for coding, agentic workflows, long-context reasoning, and complex automation tasks. The model combines frontier-level coding performance, native multimodal understanding, and a context window of up to 1 million tokens. MiniMax M3 uses MiniMax Sparse Attention to improve long-context efficiency while reducing compute requirements for large-scale inputs. It supports text, image, and video understanding, making it useful for workflows that combine code, documents, visual references, and tool-driven tasks. The model is built for repository-scale reasoning, software engineering, autonomous task execution, tool calling, and multi-step agent workflows. MiniMax M3 helps developers, AI teams, and enterprises build capable agents that can reason across large contexts and work with multimodal information.
  • 19
    Qwen2

    Qwen2

    Alibaba

    Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud. Qwen2 is a series of large language models developed by the Qwen team at Alibaba Cloud. It includes both base language models and instruction-tuned models, ranging from 0.5 billion to 72 billion parameters, and features both dense models and a Mixture-of-Experts model. The Qwen2 series is designed to surpass most previous open-weight models, including its predecessor Qwen1.5, and to compete with proprietary models across a broad spectrum of benchmarks in language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning.
  • 20
    Qwen3.6-27B
    Qwen3.6-27B is a dense, open source multimodal language model in the Qwen3.6 series, designed to deliver flagship-level performance in coding, reasoning, and agent-based workflows while maintaining a relatively efficient parameter size of 27 billion. It is positioned as a high-performance general model that “punches above its weight,” achieving results competitive with or superior to significantly larger models on key benchmarks, particularly in agentic coding tasks. It supports both thinking and non-thinking modes, allowing it to dynamically balance deep reasoning with fast responses depending on the task, and integrates capabilities across text and multimodal inputs such as images and video. Built as part of the Qwen3.6 family, the model emphasizes real-world usability, stability, and developer productivity, incorporating improvements driven by community feedback and practical deployment needs.
  • 21
    CodeGemma
    CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following. CodeGemma has 3 model variants, a 7B pre-trained variant that specializes in code completion and generation from code prefixes and/or suffixes, a 7B instruction-tuned variant for natural language-to-code chat and instruction following; and a state-of-the-art 2B pre-trained variant that provides up to 2x faster code completion. Complete lines, and functions, and even generate entire blocks of code, whether you're working locally or using Google Cloud resources. Trained on 500 billion tokens of primarily English language data from web documents, mathematics, and code, CodeGemma models generate code that's not only more syntactically correct but also semantically meaningful, reducing errors and debugging time.
  • 22
    Kimi K2

    Kimi K2

    Moonshot AI

    Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and stabilized by MuonClip’s attention-logit clamping, it delivers exceptional performance in frontier knowledge, reasoning, mathematics, coding, and general agentic workflows. Moonshot AI provides two variants, Kimi-K2-Base for research-level fine-tuning and Kimi-K2-Instruct pre-trained for immediate chat and tool-driven interactions, enabling both custom development and drop-in agentic capabilities. Benchmarks show it outperforms leading open source peers and rivals top proprietary models in coding tasks and complex task breakdowns, while its 128 K-token context length, tool-calling API compatibility, and support for industry-standard inference engines.
  • 23
    Qwen3.6-Max-Preview
    Qwen3.6-Max-Preview is a next-generation frontier language model designed to push the limits of intelligence, instruction following, and real-world agent capabilities within the Qwen ecosystem. Building on the Qwen3 series, this preview release introduces stronger world knowledge, sharper instruction alignment, and significant improvements in agentic coding performance, enabling the model to better handle complex, multi-step tasks and software engineering workflows. It is engineered for advanced reasoning and execution scenarios, where the model not only generates responses but also interacts with tools, processes long contexts, and supports structured problem-solving across domains such as coding, research, and enterprise workflows. The architecture continues the Qwen focus on large-scale, high-efficiency models capable of handling extensive context windows and delivering consistent performance across multilingual and knowledge-intensive tasks.
  • 24
    CodeQwen

    CodeQwen

    Alibaba

    CodeQwen is the code version of Qwen, the large language model series developed by the Qwen team, Alibaba Cloud. It is a transformer-based decoder-only language model pre-trained on a large amount of data of codes. Strong code generation capabilities and competitive performance across a series of benchmarks. Supporting long context understanding and generation with the context length of 64K tokens. CodeQwen supports 92 coding languages and provides excellent performance in text-to-SQL, bug fixes, etc. You can just write several lines of code with transformers to chat with CodeQwen. Essentially, we build the tokenizer and the model from pre-trained methods, and we use the generate method to perform chatting with the help of the chat template provided by the tokenizer. We apply the ChatML template for chat models following our previous practice. The model completes the code snippets according to the given prompts, without any additional formatting.
  • 25
    Qwen3-Coder
    Qwen3‑Coder is an agentic code model available in multiple sizes, led by the 480B‑parameter Mixture‑of‑Experts variant (35B active) that natively supports 256K‑token contexts (extendable to 1M) and achieves state‑of‑the‑art results comparable to Claude Sonnet 4. Pre‑training on 7.5T tokens (70 % code) and synthetic data cleaned via Qwen2.5‑Coder optimized both coding proficiency and general abilities, while post‑training employs large‑scale, execution‑driven reinforcement learning, scaling test‑case generation for diverse coding challenges, and long‑horizon RL across 20,000 parallel environments to excel on multi‑turn software‑engineering benchmarks like SWE‑Bench Verified without test‑time scaling. Alongside the model, the open source Qwen Code CLI (forked from Gemini Code) unleashes Qwen3‑Coder in agentic workflows with customized prompts, function calling protocols, and seamless integration with Node.js, OpenAI SDKs, and environment variables.
  • 26
    Mixtral 8x22B

    Mixtral 8x22B

    Mistral AI

    Mixtral 8x22B is our latest open model. It sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. It is fluent in English, French, Italian, German, and Spanish. It has strong mathematics and coding capabilities. It is natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale. Its 64K tokens context window allows precise information recall from large documents. We build models that offer unmatched cost efficiency for their respective sizes, delivering the best performance-to-cost ratio within models provided by the community. Mixtral 8x22B is a natural continuation of our open model family. Its sparse activation patterns make it faster than any dense 70B model.
  • 27
    SubQ 1.1 Small

    SubQ 1.1 Small

    Subquadratic

    SubQ 1.1 Small is a long-context AI model from Subquadratic designed to reason over complete enterprise artifacts such as codebases, document collections, contracts, and financial filings. It uses Subquadratic Sparse Attention, or SSA, to reduce the high compute costs normally associated with processing very large context windows. The model delivers near-perfect long-context retrieval across 1M, 2M, 6M, and 12M token tests while using far less attention compute than dense attention. SubQ 1.1 Small also maintains strong general reasoning, coding, knowledge, and agentic task performance across multiple benchmarks. Its capabilities make it useful for financial analysis, legal review, contract work, software engineering, due diligence, and other workflows where information is spread across large artifacts. SubQ is built for organizations that want to move beyond fragmented retrieval pipelines and enable direct reasoning over massive bodies of information.
  • 28
    Ling 2.6 Flash
    Ling 2.6 Flash is the latest cost-effective model in the Ling series, built on a Mixture of Experts architecture with 104B total parameters and 7.4B activated parameters. It is designed to achieve an optimal balance between inference performance and compute cost, making it suitable for general-purpose scenarios where strong reasoning capability, high throughput, and efficient deployment matter. Ling’s MoE architecture routes each token to activate only the most relevant expert subnetworks, compressing actual computation to a minimal fraction while maintaining large-scale model capacity. Ling 2.6 Flash provides a native 256K context window and can process approximately 200,000 characters of long-form input, with reliable long-range information retrieval whether key information appears at the beginning, middle, or end of the context. Its aggregate benchmark performance is comparable to or exceeds 40B-class Dense models.
    Starting Price: $0.00037 per 1M tokens
  • 29
    Qwen3

    Qwen3

    Alibaba

    Qwen3, the latest iteration of the Qwen family of large language models, introduces groundbreaking features that enhance performance across coding, math, and general capabilities. With models like the Qwen3-235B-A22B and Qwen3-30B-A3B, Qwen3 achieves impressive results compared to top-tier models, thanks to its hybrid thinking modes that allow users to control the balance between deep reasoning and quick responses. The platform supports 119 languages and dialects, making it an ideal choice for global applications. Its pre-training process, which uses 36 trillion tokens, enables robust performance, and advanced reinforcement learning (RL) techniques continue to refine its capabilities. Available on platforms like Hugging Face and ModelScope, Qwen3 offers a powerful tool for developers and researchers working in diverse fields.
  • 30
    QwQ-Max-Preview
    QwQ-Max-Preview is an advanced AI model built on the Qwen2.5-Max architecture, designed to excel in deep reasoning, mathematical problem-solving, coding, and agent-related tasks. This preview version offers a sneak peek at its capabilities, which include improved performance in a wide range of general-domain tasks and the ability to handle complex workflows. QwQ-Max-Preview is slated for an official open-source release under the Apache 2.0 license, offering further advancements and refinements in its full version. It also paves the way for a more accessible AI ecosystem, with the upcoming launch of the Qwen Chat app and smaller variants of the model like QwQ-32B, aimed at developers seeking local deployment options.
  • 31
    Qwen2.5-Max
    Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed by the Qwen team, pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). In evaluations, it outperforms models like DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results in other assessments, including MMLU-Pro. Qwen2.5-Max is accessible via API through Alibaba Cloud and can be explored interactively on Qwen Chat.
  • 32
    Step 3.5 Flash
    Step 3.5 Flash is an advanced open source foundation language model engineered for frontier reasoning and agentic capabilities with exceptional efficiency, built on a sparse Mixture of Experts (MoE) architecture that selectively activates only about 11 billion of its ~196 billion parameters per token to deliver high-density intelligence and real-time responsiveness. Its 3-way Multi-Token Prediction (MTP-3) enables generation throughput in the hundreds of tokens per second for complex multi-step reasoning chains and task execution, and it supports efficient long contexts with a hybrid sliding window attention approach that reduces computational overhead across large datasets or codebases. It demonstrates robust performance on benchmarks for reasoning, coding, and agentic tasks, rivaling or exceeding many larger proprietary models, and includes a scalable reinforcement learning framework for consistent self-improvement.
  • 33
    GLM-5

    GLM-5

    Zhipu AI

    GLM-5 is Z.ai’s latest large language model built for complex systems engineering and long-horizon agentic tasks. It scales significantly beyond GLM-4.5, increasing total parameters and training data while integrating DeepSeek Sparse Attention to reduce deployment costs without sacrificing long-context capacity. The model combines enhanced pre-training with a new asynchronous reinforcement learning infrastructure called slime, improving training efficiency and post-training refinement. GLM-5 achieves best-in-class performance among open-source models across reasoning, coding, and agent benchmarks, narrowing the gap with leading frontier models. It ranks highly on evaluations such as Vending Bench 2, demonstrating strong long-term planning and operational capabilities. The model is open-sourced under the MIT License.
  • 34
    Qwen3.7-Plus
    Qwen3.7-Plus is a multimodal agent model that unifies vision and language into a single, versatile agent foundation. Building on Qwen3.7’s agentic intelligence, it extends Qwen’s capabilities into visual understanding, visual reasoning, grounded interaction, and multimodal tool use, enabling agents to perceive, analyze, and act across text, images, documents, screens, and complex real-world contexts. It is designed for tasks that require more than static question answering, including visual search, document comprehension, chart and table analysis, screen understanding, GUI interaction, image-grounded reasoning, and agent workflows that combine perception with planning and execution. Qwen3.7-Plus strengthens the connection between language reasoning and visual evidence, allowing users to ask questions about images, interpret dense multimodal inputs, extract structured information, and generate responses that reflect both context and visual details.
  • 35
    ByteDance Seed
    Seed Diffusion Preview is a large-scale, code-focused language model that uses discrete-state diffusion to generate code non-sequentially, achieving dramatically faster inference without sacrificing quality by decoupling generation from the token-by-token bottleneck of autoregressive models. It combines a two-stage curriculum, mask-based corruption followed by edit-based augmentation, to robustly train a standard dense Transformer, striking a balance between speed and accuracy and avoiding shortcuts like carry-over unmasking to preserve principled density estimation. The model delivers an inference speed of 2,146 tokens/sec on H20 GPUs, outperforming contemporary diffusion baselines while matching or exceeding their accuracy on standard code benchmarks, including editing tasks, thereby establishing a new speed-quality Pareto frontier and demonstrating discrete diffusion’s practical viability for real-world code generation.
  • 36
    Molmo 2
    Molmo 2 is a new suite of state-of-the-art open vision-language models with fully open weights, training data, and training code that extends the original Molmo family’s grounded image understanding to video and multi-image inputs, enabling advanced video understanding, pointing, tracking, dense captioning, and question-answering capabilities; all with strong spatial and temporal reasoning across frames. Molmo 2 includes three variants: an 8 billion-parameter model optimized for overall video grounding and QA, a 4 billion-parameter version designed for efficiency, and a 7 billion-parameter Olmo-backed model offering a fully open end-to-end architecture including the underlying language model. These models outperform earlier Molmo versions on core benchmarks and set new open-model high-water marks for image and video understanding tasks, often competing with substantially larger proprietary systems while training on a fraction of the data used by comparable closed models.
  • 37
    AutoScientist

    AutoScientist

    AutoScientist

    AutoScientist is a system that self-improves and automates the full research loop behind model training and alignment, making it possible for more teams to shape and refine the AI they depend on. Model training and reinforcement learning are among the most powerful ways to shape a model, but they are also among the hardest to get right outside a frontier lab because attempts can fail through catastrophic forgetting, overfitting on small or low-quality datasets, and conflicting training signals. AutoScientist co-optimizes data and model training recipes automatically, self-improving across both until quality converges on the user’s objective. Where Adaptive Data shapes the inputs, AutoScientist shapes the model, running the full research loop end-to-end so users walk away with models adapted to their goal. The loop runs itself: data and recipes are co-optimized in lockstep, iterating until the model converges on the behavior described.
  • 38
    Amazon Nova 2 Pro
    Amazon Nova 2 Pro is Amazon’s most advanced reasoning model, designed to handle highly complex, multimodal tasks across text, images, video, and speech with exceptional accuracy. It excels in deep problem-solving scenarios such as agentic coding, multi-document analysis, long-range planning, and advanced math. With benchmark performance equal or superior to leading models like Claude Sonnet 4.5, GPT-5.1, and Gemini Pro, Nova 2 Pro delivers top-tier intelligence across a wide range of enterprise workloads. The model includes built-in web grounding and code execution, ensuring responses remain factual, current, and contextually accurate. Nova 2 Pro can also serve as a “teacher model,” enabling knowledge distillation into smaller, purpose-built variants for specific domains. It is engineered for organizations that require precision, reliability, and frontier-level reasoning in mission-critical AI applications.
  • 39
    MiniMax M2.5
    MiniMax M2.5 is a frontier AI model engineered for real-world productivity across coding, agentic workflows, search, and office tasks. Extensively trained with reinforcement learning in hundreds of thousands of real-world environments, it achieves state-of-the-art performance in benchmarks such as SWE-Bench Verified and BrowseComp. The model demonstrates strong architectural thinking, decomposing complex problems before generating code across more than ten programming languages. M2.5 operates at high throughput speeds of up to 100 tokens per second, enabling faster completion of multi-step tasks. It is optimized for efficient reasoning, reducing token usage and execution time compared to previous versions. With dramatically lower pricing than competing frontier models, it delivers powerful performance at minimal cost. Integrated into MiniMax Agent, M2.5 supports professional-grade office workflows, financial modeling, and autonomous task execution.
  • 40
    Qwen3.6-35B-A3B
    Qwen3.5-35B-A3B is part of the Qwen3.5 “Medium” model series, designed as a highly efficient, multimodal foundation model that balances strong reasoning ability with practical deployment requirements. It uses a Mixture-of-Experts (MoE) architecture with 35 billion total parameters but activates only about 3 billion per token, allowing it to deliver performance comparable to much larger models while significantly reducing computational cost. The model integrates a hybrid attention mechanism that combines linear attention with standard attention layers, enabling efficient long-context processing and improved scalability for complex tasks. As a native vision-language model, it can process both text and visual inputs, supporting use cases such as multimodal reasoning, coding, and agent-based workflows. It is designed to function as a general-purpose “AI agent,” capable of planning, tool use, and structured problem solving rather than just conversational responses.
  • 41
    Laguna XS.2

    Laguna XS.2

    Poolside

    Laguna XS.2 is Poolside’s open-weight agentic coding model, built as the lightest and fastest model in the Laguna family. It is a 33B total-parameter Mixture of Experts model with 3B activated parameters, trained completely in-house on 30T tokens. As Poolside’s newest generation model open to the community, Laguna XS.2 is a second-generation architecture and the company’s first open-weight model, built on the lessons learned from training Laguna M.1 across synthetic data and reinforcement learning. The model is designed for agentic coding workflows, where it can code, act, iterate quickly, and perform best inside Poolside’s coding agent. Laguna XS.2 is positioned as a strong model for rapid agentic iteration, especially for developers and teams that need a compact, efficient coding model rather than a heavier frontier system. It is released under an Apache 2.0 license, allowing the community to evaluate, fine-tune, quantize, serve, and build on the weights.
  • 42
    Qwen3.6-Plus
    Qwen3.6-Plus is an advanced AI model developed by Alibaba Cloud, designed to power real-world intelligent agents and complex workflows. It introduces significant improvements in agentic coding, enabling developers to handle everything from frontend development to large-scale codebase management. The model features a massive 1 million token context window, allowing it to process and reason over long and complex inputs. It integrates reasoning, memory, and execution capabilities to deliver highly accurate and reliable results. Qwen3.6-Plus also enhances multimodal capabilities, enabling it to understand and analyze images, videos, and documents. The platform is optimized for real-world applications, including automation, planning, and tool-based workflows. Overall, it provides a powerful foundation for building next-generation AI agents and intelligent systems.
  • 43
    Qwen3-Coder-Next
    Qwen3-Coder-Next is an open-weight language model specifically designed for coding agents and local development that delivers advanced coding reasoning, complex tool usage, and robust performance on long-horizon programming tasks with high efficiency, using a mixture-of-experts architecture that balances powerful capabilities with resource-friendly operation. It provides enhanced agentic coding abilities that help software developers, AI system builders, and automated coding workflows generate, debug, and reason about code with deep contextual understanding while recovering from execution errors, making it well-suited for autonomous coding agents and development-oriented applications. By achieving strong performance comparable to much larger parameter models while requiring fewer active parameters, Qwen3-Coder-Next enables cost-effective deployment for dynamic and complex programming workloads in research and production environments.
  • 44
    Devstral Small 2
    Devstral Small 2 is the compact, 24 billion-parameter variant of the new coding-focused model family from Mistral AI, released under the permissive Apache 2.0 license to enable both local deployment and API use. Alongside its larger sibling (Devstral 2), this model brings “agentic coding” capabilities to environments with modest compute: it supports a large 256K-token context window, enabling it to understand and make changes across entire codebases. On the standard code-generation benchmark (SWE-Bench Verified), Devstral Small 2 scores around 68.0%, placing it among open-weight models many times its size. Because of its reduced size and efficient design, Devstral Small 2 can run on a single GPU or even CPU-only setups, making it practical for developers, small teams, or hobbyists without access to data-center hardware. Despite its compact footprint, Devstral Small 2 retains key capabilities of larger models; it can reason across multiple files and track dependencies.
  • 45
    Qwen-7B

    Qwen-7B

    Alibaba

    Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. The features of the Qwen-7B series include: Trained with high-quality pretraining data. We have pretrained Qwen-7B on a self-constructed large-scale high-quality dataset of over 2.2 trillion tokens. The dataset includes plain texts and codes, and it covers a wide range of domains, including general domain data and professional domain data. Strong performance. In comparison with the models of the similar model size, we outperform the competitors on a series of benchmark datasets, which evaluates natural language understanding, mathematics, coding, etc. And more.
  • 46
    North Mini Code
    North Mini Code is Cohere’s first agentic coding model for developers and the inaugural member of its next generation of powerful models. Small, efficient, and open-source, it is built for the sovereign developer ecosystem and designed to deliver strong software development performance without requiring extensive hardware. North Mini Code is a mixture-of-experts model with 30B total parameters and 3B active parameters, giving developers access to agentic coding capabilities in a compact and efficient form. The model is optimized for code generation, agentic software engineering, and terminal tasks, with a 256K total context length and up to 64K maximum generation. It is built for real-world developer workflows, including understanding and orchestrating sub-agents, mapping system architecture, running code reviews, and supporting coding agents that need to reason through complex software tasks.
  • 47
    Qwen

    Qwen

    Alibaba

    Qwen is a powerful, free AI assistant built on the advanced Qwen model series, designed to help anyone with creativity, research, problem-solving, and everyday tasks. While Qwen Chat is the main interface for most users, Qwen itself powers a broad range of intelligent capabilities including image generation, deep research, website creation, advanced reasoning, and context-aware search. Its multimodal intelligence enables Qwen to understand and process text, images, audio, and video simultaneously for richer insights. Qwen is available on web, desktop, and mobile, ensuring seamless access across all devices. For developers, the Qwen API provides OpenAI-compatible endpoints, making integration simple and allowing Qwen’s intelligence to power apps, services, and automation. Whether you're chatting through Qwen Chat or building with the Qwen API, Qwen delivers fast, flexible, and highly capable AI support.
  • 48
    Claude Haiku 4.5
    Anthropic has launched Claude Haiku 4.5, its latest small-language model designed to deliver near-frontier performance at significantly lower cost. The model provides similar coding and reasoning quality as the company’s mid-tier Sonnet 4, yet it runs at roughly one-third of the cost and more than twice the speed. In benchmarks cited by Anthropic, Haiku 4.5 meets or exceeds Sonnet 4’s performance in key tasks such as code generation and multi-step “computer use” workflows. It is optimized for real-time, low-latency scenarios such as chat assistants, customer service agents, and pair-programming support. Haiku 4.5 is made available via the Claude API under the identifier “claude-haiku-4-5” and supports large-scale deployments where cost, responsiveness, and near-frontier intelligence matter. Claude Haiku 4.5 is available now on Claude Code and our apps. Its efficiency means you can accomplish more within your usage limits while maintaining premium model performance.
    Starting Price: $1 per million input tokens
  • 49
    GLM-5.1

    GLM-5.1

    Zhipu AI

    GLM-5.1 is the latest iteration of Z.ai’s GLM series, designed as a frontier-level, agent-oriented AI model optimized for coding, reasoning, and long-horizon workflows. It builds on the GLM-5 architecture, which uses a Mixture-of-Experts (MoE) design to deliver high performance while keeping inference costs efficient, and is part of a broader push toward open-weight, developer-accessible models. A core focus of GLM-5.1 is enabling agentic behavior, meaning it can plan, execute, and iterate across multi-step tasks rather than simply responding to single prompts. It is specifically designed to handle complex workflows such as debugging code, navigating repositories, and executing chained operations with sustained context. Compared to earlier models, GLM-5.1 improves reliability in long interactions, maintaining coherence across extended sessions and reducing breakdowns in multi-step reasoning.
  • 50
    ZeroGPU

    ZeroGPU

    ZeroGPU

    ZeroGPU is a compute efficiency layer for AI inference that helps AI applications reduce inference costs by moving high-volume tasks to specialized models across an edge-powered inference network. It is built around the idea that most production AI workloads do not need frontier-scale reasoning; tasks such as document analysis, content summarization, page classification, signal extraction, PII detection, web content processing, query routing, and message moderation can often run on smaller, task-specific models instead of expensive frontier models. ZeroGPU helps developers identify workloads that do not require deep reasoning, route them to specialized small language models and nano models, execute them across optimized servers, approved edge capacity, and cloud fallback, then measure cost reduction, latency improvement, avoided frontier-model calls, and model performance.