Compare the Top Government AI Coding Models as of May 2026 - Page 4

  • 1
    GLM-4.5V-Flash
    GLM-4.5V-Flash is an open source vision-language model, designed to bring strong multimodal capabilities into a lightweight, deployable package. It supports image, video, document, and GUI inputs, enabling tasks such as scene understanding, chart and document parsing, screen reading, and multi-image analysis. Compared to larger models in the series, GLM-4.5V-Flash offers a compact footprint while retaining core VLM capabilities like visual reasoning, video understanding, GUI task handling, and complex document parsing. It can serve in “GUI agent” workflows, meaning it can interpret screenshots or desktop captures, recognize icons or UI elements, and assist with automated desktop or web-based tasks. Although it forgoes some of the largest-model performance gains, GLM-4.5V-Flash remains versatile for real-world multimodal tasks where efficiency, lower resource usage, and broad modality support are prioritized.
    Starting Price: Free
  • 2
    GLM-4.5V

    GLM-4.5V

    Zhipu AI

    GLM-4.5V builds on the GLM-4.5-Air foundation, using a Mixture-of-Experts (MoE) architecture with 106 billion total parameters and 12 billion activation parameters. It achieves state-of-the-art performance among open-source VLMs of similar scale across 42 public benchmarks, excelling in image, video, document, and GUI-based tasks. It supports a broad range of multimodal capabilities, including image reasoning (scene understanding, spatial recognition, multi-image analysis), video understanding (segmentation, event recognition), complex chart and long-document parsing, GUI-agent workflows (screen reading, icon recognition, desktop automation), and precise visual grounding (e.g., locating objects and returning bounding boxes). GLM-4.5V also introduces a “Thinking Mode” switch, allowing users to choose between fast responses or deeper reasoning when needed.
    Starting Price: Free
  • 3
    GLM-4.7

    GLM-4.7

    Zhipu AI

    GLM-4.7 is an advanced large language model designed to significantly elevate coding, reasoning, and agentic task performance. It delivers major improvements over GLM-4.6 in multilingual coding, terminal-based tasks, and real-world software engineering benchmarks such as SWE-bench and Terminal Bench. GLM-4.7 supports “thinking before acting,” enabling more stable, accurate, and controllable behavior in complex coding and agent workflows. The model also introduces strong gains in UI and frontend generation, producing cleaner webpages, better layouts, and more polished slides. Enhanced tool-using capabilities allow GLM-4.7 to perform more effectively in web browsing, automation, and agent benchmarks. Its reasoning and mathematical performance has improved substantially, showing strong results on advanced evaluation suites. GLM-4.7 is available via Z.ai, API platforms, coding agents, and local deployment for flexible adoption.
    Starting Price: Free
  • 4
    MiniMax-M2.1
    MiniMax-M2.1 is an open-source, agentic large language model designed for advanced coding, tool use, and long-horizon planning. It was released to the community to make high-performance AI agents more transparent, controllable, and accessible. The model is optimized for robustness in software engineering, instruction following, and complex multi-step workflows. MiniMax-M2.1 supports multilingual development and performs strongly across real-world coding scenarios. It is suitable for building autonomous applications that require reasoning, planning, and execution. The model weights are fully open, enabling local deployment and customization. MiniMax-M2.1 represents a major step toward democratizing top-tier agent capabilities.
    Starting Price: Free
  • 5
    Xiaomi MiMo

    Xiaomi MiMo

    Xiaomi Technology

    The Xiaomi MiMo API open platform is a developer-oriented interface for accessing and integrating Xiaomi’s MiMo family of AI models, including reasoning and language models such as MiMo-V2-Flash, into applications and services through standardized APIs and cloud endpoints, enabling developers to build AI-enabled features like conversational agents, reasoning workflows, code assistance, and search-augmented tasks without managing model infrastructure themselves. It offers REST-style API access with authentication, request signing, and structured responses so software can send prompts and receive generated text or processed outputs programmatically, and it supports common operations like text generation, prompt handling, and inference over MiMo models. By providing documentation and onboarding tools, the open platform lets teams integrate Xiaomi’s latest open source large language models, which leverage Mixture-of-Experts (MoE) architectures.
    Starting Price: Free
  • 6
    Composer 1
    Composer is Cursor’s custom-built agentic AI model optimized specifically for software engineering tasks and designed to power fast, interactive coding assistance directly within the Cursor IDE, a VS Code-derived editor enhanced with intelligent automation. It is a mixture-of-experts model trained with reinforcement learning (RL) on real-world coding problems across large codebases, so it can produce high-speed, context-aware responses, from code edits and planning to answers that understand project structure, tools, and conventions, with generation speeds roughly four times faster than similar models in benchmarks. Composer is specialized for development workflows, leveraging long-context understanding, semantic search, and limited tool access (like file editing and terminal commands) so it can solve complex engineering requests with efficient and practical outputs.
    Starting Price: $20 per month
  • 7
    SERA

    SERA

    Ai2

    Open Coding Agents are a family of fully open, high-performance AI coding models and an associated training method released by the Allen Institute for AI that make building, customizing, and training coding agents on any repository remarkably accessible, affordable, and transparent; the platform includes models, code, training recipes, and tools that can be launched with minimal setup so users can tailor agents to their own codebases and engineering conventions for tasks like code generation, code review, debugging, maintenance, and code explanation. These agents break from the traditional closed, expensive systems by offering an open pipeline from models to training data and enabling fine-tuning on internal code to teach agents about organization-specific APIs, patterns, and workflows; the first release, SERA (Soft-verified Efficient Repository Agents), achieves state-of-the-art performance on coding benchmarks at a fraction of the typical compute cost.
    Starting Price: Free
  • 8
    Qwen3-Coder-Next
    Qwen3-Coder-Next is an open-weight language model specifically designed for coding agents and local development that delivers advanced coding reasoning, complex tool usage, and robust performance on long-horizon programming tasks with high efficiency, using a mixture-of-experts architecture that balances powerful capabilities with resource-friendly operation. It provides enhanced agentic coding abilities that help software developers, AI system builders, and automated coding workflows generate, debug, and reason about code with deep contextual understanding while recovering from execution errors, making it well-suited for autonomous coding agents and development-oriented applications. By achieving strong performance comparable to much larger parameter models while requiring fewer active parameters, Qwen3-Coder-Next enables cost-effective deployment for dynamic and complex programming workloads in research and production environments.
    Starting Price: Free
  • 9
    MiniMax M2.5
    MiniMax M2.5 is a frontier AI model engineered for real-world productivity across coding, agentic workflows, search, and office tasks. Extensively trained with reinforcement learning in hundreds of thousands of real-world environments, it achieves state-of-the-art performance in benchmarks such as SWE-Bench Verified and BrowseComp. The model demonstrates strong architectural thinking, decomposing complex problems before generating code across more than ten programming languages. M2.5 operates at high throughput speeds of up to 100 tokens per second, enabling faster completion of multi-step tasks. It is optimized for efficient reasoning, reducing token usage and execution time compared to previous versions. With dramatically lower pricing than competing frontier models, it delivers powerful performance at minimal cost. Integrated into MiniMax Agent, M2.5 supports professional-grade office workflows, financial modeling, and autonomous task execution.
    Starting Price: Free
  • 10
    DeepSeek-V4

    DeepSeek-V4

    DeepSeek

    DeepSeek-V4 is a next-generation open-source language model designed for high-performance reasoning, coding, and long-context intelligence. It introduces a powerful architecture with up to one million token context length, enabling seamless handling of large datasets and complex multi-step workflows. The model comes in two variants: DeepSeek-V4-Pro for maximum performance and DeepSeek-V4-Flash for efficiency and speed. DeepSeek-V4-Pro features 1.6 trillion total parameters with 49 billion activated, delivering near state-of-the-art performance comparable to leading closed-source models. It excels in agentic coding, mathematical reasoning, and world knowledge tasks. The model integrates advanced attention mechanisms, including token-wise compression and sparse attention, significantly reducing compute and memory costs. It is also optimized for AI agents, supporting tool use and multi-step workflows.
    Starting Price: Free
  • 11
    Qwen3.5

    Qwen3.5

    Alibaba

    Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.
    Starting Price: Free
  • 12
    Leanstral

    Leanstral

    Mistral AI

    Leanstral is an open-source code agent developed by Mistral AI specifically designed to work with the Lean 4 proof assistant. The model focuses on generating code while also formally verifying its correctness against strict mathematical or software specifications. Unlike traditional coding assistants, Leanstral integrates directly with formal proof systems to ensure that generated code satisfies defined logical requirements. Its architecture is optimized for proof engineering tasks and operates efficiently with sparse model parameters. Leanstral is released under the Apache 2.0 license, making it freely accessible for developers, researchers, and organizations to use and customize. The model is designed to operate within real-world formal repositories rather than isolated problem environments. By combining code generation with formal verification, Leanstral aims to reduce the need for manual human review in complex software and mathematical development.
    Starting Price: Free
  • 13
    MiniMax M2.7
    MiniMax M2.7 is an advanced AI model designed to enhance real-world productivity across coding, search, and office workflows. It is trained with reinforcement learning across numerous real-world environments, enabling it to handle complex, multi-step tasks effectively. The model excels in problem-solving by breaking down challenges before generating solutions across multiple programming languages. It delivers high-speed performance with rapid token generation, allowing tasks to be completed efficiently. With optimized reasoning and cost-effective pricing, it provides powerful capabilities while minimizing resource usage. It also achieves strong performance in software engineering benchmarks, reducing incident response time and improving development efficiency. Additionally, it supports advanced agentic workflows and professional-grade office tasks, making it highly versatile for modern work environments.
    Starting Price: Free
  • 14
    MiMo-V2-Pro

    MiMo-V2-Pro

    Xiaomi Technology

    Xiaomi MiMo-V2-Pro is a flagship AI foundation model designed to power real-world agentic workflows and complex task execution. It is built to function as the core intelligence behind agent systems, enabling orchestration of multi-step processes and production-level tasks. The model demonstrates strong capabilities in coding, tool usage, and search-based tasks, performing competitively on global benchmarks. With its large-scale architecture and extended context window, it can handle long and complex interactions efficiently. MiMo-V2-Pro is optimized for practical applications, delivering reliable performance across development, automation, and enterprise workflows.
    Starting Price: $1/million tokens
  • 15
    Mercury Edit 2
    Mercury Edit 2 is part of Inception Labs’ Mercury family of AI models, designed to perform high-speed reasoning, coding, and editing tasks using a fundamentally different architecture from traditional large language models. It builds on Mercury 2, a diffusion-based reasoning model that generates and refines entire outputs in parallel rather than producing text token by token, enabling significantly faster performance and more responsive editing workflows. Instead of acting like a sequential “typewriter,” the system behaves more like an editor, starting with a rough draft and iteratively improving it across multiple tokens at once, which allows for real-time interaction and rapid iteration in tasks such as code editing, content generation, and agent-based workflows. This architecture delivers throughput of up to around 1,000 tokens per second, making it several times faster than conventional models while maintaining competitive reasoning quality across benchmarks.
    Starting Price: $0.25 per 1M input tokens
  • 16
    Qwen3.6-35B-A3B
    Qwen3.5-35B-A3B is part of the Qwen3.5 “Medium” model series, designed as a highly efficient, multimodal foundation model that balances strong reasoning ability with practical deployment requirements. It uses a Mixture-of-Experts (MoE) architecture with 35 billion total parameters but activates only about 3 billion per token, allowing it to deliver performance comparable to much larger models while significantly reducing computational cost. The model integrates a hybrid attention mechanism that combines linear attention with standard attention layers, enabling efficient long-context processing and improved scalability for complex tasks. As a native vision-language model, it can process both text and visual inputs, supporting use cases such as multimodal reasoning, coding, and agent-based workflows. It is designed to function as a general-purpose “AI agent,” capable of planning, tool use, and structured problem solving rather than just conversational responses.
    Starting Price: Free
  • 17
    GPT-5.5

    GPT-5.5

    OpenAI

    GPT-5.5 is an advanced AI model designed to handle complex, real-world tasks with greater autonomy and efficiency. It quickly understands user intent and can execute multi-step workflows such as coding, research, data analysis, and document creation with minimal guidance. Instead of requiring step-by-step instructions, GPT-5.5 plans tasks, uses tools, evaluates outputs, and continues working until completion. It excels in knowledge work, software development, and analytical problem-solving, helping users move from idea to execution faster. The model is built to operate across tools and environments, making it highly effective for modern digital workflows. With strong reasoning and persistence, GPT-5.5 enables individuals and teams to complete demanding work more efficiently and accurately.
    Starting Price: $5 per 1M tokens (input)
  • 18
    GPT-5.5 Pro
    GPT-5.5 Pro is an advanced AI model designed to handle complex, real-world work with greater autonomy and efficiency. It understands user intent quickly and can execute multi-step tasks such as coding, research, data analysis, and document creation with minimal guidance. The model is built to plan, use tools, and refine its outputs until tasks are complete. It excels in knowledge work, software development, and analytical problem-solving. With strong reasoning and persistence, GPT-5.5 Pro can manage long-running workflows across tools and systems. It delivers high-quality results while maintaining speed and efficiency. Overall, it enables individuals and teams to complete demanding tasks faster and more accurately.
    Starting Price: $30 per 1M tokens (input)
  • 19
    Qwen3.6-27B
    Qwen3.6-27B is a dense, open source multimodal language model in the Qwen3.6 series, designed to deliver flagship-level performance in coding, reasoning, and agent-based workflows while maintaining a relatively efficient parameter size of 27 billion. It is positioned as a high-performance general model that “punches above its weight,” achieving results competitive with or superior to significantly larger models on key benchmarks, particularly in agentic coding tasks. It supports both thinking and non-thinking modes, allowing it to dynamically balance deep reasoning with fast responses depending on the task, and integrates capabilities across text and multimodal inputs such as images and video. Built as part of the Qwen3.6 family, the model emphasizes real-world usability, stability, and developer productivity, incorporating improvements driven by community feedback and practical deployment needs.
    Starting Price: Free
  • 20
    KAT-Coder-Pro V2
    KAT-Coder is an agentic AI coding system designed to go beyond traditional autocomplete tools by enabling end-to-end software development workflows driven by reasoning, planning, and execution. It is positioned as a flagship coding model within the KAT ecosystem, built specifically for “agentic coding,” where the model does not just generate snippets but can diagnose issues, propose fixes, run tests, and iterate across multiple files as part of a continuous development loop. It integrates directly with developer environments through API endpoints and proxy layers compatible with tools like Claude Code, allowing seamless use inside existing IDE workflows without changing the interface developers are already familiar with. KAT-Coder is trained using a multi-stage pipeline that includes supervised fine-tuning and large-scale reinforcement learning, enabling it to understand programming context, and reason over complex tasks.
    Starting Price: $0.30 per month
  • 21
    DeepSeek-V4-Pro
    DeepSeek-V4-Pro is a large-scale Mixture-of-Experts (MoE) language model designed for advanced reasoning, coding, and long-context understanding. It features 1.6 trillion total parameters with 49 billion activated parameters, enabling high performance while maintaining efficiency. The model supports an exceptionally large context window of up to one million tokens, allowing it to process extensive documents and workflows. It uses a hybrid attention architecture to optimize long-context performance and reduce computational cost. DeepSeek-V4-Pro is trained on over 32 trillion tokens, improving its knowledge and reasoning capabilities. It also includes advanced optimization techniques for stability and faster convergence during training. The model supports multiple reasoning modes, allowing users to balance speed and accuracy based on their needs. Overall, it provides a powerful open-source solution for complex AI tasks and large-scale applications.
    Starting Price: Free
  • 22
    DeepSeek-V4-Flash
    DeepSeek-V4-Flash is a high-efficiency Mixture-of-Experts (MoE) language model designed for fast, scalable reasoning and text generation. It features 284 billion total parameters with 13 billion activated parameters, delivering strong performance while optimizing computational cost. The model supports an extensive context window of up to one million tokens, enabling it to process large documents and complex workflows with ease. Its hybrid attention architecture enhances long-context efficiency by reducing memory and compute requirements. Trained on over 32 trillion tokens, DeepSeek-V4-Flash demonstrates solid capabilities across knowledge, reasoning, and coding tasks. It is designed for scenarios where speed and efficiency are critical, offering a balance between performance and resource usage. The model also supports multiple reasoning modes, allowing users to adjust between faster outputs and deeper analysis.
    Starting Price: Free
  • 23
    CodeGen

    CodeGen

    Salesforce

    CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
    Starting Price: Free
  • 24
    StarCoder

    StarCoder

    BigCode

    StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. Similar to LLaMA, we trained a ~15B parameter model for 1 trillion tokens. We fine-tuned StarCoderBase model for 35B Python tokens, resulting in a new model that we call StarCoder. We found that StarCoderBase outperforms existing open Code LLMs on popular programming benchmarks and matches or surpasses closed models such as code-cushman-001 from OpenAI (the original Codex model that powered early versions of GitHub Copilot). With a context length of over 8,000 tokens, the StarCoder models can process more input than any other open LLM, enabling a wide range of interesting applications. For example, by prompting the StarCoder models with a series of dialogues, we enabled them to act as a technical assistant.
    Starting Price: Free
  • 25
    Llama 2
    The next generation of our open source large language model. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 70B parameters. Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. Llama 2 was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations. We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2.
    Starting Price: Free
  • 26
    Code Llama
    Code Llama is a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts. Code Llama is free for research and commercial use. Code Llama is built on top of Llama 2 and is available in three models: Code Llama, the foundational code model; Codel Llama - Python specialized for Python; and Code Llama - Instruct, which is fine-tuned for understanding natural language instructions.
    Starting Price: Free
  • 27
    ChatGPT Enterprise
    Enterprise-grade security & privacy and the most powerful version of ChatGPT yet. 1. Customer prompts or data are not used for training models 2. Data encryption at rest (AES-256) and in transit (TLS 1.2+) 3. SOC 2 compliant 4. Dedicated admin console and easy bulk member management 5. SSO and Domain Verification 6. Analytics dashboard to understand usage 7. Unlimited, high-speed access to GPT-4 and Advanced Data Analysis* 8. 32k token context windows for 4X longer inputs and memory 9. Shareable chat templates for your company to collaborate
    Starting Price: $60/user/month
  • 28
    GPT-5

    GPT-5

    OpenAI

    GPT-5 is OpenAI’s most advanced AI model, delivering smarter, faster, and more useful responses across a wide range of topics including math, science, finance, and law. It features built-in thinking capabilities that allow it to provide expert-level answers and perform complex reasoning. GPT-5 can handle long context lengths and generate detailed outputs, making it ideal for coding, research, and creative writing. The model includes a ‘verbosity’ parameter for customizable response length and improved personality control. It integrates with business tools like Google Drive and SharePoint to provide context-aware answers while respecting security permissions. Available to everyone, GPT-5 empowers users to collaborate with an AI assistant that feels like a knowledgeable colleague.
    Starting Price: $1.25 per 1M tokens
  • 29
    OpenAI o3
    OpenAI o3 is an advanced AI model designed to enhance reasoning capabilities by breaking down complex instructions into smaller, more manageable steps. It offers significant improvements over previous AI iterations, excelling in coding tasks, competitive programming, and achieving high scores in mathematics and science benchmarks. Available for widespread use, OpenAI o3 supports advanced AI-driven problem-solving and decision-making processes. The model incorporates deliberative alignment techniques to ensure its responses align with established safety and ethical guidelines, making it a powerful tool for developers, researchers, and enterprises seeking sophisticated AI solutions.
    Starting Price: $2 per 1 million tokens
  • 30
    Yi-Large
    Yi-Large is a proprietary large language model developed by 01.AI, offering a 32k context length with both input and output costs at $2 per million tokens. It stands out with its advanced capabilities in natural language processing, common-sense reasoning, and multilingual support, performing on par with leading models like GPT-4 and Claude3 in various benchmarks. Yi-Large is designed for tasks requiring complex inference, prediction, and language understanding, making it suitable for applications like knowledge search, data classification, and creating human-like chatbots. Its architecture is based on a decoder-only transformer with enhancements such as pre-normalization and Group Query Attention, and it has been trained on a vast, high-quality multilingual dataset. This model's versatility and cost-efficiency make it a strong contender in the AI market, particularly for enterprises aiming to deploy AI solutions globally.
    Starting Price: $0.19 per 1M input token
MongoDB Logo MongoDB