Compare the Top On-Premises AI Coding Models as of April 2026 - Page 2

  • 1
    GLM-4.1V

    GLM-4.1V

    Zhipu AI

    GLM-4.1V is a vision-language model, providing a powerful, compact multimodal model designed for reasoning and perception across images, text, and documents. The 9-billion-parameter variant (GLM-4.1V-9B-Thinking) is built on the GLM-4-9B foundation and enhanced through a specialized training paradigm using Reinforcement Learning with Curriculum Sampling (RLCS). It supports a 64k-token context window and accepts high-resolution inputs (up to 4K images, any aspect ratio), enabling it to handle complex tasks such as optical character recognition, image captioning, chart and document parsing, video and scene understanding, GUI-agent workflows (e.g., interpreting screenshots, recognizing UI elements), and general vision-language reasoning. In benchmark evaluations at the 10 B-parameter scale, GLM-4.1V-9B-Thinking achieved top performance on 23 of 28 tasks.
    Starting Price: Free
  • 2
    GLM-4.5V-Flash
    GLM-4.5V-Flash is an open source vision-language model, designed to bring strong multimodal capabilities into a lightweight, deployable package. It supports image, video, document, and GUI inputs, enabling tasks such as scene understanding, chart and document parsing, screen reading, and multi-image analysis. Compared to larger models in the series, GLM-4.5V-Flash offers a compact footprint while retaining core VLM capabilities like visual reasoning, video understanding, GUI task handling, and complex document parsing. It can serve in “GUI agent” workflows, meaning it can interpret screenshots or desktop captures, recognize icons or UI elements, and assist with automated desktop or web-based tasks. Although it forgoes some of the largest-model performance gains, GLM-4.5V-Flash remains versatile for real-world multimodal tasks where efficiency, lower resource usage, and broad modality support are prioritized.
    Starting Price: Free
  • 3
    GLM-4.5V

    GLM-4.5V

    Zhipu AI

    GLM-4.5V builds on the GLM-4.5-Air foundation, using a Mixture-of-Experts (MoE) architecture with 106 billion total parameters and 12 billion activation parameters. It achieves state-of-the-art performance among open-source VLMs of similar scale across 42 public benchmarks, excelling in image, video, document, and GUI-based tasks. It supports a broad range of multimodal capabilities, including image reasoning (scene understanding, spatial recognition, multi-image analysis), video understanding (segmentation, event recognition), complex chart and long-document parsing, GUI-agent workflows (screen reading, icon recognition, desktop automation), and precise visual grounding (e.g., locating objects and returning bounding boxes). GLM-4.5V also introduces a “Thinking Mode” switch, allowing users to choose between fast responses or deeper reasoning when needed.
    Starting Price: Free
  • 4
    GLM-4.7

    GLM-4.7

    Zhipu AI

    GLM-4.7 is an advanced large language model designed to significantly elevate coding, reasoning, and agentic task performance. It delivers major improvements over GLM-4.6 in multilingual coding, terminal-based tasks, and real-world software engineering benchmarks such as SWE-bench and Terminal Bench. GLM-4.7 supports “thinking before acting,” enabling more stable, accurate, and controllable behavior in complex coding and agent workflows. The model also introduces strong gains in UI and frontend generation, producing cleaner webpages, better layouts, and more polished slides. Enhanced tool-using capabilities allow GLM-4.7 to perform more effectively in web browsing, automation, and agent benchmarks. Its reasoning and mathematical performance has improved substantially, showing strong results on advanced evaluation suites. GLM-4.7 is available via Z.ai, API platforms, coding agents, and local deployment for flexible adoption.
    Starting Price: Free
  • 5
    GLM-5

    GLM-5

    Zhipu AI

    GLM-5 is Z.ai’s latest large language model built for complex systems engineering and long-horizon agentic tasks. It scales significantly beyond GLM-4.5, increasing total parameters and training data while integrating DeepSeek Sparse Attention to reduce deployment costs without sacrificing long-context capacity. The model combines enhanced pre-training with a new asynchronous reinforcement learning infrastructure called slime, improving training efficiency and post-training refinement. GLM-5 achieves best-in-class performance among open-source models across reasoning, coding, and agent benchmarks, narrowing the gap with leading frontier models. It ranks highly on evaluations such as Vending Bench 2, demonstrating strong long-term planning and operational capabilities. The model is open-sourced under the MIT License.
    Starting Price: Free
  • 6
    Qwen3.5

    Qwen3.5

    Alibaba

    Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.
    Starting Price: Free
  • 7
    Leanstral

    Leanstral

    Mistral AI

    Leanstral is an open-source code agent developed by Mistral AI specifically designed to work with the Lean 4 proof assistant. The model focuses on generating code while also formally verifying its correctness against strict mathematical or software specifications. Unlike traditional coding assistants, Leanstral integrates directly with formal proof systems to ensure that generated code satisfies defined logical requirements. Its architecture is optimized for proof engineering tasks and operates efficiently with sparse model parameters. Leanstral is released under the Apache 2.0 license, making it freely accessible for developers, researchers, and organizations to use and customize. The model is designed to operate within real-world formal repositories rather than isolated problem environments. By combining code generation with formal verification, Leanstral aims to reduce the need for manual human review in complex software and mathematical development.
    Starting Price: Free
  • 8
    MiniMax M2.7
    MiniMax M2.7 is an advanced AI model designed to enhance real-world productivity across coding, search, and office workflows. It is trained with reinforcement learning across numerous real-world environments, enabling it to handle complex, multi-step tasks effectively. The model excels in problem-solving by breaking down challenges before generating solutions across multiple programming languages. It delivers high-speed performance with rapid token generation, allowing tasks to be completed efficiently. With optimized reasoning and cost-effective pricing, it provides powerful capabilities while minimizing resource usage. It also achieves strong performance in software engineering benchmarks, reducing incident response time and improving development efficiency. Additionally, it supports advanced agentic workflows and professional-grade office tasks, making it highly versatile for modern work environments.
    Starting Price: Free
  • 9
    CodeGen

    CodeGen

    Salesforce

    CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
    Starting Price: Free
  • 10
    StarCoder

    StarCoder

    BigCode

    StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. Similar to LLaMA, we trained a ~15B parameter model for 1 trillion tokens. We fine-tuned StarCoderBase model for 35B Python tokens, resulting in a new model that we call StarCoder. We found that StarCoderBase outperforms existing open Code LLMs on popular programming benchmarks and matches or surpasses closed models such as code-cushman-001 from OpenAI (the original Codex model that powered early versions of GitHub Copilot). With a context length of over 8,000 tokens, the StarCoder models can process more input than any other open LLM, enabling a wide range of interesting applications. For example, by prompting the StarCoder models with a series of dialogues, we enabled them to act as a technical assistant.
    Starting Price: Free
  • 11
    Llama 2
    The next generation of our open source large language model. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 70B parameters. Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. Llama 2 was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations. We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2.
    Starting Price: Free
  • 12
    Code Llama
    Code Llama is a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts. Code Llama is free for research and commercial use. Code Llama is built on top of Llama 2 and is available in three models: Code Llama, the foundational code model; Codel Llama - Python specialized for Python; and Code Llama - Instruct, which is fine-tuned for understanding natural language instructions.
    Starting Price: Free
  • 13
    PaLM 2

    PaLM 2

    Google

    PaLM 2 is our next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI. It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements. PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. It was evaluated rigorously for its potential harms and biases, capabilities and downstream uses in research and in-product applications. It’s being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API.
  • 14
    Amazon Nova
    Amazon Nova is a new generation of state-of-the-art (SOTA) foundation models (FMs) that deliver frontier intelligence and industry leading price-performance, available exclusively on Amazon Bedrock. Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro are understanding models that accept text, image, or video inputs and generate text output. They provide a broad selection of capability, accuracy, speed, and cost operation points. Amazon Nova Micro is a text only model that delivers the lowest latency responses at very low cost. Amazon Nova Lite is a very low-cost multimodal model that is lightning fast for processing image, video, and text inputs. Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro’s capabilities, coupled with its industry-leading speed and cost efficiency, makes it a compelling model for almost any task, including video summarization, Q&A, math & more.
  • 15
    LTM-1

    LTM-1

    Magic AI

    Magic’s LTM-1 enables 50x larger context windows than transformers. Magic's trained a Large Language Model (LLM) that’s able to take in the gigantic amounts of context when generating suggestions. For our coding assistant, this means Magic can now see your entire repository of code. Larger context windows can allow AI models to reference more explicit, factual information and their own action history. We hope to be able to utilize this research to improve reliability and coherence.
MongoDB Logo MongoDB