Compare the Top AI Coding Models that integrate with Together AI as of July 2026

This a list of AI Coding Models that integrate with Together AI. Use the filters on the left to add additional filters for products that have integrations with Together AI. View the products that work with Together AI in the table below.

What are AI Coding Models for Together AI?

AI coding models are machine learning models specifically trained to assist with software development tasks, such as code generation, bug detection, code completion, and optimization. These models are often built using large datasets of source code and can understand programming languages, patterns, and frameworks. AI coding models can write code based on user prompts, suggest syntax or entire functions, and help developers improve their code through real-time suggestions. Compare and read user reviews of the best AI Coding Models for Together AI currently available using the table below. This list is updated regularly.

  • 1
    Kimi K2.7 Code

    Kimi K2.7 Code

    Moonshot AI

    Kimi K2.7 Code is an open-source, coding-focused agentic AI model developed by Moonshot AI for long-horizon software engineering tasks. It is designed to improve coding performance, agent workflows, and real-world development assistance compared with earlier Kimi K2 versions. The model supports a 256K context window, making it useful for working with large codebases, long technical documents, and complex multi-step programming tasks. Kimi K2.7 Code is available through Kimi Code and API access, with OpenAI- and Anthropic-compatible options for easier integration into developer workflows. It is also listed on Hugging Face and supports deployment through inference engines such as vLLM, SGLang, and KTransformers. With improved agentic capabilities, long-context support, and reduced thinking-token usage compared with K2.6, Kimi K2.7 Code gives developers a flexible open-source option for AI-assisted coding.
    Starting Price: Free
  • 2
    GLM-5.2

    GLM-5.2

    Zhipu AI

    GLM-5.2 is an advanced AI foundation model designed to support complex reasoning, coding, and long-range agentic tasks. It helps developers, teams, and organizations build intelligent systems that can understand instructions, solve technical problems, and assist with demanding workflows. The model is especially useful for software engineering, automation, research, and productivity-focused applications. GLM-5.2 is built to handle large amounts of context, making it suitable for projects that require deeper understanding across extended conversations, documents, or codebases. Its mixture-of-experts design helps balance strong performance with more efficient model operation. GLM-5.2 gives businesses and developers a powerful AI tool for creating smarter applications, improving technical workflows, and supporting advanced digital experiences.
    Starting Price: Free
  • 3
    Kimi K2.5

    Kimi K2.5

    Moonshot AI

    Kimi K2.5 is a next-generation multimodal AI model designed for advanced reasoning, coding, and visual understanding tasks. It features a native multimodal architecture that supports both text and visual inputs, enabling image and video comprehension alongside natural language processing. Kimi K2.5 delivers open-source state-of-the-art performance in agent workflows, software development, and general intelligence tasks. The model offers ultra-long context support with a 256K token window, making it suitable for large documents and complex conversations. It includes long-thinking capabilities that allow multi-step reasoning and tool invocation for solving challenging problems. Kimi K2.5 is fully compatible with the OpenAI API format, allowing developers to switch seamlessly with minimal changes. With strong performance, flexibility, and developer-focused tooling, Kimi K2.5 is built for production-grade AI applications.
    Starting Price: Free
  • 4
    GLM-5.1

    GLM-5.1

    Zhipu AI

    GLM-5.1 is the latest iteration of Z.ai’s GLM series, designed as a frontier-level, agent-oriented AI model optimized for coding, reasoning, and long-horizon workflows. It builds on the GLM-5 architecture, which uses a Mixture-of-Experts (MoE) design to deliver high performance while keeping inference costs efficient, and is part of a broader push toward open-weight, developer-accessible models. A core focus of GLM-5.1 is enabling agentic behavior, meaning it can plan, execute, and iterate across multi-step tasks rather than simply responding to single prompts. It is specifically designed to handle complex workflows such as debugging code, navigating repositories, and executing chained operations with sustained context. Compared to earlier models, GLM-5.1 improves reliability in long interactions, maintaining coherence across extended sessions and reducing breakdowns in multi-step reasoning.
    Starting Price: Free
  • 5
    Kimi K2.6

    Kimi K2.6

    Moonshot AI

    Kimi K2.6 is a next-generation agentic AI model developed by Moonshot AI, designed to push forward real-world execution, coding, and multi-step reasoning beyond earlier K2 and K2.5 versions. It builds on a Mixture-of-Experts architecture and the multimodal, agent-first foundation of the Kimi series, combining language understanding, coding, and tool use into a single system capable of planning and executing complex workflows. It introduces deeper reasoning capabilities and significantly improved agent planning, allowing it to break down tasks, coordinate tools, and handle multi-file or multi-step problems with greater accuracy and efficiency. It supports advanced tool calling with high reliability, enabling integration with external systems such as web search or APIs, and includes built-in validation mechanisms to ensure correct execution formats.
    Starting Price: Free
  • 6
    MiniMax M3

    MiniMax M3

    MiniMax

    MiniMax M3 is an open-weight multimodal AI model designed for coding, agentic workflows, long-context reasoning, and complex automation tasks. The model combines frontier-level coding performance, native multimodal understanding, and a context window of up to 1 million tokens. MiniMax M3 uses MiniMax Sparse Attention to improve long-context efficiency while reducing compute requirements for large-scale inputs. It supports text, image, and video understanding, making it useful for workflows that combine code, documents, visual references, and tool-driven tasks. The model is built for repository-scale reasoning, software engineering, autonomous task execution, tool calling, and multi-step agent workflows. MiniMax M3 helps developers, AI teams, and enterprises build capable agents that can reason across large contexts and work with multimodal information.
    Starting Price: Free
  • 7
    Qwen3-Coder
    Qwen3‑Coder is an agentic code model available in multiple sizes, led by the 480B‑parameter Mixture‑of‑Experts variant (35B active) that natively supports 256K‑token contexts (extendable to 1M) and achieves state‑of‑the‑art results comparable to Claude Sonnet 4. Pre‑training on 7.5T tokens (70 % code) and synthetic data cleaned via Qwen2.5‑Coder optimized both coding proficiency and general abilities, while post‑training employs large‑scale, execution‑driven reinforcement learning, scaling test‑case generation for diverse coding challenges, and long‑horizon RL across 20,000 parallel environments to excel on multi‑turn software‑engineering benchmarks like SWE‑Bench Verified without test‑time scaling. Alongside the model, the open source Qwen Code CLI (forked from Gemini Code) unleashes Qwen3‑Coder in agentic workflows with customized prompts, function calling protocols, and seamless integration with Node.js, OpenAI SDKs, and environment variables.
    Starting Price: Free
  • 8
    DeepCoder

    DeepCoder

    Agentica Project

    DeepCoder is a fully open source code-reasoning and generation model released by Agentica Project in collaboration with Together AI. It is fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning, achieving a 60.6% accuracy on LiveCodeBench (representing an 8% improvement over the base), a performance level that matches that of proprietary models such as o3-mini (2025-01-031 Low) and o1 while using only 14 billion parameters. It was trained over 2.5 weeks on 32 H100 GPUs with a curated dataset of roughly 24,000 coding problems drawn from verified sources (including TACO-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench submissions), each problem requiring a verifiable solution and at least five unit tests to ensure reliability for RL training. To handle long-range context, DeepCoder employs techniques such as iterative context lengthening and overlong filtering.
    Starting Price: Free
  • 9
    DeepSWE

    DeepSWE

    Agentica Project

    DeepSWE is a fully open source, state-of-the-art coding agent built on top of the Qwen3-32B foundation model and trained exclusively via reinforcement learning (RL), without supervised finetuning or distillation from proprietary models. It is developed using rLLM, Agentica’s open source RL framework for language agents. DeepSWE operates as an agent; it interacts with a simulated development environment (via the R2E-Gym environment) using a suite of tools (file editor, search, shell-execution, submit/finish), enabling it to navigate codebases, edit multiple files, compile/run tests, and iteratively produce patches or complete engineering tasks. DeepSWE exhibits emergent behaviors beyond simple code generation; when presented with bugs or feature requests, the agent reasons about edge cases, seeks existing tests in the repository, proposes patches, writes extra tests for regressions, and dynamically adjusts its “thinking” effort.
    Starting Price: Free
  • 10
    DeepSeek-V4

    DeepSeek-V4

    DeepSeek

    DeepSeek-V4 is a next-generation open-source language model designed for high-performance reasoning, coding, and long-context intelligence. It introduces a powerful architecture with up to one million token context length, enabling seamless handling of large datasets and complex multi-step workflows. The model comes in two variants: DeepSeek-V4-Pro for maximum performance and DeepSeek-V4-Flash for efficiency and speed. DeepSeek-V4-Pro features 1.6 trillion total parameters with 49 billion activated, delivering near state-of-the-art performance comparable to leading closed-source models. It excels in agentic coding, mathematical reasoning, and world knowledge tasks. The model integrates advanced attention mechanisms, including token-wise compression and sparse attention, significantly reducing compute and memory costs. It is also optimized for AI agents, supporting tool use and multi-step workflows.
    Starting Price: Free
  • 11
    Qwen3.5

    Qwen3.5

    Alibaba

    Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.
    Starting Price: Free
  • 12
    MiniMax M2.7
    MiniMax M2.7 is an advanced AI model designed to enhance real-world productivity across coding, search, and office workflows. It is trained with reinforcement learning across numerous real-world environments, enabling it to handle complex, multi-step tasks effectively. The model excels in problem-solving by breaking down challenges before generating solutions across multiple programming languages. It delivers high-speed performance with rapid token generation, allowing tasks to be completed efficiently. With optimized reasoning and cost-effective pricing, it provides powerful capabilities while minimizing resource usage. It also achieves strong performance in software engineering benchmarks, reducing incident response time and improving development efficiency. Additionally, it supports advanced agentic workflows and professional-grade office tasks, making it highly versatile for modern work environments.
    Starting Price: Free
  • 13
    DeepSeek-V4-Pro
    DeepSeek-V4-Pro is a large-scale Mixture-of-Experts (MoE) language model designed for advanced reasoning, coding, and long-context understanding. It features 1.6 trillion total parameters with 49 billion activated parameters, enabling high performance while maintaining efficiency. The model supports an exceptionally large context window of up to one million tokens, allowing it to process extensive documents and workflows. It uses a hybrid attention architecture to optimize long-context performance and reduce computational cost. DeepSeek-V4-Pro is trained on over 32 trillion tokens, improving its knowledge and reasoning capabilities. It also includes advanced optimization techniques for stability and faster convergence during training. The model supports multiple reasoning modes, allowing users to balance speed and accuracy based on their needs. Overall, it provides a powerful open-source solution for complex AI tasks and large-scale applications.
    Starting Price: Free
  • 14
    DeepSeek-V4-Flash
    DeepSeek-V4-Flash is a high-efficiency Mixture-of-Experts (MoE) language model designed for fast, scalable reasoning and text generation. It features 284 billion total parameters with 13 billion activated parameters, delivering strong performance while optimizing computational cost. The model supports an extensive context window of up to one million tokens, enabling it to process large documents and complex workflows with ease. Its hybrid attention architecture enhances long-context efficiency by reducing memory and compute requirements. Trained on over 32 trillion tokens, DeepSeek-V4-Flash demonstrates solid capabilities across knowledge, reasoning, and coding tasks. It is designed for scenarios where speed and efficiency are critical, offering a balance between performance and resource usage. The model also supports multiple reasoning modes, allowing users to adjust between faster outputs and deeper analysis.
    Starting Price: Free
  • 15
    Nemotron 3 Super
    Nemotron-3 Super is part of NVIDIA’s Nemotron 3 family of open models designed to enable advanced agentic AI systems that can reason, plan, and execute multi-step workflows across complex environments. The model introduces a hybrid Mamba-Transformer Mixture-of-Experts architecture that combines the efficiency of state-space Mamba layers with the contextual understanding of transformer attention, allowing it to process long sequences and complex reasoning tasks with high accuracy and throughput. This architecture activates only a subset of model parameters for each token, improving computational efficiency while maintaining strong reasoning capabilities and enabling scalable inference for large workloads. Nemotron-3 Super contains roughly 120 billion parameters with around 12 billion active during inference, accelerating multi-step reasoning and collaborative agent interactions across large contexts.
  • Previous
  • You're on page 1
  • Next
Auth0 Logo