Compare the Top AI Models that integrate with Ollama as of September 2025

This a list of AI Models that integrate with Ollama. Use the filters on the left to add additional filters for products that have integrations with Ollama. View the products that work with Ollama in the table below.

What are AI Models for Ollama?

AI models are systems designed to simulate human intelligence by learning from data and solving complex tasks. They include specialized types like Large Language Models (LLMs) for text generation, image models for visual recognition and editing, and video models for processing and analyzing dynamic content. These models power applications such as chatbots, facial recognition, video summarization, and personalized recommendations. Their capabilities rely on advanced algorithms, extensive training datasets, and robust computational resources. AI models are transforming industries by automating processes, enhancing decision-making, and enabling creative innovations. Compare and read user reviews of the best AI Models for Ollama currently available using the table below. This list is updated regularly.

  • 1
    CodeQwen

    CodeQwen

    Alibaba

    CodeQwen is the code version of Qwen, the large language model series developed by the Qwen team, Alibaba Cloud. It is a transformer-based decoder-only language model pre-trained on a large amount of data of codes. Strong code generation capabilities and competitive performance across a series of benchmarks. Supporting long context understanding and generation with the context length of 64K tokens. CodeQwen supports 92 coding languages and provides excellent performance in text-to-SQL, bug fixes, etc. You can just write several lines of code with transformers to chat with CodeQwen. Essentially, we build the tokenizer and the model from pre-trained methods, and we use the generate method to perform chatting with the help of the chat template provided by the tokenizer. We apply the ChatML template for chat models following our previous practice. The model completes the code snippets according to the given prompts, without any additional formatting.
    Starting Price: Free
  • 2
    EXAONE Deep
    EXAONE Deep is a series of reasoning-enhanced language models developed by LG AI Research, featuring parameter sizes of 2.4 billion, 7.8 billion, and 32 billion. These models demonstrate superior capabilities in various reasoning tasks, including math and coding benchmarks. Notably, EXAONE Deep 2.4B outperforms other models of comparable size, EXAONE Deep 7.8B surpasses both open-weight models of similar scale and the proprietary reasoning model OpenAI o1-mini, and EXAONE Deep 32B shows competitive performance against leading open-weight models. The repository provides comprehensive documentation covering performance evaluations, quickstart guides for using EXAONE Deep models with Transformers, explanations of quantized EXAONE Deep weights in AWQ and GGUF formats, and instructions for running EXAONE Deep models locally using frameworks like llama.cpp and Ollama.
    Starting Price: Free
  • 3
    Devstral

    Devstral

    Mistral AI

    Devstral is an open source, agentic large language model (LLM) developed by Mistral AI in collaboration with All Hands AI, specifically designed for software engineering tasks. It excels at navigating complex codebases, editing multiple files, and resolving real-world issues, outperforming all open source models on the SWE-Bench Verified benchmark with a score of 46.8%. Devstral is fine-tuned from Mistral-Small-3.1 and features a long context window of up to 128,000 tokens. It is optimized for local deployment on high-end hardware, such as a Mac with 32GB RAM or an Nvidia RTX 4090 GPU, and is compatible with inference frameworks like vLLM, Transformers, and Ollama. Released under the Apache 2.0 license, Devstral is available for free and can be accessed via Hugging Face, Ollama, Kaggle, Unsloth, and LM Studio.
    Starting Price: $0.1 per million input tokens
  • 4
    Llama 2
    The next generation of our open source large language model. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 70B parameters. Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. Llama 2 was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations. We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2.
    Starting Price: Free
  • 5
    Code Llama
    Code Llama is a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts. Code Llama is free for research and commercial use. Code Llama is built on top of Llama 2 and is available in three models: Code Llama, the foundational code model; Codel Llama - Python specialized for Python; and Code Llama - Instruct, which is fine-tuned for understanding natural language instructions.
    Starting Price: Free
  • 6
    Gemma 3

    Gemma 3

    Google

    Gemma 3, introduced by Google, is a new AI model built on the Gemini 2.0 architecture, designed to offer enhanced performance and versatility. This model is capable of running efficiently on a single GPU or TPU, making it accessible for a wide range of developers and researchers. Gemma 3 focuses on improving natural language understanding, generation, and other AI-driven tasks. By offering scalable, powerful AI capabilities, Gemma 3 aims to advance the development of AI systems across various industries and use cases.
    Starting Price: Free
  • 7
    Gemma 3n

    Gemma 3n

    Google DeepMind

    Gemma 3n is our state-of-the-art open multimodal model, engineered for on-device performance and efficiency. Made for responsive, low-footprint local inference, Gemma 3n empowers a new wave of intelligent, on-the-go applications. It analyzes and responds to combined images and text, with video and audio coming soon. Build intelligent, interactive features that put user privacy first and work reliably offline. Mobile-first architecture, with a significantly reduced memory footprint. Co-designed by Google's mobile hardware teams and industry leaders. 4B active memory footprint with the ability to create submodels for quality-latency tradeoffs. Gemma 3n is our first open model built on this groundbreaking, shared architecture, allowing developers to begin experimenting with this technology today in an early preview.
  • Previous
  • You're on page 1
  • Next