Showing 51 open source projects for "llama.cpp"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    llama.cpp

    llama.cpp

    Port of Facebook's LLaMA model in C/C++

    The llama.cpp project enables the inference of Meta's LLaMA model (and other models) in pure C/C++ without requiring a Python runtime. It is designed for efficient and fast model execution, offering easy integration for applications needing LLM-based capabilities. The repository focuses on providing a highly optimized and portable implementation for running large language models directly within C/C++ environments.
    Downloads: 249 This Week
    Last Update:
    See Project
  • 2
    llama.cpp

    llama.cpp

    LLM inference in C/C++

    llama.cpp is a high-performance C and C++ project for running large language models locally and in the cloud with minimal setup. It is built around efficient inference, broad hardware support, and the GGUF model format. The project supports many model families and has become a major foundation for local AI tools, model serving, and embedded inference workflows.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    llama.cpp Python Bindings

    llama.cpp Python Bindings

    Python bindings for llama.cpp

    llama-cpp-python provides Python bindings for llama.cpp, enabling the integration of LLaMA (Large Language Model Meta AI) language models into Python applications. This facilitates the use of LLaMA's capabilities in natural language processing tasks within Python environments.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Maid

    Maid

    Maid is a cross-platform Flutter app for interfacing with GGUF

    Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely. Maid is a cross-platform free and open source application for interfacing with llama.cpp models locally, and remotely with Ollama, Mistral, Google Gemini and OpenAI models remotely. Maid supports Sillytavern character cards to allow you to interact with all your favorite characters.
    Downloads: 33 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 5
    LLamaSharp

    LLamaSharp

    C#/.NET binding of llama.cpp, including LLaMa/GPT model inference

    The C#/.NET binding of llama.cpp. It provides APIs to infer the LLaMa Models and deploy it on the local environment. It works on both Windows, Linux and MAC without the requirement for compiling llama.cpp yourself. Its performance is close to llama.cpp. Furthermore, it provides integrations with other projects such as BotSharp to provide higher-level applications and UI.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GPT4All

    GPT4All

    Run Local LLMs on Any Device. Open-source

    ...The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This project also supports Python integrations for easy automation and customization. GPT4All is ideal for individuals and businesses seeking private, offline access to powerful LLMs.
    Downloads: 105 This Week
    Last Update:
    See Project
  • 8
    OpenJarvis

    OpenJarvis

    Personal AI, On Personal Devices

    ...The framework provides shared primitives for building local-first agents, along with evaluation tools that measure performance using metrics such as energy consumption, latency, cost, and accuracy. OpenJarvis integrates with local inference engines like Ollama, vLLM, SGLang, and llama.cpp to run language models directly on personal hardware. It also includes a learning loop that allows models to improve over time using locally generated interaction traces. By prioritizing local execution and efficiency, OpenJarvis aims to provide a foundation for privacy-preserving personal AI assistants.
    Downloads: 115 This Week
    Last Update:
    See Project
  • 9
    wllama

    wllama

    WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

    wllama is a WebAssembly-based library that enables large language model inference directly inside a web browser. Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms. By running models locally on the user’s device, wllama enables privacy-preserving AI applications that do not require sending data to remote servers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    Clippy

    Clippy

    Clippy, now with some AI

    ...The project serves as both a playful homage to the early days of personal computing and a practical demonstration of local AI inference. Clippy integrates with the llama.cpp runtime to run models directly on a user’s computer without requiring cloud-based AI services. It supports models in the GGUF format, which allows it to run many publicly available open-source LLMs efficiently on consumer hardware. Users interact with the system through a simple animated assistant interface that can answer questions, generate text, and perform conversational tasks. ...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 11
    LocalAI

    LocalAI

    The free, Open Source alternative to OpenAI, Claude and others

    ...LocalAI can run on consumer-grade hardware and does not necessarily require a GPU, making it accessible for local development and private deployments. It integrates with multiple backends like llama.cpp, transformers, and diffusers to support different AI workloads. With its self-hosted architecture and OpenAI-compatible API, LocalAI enables developers to build secure, local-first AI applications.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 12
    OpenMonoAgent

    OpenMonoAgent

    Terminal-native coding agent powered by local LLMs

    OpenMonoAgent.ai is a self-hosted coding agent designed to run entirely on the user’s own hardware. It pairs a .NET CLI with a local llama.cpp inference server so developers can use agentic coding workflows without cloud subscriptions or per-token billing. The project emphasizes privacy, local control, and ownership of the model, compute, and project data. It includes a terminal-native workflow, built-in tools, Docker sandboxing, and code intelligence features. The system can run on CPU or GPU and is designed to auto-configure itself when possible. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    llama.vscode

    llama.vscode

    VS Code extension for LLM-assisted code/text completion

    llama.vscode is a Visual Studio Code extension that provides AI-assisted coding features powered primarily by locally running language models. The extension is designed to be lightweight and efficient, enabling developers to use AI tools even on consumer-grade hardware. It integrates with the llama.cpp runtime to run language models locally, eliminating the need to rely entirely on external APIs or cloud providers. The extension supports common AI development features such as code completion, conversational chat assistance, and AI-assisted code editing directly within the IDE. Developers can select and manage models through a configuration interface that automatically downloads and runs the required models locally. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Paddler

    Paddler

    Open-source LLM load balancer and serving platform for hosting LLMs

    ...The system acts as a specialized load balancer and serving layer for language models, enabling organizations to run inference workloads without relying on external API providers. It supports running models locally through engines such as llama.cpp while distributing requests across multiple compute nodes to improve performance and reliability. The architecture is designed with privacy and cost control in mind, making it suitable for organizations that handle sensitive data or require predictable operational costs. Paddler also includes tools for monitoring, request buffering, and autoscaling integration so that deployments can adapt dynamically to changing workloads. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    CoPaw

    CoPaw

    Your Personal AI Assistant; easy to install, deploy on local or coud

    ...Built by the AgentScope team, it connects to multiple chat platforms—including DingTalk, Feishu, QQ, Discord, iMessage, and more—through a single unified assistant. CoPaw supports both cloud-based LLM providers and fully local models such as llama.cpp, MLX, and Ollama, allowing you to operate without API keys if preferred. It includes a browser-based Console for chatting, configuring models, managing memory, and extending capabilities with custom skills. With built-in cron scheduling, heartbeat check-ins, and extensible skill loading, CoPaw grows with your workflow over time. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    mac code

    mac code

    Claude Code, but it runs on your Mac for free

    ...It operates as a CLI-based assistant that routes user prompts into different execution paths such as chat, shell commands, or web search, functioning as a multi-purpose development agent. The system integrates with inference engines like llama.cpp and Apple’s MLX framework, allowing users to run models up to 35B parameters locally with varying performance trade-offs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    llama.vim

    llama.vim

    Vim plugin for LLM-assisted code/text completion

    ...The plugin enables developers to access AI-assisted text and code completion features without leaving their terminal-based development environment. Instead of relying on remote AI services, the plugin is designed to work with locally running LLM inference engines such as llama.cpp. This approach allows developers to benefit from AI-assisted coding features while maintaining full control over their data and avoiding external API dependencies. The plugin focuses on simplicity and performance, providing fast completions and editing assistance even on consumer-grade hardware. By integrating AI functionality directly into Vim workflows, the tool enables developers to write and edit code more efficiently while staying within a familiar development interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    FreedomGPT

    FreedomGPT

    React and Electron-based app that executes the FreedomGPT LLM locally

    ...The app's setup is simple, and it includes clear installation guides for both macOS and Windows platforms, as well as detailed instructions for building necessary libraries like llama.cpp.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 20
    llamafile

    llamafile

    Distribute and run LLMs with a single file

    llamafile lets you distribute and run LLMs with a single file. (announcement blog post). Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation. The easiest way to try it for yourself is to download our example llamafile for the LLaVA model (license: LLaMA 2, OpenAI). LLaVA is a new LLM that can do more than just chat; you can also upload images and ask it questions about them. ...
    Downloads: 45 This Week
    Last Update:
    See Project
  • 21
    node-llama-cpp

    node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama

    node-llama-cpp is a JavaScript and Node.js binding that allows developers to run large language models locally using the high-performance inference engine provided by llama.cpp. The library enables applications built with Node.js to interact directly with local LLM models without requiring a remote API or external service. By using native bindings and optimized model execution, the framework allows developers to integrate advanced language model capabilities into desktop applications, server software, and command-line tools. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Harbor LLM

    Harbor LLM

    Run a full local LLM stack with one command using Docker

    ...With a single command, users can start preconfigured tools like Ollama and Open WebUI, enabling chat, workflows, and integrations immediately. Harbor supports multiple inference engines, including llama.cpp and vLLM, and connects them seamlessly to user interfaces. It also includes tools for web retrieval, image generation, voice interaction, and workflow automation. Built on Docker, Harbor allows services to run in isolated containers while communicating over a local network. It is intended for local development and experimentation rather than production deployment, giving developers a flexible way to explore AI systems, test configurations, and manage complex LLM stacks without manual wiring or setup overhead.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    qvac-fabric-llm.cpp

    qvac-fabric-llm.cpp

    QVAC Fabric: cross-platform LLM inference and fine-tuning

    qvac-fabric-llm.cpp is a cross-platform large language model inference and fine-tuning engine built as an advanced fork of llama.cpp, designed to run efficiently across desktops, mobile devices, and heterogeneous GPU environments. The project focuses on removing hardware limitations traditionally associated with LLM deployment by enabling support for a wide range of backends, including Vulkan, Metal, CUDA, and CPU, making it accessible on devices ranging from smartphones to enterprise servers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DevoxxGenie

    DevoxxGenie

    DevoxxGenie is a plugin for IntelliJ IDEA that uses local LLM's

    Devoxx Genie is a fully Java-based LLM Code Assistant plugin for IntelliJ IDEA, designed to integrate with local LLM providers such as Ollama, LMStudio, GPT4All, Llama.cpp, and Exo but also cloud-based LLMs such as OpenAI, Anthropic, Mistral, Groq, Gemini, DeepInfra, DeepSeek, OpenRouter and Azure OpenAI.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    MindWork AI Studio

    MindWork AI Studio

    Independent cross-platform desktop app for local and cloud LLMs

    ...It is built with a strong focus on accessibility and democratization, enabling users to run AI workflows even on low-cost hardware while maintaining flexibility in choosing providers such as OpenAI, Gemini, Anthropic, and self-hosted solutions like Ollama or llama.cpp. The platform introduces a concept of “assistants,” which abstract prompting into reusable tools for tasks like translation, summarization, or document analysis, making it easier for non-technical users to leverage AI capabilities. It also incorporates advanced features such as retrieval-augmented generation, plugin extensibility, and support for multiple data sources, allowing users to integrate their own files and knowledge bases into conversations.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo