Showing 75 open source projects for "llama-cpp-python.whl"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • 1
    bert4torch

    bert4torch

    An elegent pytorch implement of transformers

    An elegant PyTorch implement of transformers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Jan.ai

    Jan.ai

    Open source alternative to ChatGPT that runs 100% offline

    ...It allows you to download and run LLMs (local language models) offline while also offering optional integration with cloud-based model providers—giving you full control over your data and AI interactions. Download and run LLMs (Llama, Gemma, Qwen, GPT-oss etc.) from HuggingFace. Connect to GPT models via OpenAI, Claude models via Anthropic, Mistral, Groq, and others. Create specialized AI assistants for your tasks. MCP integration for agentic capabilities.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 3
    DeepSeek R1

    DeepSeek R1

    Open-source, high-performance AI model with advanced reasoning

    ...This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. To further support the research community, DeepSeek has released distilled versions of the model based on architectures such as LLaMA and Qwen.
    Downloads: 68 This Week
    Last Update:
    See Project
  • 4
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Business Automation Software for SMBs Icon
    Business Automation Software for SMBs

    Fed up with not having the time, money and resources to grow your business?

    The only software you need to increase cash flow, optimize resource utilization, and take control of your assets and inventory.
    Learn More
  • 5
    GPT4All

    GPT4All

    Run Local LLMs on Any Device. Open-source

    GPT4All is an open-source project that allows users to run large language models (LLMs) locally on their desktops or laptops, eliminating the need for API calls or GPUs. The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This...
    Downloads: 107 This Week
    Last Update:
    See Project
  • 6
    llamafile

    llamafile

    Distribute and run LLMs with a single file

    ...We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation. The easiest way to try it for yourself is to download our example llamafile for the LLaVA model (license: LLaMA 2, OpenAI). LLaVA is a new LLM that can do more than just chat; you can also upload images and ask it questions about them. With llamafile, this all happens locally; no data ever leaves your computer.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 7
    LocalAI

    LocalAI

    Self-hosted, community-driven, local OpenAI compatible API

    ...Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU is required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many others. LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer-grade hardware, supporting multiple model families that are compatible with the ggml format. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 8
    SillyTavern

    SillyTavern

    LLM Frontend for Power Users

    Mobile-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI, OpenRouter, Claude, Scale), VN-like Waifu Mode, Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. ...
    Downloads: 152 This Week
    Last Update:
    See Project
  • 9
    Axolotl

    Axolotl

    Go ahead and axolotl questions

    Axolotl is a powerful and flexible framework for fine-tuning large language models on custom datasets. Built for researchers and developers, Axolotl simplifies the process of adapting LLMs for specific tasks, including chat, code generation, and instruction following. It supports a wide variety of model architectures and offers out-of-the-box optimization strategies for efficient training.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Smart Business Texting that Generates Pipeline Icon
    Smart Business Texting that Generates Pipeline

    Create and convert pipeline at scale through industry leading SMS campaigns, automation, and conversation management.

    TextUs is the leading text messaging service provider for businesses that want to engage in real-time conversations with customers, leads, employees and candidates. Text messaging is one of the most engaging ways to communicate with customers, candidates, employees and leads. 1:1, two-way messaging encourages response and engagement. Text messages help teams get 10x the response rate over phone and email. Business text messaging has become a more viable form of communication than traditional mediums. The TextUs user experience is intentionally designed to resemble the familiar SMS inbox, allowing users to easily manage contacts, conversations, and campaigns. Work right from your desktop with the TextUs web app or use the Chrome extension alongside your ATS or CRM. Leverage the mobile app for on-the-go sending and responding.
    Learn More
  • 10
    LazyLLM

    LazyLLM

    Easiest and laziest way for building multi-agent LLMs applications

    LazyLLM is an optimized, lightweight LLM server designed for easy and fast deployment of large language models. It is fully compatible with the OpenAI API specification, enabling developers to integrate their own models into applications that normally rely on OpenAI’s endpoints. LazyLLM emphasizes low resource usage and fast inference while supporting multiple models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Lepton AI

    Lepton AI

    A Pythonic framework to simplify AI service building

    A Pythonic framework to simplify AI service building. Cutting-edge AI inference and training, unmatched cloud-native experience, and top-tier GPU infrastructure. Ensure 99.9% uptime with comprehensive health checks and automatic repairs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    llama2-webui

    llama2-webui

    Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere

    Running Llama 2 with gradio web UI on GPU or CPU from anywhere (Linux/Windows/Mac).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SGLang

    SGLang

    SGLang is a fast serving framework for large language models

    SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    OpenLLM

    OpenLLM

    Operating LLMs in production

    ...With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps. Built-in supports a wide range of open-source LLMs and model runtime, including Llama 2, StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder, and more. Serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    BrowserAI

    BrowserAI

    Run local LLMs like llama, deepseek, kokoro etc. inside your browser

    BrowserAI is a cutting-edge platform that allows users to run large language models (LLMs) directly in their web browser without the need for a server. It leverages WebGPU for accelerated performance and supports offline functionality, making it a highly efficient and privacy-conscious solution. The platform provides a developer-friendly SDK with pre-configured popular models, and it allows for seamless switching between MLC and Transformer engines. Additionally, it supports features such as...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    promptfoo

    promptfoo

    Evaluate and compare LLM outputs, catch regressions, improve prompts

    ...Use built-in metrics, LLM-graded evals, or define your own custom metrics. Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow. Use OpenAI, Anthropic, and open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Pruna AI

    Pruna AI

    Pruna is a model optimization framework built for developers

    Pruna is an open-source, self-hostable AI inference engine designed to help teams deploy and manage large language models (LLMs) efficiently across private or hybrid infrastructures. Built with performance and developer ergonomics in mind, Pruna simplifies inference workflows by enabling multi-model orchestration, autoscaling, GPU resource allocation, and compatibility with popular open-source models. It is ideal for companies or teams looking to reduce reliance on external APIs while...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Curated Transformers

    Curated Transformers

    PyTorch library of curated Transformer models and their components

    ...It provides state-of-the-art models that are composed of a set of reusable components. Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary allocations and initialization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Agents-Flex

    Agents-Flex

    Agents-Flex is an elegant LLM Application Framework like LangChain

    Agents-Flex includes a variety of network protocols for connecting LLMs, such as HTTP, SSE and WS. Its simple and flexible design allows developers to easily connect to various LLMs, including OpenAI, LLama, and other AI. Agents-Flex provides a rich set of development templates and Prompt Frameworks, including FEW-SHOT, CRISPE, BROKE, and ICIO. Developers can also customize their own unique prompt templates. Agents-Flex has a very flexible Function Calling component. It supports local method definitions, parsing, callbacks through LLMs, and executing local methods to obtain results. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Speech-AI-Forge

    Speech-AI-Forge

    Speech-AI-Forge is a project developed around TTS generation model

    ...It is model-agnostic and advertises support for a variety of TTS and speech models such as ChatTTS, CosyVoice, Fish-Speech, FireredTTS and others, as well as Whisper-based ASR, giving you a flexible playground for experimenting with different speech stacks. The project also integrates with general-purpose LLMs (for example GPT- or LLaMA-style models), which can be used to pre-process text, manage conversations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    MCP Hub

    MCP Hub

    An MCP client for Neovim that seamlessly integrates MCP servers

    mcphub.nvim is an MCP (Model Context Protocol) client plugin for Neovim that seamlessly integrates MCP servers into your editing workflow with an intuitive interface for managing, testing, and using MCP servers with your favorite chat plugins. Create your first MCP capable agent you need only 6 lines of code. Works with any langchain-supported LLM that supports tool calling (OpenAI, Anthropic, Groq, LLama etc.) Explore MCP capabilities and generate starter code with the interactive code builder. An MCP client for Neovim that seamlessly integrates MCP servers into your editing workflow with an intuitive interface for managing, testing, and using MCP servers with your favorite chat plugins.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CogVLM2

    CogVLM2

    GPT4V-level open-source multi-modal model based on Llama3-8B

    CogVLM2 is the second generation of the CogVLM vision-language model series, developed by ZhipuAI and released in 2024. Built on Meta-Llama-3-8B-Instruct, CogVLM2 significantly improves over its predecessor by providing stronger performance across multimodal benchmarks such as TextVQA, DocVQA, and ChartQA, while introducing extended context length support of up to 8K tokens and high-resolution image input up to 1344×1344. The series includes models for both image understanding and video understanding, with CogVLM2-Video supporting up to 1-minute videos by analyzing keyframes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Groq AppGen

    Groq AppGen

    Project showcasing Llama 3.3 70B HTML codegen abilities

    Groq AppGen is an interactive web application (built with Next.js and TypeScript) that uses Groq’s LLM API to generate or modify web application code based on natural-language prompts. Essentially, you tell the app what kind of web app or page you want (in plain English), and groq-appgen will produce HTML/JSX code scaffolding, layout, and optionally application logic accordingly. It supports iterative feedback: you can refine your prompt, adjust parameters or requirements, and have the app...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 8 This Week
    Last Update:
    See Project