• Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    bitnet.cpp

    bitnet.cpp

    Official inference framework for 1-bit LLMs

    ...The project’s focus on extreme quantization dramatically reduces memory footprint and energy consumption compared with traditional 16-bit or 32-bit LLMs, making it practical to deploy advanced language understanding and generation models on everyday machines. BitNet is built to scale across architectures, with configurable kernels and tiling strategies that adapt to different hardware, and it supports large models with impressive throughput even on modest resources.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Z80-μLM

    Z80-μLM

    Z80-μLM is a 2-bit quantized language model

    Z80-μLM is a retro-computing AI project that demonstrates a tiny language model (Z80-μLM) engineered to run on an 8-bit Z80 CPU by aggressively quantizing weights down to 2-bit precision. The repository provides a complete workflow where you train or fine-tune conversational models in Python, then export them into a format that can be executed on classic Z80 systems. A key deliverable is producing CP/M-compatible .COM binaries, enabling a genuinely vintage “chat with your computer” experience on real hardware or accurate emulators. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    AIMET

    AIMET

    AIMET is a library that provides advanced quantization and compression

    ...Quantized inference is significantly faster than floating point inference. For example, models that we’ve run on the Qualcomm® Hexagon™ DSP rather than on the Qualcomm® Kryo™ CPU have resulted in a 5x to 15x speedup. Plus, an 8-bit model also has a 4x smaller memory footprint relative to a 32-bit model. However, often when quantizing a machine learning model (e.g., from 32-bit floating point to an 8-bit fixed point value), the model accuracy is sacrificed.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    Text Generation Web UI

    Text Generation Web UI

    Oobabooga - The definitive Web UI for local AI, with powerful features

    ...Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very efficient text streaming. Parameter presets, 8-bit mode. Layers splitting across GPU(s), CPU, and disk. CPU mode, FlexGen, DeepSpeed ZeRO-3, API with streaming and without streaming. LLaMA model, including 4-bit GPTQ. RWKV model, LoRA (loading and training), Softprompts, and extensions.
    Downloads: 29 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    ...The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks. The repo provides multiple ways to run models (CLI, web demo, and OpenAI-Vision–style APIs), along with quantization options that reduce VRAM needs (e.g., 4-bit). It includes checkpoints for chat, base, and grounding variants, plus recipes for model-parallel inference and LoRA fine-tuning. The documentation covers task prompts for general dialogue, visual grounding (box→caption, caption→box, caption+boxes), and GUI agent workflows that produce structured actions with bounding boxes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    bitsandbytes

    bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch

    bitsandbytes is an open-source library designed to make training and inference of large neural networks more efficient by dramatically reducing memory usage. Built primarily for the PyTorch ecosystem, the library introduces advanced quantization techniques that allow models to operate using reduced numerical precision while maintaining high accuracy. These optimizations enable large language models and other deep learning architectures to run on hardware with limited memory resources,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    AutoGPTQ

    AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis

    AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Transformer Engine

    Transformer Engine

    A library for accelerating Transformer models on NVIDIA GPUs

    Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 9
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Curated Transformers

    Curated Transformers

    PyTorch library of curated Transformer models and their components

    ...Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary allocations and initialization.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 12
    SageAttention

    SageAttention

    NeurIPS2025 Spotlight] Quantized Attention

    SageAttention is an open-source optimization library designed to accelerate the attention mechanism used in transformer-based neural networks. Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Ludwig AI

    Ludwig AI

    Low-code framework for building custom LLMs, neural networks

    ...Comprehensive config validation detects invalid parameter combinations and prevents runtime failures. Automatic batch size selection, distributed training (DDP, DeepSpeed), parameter efficient fine-tuning (PEFT), 4-bit quantization (QLoRA), and larger-than-memory datasets. Retain full control of your models down to the activation functions. Support for hyperparameter optimization, explainability, and rich metric visualizations. Experiment with different model architectures, tasks, features, and modalities with just a few parameter changes in the config. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    FastDeploy

    FastDeploy

    High-performance Inference and Deployment Toolkit for LLMs and VLMs

    FastDeploy is an open-source inference and deployment toolkit designed to simplify the process of running and serving deep learning models across a wide range of hardware platforms. Developed within the PaddlePaddle ecosystem, the toolkit focuses on providing high-performance deployment capabilities for modern AI models including large language models and vision-language systems. The platform enables developers to deploy trained models quickly using optimized inference pipelines that support...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    ChatGLM3

    ChatGLM3

    ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat

    ...The family includes base and long-context variants (8K/32K/128K). The repo ships Python APIs, CLI and web demos (Gradio/Streamlit), an OpenAI-format API server, and a compact fine-tuning kit. Quantization (4/8-bit), CPU/MPS support, and accelerator backends (TensorRT-LLM, OpenVINO, chatglm.cpp) enable lightweight local or edge deployment.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    TurboDiffusion

    TurboDiffusion

    100–200× Acceleration for Video Diffusion Models

    TurboDiffusion is an advanced open-source framework designed to dramatically accelerate video diffusion model generation, aiming for performance improvements on the order of 100–200× compared with traditional implementations while retaining high output quality. It achieves this by combining a suite of algorithmic and engineering optimizations, including attention acceleration techniques, efficient step distillation methods, and quantization strategies that reduce computational overhead. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    FastChat

    FastChat

    Open platform for training, serving, and evaluating language models

    ...This requires 8-bit compression to be enabled and the bitsandbytes package to be installed, which is only available on linux operating systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Recurrent Interface Network (RIN)

    Recurrent Interface Network (RIN)

    Implementation of Recurrent Interface Network (RIN)

    ...The author unawaredly reinvented the induced set-attention block from the set transformers paper. They also combine this with the self-conditioning technique from the Bit Diffusion paper, specifically for the latents. The last ingredient seems to be a new noise function based around the sigmoid, which the author claims is better than cosine scheduler for larger images. The big surprise is that the generations can reach this level of fidelity. Will need to verify this on my own machine. Additionally, we will try adding an extra linear attention on the main branch as well as self-conditioning in the pixel space. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    EasyTTS

    EasyTTS

    Text to Speech Utility

    EasyTTS is a text to speech app for 64 bit Windows that offers online and offline text-to-speech, with settings for how fast the voice is. It supports languages other than English but only if you are connected to the Internet. These are Spanish, Portuguese, Russian, French, and Mandarin (?) Chinese.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    ollama_manager_gui

    ollama_manager_gui

    A graphical manager for ollama that can manage your LLMs

    This app will help install ollama and LLMs using the gui provided by this app. It checks for ollama when launched and if it doesn't exist it will help by bringing you to the ollama site for download. This app is heavily upgraded and now also works properly on Linux. It now has progress bars and many many many improvements. It can launch the LLM by clicking the link. it can launch multiple LLMs in separate windows. It can also remove an installed LLM. There is a confirmation...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    llama2-webui

    llama2-webui

    Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere

    Running Llama 2 with gradio web UI on GPU or CPU from anywhere (Linux/Windows/Mac).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Basaran

    Basaran

    Basaran, an open-source alternative to the OpenAI text completion API

    ...Stream generation using various decoding strategies. Support both decoder-only and encoder-decoder models. Detokenizer that handles surrogates and whitespace. Multi-GPU support with optional 8-bit quantization. Real-time partial progress using server-sent events. Compatible with OpenAI API and client libraries. Comes with a fancy web-based playground. Docker images are available on Docker Hub and GitHub Packages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB