Showing 689 open source projects for "ai model"

View related business solutions
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 1
    Universal Commerce Protocol (UCP)

    Universal Commerce Protocol (UCP)

    The common language for platforms, agents and businesses.

    Universal Commerce Protocol (UCP) is an open standard designed to unify how platforms, businesses, and payment providers interact across the modern commerce ecosystem. It provides a common language that eliminates fragmented, custom integrations and enables seamless interoperability between diverse commerce systems. Built for an increasingly agentic web, UCP supports AI-driven platforms that can discover products, manage carts, and complete transactions securely on a user’s behalf. Its...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Unsloth Studio

    Unsloth Studio

    Unified web UI for training and running open models locally

    Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    HunyuanVideo-Avatar

    HunyuanVideo-Avatar

    Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

    HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Parallax

    Parallax

    Parallax is a distributed model serving framework

    Parallax is a decentralized inference framework designed to run large language models across distributed computing resources. Instead of relying on centralized GPU clusters in data centers, the system allows multiple heterogeneous machines to collaborate in serving AI inference workloads. Parallax divides model layers across different nodes and dynamically coordinates them to form a complete inference pipeline. A two-stage scheduling architecture determines how model layers are allocated to available hardware and how requests are routed across nodes during execution. This scheduling system optimizes latency, throughput, and hardware utilization even when nodes have different computational capabilities. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Stable Diffusion WebUI Docker

    Stable Diffusion WebUI Docker

    Easy Docker setup for Stable Diffusion with user-friendly UI

    Stable Diffusion WebUI Docker is a Docker-based repository that simplifies running Stable Diffusion with rich user interfaces by packaging multiple popular web UIs into an easy-to-deploy containerized solution. It integrates leading community UIs like AUTOMATIC1111 and ComfyUI into a Docker Compose setup that can be started with a single command, abstracting away dependency installation and environment configuration. Users can choose which UI profile they want to run — for example, full...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    verl-agent

    verl-agent

    Designed for training LLM/VLM agents via RL

    verl-agent is an open-source reinforcement learning framework designed to train large language model agents and vision-language model agents for complex interactive environments. Built as an extension of the veRL reinforcement learning infrastructure, the project focuses on enabling scalable training for agents that perform multi-step reasoning and decision-making tasks. The framework supports multi-turn interactions between agents and their environments, allowing the system to receive...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks. The repo provides multiple ways to run models...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    VisualGLM-6B

    VisualGLM-6B

    Chinese and English multimodal conversational language model

    VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. This integration empowers...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 12
    InternLM-XComposer-2.5

    InternLM-XComposer-2.5

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

    InternLM-XComposer is an open-source multimodal AI system designed to generate long-form content that combines text with visual elements such as images and diagrams. The model is built on top of the InternLM language model architecture and extends its capabilities to handle multimodal inputs and outputs. Instead of producing only textual responses, the system can generate visually enriched documents such as illustrated articles, presentations, and educational materials. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PAL MCP

    PAL MCP

    The power of Claude Code / GeminiCLI / CodexCLI

    PAL MCP is an open-source Model Context Protocol (MCP) server designed to act as a powerful middleware layer that connects AI clients and tools—like Claude Code, Codex CLI, Cursor, and IDE plugins—to a broad range of underlying AI models, enabling collaborative multi-model workflows rather than relying on a single model. It lets developers orchestrate interactions across multiple models (including Gemini, OpenAI, Grok, Azure, Ollama, OpenRouter, and custom/self-hosted models), preserving conversation context seamlessly as tasks evolve and substeps run across tools. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Gemini Fullstack LangGraph Quickstart

    Gemini Fullstack LangGraph Quickstart

    Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph

    ...It then iteratively refines its search until it produces a comprehensive, well-cited answer synthesized by the Gemini model. The repository provides both a browser-based chat interface and a command-line script (cli_research.py) for executing research queries directly. For production deployment, the backend integrates with Redis and PostgreSQL to manage persistent memory, streaming outputs, & background task coordination.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Pal

    Pal

    A personal context-agent that learns how you work

    Pal is an open-source AI personal agent built within the Agno ecosystem that functions as an intelligent digital assistant designed to learn from user activity over time. The system acts as an AI-powered “second brain” capable of capturing, organizing, and retrieving personal knowledge such as notes, bookmarks, research findings, people, and meeting information.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Aden Hive

    Aden Hive

    Outcome driven agent development framework that evolves

    Hive is an open-source agent development framework that helps developers build autonomous, reliable, self-improving AI agents by letting them describe goals in ordinary natural language instead of hand-coding detailed workflows. Rather than manually defining execution graphs, Hive’s coding agent generates the agent graph, connection code, and test cases based on your high-level objectives, enabling outcome-driven agent creation that fits real business processes. Once deployed, agents can...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 17
    Fara-7B

    Fara-7B

    An Efficient Agentic Model for Computer Use

    Fara-7B is a Microsoft initiative aimed at bringing rigor, transparency, and structured evaluation to AI systems through automated and customizable assessment frameworks. It provides stakeholders with a way to benchmark and evaluate models across dimensions such as fairness, robustness, security, privacy, and ethical considerations. Rather than relying on ad-hoc or manual review processes, FARA enables organizations to profile AI behavior using standardized tests, metrics, and reporting...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    ...LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. It is designed to run server-side and can integrate with various AI model providers and realtime APIs to support different application requirements. LiveKit Agents also includes tools for scheduling and managing agent tasks, making it easier to connect users to automated assistants in live communication scenarios.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    UCP Python SDK

    UCP Python SDK

    The official Python SDK for UCP

    UCP Python SDK repository for the Universal Commerce Protocol (UCP) delivers an official Python client library that simplifies building UCP-compliant applications in Python. UCP itself is a modern, open-source standard that empowers seamless commerce interactions between platforms, AI agents, merchants, and payment providers without requiring bespoke integrations for every participant in the commerce ecosystem. This SDK provides Pydantic models for UCP schemas, making it easy for Python...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    FlowLens MCP

    FlowLens MCP

    Open-source MCP server that gives your coding agent

    ...The MCP server then loads this captured “flow” and exposes it to the AI agent via the Model Context Protocol (MCP), letting the agent examine, search, filter, and reason about the session just as a human developer would, without needing the agent to re-run the flow or rely on minimal reproduction data (logs, screenshots).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework from Tencent Hunyuan, built on their HunyuanVideo foundation. It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Intel LLM Library for PyTorch

    Intel LLM Library for PyTorch

    Accelerate local LLM inference and finetuning

    Intel LLM Library for PyTorch is an open-source acceleration library developed to optimize large language model inference and fine-tuning on Intel hardware platforms. Built as an extension of the PyTorch ecosystem, the library enables developers to run modern transformer models efficiently on Intel CPUs, GPUs, and specialized AI accelerators. The framework provides hardware-aware optimizations and low-precision computation techniques that significantly improve the performance of large language models while reducing memory consumption. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Image-Editor

    Image-Editor

    AI based photo editing website for changing image background

    ...Image-Editor uses Python's cv2 library, which provides an easy and efficient way to work with images and videos, including a wide range of image processing and computer vision algorithms. With cv2, you can easily read, write, filter, and display images, and much more. Image-Editor uses Mediapipe's selfie_segmentation model for background removal in real-time video streams. This advanced model uses deep neural networks to detect and remove the background.
    Downloads: 5 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB