Open Source Python Artificial Intelligence Software - Page 4

Python Artificial Intelligence Software

View 14960 business solutions

Browse free open source Python Artificial Intelligence Software and projects below. Use the toggles on the left to filter open source Python Artificial Intelligence Software by OS, license, language, programming language, and project status.

  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. The system also includes advanced features like non-verbal expression tags and pronunciation overrides, enabling expressive and precise output. With support for both API-based and command-line usage, it is designed for research, production, and experimentation alike.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 2
    deepface

    deepface

    A Lightweight Face Recognition and Facial Attribute Analysis

    DeepFace is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, ArcFace, Dlib, SFace and GhostFaceNet. Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 3
    Basic Pitch

    Basic Pitch

    A lightweight audio-to-MIDI converter with pitch bend detection

    Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, pip install-able and npm install-able via its sibling repo. Basic Pitch may be simple, but it's is far from "basic"! basic-pitch is efficient and easy to use, and its multi pitch support, its ability to generalize across instruments, and its note accuracy compete with much larger and more resource-hungry AMT systems. Provide a compatible audio file and a basic-pitch will generate a MIDI file, complete with pitch bends. The basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 4
    Open Notebook

    Open Notebook

    An Open Source implementation of Notebook LM with more flexibility

    Open Notebook is an open-source, privacy-focused alternative to Google’s Notebook LM that gives users full control over their research and AI workflows. Designed to be self-hosted, it ensures complete data sovereignty by keeping your content local or within your own infrastructure. The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization. Open Notebook enables users to organize and analyze multi-modal content such as PDFs, videos, audio files, web pages, and Office documents. It combines full-text and vector search with context-aware AI chat to deliver insights grounded in your own research materials. With advanced features like multi-speaker podcast generation, customizable content transformations, and a comprehensive REST API, Open Notebook provides a powerful and extensible research environment.
    Downloads: 28 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Hunyuan3D 2.0

    Hunyuan3D 2.0

    High-Resolution 3D Assets Generation with Large Scale Diffusion Models

    The Hunyuan3D-2 model, developed by Tencent, is designed for generating high-resolution 3D assets using large-scale diffusion models. This model offers advanced capabilities for creating detailed 3D models, including texture enhancements, multi-view shape generation, and rapid inference for real-time applications. It is particularly useful for industries requiring high-quality 3D content, such as gaming, film, and virtual reality. Hunyuan3D-2 supports various enhancements and is available for deployment through tools like Blender and Hugging Face. Includes a user-friendly production/studio tool (Hunyuan3D-Studio) to manipulate/animate meshes. Condition-aligned shape generation via the DiT model, so generated mesh is influenced by input images or prompts.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 6
    Aider

    Aider

    Aider is AI pair programming in your terminal

    Aider is an AI pair programming tool that runs directly in your terminal, helping developers build new projects or extend existing codebases faster and more confidently. It works alongside you like a coding partner, using powerful large language models to understand your code and implement precise changes. Aider creates a structured map of your entire repository, allowing it to handle large and complex projects effectively. It supports over 100 programming languages, making it flexible for nearly any development stack. With built-in Git integration, Aider keeps you in control by automatically committing clean, reversible changes. Whether you’re coding locally or in the cloud, Aider turns natural language requests into reliable, production-ready code.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 7
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. This capability is grounded in a new data engine that automatically annotated over four million unique concepts, producing a massive open-vocabulary segmentation dataset and enabling the model to achieve 75–80% of human performance on the SA-CO benchmark, which itself spans 270K unique concepts.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 8
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 9
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 10
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models. Stability AI's approach to image synthesis has contributed to creating detailed, scalable images while maintaining efficiency.
    Downloads: 234 This Week
    Last Update:
    See Project
  • 11
    CogVideo

    CogVideo

    Text and image to video generation: CogVideoX and CogVideo

    CogVideo is an open-source family of advanced video generation models that can create videos from text, images, or existing video inputs. Built on large-scale Transformer and diffusion architectures, it enables multimodal generation across text-to-video, image-to-video, and video continuation tasks. The latest CogVideoX models offer higher resolution outputs, longer video durations, and improved controllability through prompt engineering. The project includes tools for inference, fine-tuning, and optimization, making it suitable for both research and production use. It supports efficient deployment on a range of GPUs, including consumer hardware with quantization techniques. Overall, CogVideo provides a powerful framework for generating high-quality AI videos and experimenting with cutting-edge multimodal AI systems.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 12
    video-use

    video-use

    Edit videos with Claude Code

    Video Use is an open-source AI-powered video editing tool that allows users to transform raw footage into polished videos using natural language commands. Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of content types, including interviews, tutorials, montages, and talking-head videos. By combining structured text representations with on-demand visual previews, it minimizes processing overhead while maintaining high-quality results. Overall, Video Use reimagines video editing as an AI-driven, conversational workflow.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 13
    OpenManus

    OpenManus

    Open-source AI agent framework

    OpenManus is an open-source AI agent framework designed to autonomously execute complex, multi-step tasks by combining reasoning, planning, and tool use. It enables developers to build agents that can think, act, and iterate toward goals rather than simply responding to prompts. The platform emphasizes task decomposition, allowing agents to break down objectives into smaller steps and execute them sequentially or recursively. OpenManus supports integration with external tools, APIs, and environments, making it suitable for real-world automation workflows. It is built to be flexible and extensible, enabling customization of agent behaviors, tools, and reasoning strategies. Overall, OpenManus provides a foundation for creating more capable, autonomous AI systems that can handle dynamic and goal-driven tasks.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 14
    Agent Zero

    Agent Zero

    Agent Zero AI framework

    Agent Zero is not a predefined agentic framework. It is designed to be dynamic, organically growing, and learning as you use it. Agent Zero is fully transparent, readable, comprehensible, customizable and interactive. Agent Zero uses the computer as a tool to accomplish its (your) tasks. Agents can communicate with their superiors and subordinates, asking questions, giving instructions, and providing guidance. Instruct your agents in the system prompt on how to communicate effectively. The terminal interface is real-time streamed and interactive. You can stop and intervene at any point. If you see your agent heading in the wrong direction, just stop and tell it right away. There is a lot of freedom in this framework. You can instruct your agents to regularly report back to superiors asking for permission to continue. You can instruct them to use point-scoring systems when deciding when to delegate subtasks. Superiors can double-check subordinates' results and disputes.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 15
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 16
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic information while preserving fine-grained prosody, leading to more stable and expressive generation than many discrete-token systems. Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 17
    FastSD CPU

    FastSD CPU

    Fast stable diffusion on CPU and AI PC

    FastSD CPU is an optimized fork of Stable Diffusion designed to run efficiently on CPUs and devices without dedicated GPUs by leveraging Latent Consistency Models and Adversarial Diffusion Distillation techniques that accelerate inference. It focuses on bringing fast text-to-image generation to mainstream hardware like desktop CPUs, lower-end laptops, or edge devices without requiring high-end graphics processors. The repository contains multiple interfaces including a desktop GUI for simple generation, an advanced web-based UI with support for extensions like LoRA and ControlNet, and a command-line interface for scripted usage or server deployments. With support for performance-oriented libraries such as OpenVINO and hardware acceleration on platforms like Intel AI PCs, FastSD CPU aims to shrink generation times dramatically compared with naive CPU implementations.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 18
    LLaMA 3

    LLaMA 3

    The official Meta Llama 3 GitHub site

    This repository is the former home for Llama 3 model artifacts and getting-started code, covering pre-trained and instruction-tuned variants across multiple parameter sizes. It introduced the public packaging of weights, licenses, and quickstart examples that helped developers fine-tune or run the models locally and on common serving stacks. As the Llama stack evolved, Meta consolidated repositories and marked this one deprecated, pointing users to newer, centralized hubs for models, utilities, and docs. Even as a deprecated repo, it documents the transition path and preserves references that clarify how Llama 3 releases map into the current ecosystem. Practically, it functioned as a bridge between Llama 2 and later Llama releases by standardizing distribution and starter code for inference and fine-tuning. Teams still treat it as historical reference material for version lineage and migration notes.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 19
    Lemonade

    Lemonade

    Lemonade helps users run local LLMs with the highest performance

    Lemonade is a local LLM runtime that aims to deliver the highest possible performance on your own hardware by auto-configuring state-of-the-art inference engines for both NPUs and GPUs. The project positions itself as a “local LLM server” you can run on laptops and workstations, abstracting away backend differences while giving you a single place to serve and manage models. Its README emphasizes real-world adoption across startups, research groups, and large companies, signaling a focus on practical deployments rather than toy demos. The repository highlights easy onboarding with downloads, docs, and a Discord for support, suggesting an active user community. Messaging centers on squeezing maximum throughput/latency from modern accelerators without users having to hand-tune kernels or flags. Releases further reinforce the “server” framing, pointing developers toward a service that can be integrated into apps and tools.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 20
    Unsloth Studio

    Unsloth Studio

    Unified web UI for training and running open models locally

    Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement learning workflows, making it suitable for both experimentation and production use. Users can interact with models through chat, upload files like PDFs or images, and execute code within the environment to improve outputs. By combining powerful optimization techniques with an intuitive UI, Unsloth Studio simplifies the process of building and customizing AI models locally.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 21
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    VoxCPM2 is an advanced open-source text-to-speech system that redefines speech synthesis by eliminating traditional tokenization and instead generating continuous speech representations through a diffusion-based autoregressive architecture. Built on top of the MiniCPM model family, it enables highly natural, expressive, and context-aware speech generation that adapts tone, emotion, and pacing directly from input text. The system is trained on massive multilingual datasets, enabling support for dozens of languages and dialects while maintaining high fidelity and realism in generated audio. VoxCPM stands out for its ability to perform voice cloning with minimal input, capturing not only the speaker’s timbre but also nuanced features such as rhythm, accent, and emotional delivery. It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 22
    deepfakes_faceswap

    deepfakes_faceswap

    Deepfakes Software For All

    Faceswap is the leading free and open source multi-platform deepfakes software. When faceswapping was first developed and published, the technology was groundbreaking, it was a huge step in AI development. It was also completely ignored outside of academia because the code was confusing and fragmentary. It required a thorough understanding of complicated AI techniques and took a lot of effort to figure it out. Until one individual brought it together into a single, cohesive collection.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 23
    ebook2audiobook

    ebook2audiobook

    Generate audiobooks from e-books, voice cloning & 1107+ languages

    ebook2audiobook is a tool to convert legally obtained eBooks (non-DRM) into fully narrated audiobooks, complete with chapters and metadata. It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of underlying TTS backends (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS and more), which gives flexibility depending on hardware availability, voice preference, and language. It also supports a huge number of languages — apparently “+1110 languages and dialects” in its supported set — making it suitable for eBooks in many languages.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 24
    Claw Code

    Claw Code

    AI agent harness for AI coding agents

    Claw Code is an open-source AI agent harness project focused on building better tools for orchestrating and managing autonomous coding agents. It originated as a clean-room reimplementation inspired by the architecture of Claude Code, aiming to replicate core concepts without using proprietary code. The project provides a Python-based foundation for experimenting with agent workflows, tool integration, and task execution pipelines. It emphasizes harness engineering—how agents are structured, how they interact with tools, and how they maintain context during execution. The system is being actively expanded, with a Rust-based runtime in development to improve performance and memory safety. Overall, Claw Code serves as a research-driven platform for advancing agent-based software development systems.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 25
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. Open-LLM-VTuber is modular, allowing developers to swap or configure different language models, speech recognition engines, and voice synthesis systems depending on their needs. It can run locally and supports both offline and online AI services, giving users flexibility in how models and resources are used. Open-LLM-VTuber was originally inspired by the goal of recreating an AI VTuber experience using open source tools that work across multiple operating systems.
    Downloads: 19 This Week
    Last Update:
    See Project
Auth0 Logo