Browse free open source Python AI Models and projects below. Use the toggles on the left to filter open source Python AI Models by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    HunyuanVideo-Avatar

    HunyuanVideo-Avatar

    Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

    HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces, enabling multiple characters to be animated in a scene. Character image injection module for better consistency between training and inference conditioning. Emotion control by extracting emotion reference images and transferring emotional style into video sequences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. Temporal alignment via frame-level synchronization modules (e.g. Synchformer).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    IQuest-Coder-V1 Model Family

    IQuest-Coder-V1 Model Family

    New family of code large language models (LLMs)

    IQuest-Coder-V1 is a cutting-edge family of open-source large language models specifically engineered for code generation, deep code understanding, and autonomous software engineering tasks. These models range from tens of billions to smaller footprints and are trained on a novel code-flow multi-stage paradigm that captures how real software evolves over time — not just static code snapshots — giving them a deeper semantic understanding of programming logic. They support native long contexts up to 128K tokens, enabling them to reason across large codebases and multi-file interactions without context fragmentation, and include “Thinking” variants optimized for complex reasoning and “Loop” variants with recurrent mechanisms to improve inference efficiency. IQuest-Coder-V1 delivers state-of-the-art performance on multiple coding benchmarks, demonstrating strong results in competitive programming, tool use, and agentic code generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Improved Diffusion

    Improved Diffusion

    Release for Improved Denoising Diffusion Probabilistic Models

    improved-diffusion is an open source implementation of diffusion probabilistic models created by OpenAI. These models, also known as score-based generative models, are a class of generative models that have shown strong performance in producing high-quality synthetic data such as images. The repository provides code for training and sampling diffusion models with improved techniques that enhance stability, efficiency, and output fidelity. It includes scripts for setting up training runs, generating samples, and reproducing results from OpenAI’s research on diffusion-based generation. The implementation is intended for researchers and practitioners who want to explore the theoretical and practical aspects of diffusion models in deep learning. By making this code available, OpenAI provides a foundation for further experimentation and development in generative modeling research.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Improved GAN

    Improved GAN

    Code for the paper "Improved Techniques for Training GANs"

    Improved-GAN is the official code release from OpenAI accompanying the research paper Improved Techniques for Training GANs. It provides implementations of experiments conducted on datasets such as MNIST, SVHN, CIFAR-10, and ImageNet. The project focuses on demonstrating enhanced training methods for Generative Adversarial Networks, addressing stability and performance issues that were common in earlier GAN models. The repository includes training scripts, evaluation methods, and pretrained configurations for reproducing experimental results. By offering structured experiments across multiple datasets, it allows researchers to study and replicate the improvements described in the paper. Although the project is archived and not actively maintained, it remains a reference point in the history of GAN research, influencing subsequent model training approaches.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    InfoGAN

    InfoGAN

    Code for reproducing key results in the paper

    The InfoGAN repository contains the original implementation used to reproduce the results in the paper “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets”. InfoGAN is a variant of the GAN (Generative Adversarial Network) architecture that aims to learn disentangled and interpretable latent representations by maximizing the mutual information between a subset of the latent codes and the generated outputs. That extra incentive encourages the generator to structure its latent space in a way where certain latent variables control meaningful, distinct factors (e.g. rotation, width, stroke thickness) in the output images. The repository includes code for experiments (e.g. on MNIST), launcher scripts, and some tests. It depends on a development version of TensorFlow (the code expects features not in older stable releases), and also uses other libraries like prettytensor and progressbar.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing conflicts in multimodal models: namely that the visual encoder has to serve both analysis (understanding) and synthesis (generation) roles. By splitting those pathways but keeping one unified core transformer, Janus maintains flexibility and achieves strong performance across tasks previously requiring distinct architectures. The repository includes pretrained checkpoints (for example 1.3B and 7B parameter versions), a Gradio demo, and guidance for local deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LaMDA-pytorch

    LaMDA-pytorch

    Open-source pre-training implementation of Google's LaMDA in PyTorch

    Open-source pre-training implementation of Google's LaMDA research paper in PyTorch. The totally not sentient AI. This repository will cover the 2B parameter implementation of the pre-training architecture as that is likely what most can afford to train. You can review Google's latest blog post from 2022 which details LaMDA here. You can also view their previous blog post from 2021 on the model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Large Concept Model

    Large Concept Model

    Language modeling in a sentence representation space

    Large Concept Model is a research codebase centered on concept-centric representation learning at scale, aiming to capture shared structure across many categories and modalities. It organizes training around concepts (rather than just raw labels), encouraging models to understand attributes, relations, and compositional structure that transfer across tasks. The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically from large image–text or weakly supervised corpora. It includes utilities to build concept vocabularies, map supervision signals to those vocabularies, and measure zero-shot or few-shot generalization. Probing tools help diagnose what the model knows—e.g., attribute recognition, relation understanding, or compositionality—so you can iterate on data and objectives. The design is modular, making it straightforward to swap backbones, change objectives, or integrate retrieval components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Leanstral

    Leanstral

    Open-source code agent designed for Lean 4

    Leanstral is an open-weight large language model developed by Mistral AI and specifically designed as a code agent for the Lean 4 proof assistant, enabling advanced interaction with formal mathematics and program verification systems. The model is built to understand and generate Lean 4 code, which is used to express complex mathematical constructs as well as formal software specifications. By focusing on theorem proving and formal reasoning, Leanstral represents a specialized direction within large language models, targeting domains that require strict correctness and logical rigor rather than general conversational tasks. It leverages modern large-scale architectures, likely incorporating mixture-of-experts techniques, to balance efficiency and capability while handling structured symbolic reasoning tasks. The model can assist in writing proofs, exploring mathematical structures, and validating logical properties in code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ling-V2

    Ling-V2

    Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

    Ling-V2 is an open-source family of Mixture-of-Experts (MoE) large language models developed by the InclusionAI research organization with the goal of combining state-of-the-art performance, efficiency, and openness for next-generation AI applications. It introduces highly sparse architectures where only a fraction of the model’s parameters are activated per input token, enabling models like Ling-mini-2.0 to achieve reasoning and instruction-following capabilities on par with much larger dense models while remaining significantly more computationally efficient. Trained on more than 20 trillion tokens of high-quality data and enhanced through multi-stage supervised fine-tuning and reinforcement learning, Ling-V2’s models demonstrate strong general reasoning, mathematical problem-solving, coding understanding, and knowledge-intensive task performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    LongCat-Image

    LongCat-Image

    Foundation model for image generation

    LongCat-Image is an open-source foundation model for image generation and editing created by the LongCat team at Meituan, designed to deliver high-quality visual outputs while remaining efficient and accessible for developers and researchers. Rather than relying on massive parameter counts typical of many cutting-edge models, LongCat-Image achieves strong photorealism, stable structure, and accurate bilingual (Chinese and English) text rendering with a more compact ~6-billion parameter architecture, making it competitive with much larger alternatives despite its relatively lean design. The model excels at both text-to-image generation and instruction-guided image editing, offering users versatile capabilities for creative and practical tasks—whether generating art, mockups, or adjusting existing visuals with fine control.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MAE (Masked Autoencoders)

    MAE (Masked Autoencoders)

    PyTorch implementation of MAE

    MAE (Masked Autoencoders) is a self-supervised learning framework for visual representation learning using masked image modeling. It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision. The encoder processes only the visible patches, while a lightweight decoder reconstructs the full image—making pretraining computationally efficient. After pretraining, the encoder serves as a powerful backbone for downstream tasks like image classification, segmentation, and detection, achieving top performance with minimal fine-tuning. The repository provides pretrained models, fine-tuning scripts, evaluation protocols, and visualization tools for reconstruction quality and learned features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MUSE

    MUSE

    A library for Multilingual Unsupervised or Supervised word Embeddings

    MUSE is a framework for learning multilingual word embeddings that live in a shared space, enabling bilingual lexicon induction, cross-lingual retrieval, and zero-shot transfer. It supports both supervised alignment with seed dictionaries and unsupervised alignment that starts without parallel data by using adversarial initialization followed by Procrustes refinement. The code can align pre-trained monolingual embeddings (such as fastText) across dozens of languages and provides standardized evaluation scripts and dictionaries. By mapping languages into a common vector space, MUSE makes it straightforward to build cross-lingual applications where resources are scarce for some languages. The training and evaluation pipeline is lightweight and fast, so experimenting with different languages or initialization strategies is easy. Beyond dictionary induction, the learned embeddings are often used as building blocks for downstream tasks like classification, retrieval, or machine translation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    Mask2Former is a unified segmentation architecture that handles semantic, instance, and panoptic segmentation with one model and one training recipe. Its core idea is to cast segmentation as mask classification: a transformer decoder predicts a set of mask queries, each with an associated class score, eliminating the need for task-specific heads. A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial support. This leads to accurate masks with sharp boundaries and strong small-object performance while remaining efficient on high-resolution inputs. The project provides extensive configurations and pretrained models across popular benchmarks like COCO, ADE20K, and Cityscapes. Built on top of Detectron2, it includes training scripts, inference tools, and visualization utilities that make experimentation straightforward.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MaskFormer

    MaskFormer

    Per-Pixel Classification is Not All You Need for Semantic Segmentation

    MaskFormer is a unified framework for image segmentation developed by Facebook Research, designed to bridge the gap between semantic, instance, and panoptic segmentation within a single architecture. Unlike traditional segmentation pipelines that treat these tasks separately, MaskFormer reformulates segmentation as a mask classification problem, enabling a consistent and efficient approach across multiple segmentation domains. Built on top of Detectron2, it supports a wide range of datasets including ADE20K, Cityscapes, COCO-Stuff, and Mapillary Vistas, and provides pretrained baselines for each. The model achieves strong performance and scalability while simplifying training and evaluation workflows. Its successor, Mask2Former, extends the same meta-architecture to achieve state-of-the-art results across all major segmentation benchmarks. MaskFormer’s modular design, dataset integration, and compatibility with existing Detectron2 models make it an essential research tool.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Mellum-4b-base

    Mellum-4b-base

    JetBrains’ 4B parameter code model for completions

    Mellum-4b-base is JetBrains’ first open-source large language model designed and optimized for code-related tasks. Built with 4 billion parameters and a LLaMA-style architecture, it was trained on over 4.2 trillion tokens across multiple programming languages, including datasets such as The Stack, StarCoder, and CommitPack. With a context window of 8,192 tokens, it excels at code completion, fill-in-the-middle tasks, and intelligent code suggestions for professional developer tools and IDEs. The model is efficient for both cloud inference with vLLM and local deployment using llama.cpp or Ollama, thanks to its bf16 precision and AMP training. While the base model is not fine-tuned for downstream tasks, it is designed to be easily adapted through supervised fine-tuning (SFT) or reinforcement learning (RL). Benchmarks on RepoBench, SAFIM, and HumanEval demonstrate its competitive performance, with specialized fine-tuned versions for Python already showing strong improvements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Mesh R-CNN

    Mesh R-CNN

    code for Mesh R-CNN, ICCV 2019

    Mesh R-CNN is a 3D reconstruction and object understanding framework developed by Facebook Research that extends Mask R-CNN into the 3D domain. Built on top of Detectron2 and PyTorch3D, Mesh R-CNN enables end-to-end 3D mesh prediction directly from single RGB images. The model learns to detect, segment, and reconstruct detailed 3D mesh representations of objects in natural images, bridging the gap between 2D perception and 3D understanding. Unlike voxel-based or point-based approaches, Mesh R-CNN uses a differentiable mesh representation, allowing it to efficiently refine surface geometry while maintaining high spatial detail. The system combines 2D detection from Mask R-CNN with 3D reasoning modules that output full mesh reconstructions aligned with the input image. It has been evaluated on datasets such as Pix3D, where it demonstrates state-of-the-art performance in reconstructing real-world object geometry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation across base and target domains to measure how well the model retains its general knowledge while specializing as needed. It includes utilities to fine-tune vision-language embeddings, compute prompt or adapter updates, and benchmark across transfer and retention metrics. MetaCLIP is especially suited for real-world settings where a model must continuously incorporate new visual categories or domains over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Metaseq

    Metaseq

    Repo for external large-scale work

    Metaseq is a flexible, high-performance framework for training and serving large-scale sequence models, such as language models, translation systems, and instruction-tuned LLMs. Built on top of PyTorch, it provides distributed training, model sharding, mixed-precision computation, and memory-efficient checkpointing to support models with hundreds of billions of parameters. The framework was used internally at Meta to train models like OPT (Open Pre-trained Transformer) and serves as a reference implementation for scaling transformer architectures efficiently across GPUs and nodes. It supports both pretraining and fine-tuning workflows with data pipelines for text, multilingual corpora, and custom tokenization schemes. Metaseq also includes APIs for evaluation, generation, and model serving, enabling seamless transitions from training to inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports both text and audio inputs to generate outputs in various forms, including voice cloning, emotion control, and interactive role-playing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MiniCPM4

    MiniCPM4

    Ultra-Efficient LLMs on End Device

    MiniCPM4 is part of the MiniCPM family of ultra-efficient large language models designed specifically for high performance on edge devices and resource-constrained environments. Unlike traditional large-scale models that require extensive computational resources, MiniCPM4 focuses on delivering competitive reasoning and language capabilities while maintaining significantly lower latency and higher efficiency. It achieves this through optimized architectures, scalable training strategies, and techniques such as long-context pretraining and YaRN-based length extension, allowing it to handle sequences up to 128K tokens effectively. The model demonstrates strong performance across tasks such as long-text comprehension, reasoning, and general language generation, often outperforming similar-sized models in both speed and accuracy. MiniCPM4 is available in multiple parameter sizes, making it adaptable to different deployment scenarios ranging from mobile devices to GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    MiniCPM4.1

    MiniCPM4.1

    Achieving 3+ generation speedup on reasoning tasks

    MiniCPM4.1 is an enhanced iteration of the MiniCPM4 architecture, introducing improvements in reasoning capabilities, inference speed, and hybrid operation modes that allow dynamic switching between deep reasoning and standard generation. It builds upon the same efficiency-focused philosophy but further optimizes decoding performance, achieving substantial speed gains in reasoning-intensive tasks while maintaining high-quality outputs. One of its key innovations is the hybrid reasoning mode, which allows developers to control whether the model engages in deeper reasoning processes or faster responses depending on the use case. The model also supports both dense and sparse attention mechanisms, enabling more efficient computation depending on the selected inference framework. With improved pretraining on longer sequences and enhanced scaling techniques, MiniCPM4.1 delivers better performance in long-context tasks and complex problem solving.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel strategies such as LASP+, varlen ring attention, and Expert Tensor Parallelism, enabling a training context of 1 million tokens and up to 4 million tokens at inference. MiniMax-VL-01 extends this core by adding a 303M-parameter Vision Transformer and a two-layer MLP projector in a ViT–MLP–LLM framework, allowing the model to process images at dynamic resolutions up to 2016×2016.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. M1 is further trained with large-scale reinforcement learning over diverse tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB