Showing 54 open source projects for "framework"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Context for your AI agents Icon
    Context for your AI agents

    Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

    Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
    Try for free
  • 1
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    ...It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. GLM-TTS also supports phoneme-level control and hybrid text + phoneme input, giving developers precise control over pronunciation critical for multilingual or polyphone­-rich languages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    ...It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding, and long-document interpretation. GLM-4.5V emerged from a training framework that leverages scalable reinforcement learning (with curriculum sampling) to boost performance across tasks ranging from STEM problem solving to long-context reasoning, giving it broad applicability beyond narrow benchmarks. When it was released, it achieved state-of-the-art results on a large collection of public multimodal benchmarks for open-source models.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of specific image regions or objects. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    InstantCharacter

    InstantCharacter

    Personalize Any Characters with a Scalable Diffusion Transformer

    InstantCharacter is a tuning-free diffusion transformer framework created by Tencent Hunyuan / InstantX team, which enables generating images of a specific character (subject) from a single reference image, preserving identity and character features. Uses adapters, so full fine-tuning of the base model is not required. Demo scripts and pipeline API (via infer_demo.py, pipeline.py) included. It works by adapting a base image generation model with a lightweight adapter so that you can produce character-preserving generations in various downstream tasks (e.g. changing pose, clothing, scene) without needing full model fine-tuning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 5
    4M

    4M

    4M: Massively Multimodal Masked Modeling

    4M is a training framework for “any-to-any” vision foundation models that uses tokenization and masking to scale across many modalities and tasks. The same model family can classify, segment, detect, caption, and even generate images, with a single interface for both discriminative and generative use. The repository releases code and models for multiple variants (e.g., 4M-7 and 4M-21), emphasizing transfer to unseen tasks and modalities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation across base and target domains to measure how well the model retains its general knowledge while specializing as needed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework from Tencent Hunyuan, built on their HunyuanVideo foundation. It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and support for parallel inference via xDiT. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    Step1X-Edit is a state-of-the-art open-source image editing model/framework that uses a multimodal large language model (LLM) together with a diffusion-based image decoder to let users edit images simply via natural-language instructions plus a reference image. You supply an existing image and a textual command — e.g. “add a ruby pendant on the girl’s neck” or “make the background a sunset over mountains” — and the model interprets the instruction, computes a latent embedding combining the image content and user intent, then decodes a new image implementing the edit. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Dynamic Work and Complex Project Management Platform | Quickbase Icon
    Dynamic Work and Complex Project Management Platform | Quickbase

    Quickbase is the leading application platform for dynamic work.

    Our no-code platform lets you easily create, connect, and customize enterprise applications that fix visibility and workflow gaps without replacing a single system.
    Learn More
  • 10
    Step1X-3D

    Step1X-3D

    High-Fidelity and Controllable Generation of Textured 3D Assets

    Step1X-3D is an open-source framework for generating high-fidelity textured 3D assets from scratch — both their geometry and surface textures — using modern generative AI techniques. It combines a hybrid architecture: a geometry generation stage using a VAE-DiT model to output a watertight 3D representation (e.g. TSDF surface), and a texture synthesis stage that conditions on geometry and optionally reference input (or prompts) to produce view-consistent textures using a diffusion-based texture module. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    GPT Discord Bot

    GPT Discord Bot

    Example Discord bot written in Python that uses the completions API

    ...The bot uses the Chat Completions API (defaulting to gpt-3.5-turbo) to carry out conversational interactions and the Moderations API to filter user messages. It is built on top of the discord.py framework and the OpenAI Python library, providing a simple, extensible template for building AI-powered Discord applications. The bot supports a /chat command that spawns a public thread, carries full conversation context across messages, and gracefully closes the thread when context or message limits are reached. Developers can customize system instructions through a config file and modify the model used for responses. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    ...It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel strategies such as LASP+, varlen ring attention, and Expert Tensor Parallelism, enabling a training context of 1 million tokens and up to 4 million tokens at inference. MiniMax-VL-01 extends this core by adding a 303M-parameter Vision Transformer and a two-layer MLP projector in a ViT–MLP–LLM framework, allowing the model to process images at dynamic resolutions up to 2016×2016.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. ...
    Downloads: 58 This Week
    Last Update:
    See Project
  • 15
    GLM-130B

    GLM-130B

    GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

    ...The model supports efficient inference via INT8 and INT4 quantization, reducing hardware requirements from 8× A100 GPUs to as little as a single server with 4× RTX 3090s. Built on the SwissArmyTransformer (SAT) framework and compatible with DeepSpeed and FasterTransformer, it supports high-speed inference (up to 2.5× faster) and reproducible evaluation across 30+ benchmark tasks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Metaseq

    Metaseq

    Repo for external large-scale work

    Metaseq is a flexible, high-performance framework for training and serving large-scale sequence models, such as language models, translation systems, and instruction-tuned LLMs. Built on top of PyTorch, it provides distributed training, model sharding, mixed-precision computation, and memory-efficient checkpointing to support models with hundreds of billions of parameters. The framework was used internally at Meta to train models like OPT (Open Pre-trained Transformer) and serves as a reference implementation for scaling transformer architectures efficiently across GPUs and nodes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ToMe (Token Merging)

    ToMe (Token Merging)

    A method to increase the speed and lower the memory footprint

    ToMe (Token Merging) is a PyTorch-based optimization framework designed to significantly accelerate Vision Transformer (ViT) architectures without retraining. Developed by researchers at Facebook (Meta AI), ToMe introduces an efficient technique that merges similar tokens within transformer layers, reducing redundant computation while preserving model accuracy. This approach differs from token pruning, which removes background tokens entirely; instead, ToMe merges tokens based on feature similarity, allowing it to compress both foreground and background information efficiently. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GPT-NeoX

    GPT-NeoX

    Implementation of model parallel autoregressive transformers on GPUs

    This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    ConvNeXt V2

    ConvNeXt V2

    Code release for ConvNeXt V2 model

    ConvNeXt V2 is an evolution of the ConvNeXt architecture that co-designs convolutional networks alongside self-supervised learning. The V2 version introduces a fully convolutional masked autoencoder (FCMAE) framework where parts of the image are masked and the network reconstructs the missing content, marrying convolutional inductive bias with powerful pretraining. A key innovation is a new Global Response Normalization (GRN) layer added to the ConvNeXt backbone, which enhances feature competition across channels. The result is a convnet that competes strongly with transformer architectures on recognition benchmarks while being efficient and hardware-friendly. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    minGPT

    minGPT

    A minimal PyTorch re-implementation of the OpenAI GPT

    ...Because the whole model is around 300 lines of code, users can follow each step—from embedding lookup, positional encodings, multi-head attention, feed-forward layers, to output heads—and thus demystify how GPT-style models work beneath the surface. It provides a practical sandbox for experimentation, letting learners tweak the architecture, dataset, or training loop without being overwhelmed by framework abstraction.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    MAE (Masked Autoencoders)

    MAE (Masked Autoencoders)

    PyTorch implementation of MAE

    MAE (Masked Autoencoders) is a self-supervised learning framework for visual representation learning using masked image modeling. It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MaskFormer

    MaskFormer

    Per-Pixel Classification is Not All You Need for Semantic Segmentation

    MaskFormer is a unified framework for image segmentation developed by Facebook Research, designed to bridge the gap between semantic, instance, and panoptic segmentation within a single architecture. Unlike traditional segmentation pipelines that treat these tasks separately, MaskFormer reformulates segmentation as a mask classification problem, enabling a consistent and efficient approach across multiple segmentation domains.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Multi-Agent Emergence Environments

    Multi-Agent Emergence Environments

    Environment generation code for the paper "Emergent Tool Use"

    ...Developers can construct custom environments by combining modular components such as Boxes, Ramps, and RandomWalls using a flexible layering approach that reduces code duplication. The framework includes several predefined environments—such as Hide and Seek, Box Locking, Blueprint Construction, and Shelter Construction—that model distinct problem-solving and collaboration scenarios.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    PyTorch GAN Zoo

    PyTorch GAN Zoo

    A mix of GAN implementations including progressive growing

    ...In addition to core GAN training, the repository includes tools for model evaluation, such as Inception Score and SWD metrics, as well as advanced features like GDPP for diverse generation and AC-GAN conditioning for class-specific synthesis. The framework also supports “inspirational generation,” enabling style or content transfer from reference images through pre-trained models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DeepSDF

    DeepSDF

    Learning Continuous Signed Distance Functions for Shape Representation

    DeepSDF is a deep learning framework for continuous 3D shape representation using Signed Distance Functions (SDFs), as presented in the CVPR 2019 paper DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation by Park et al. The framework learns a continuous implicit function that maps 3D coordinates to their corresponding signed distances from object surfaces, allowing compact, high-fidelity shape modeling.
    Downloads: 1 This Week
    Last Update:
    See Project