Tiny vision language model
AlphaFold 3 inference pipeline
The most powerful local music generation model
General-purpose image editing model that delivers high-fidelity
Python inference and LoRA trainer package for the LTX-2 audio–video
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Advanced language and coding AI model
Official inference repo for FLUX.2 models
A Family of Open Sourced Music Foundation Models
Official implementation of Watermark Anything with Localized Messages
ICLR2024 Spotlight: curation/training code, metadata, distribution
Chat & pretrained large audio language model proposed by Alibaba Cloud
Tooling for the Common Objects In 3D dataset
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Uncommon Objects in 3D dataset
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Foundation model for image generation
A SOTA open-source image editing model
Open-source framework for intelligent speech interaction
LLM-based Reinforcement Learning audio edit model
High-Resolution Image Synthesis with Latent Diffusion Models
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Let us control diffusion models
A method to increase the speed and lower the memory footprint
Per-Pixel Classification is Not All You Need for Semantic Segmentation